Class: Queries::Otu::Autocomplete

Inherits:
Query::Autocomplete show all
Defined in:
lib/queries/otu/autocomplete.rb

Overview

See Query::Autocomplete for optimization strategy per name. There are 4 classes of name, each which has the same strategy: OTU name, Original TaxonName, TaxonName, CommonName We then apply a global priority pulling the best names from each sub-strategy to the top.

Constant Summary collapse

QUERIES =

Keys are method names. Existence of method is checked before requesting the query

{
  # OTU
  otu_name_exact: {priority: 1},
  autocomplete_exact_id: {priority: 1},
  autocomplete_identifier_cached_exact: {priority: 1},
  otu_name_start_match: {priority: 200},
  otu_name_similarity: {priority: 220},

  # TaxonName
  autocomplete_taxon_name: {priority: nil}, # Priority is slotted from 10 .. 20
  # These are all approximately covered in the blanket taxon_name autocomplete
  # taxon_name_name_exact: {priority: 10},
  # taxon_name_identifier_exact: {priority: 10},
  # taxon_name_name_start_match: {priority: 100},
  # taxon_name_name_high_cuttoff: {priority: 200},

  # CommonName
  # These should all be covered/moved to common_name_autocomplete,
  autocomplete_common_name_exact: {priority: 100},
  autocomplete_common_name_like: {priority: 1000}
  # common_name_identifier_exact: {priority: 10},
  # common_name_name_start_match: {priority: 100},
  # common_name_name_similarity: {priority: 200},
}.freeze

Instance Attribute Summary collapse

Attributes inherited from Query::Autocomplete

#dynamic_limit, #project_id, #query_string

Attributes inherited from Query

#query_string, #terms

Instance Method Summary collapse

Methods inherited from Query::Autocomplete

#autocomplete_cached, #autocomplete_cached_wildcard_anywhere, #autocomplete_common_name_exact, #autocomplete_common_name_like, #autocomplete_exact_id, #autocomplete_exactly_named, #autocomplete_named, #autocomplete_ordered_wildcard_pieces_in_cached, #cached_facet, #combine_or_clauses, #common_name_name, #common_name_table, #common_name_wild_pieces, #exactly_named, #fragments, #integers, #least_levenshtein, #match_wildcard_end_in_cached, #match_wildcard_in_cached, #named, #only_ids, #only_integers?, #parent, #parent_child_join, #parent_child_where, #pieces, #scope, #string_fragments, #wildcard_wrapped_integers, #wildcard_wrapped_years, #with_cached, #with_cached_like, #with_id, #with_project_id, #year_letter, #years

Methods inherited from Query

#alphabetic_strings, #alphanumeric_strings, base_name, #base_name, #build_terms, #cached_facet, #end_wildcard, #levenshtein_distance, #match_ordered_wildcard_pieces_in_cached, #no_terms?, referenced_klass, #referenced_klass, #referenced_klass_except, #referenced_klass_intersection, #referenced_klass_union, #start_and_end_wildcard, #start_wildcard, #table, #wildcard_pieces

Constructor Details

#initialize(string, project_id: nil, having_taxon_name_only: false, with_taxon_name: nil, exact: 'false') ⇒ Autocomplete

Returns a new instance of Autocomplete.



56
57
58
59
60
61
62
63
# File 'lib/queries/otu/autocomplete.rb', line 56

def initialize(string, project_id: nil, having_taxon_name_only: false, with_taxon_name: nil, exact: 'false')
  super(string, project_id:)
  @having_taxon_name_only = boolean_param({having_taxon_name_only:}, :having_taxon_name_only)
  @with_taxon_name = boolean_param({with_taxon_name:}, :with_taxon_name)

  # TODO: move to mode
  @exact = boolean_param({exact:}, :exact)
end

Instance Attribute Details

#exactBoolean

Returns &exact=<“true”|“false”> if ‘true’ then only #name = query_string results are returned (no fuzzy matching).

Returns:

  • (Boolean)

    &exact=<“true”|“false”> if ‘true’ then only #name = query_string results are returned (no fuzzy matching)



27
28
29
# File 'lib/queries/otu/autocomplete.rb', line 27

def exact
  @exact
end

#having_taxon_name_onlyObject

Returns Boolean, nil true - only return Otus with ‘name` = nil false,nil - no effect.

Returns:

  • Boolean, nil true - only return Otus with ‘name` = nil false,nil - no effect



16
17
18
# File 'lib/queries/otu/autocomplete.rb', line 16

def having_taxon_name_only
  @having_taxon_name_only
end

#with_taxon_nameObject

Returns Boolean, nil true - OTU must have taxon name false - OTU must not have taxon name nil - ignored.

Returns:

  • Boolean, nil true - OTU must have taxon name false - OTU must not have taxon name nil - ignored



22
23
24
# File 'lib/queries/otu/autocomplete.rb', line 22

def with_taxon_name
  @with_taxon_name
end

Instance Method Details

#api_autocompleteObject

Maintains valid_taxon_name_id needed for API.

Considerations:

otus -> taxon names -> valid taxon name_id <- otu can return more OTUs than the original query
   because there can be multiple OTUs for the valid name of an invalid original result.
   right now we pick the first valid OTU for the name with distinct on()


119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
# File 'lib/queries/otu/autocomplete.rb', line 119

def api_autocomplete
  @with_taxon_name = true

  # This limit() has more impact now. Since all
  # names are loaded large matches can swamp exact names
  # before priority ordering is applied. May require tuning.
  otus = compact_priorities( autocomplete_base.limit(30) )

  otu_order = otus.map(&:id).uniq

  f = ::Otu.where(id: otu_order)
    .joins('left join taxon_names t1 on otus.taxon_name_id = t1.id')
    .joins('left join otus o2 on t1.cached_valid_taxon_name_id = o2.taxon_name_id')
    .select('distinct on (otus.id) otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) as otu_valid_id')

  f.sort_by.with_index { |item, idx| [(otu_order.index(item.id) || 999), (idx || 999)] }
end

#api_autocomplete_extendedArray

An autocomplete result that permits displaying the TaxonName as originally matched. Note that otu: is really only useful when displaying otus without &having_taxon_name_only=true. We don’t, for example make use of this element there.

Returns:

  • (Array)

    of { otu:, label_target:, otu_valid_id: }



172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
# File 'lib/queries/otu/autocomplete.rb', line 172

def api_autocomplete_extended
  otu_queries = QUERIES.dup
  otu_queries.delete(:autocomplete_taxon_name)

  base_otus = autocomplete_base(otu_queries).limit(30)
  taxon_name_otus = autocomplete_taxon_name_extended

  r = []

  base_otus.each do |o|
    r.push({
      otu: o, # contains priority
      label_target: o
    })
  end

  taxon_name_otus.each do |o|
    r.push({
      otu: o,
      label_target: (o.label_target_taxon_name_id ? ::TaxonName.find(o.label_target_taxon_name_id) : o.taxon_name )  # is o.taxon_name true?!
    })
  end

  # Keep a unique set of otu + label (to render)
  seen = Set.new

  # The compacted result
  compact = []

  r.each do |h|
    g = h[:label_target].id.to_s + h[:label_target].class.name
    m = [ h[:otu].id, g ]
    next if seen.include?( m )
    seen << m
    compact.push h
  end

  compact.sort!{|c,d| (c[:otu].priority || 999) <=> (d[:otu].priority || 999 )}

  # TODO: Refactor to remove extra query and assignment of otu_valid_id.  This is ugly.
  otu_order = compact.collect{|d| d[:otu].id}

  # Extra query is painful.
  f = ::Otu.where(id: otu_order)
    .joins('left join taxon_names t1 on otus.taxon_name_id = t1.id')
#         .joins('left join otus o2 on t1.cached_valid_taxon_name_id = o2.taxon_name_id')
    .joins('left join otus o2 on t1.cached_valid_taxon_name_id = o2.taxon_name_id and o2.taxon_name_id <> otus.taxon_name_id') # See https://github.com/sfg-taxonpages/orthoptera/issues/90
    .select('distinct on (otus.id) otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) as otu_valid_id')

  compact.each do |h|
    h[:otu_valid_id] = f.select{|j| j.id == h[:otu].id}.first.otu_valid_id
  end

  compact
end

#autocompleteObject



244
245
246
# File 'lib/queries/otu/autocomplete.rb', line 244

def autocomplete
  compact_priorities( autocomplete_base.limit(40) )
end

#autocomplete_base(targets = QUERIES) ⇒ Object



248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
# File 'lib/queries/otu/autocomplete.rb', line 248

def autocomplete_base(targets = QUERIES)
  queries = []

  targets.each do |q, p|
    if self.respond_to?(q)

      a = send(q)
      next if a.nil? # query has returned nil

      y = p[:priority]

      a = scope_autocomplete(a)

      a = a.select("otus.*, #{y} as priority") unless y.nil?

      queries.push a
    end
  end

  queries.compact!
  referenced_klass_union(queries).order('priority')
end

#autocomplete_taxon_nameScope

Returns Pull the result of a TaxonName autocomplete. Maintain the order returned, and re-cast the result in terms of an OTU query. Expensive but maintaining order is key.

Returns:

  • (Scope)

    Pull the result of a TaxonName autocomplete. Maintain the order returned, and re-cast the result in terms of an OTU query. Expensive but maintaining order is key.



96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
# File 'lib/queries/otu/autocomplete.rb', line 96

def autocomplete_taxon_name
  taxon_names = Queries::TaxonName::Autocomplete.new(query_string, exact:, project_id:).autocomplete # an array, not a query

  ids = taxon_names.collect{|n| n.is_combination? ? n.cached_valid_taxon_name_id : n.id} # TODO: Experiment with :cached_valid_taxon_name_id) # We assume we want to land on Valid OTUs, but see #
  return nil if ids.empty?

  min = 10.0
  max = 20.0
  scale = (max - min) / ids.count.to_f

  # TODO: optimize *
  base_query.select("otus.*, ((#{min} + row_number() OVER ())::float * #{scale}) as priority") # small incrementing numbers for priority
    .joins("INNER JOIN ( SELECT unnest(ARRAY[#{ids.join(',')}]) AS id, row_number() OVER () AS row_num ) AS id_order ON otus.taxon_name_id = id_order.id")
    .order('id_order.row_num')
end

#autocomplete_taxon_name_extendedObject



138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# File 'lib/queries/otu/autocomplete.rb', line 138

def autocomplete_taxon_name_extended
  taxon_names = Queries::TaxonName::Autocomplete.new(query_string, exact:, project_id:).autocomplete # an array, not a query

  ids = taxon_names.collect{|n|
    [
      (n.is_combination? ? n.cached_valid_taxon_name_id : n.id), # Points to the OTU target, if there is one
      n.id,  # points to the label target
    ]
  }

  return ::Otu.none if ids.empty?

  ids.uniq!

  min = 10.0
  max = 20.0
  scale = (max - min) / ids.count.to_f

  # TODO: optimize *
  otus = base_query.select("otus.*, label_target_taxon_name_id, ((#{min} + row_number() OVER ())::float * #{scale}) as priority") # small incrementing numbers for priority
    .joins("INNER JOIN ( SELECT unnest(ARRAY[#{ids.map(&:first).join(',')}]) AS id, unnest(ARRAY[#{ids.map(&:last).join(',')}]) AS label_target_taxon_name_id, row_number() OVER () AS row_num ) AS id_order ON otus.taxon_name_id = id_order.id")
    .order('id_order.row_num')

  otus = scope_autocomplete(otus).includes(:taxon_name)

  otus
end

#base_queryObject



65
66
67
68
69
# File 'lib/queries/otu/autocomplete.rb', line 65

def base_query
  q = ::Otu.all
  q = q.where(project_id:) if project_id.any?
  q
end

#compact_priorities(otus) ⇒ Object

Doesn’t work for extended, as we can have the same OTU with different labels



231
232
233
234
235
236
237
238
239
240
241
242
# File 'lib/queries/otu/autocomplete.rb', line 231

def compact_priorities(otus)
  # Mmmmarg!
  # We may have the same name at different priorities, strike all but the highest/first.
  r = []
  i = {}
  otus.each do |o|
    next if i[o.id]
    r.push o
    i[o.id] = true
  end
  r
end

#otu_name_exactObject



71
72
73
# File 'lib/queries/otu/autocomplete.rb', line 71

def otu_name_exact
  base_query.where(otus: {name: query_string})
end

#otu_name_similarityObject

All records that meet the similarity cuttoff

  • this is intended as a generic replacement for wildcarded results

Observations:

- was similarity(), experimenting with word_similarity
- 3 letter matches are going to be low probability, matches kick in at 4


86
87
88
89
90
91
# File 'lib/queries/otu/autocomplete.rb', line 86

def otu_name_similarity
  base_query
    .where('otus.name % ?', query_string)
    .where( ApplicationRecord.sanitize_sql_array(["word_similarity('%s', otus.name) > 0.33", query_string]))
    .order('otus.name, length(otus.name)')
end

#otu_name_start_matchObject



75
76
77
# File 'lib/queries/otu/autocomplete.rb', line 75

def otu_name_start_match
  base_query.where('otus.name ilike ?', query_string + '%')
end

#scope_autocomplete(query) ⇒ Object



271
272
273
274
275
276
# File 'lib/queries/otu/autocomplete.rb', line 271

def scope_autocomplete(query)
  query = query.joins(:taxon_name) if with_taxon_name
  query = query.where.missing(:taxon_name) if with_taxon_name == false
  query = query.joins(:taxon_name).where(otus: {name: nil}) if having_taxon_name_only
  query
end