Class: Queries::Otu::Autocomplete

Inherits:
Query::Autocomplete show all
Defined in:
lib/queries/otu/autocomplete.rb

Overview

See Query::Autocomplete for optimization strategy per name. There are 4 classes of name, each which has the same strategy: OTU name, Original TaxonName, TaxonName, CommonName We then apply a global priority pulling the best names from each sub-strategy to the top.

Constant Summary collapse

QUERIES =

Keys are method names. Existence of method is checked before requesting the query

{
  # OTU
  otu_name_exact: {priority: 1},
  autocomplete_exact_id: {priority: 1},
  autocomplete_identifier_cached_exact: {priority: 1},
  otu_name_start_match: {priority: 200},
  otu_name_similarity: {priority: 220},

  # TaxonName
  autocomplete_taxon_name: {priority: nil}, # Priority is slotted from 10 .. 20
  # These are all approximately covered in the blanket taxon_name autocomplete
  # taxon_name_name_exact: {priority: 10},
  # taxon_name_identifier_exact: {priority: 10},
  # taxon_name_name_start_match: {priority: 100},
  # taxon_name_name_high_cuttoff: {priority: 200},

  # CommonName
  # These should all be covered/moved to common_name_autocomplete,
  autocomplete_common_name_exact: {priority: 100},
  autocomplete_common_name_like: {priority: 1000}
  # common_name_identifier_exact: {priority: 10},
  # common_name_name_start_match: {priority: 100},
  # common_name_name_similarity: {priority: 200},
}.freeze

Instance Attribute Summary collapse

Attributes inherited from Query::Autocomplete

#dynamic_limit, #project_id, #query_string

Attributes inherited from Query

#query_string, #terms

Instance Method Summary collapse

Methods inherited from Query::Autocomplete

#autocomplete_cached, #autocomplete_cached_wildcard_anywhere, #autocomplete_common_name_exact, #autocomplete_common_name_like, #autocomplete_exact_id, #autocomplete_exactly_named, #autocomplete_named, #autocomplete_ordered_wildcard_pieces_in_cached, #combine_or_clauses, #common_name_name, #common_name_table, #common_name_wild_pieces, #exactly_named, #fragments, #integers, #least_levenshtein, #match_wildcard_end_in_cached, #match_wildcard_in_cached, #named, #only_ids, #only_integers?, #parent, #parent_child_join, #parent_child_where, #pieces, #scope, #string_fragments, #wildcard_wrapped_integers, #wildcard_wrapped_years, #with_cached, #with_cached_like, #with_id, #with_project_id, #year_letter, #years

Methods inherited from Query

#alphabetic_strings, #alphanumeric_strings, base_name, #base_name, #build_terms, #cached_facet, #end_wildcard, #levenshtein_distance, #match_ordered_wildcard_pieces_in_cached, #no_terms?, referenced_klass, #referenced_klass, #referenced_klass_except, #referenced_klass_intersection, #referenced_klass_union, #start_and_end_wildcard, #start_wildcard, #table, #wildcard_pieces

Constructor Details

#initialize(string, project_id: nil, having_taxon_name_only: false, with_taxon_name: nil, exact: 'false') ⇒ Autocomplete

Returns a new instance of Autocomplete.



56
57
58
59
60
61
62
63
# File 'lib/queries/otu/autocomplete.rb', line 56

def initialize(string, project_id: nil, having_taxon_name_only: false, with_taxon_name: nil, exact: 'false')
  super(string, project_id:)
  @having_taxon_name_only = boolean_param({having_taxon_name_only:}, :having_taxon_name_only)
  @with_taxon_name = boolean_param({with_taxon_name:}, :with_taxon_name)

  # TODO: move to mode
  @exact = boolean_param({exact:}, :exact)
end

Instance Attribute Details

#exactBoolean

Returns &exact=<“true”|“false”> if ‘true’ then only #name = query_string results are returned (no fuzzy matching).

Returns:

  • (Boolean)

    &exact=<“true”|“false”> if ‘true’ then only #name = query_string results are returned (no fuzzy matching)



27
28
29
# File 'lib/queries/otu/autocomplete.rb', line 27

def exact
  @exact
end

#having_taxon_name_onlyObject

Returns Boolean, nil true - only return Otus with ‘name` = nil false,nil - no effect.

Returns:

  • Boolean, nil true - only return Otus with ‘name` = nil false,nil - no effect



16
17
18
# File 'lib/queries/otu/autocomplete.rb', line 16

def having_taxon_name_only
  @having_taxon_name_only
end

#with_taxon_nameObject

Returns Boolean, nil true - OTU must have taxon name false - OTU must not have taxon name nil - ignored.

Returns:

  • Boolean, nil true - OTU must have taxon name false - OTU must not have taxon name nil - ignored



22
23
24
# File 'lib/queries/otu/autocomplete.rb', line 22

def with_taxon_name
  @with_taxon_name
end

Instance Method Details

#api_autocompleteObject

Maintains valid_taxon_name_id needed for API.

Considerations:

otus -> taxon names -> valid taxon name_id <- otu can return more OTUs than the original query
   because there can be multiple OTUs for the valid name of an invalid original result.
   right now we pick the first valid OTU for the name with distinct on()


118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
# File 'lib/queries/otu/autocomplete.rb', line 118

def api_autocomplete
  @with_taxon_name = true

  # This limit() has more impact now. Since all
  # names are loaded large matches can swamp exact names
  # before priority ordering is applied. May require tuning.
  otus = compact_priorities( autocomplete_base.limit(30) )

  otu_order = otus.map(&:id).uniq

  f = ::Otu.where(id: otu_order)
        .joins('left join taxon_names t1 on otus.taxon_name_id = t1.id')
        .joins('left join otus o2 on t1.cached_valid_taxon_name_id = o2.taxon_name_id')
        .select('distinct on (otus.id) otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) as otu_valid_id')

  f.sort_by.with_index { |item, idx| [(otu_order.index(item.id) || 999), (idx || 999)] }
end

#autocompleteObject



149
150
151
# File 'lib/queries/otu/autocomplete.rb', line 149

def autocomplete
  compact_priorities( autocomplete_base.limit(40) )
end

#autocomplete_baseObject



153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
# File 'lib/queries/otu/autocomplete.rb', line 153

def autocomplete_base
  queries = []

  QUERIES.each do |q, p|
    if self.respond_to?(q)

      a = send(q)
      next if a.nil? # query has returned nil

      y = p[:priority]

      a = a.joins(:taxon_name) if with_taxon_name
      a = a.where.missing(:taxon_name) if with_taxon_name == false
      a = a.joins(:taxon_name).where(otus: {name: nil}) if having_taxon_name_only

      a = a.select("otus.*, #{y} as priority") unless y.nil?

      queries.push a
    end
  end

  queries.compact!
  referenced_klass_union(queries).order('priority')
end

#autocomplete_taxon_nameScope

Returns Pull the result of a TaxonName autocomplete. Maintain the order returned, and re-cast the result in terms of an OTU query. Expensive but maintain order is key.

Returns:

  • (Scope)

    Pull the result of a TaxonName autocomplete. Maintain the order returned, and re-cast the result in terms of an OTU query. Expensive but maintain order is key.



96
97
98
99
100
101
102
103
104
105
106
107
108
109
# File 'lib/queries/otu/autocomplete.rb', line 96

def autocomplete_taxon_name
  taxon_names = Queries::TaxonName::Autocomplete.new(query_string, exact:, project_id:).autocomplete # an array, not a query

  ids = taxon_names.map(&:id) # TODO: Experiment with :cached_valid_taxon_name_id) # We assume we want to land on Valid OTUs, but see #
  return nil if ids.empty?

  min = 10.0
  max = 20.0
  scale = (max - min) / ids.count.to_f

  base_query.select("otus.*, ((#{min} + row_number() OVER ())::float * #{scale}) as priority") # small incrementing numbers for priority
  .joins("INNER JOIN ( SELECT unnest(ARRAY[#{ids.join(',')}]) AS id, row_number() OVER () AS row_num ) AS id_order ON otus.taxon_name_id = id_order.id")
  .order('id_order.row_num')
end

#base_queryObject



65
66
67
68
69
# File 'lib/queries/otu/autocomplete.rb', line 65

def base_query
  q = ::Otu.all
  q = q.where(project_id:) if project_id.any?
  q
end

#compact_priorities(otus) ⇒ Object



136
137
138
139
140
141
142
143
144
145
146
147
# File 'lib/queries/otu/autocomplete.rb', line 136

def compact_priorities(otus)
  # Mmmmarg!
  # We may have the same name at different priorities, strike all but the highest/first.
  r = []
  i = {}
  otus.each do |o|
    next if i[o.id]
    r.push o
    i[o.id] = true
  end
  r
end

#otu_name_exactObject



71
72
73
# File 'lib/queries/otu/autocomplete.rb', line 71

def otu_name_exact
  base_query.where(otus: {name: query_string})
end

#otu_name_similarityObject

All records that meet the similarity cuttoff

  • this is intended as a generic replacement for wildcarded results

Observations:

- was similarity(), experimenting with word_similarity
- 3 letter matches are going to be low probability, matches kick in at 4


86
87
88
89
90
91
# File 'lib/queries/otu/autocomplete.rb', line 86

def otu_name_similarity
  base_query
  .where('otus.name % ?', query_string)
    .where( ApplicationRecord.sanitize_sql_array(["word_similarity('%s', otus.name) > 0.33", query_string]))
  .order('otus.name, length(otus.name)')
end

#otu_name_start_matchObject



75
76
77
# File 'lib/queries/otu/autocomplete.rb', line 75

def otu_name_start_match
  base_query.where('otus.name ilike ?', query_string + '%')
end