Class: Queries::Otu::Autocomplete
- Inherits:
-
Query::Autocomplete
- Object
- Query
- Query::Autocomplete
- Queries::Otu::Autocomplete
- Defined in:
- lib/queries/otu/autocomplete.rb
Overview
See Query::Autocomplete for optimization strategy per name. There are 4 classes of name, each which has the same strategy: OTU name, Original TaxonName, TaxonName, CommonName We then apply a global priority pulling the best names from each sub-strategy to the top.
Constant Summary collapse
- QUERIES =
Keys are method names. Existence of method is checked before requesting the query
{ # OTU otu_name_exact: {priority: 1}, autocomplete_exact_id: {priority: 1}, autocomplete_identifier_cached_exact: {priority: 1}, otu_name_start_match: {priority: 200}, otu_name_similarity: {priority: 220}, # TaxonName autocomplete_taxon_name: {priority: nil}, # Priority is slotted from 10 .. 20 # These are all approximately covered in the blanket taxon_name autocomplete # taxon_name_name_exact: {priority: 10}, # taxon_name_identifier_exact: {priority: 10}, # taxon_name_name_start_match: {priority: 100}, # taxon_name_name_high_cuttoff: {priority: 200}, # CommonName # These should all be covered/moved to common_name_autocomplete, autocomplete_common_name_exact: {priority: 100}, autocomplete_common_name_like: {priority: 1000} # common_name_identifier_exact: {priority: 10}, # common_name_name_start_match: {priority: 100}, # common_name_name_similarity: {priority: 200}, }.freeze
Instance Attribute Summary collapse
-
#exact ⇒ Boolean
&exact=<“true”|“false”> if ‘true’ then only #name = query_string results are returned (no fuzzy matching).
-
#having_taxon_name_only ⇒ Object
Boolean, nil true - only return Otus with ‘name` = nil false,nil - no effect.
-
#with_taxon_name ⇒ Object
Boolean, nil true - OTU must have taxon name false - OTU must not have taxon name nil - ignored.
Attributes inherited from Query::Autocomplete
#dynamic_limit, #project_id, #query_string
Attributes inherited from Query
Instance Method Summary collapse
-
#api_autocomplete ⇒ Object
Maintains valid_taxon_name_id needed for API.
-
#api_autocomplete_extended ⇒ Array
An autocomplete result that permits displaying the TaxonName as originally matched.
- #autocomplete ⇒ Object
- #autocomplete_base(targets = QUERIES) ⇒ Object
-
#autocomplete_taxon_name ⇒ Scope
Pull the result of a TaxonName autocomplete.
- #autocomplete_taxon_name_extended ⇒ Object
- #base_query ⇒ Object
-
#compact_priorities(otus) ⇒ Object
Doesn’t work for extended, as we can have the same OTU with different labels.
-
#initialize(string, project_id: nil, having_taxon_name_only: false, with_taxon_name: nil, exact: 'false') ⇒ Autocomplete
constructor
A new instance of Autocomplete.
- #otu_name_exact ⇒ Object
-
#otu_name_similarity ⇒ Object
All records that meet the similarity cuttoff - this is intended as a generic replacement for wildcarded results.
- #otu_name_start_match ⇒ Object
- #scope_autocomplete(query) ⇒ Object
Methods inherited from Query::Autocomplete
#autocomplete_cached, #autocomplete_cached_wildcard_anywhere, #autocomplete_common_name_exact, #autocomplete_common_name_like, #autocomplete_exact_id, #autocomplete_exactly_named, #autocomplete_named, #autocomplete_ordered_wildcard_pieces_in_cached, #cached_facet, #combine_or_clauses, #common_name_name, #common_name_table, #common_name_wild_pieces, #exactly_named, #fragments, #integers, #least_levenshtein, #match_wildcard_end_in_cached, #match_wildcard_in_cached, #named, #only_ids, #only_integers?, #parent, #parent_child_join, #parent_child_where, #pieces, #scope, #string_fragments, #wildcard_wrapped_integers, #wildcard_wrapped_years, #with_cached, #with_cached_like, #with_id, #with_project_id, #year_letter, #years
Methods inherited from Query
#alphabetic_strings, #alphanumeric_strings, base_name, #base_name, #build_terms, #cached_facet, #end_wildcard, #levenshtein_distance, #match_ordered_wildcard_pieces_in_cached, #no_terms?, referenced_klass, #referenced_klass, #referenced_klass_except, #referenced_klass_intersection, #referenced_klass_union, #start_and_end_wildcard, #start_wildcard, #table, #wildcard_pieces
Constructor Details
#initialize(string, project_id: nil, having_taxon_name_only: false, with_taxon_name: nil, exact: 'false') ⇒ Autocomplete
Returns a new instance of Autocomplete.
56 57 58 59 60 61 62 63 |
# File 'lib/queries/otu/autocomplete.rb', line 56 def initialize(string, project_id: nil, having_taxon_name_only: false, with_taxon_name: nil, exact: 'false') super(string, project_id:) @having_taxon_name_only = boolean_param({having_taxon_name_only:}, :having_taxon_name_only) @with_taxon_name = boolean_param({with_taxon_name:}, :with_taxon_name) # TODO: move to mode @exact = boolean_param({exact:}, :exact) end |
Instance Attribute Details
#exact ⇒ Boolean
Returns &exact=<“true”|“false”> if ‘true’ then only #name = query_string results are returned (no fuzzy matching).
27 28 29 |
# File 'lib/queries/otu/autocomplete.rb', line 27 def exact @exact end |
#having_taxon_name_only ⇒ Object
Returns Boolean, nil true - only return Otus with ‘name` = nil false,nil - no effect.
16 17 18 |
# File 'lib/queries/otu/autocomplete.rb', line 16 def having_taxon_name_only @having_taxon_name_only end |
#with_taxon_name ⇒ Object
Returns Boolean, nil true - OTU must have taxon name false - OTU must not have taxon name nil - ignored.
22 23 24 |
# File 'lib/queries/otu/autocomplete.rb', line 22 def with_taxon_name @with_taxon_name end |
Instance Method Details
#api_autocomplete ⇒ Object
Maintains valid_taxon_name_id needed for API.
Considerations:
otus -> taxon names -> valid taxon name_id <- otu can return more OTUs than the original query
because there can be multiple OTUs for the valid name of an invalid original result.
right now we pick the first valid OTU for the name with distinct on()
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
# File 'lib/queries/otu/autocomplete.rb', line 119 def api_autocomplete @with_taxon_name = true # This limit() has more impact now. Since all # names are loaded large matches can swamp exact names # before priority ordering is applied. May require tuning. otus = compact_priorities( autocomplete_base.limit(30) ) otu_order = otus.map(&:id).uniq f = ::Otu.where(id: otu_order) .joins('left join taxon_names t1 on otus.taxon_name_id = t1.id') .joins('left join otus o2 on t1.cached_valid_taxon_name_id = o2.taxon_name_id') .select('distinct on (otus.id) otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) as otu_valid_id') f.sort_by.with_index { |item, idx| [(otu_order.index(item.id) || 999), (idx || 999)] } end |
#api_autocomplete_extended ⇒ Array
An autocomplete result that permits displaying the TaxonName as originally matched. Note that otu: is really only useful when displaying otus without &having_taxon_name_only=true. We don’t, for example make use of this element there.
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 |
# File 'lib/queries/otu/autocomplete.rb', line 172 def api_autocomplete_extended otu_queries = QUERIES.dup otu_queries.delete(:autocomplete_taxon_name) base_otus = autocomplete_base(otu_queries).limit(30) taxon_name_otus = autocomplete_taxon_name_extended r = [] base_otus.each do |o| r.push({ otu: o, # contains priority label_target: o }) end taxon_name_otus.each do |o| r.push({ otu: o, label_target: (o.label_target_taxon_name_id ? ::TaxonName.find(o.label_target_taxon_name_id) : o.taxon_name ) # is o.taxon_name true?! }) end # Keep a unique set of otu + label (to render) seen = Set.new # The compacted result compact = [] r.each do |h| g = h[:label_target].id.to_s + h[:label_target].class.name m = [ h[:otu].id, g ] next if seen.include?( m ) seen << m compact.push h end compact.sort!{|c,d| (c[:otu].priority || 999) <=> (d[:otu].priority || 999 )} # TODO: Refactor to remove extra query and assignment of otu_valid_id. This is ugly. otu_order = compact.collect{|d| d[:otu].id} # Extra query is painful. f = ::Otu.where(id: otu_order) .joins('left join taxon_names t1 on otus.taxon_name_id = t1.id') # .joins('left join otus o2 on t1.cached_valid_taxon_name_id = o2.taxon_name_id') .joins('left join otus o2 on t1.cached_valid_taxon_name_id = o2.taxon_name_id and o2.taxon_name_id <> otus.taxon_name_id') # See https://github.com/sfg-taxonpages/orthoptera/issues/90 .select('distinct on (otus.id) otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) as otu_valid_id') compact.each do |h| h[:otu_valid_id] = f.select{|j| j.id == h[:otu].id}.first.otu_valid_id end compact end |
#autocomplete ⇒ Object
244 245 246 |
# File 'lib/queries/otu/autocomplete.rb', line 244 def autocomplete compact_priorities( autocomplete_base.limit(40) ) end |
#autocomplete_base(targets = QUERIES) ⇒ Object
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 |
# File 'lib/queries/otu/autocomplete.rb', line 248 def autocomplete_base(targets = QUERIES) queries = [] targets.each do |q, p| if self.respond_to?(q) a = send(q) next if a.nil? # query has returned nil y = p[:priority] a = scope_autocomplete(a) a = a.select("otus.*, #{y} as priority") unless y.nil? queries.push a end end queries.compact! referenced_klass_union(queries).order('priority') end |
#autocomplete_taxon_name ⇒ Scope
Returns Pull the result of a TaxonName autocomplete. Maintain the order returned, and re-cast the result in terms of an OTU query. Expensive but maintaining order is key.
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
# File 'lib/queries/otu/autocomplete.rb', line 96 def autocomplete_taxon_name taxon_names = Queries::TaxonName::Autocomplete.new(query_string, exact:, project_id:).autocomplete # an array, not a query ids = taxon_names.collect{|n| n.is_combination? ? n.cached_valid_taxon_name_id : n.id} # TODO: Experiment with :cached_valid_taxon_name_id) # We assume we want to land on Valid OTUs, but see # return nil if ids.empty? min = 10.0 max = 20.0 scale = (max - min) / ids.count.to_f # TODO: optimize * base_query.select("otus.*, ((#{min} + row_number() OVER ())::float * #{scale}) as priority") # small incrementing numbers for priority .joins("INNER JOIN ( SELECT unnest(ARRAY[#{ids.join(',')}]) AS id, row_number() OVER () AS row_num ) AS id_order ON otus.taxon_name_id = id_order.id") .order('id_order.row_num') end |
#autocomplete_taxon_name_extended ⇒ Object
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
# File 'lib/queries/otu/autocomplete.rb', line 138 def autocomplete_taxon_name_extended taxon_names = Queries::TaxonName::Autocomplete.new(query_string, exact:, project_id:).autocomplete # an array, not a query ids = taxon_names.collect{|n| [ (n.is_combination? ? n.cached_valid_taxon_name_id : n.id), # Points to the OTU target, if there is one n.id, # points to the label target ] } return ::Otu.none if ids.empty? ids.uniq! min = 10.0 max = 20.0 scale = (max - min) / ids.count.to_f # TODO: optimize * otus = base_query.select("otus.*, label_target_taxon_name_id, ((#{min} + row_number() OVER ())::float * #{scale}) as priority") # small incrementing numbers for priority .joins("INNER JOIN ( SELECT unnest(ARRAY[#{ids.map(&:first).join(',')}]) AS id, unnest(ARRAY[#{ids.map(&:last).join(',')}]) AS label_target_taxon_name_id, row_number() OVER () AS row_num ) AS id_order ON otus.taxon_name_id = id_order.id") .order('id_order.row_num') otus = scope_autocomplete(otus).includes(:taxon_name) otus end |
#base_query ⇒ Object
65 66 67 68 69 |
# File 'lib/queries/otu/autocomplete.rb', line 65 def base_query q = ::Otu.all q = q.where(project_id:) if project_id.any? q end |
#compact_priorities(otus) ⇒ Object
Doesn’t work for extended, as we can have the same OTU with different labels
231 232 233 234 235 236 237 238 239 240 241 242 |
# File 'lib/queries/otu/autocomplete.rb', line 231 def compact_priorities(otus) # Mmmmarg! # We may have the same name at different priorities, strike all but the highest/first. r = [] i = {} otus.each do |o| next if i[o.id] r.push o i[o.id] = true end r end |
#otu_name_exact ⇒ Object
71 72 73 |
# File 'lib/queries/otu/autocomplete.rb', line 71 def otu_name_exact base_query.where(otus: {name: query_string}) end |
#otu_name_similarity ⇒ Object
All records that meet the similarity cuttoff
-
this is intended as a generic replacement for wildcarded results
Observations:
- was similarity(), experimenting with word_similarity
- 3 letter matches are going to be low probability, matches kick in at 4
86 87 88 89 90 91 |
# File 'lib/queries/otu/autocomplete.rb', line 86 def otu_name_similarity base_query .where('otus.name % ?', query_string) .where( ApplicationRecord.sanitize_sql_array(["word_similarity('%s', otus.name) > 0.33", query_string])) .order('otus.name, length(otus.name)') end |
#otu_name_start_match ⇒ Object
75 76 77 |
# File 'lib/queries/otu/autocomplete.rb', line 75 def otu_name_start_match base_query.where('otus.name ilike ?', query_string + '%') end |
#scope_autocomplete(query) ⇒ Object
271 272 273 274 275 276 |
# File 'lib/queries/otu/autocomplete.rb', line 271 def scope_autocomplete(query) query = query.joins(:taxon_name) if with_taxon_name query = query.where.missing(:taxon_name) if with_taxon_name == false query = query.joins(:taxon_name).where(otus: {name: nil}) if having_taxon_name_only query end |