Class: Queries::Otu::Autocomplete
- Inherits:
-
Query::Autocomplete
- Object
- Query
- Query::Autocomplete
- Queries::Otu::Autocomplete
- Defined in:
- lib/queries/otu/autocomplete.rb
Overview
See Query::Autocomplete for optimization strategy per name. There are 4 classes of name, each which has the same strategy: OTU name, Original TaxonName, TaxonName, CommonName We then apply a global priority pulling the best names from each sub-strategy to the top.
Constant Summary collapse
- QUERIES =
Keys are method names. Existence of method is checked before requesting the query
{ # OTU autocomplete_taxon_name_hybrid: {priority: 1}, otu_name_exact: {priority: 2}, # Was 1 autocomplete_exact_id: {priority: 2}, autocomplete_identifier_cached_exact: {priority: 3}, otu_name_start_match: {priority: 200}, otu_name_similarity: {priority: 220}, # TaxonName autocomplete_taxon_name: {priority: nil}, # Priority is slotted from 10 .. 20 # These are all approximately covered in the blanket taxon_name autocomplete # taxon_name_name_exact: {priority: 10}, # taxon_name_identifier_exact: {priority: 10}, # taxon_name_name_start_match: {priority: 100}, # taxon_name_name_high_cuttoff: {priority: 200}, # CommonName # These should all be covered/moved to common_name_autocomplete, autocomplete_common_name_exact: {priority: 300}, autocomplete_common_name_like: {priority: 1000} # common_name_identifier_exact: {priority: 10}, # common_name_name_start_match: {priority: 100}, # common_name_name_similarity: {priority: 200}, }.freeze
Instance Attribute Summary collapse
-
#exact ⇒ Boolean
&exact=<"true"|"false"> if 'true' then only #name = query_string results are returned (no fuzzy matching).
-
#having_taxon_name_only ⇒ Object
Boolean, nil true - only return Otus with
name= nil false,nil - no effect. -
#include_common_names ⇒ Object
Boolean, nil true - 'pre-load' common names with otus false/nil - ignored.
-
#include_taxon_name ⇒ Object
Only applied pertinent to the TaxonName autocomplete Only applied pertinent to the TaxonName autocomplete.
-
#with_taxon_name ⇒ Object
Boolean, nil true - OTU must have taxon name false - OTU must not have taxon name nil - ignored.
Attributes inherited from Query::Autocomplete
#dynamic_limit, #project_id, #query_string
Attributes inherited from Query
Instance Method Summary collapse
-
#api_autocomplete ⇒ Object
DEPRECATED Maintains valid_taxon_name_id needed for API.
-
#api_autocomplete_extended ⇒ Array
An autocomplete result that permits displaying the TaxonName as originally matched.
- #autocomplete ⇒ Object
- #autocomplete_base(targets = QUERIES) ⇒ Object
-
#autocomplete_taxon_name ⇒ Scope
Pull the result of a TaxonName autocomplete.
- #autocomplete_taxon_name_extended ⇒ Object
-
#autocomplete_taxon_name_hybrid ⇒ Object
For names like Tapinoma CASC_2231.
- #base_query ⇒ Object
-
#compact_priorities(otus) ⇒ Object
Doesn't work for extended, as we can have the same OTU with different labels.
-
#initialize(string, project_id: nil, having_taxon_name_only: false, with_taxon_name: nil, exact: 'false', include_common_names: false, include_taxon_name: false) ⇒ Autocomplete
constructor
A new instance of Autocomplete.
- #otu_name_exact ⇒ Object
-
#otu_name_similarity ⇒ Object
All records that meet the similarity cuttoff - this is intended as a generic replacement for wildcarded results.
- #otu_name_start_match ⇒ Object
- #scope_autocomplete(query) ⇒ Object
Methods inherited from Query::Autocomplete
#autocomplete_cached, #autocomplete_cached_wildcard_anywhere, #autocomplete_common_name_exact, #autocomplete_common_name_like, #autocomplete_exact_id, #autocomplete_exactly_named, #autocomplete_named, #autocomplete_ordered_wildcard_pieces_in_cached, #cached_facet, #combine_or_clauses, #common_name_name, #common_name_table, #common_name_wild_pieces, #exactly_named, #fragments, #integers, #least_levenshtein, #match_wildcard_end_in_cached, #match_wildcard_in_cached, #named, #only_ids, #only_integers?, #parent, #parent_child_join, #parent_child_where, #pieces, #safe_integers, #scope, #string_fragments, #wildcard_wrapped_integers, #wildcard_wrapped_years, #with_cached, #with_cached_like, #with_id, #with_project_id, #year_letter, #years
Methods inherited from Query
#alphabetic_strings, #alphanumeric_strings, base_name, #base_name, #build_terms, #cached_facet, #end_wildcard, #levenshtein_distance, #match_ordered_wildcard_pieces_in_cached, #no_terms?, referenced_klass, #referenced_klass, #referenced_klass_except, #referenced_klass_intersection, #referenced_klass_union, #start_and_end_wildcard, #start_wildcard, #table, #wildcard_pieces
Constructor Details
#initialize(string, project_id: nil, having_taxon_name_only: false, with_taxon_name: nil, exact: 'false', include_common_names: false, include_taxon_name: false) ⇒ Autocomplete
Returns a new instance of Autocomplete.
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# File 'lib/queries/otu/autocomplete.rb', line 70 def initialize( string, project_id: nil, having_taxon_name_only: false, with_taxon_name: nil, exact: 'false', include_common_names: false, include_taxon_name: false ) super(string, project_id:) @having_taxon_name_only = boolean_param({having_taxon_name_only:}, :having_taxon_name_only) @with_taxon_name = boolean_param({with_taxon_name:}, :with_taxon_name) # TODO: move to mode @exact = boolean_param({exact:}, :exact) @include_common_names = boolean_param({include_common_names:}, :include_common_names) @include_taxon_name = boolean_param({include_taxon_name:}, :include_taxon_name) end |
Instance Attribute Details
#exact ⇒ Boolean
Returns &exact=<"true"|"false"> if 'true' then only #name = query_string results are returned (no fuzzy matching).
27 28 29 |
# File 'lib/queries/otu/autocomplete.rb', line 27 def exact @exact end |
#having_taxon_name_only ⇒ Object
Returns Boolean, nil
true - only return Otus with name = nil
false,nil - no effect.
16 17 18 |
# File 'lib/queries/otu/autocomplete.rb', line 16 def having_taxon_name_only @having_taxon_name_only end |
#include_common_names ⇒ Object
Returns Boolean, nil true - 'pre-load' common names with otus false/nil - ignored.
32 33 34 |
# File 'lib/queries/otu/autocomplete.rb', line 32 def include_common_names @include_common_names end |
#include_taxon_name ⇒ Object
Only applied pertinent to the TaxonName autocomplete Only applied pertinent to the TaxonName autocomplete
40 41 42 |
# File 'lib/queries/otu/autocomplete.rb', line 40 def include_taxon_name @include_taxon_name end |
#with_taxon_name ⇒ Object
Returns Boolean, nil true - OTU must have taxon name false - OTU must not have taxon name nil - ignored.
22 23 24 |
# File 'lib/queries/otu/autocomplete.rb', line 22 def with_taxon_name @with_taxon_name end |
Instance Method Details
#api_autocomplete ⇒ Object
DEPRECATED Maintains valid_taxon_name_id needed for API.
Considerations:
otus -> taxon names -> valid taxon name_id <- otu can return more OTUs than the original query
because there can be multiple OTUs for the valid name of an invalid original result.
right now we pick the first valid OTU for the name with distinct on()
154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
# File 'lib/queries/otu/autocomplete.rb', line 154 def api_autocomplete @with_taxon_name = true # This limit() has more impact now. Since all # names are loaded large matches can swamp exact names # before priority ordering is applied. May require tuning. otus = compact_priorities( autocomplete_base.limit(30) ) otu_order = otus.map(&:id).uniq f = ::Otu.where(id: otu_order) .joins('left join taxon_names t1 on otus.taxon_name_id = t1.id') .joins('left join otus o2 on t1.cached_valid_taxon_name_id = o2.taxon_name_id') .select('distinct on (otus.id) otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) as otu_valid_id') f.sort_by.with_index { |item, idx| [(otu_order.index(item.id) || 9999), (idx || 9999)] } end |
#api_autocomplete_extended ⇒ Array
An autocomplete result that permits displaying the TaxonName as originally matched. Note that otu: is really only useful when displaying otus without &having_taxon_name_only=true. We don't, for example make use of this element there.
228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 |
# File 'lib/queries/otu/autocomplete.rb', line 228 def api_autocomplete_extended otu_queries = QUERIES.dup otu_queries.delete(:autocomplete_taxon_name) base_otus = autocomplete_base(otu_queries).limit(30) taxon_name_otus = autocomplete_taxon_name_extended r = [] base_otus.each do |o| r.push({ otu: o, # contains priority label_target: o }) end taxon_name_otus.each do |o| r.push({ otu: o, label_target: (o.label_target_taxon_name_id ? ::TaxonName.find(o.label_target_taxon_name_id) : o.taxon_name ) # is o.taxon_name true?! }) end # Keep a unique set of otu + label (to render) seen = Set.new # The compacted result compact = [] r.each do |h| g = h[:label_target].id.to_s + h[:label_target].class.name m = [ h[:otu].id, g ] next if seen.include?( m ) seen << m compact.push h end compact.sort!{|c,d| (c[:otu].priority || 999) <=> (d[:otu].priority || 999 )} # TODO: Refactor to remove extra query and assignment of otu_valid_id. This is ugly. otu_order = compact.collect{|d| d[:otu].id} # Extra query is painful. f = ::Otu.where(id: otu_order) .joins('left join taxon_names t1 on otus.taxon_name_id = t1.id') # .joins('left join otus o2 on t1.cached_valid_taxon_name_id = o2.taxon_name_id') .joins('left join otus o2 on t1.cached_valid_taxon_name_id = o2.taxon_name_id and o2.taxon_name_id <> otus.taxon_name_id') # See https://github.com/sfg-taxonpages/orthoptera/issues/90 .select('distinct on (otus.id) otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) as otu_valid_id') compact.each do |h| h[:otu_valid_id] = f.select{|j| j.id == h[:otu].id}.first.otu_valid_id end compact end |
#autocomplete ⇒ Object
301 302 303 |
# File 'lib/queries/otu/autocomplete.rb', line 301 def autocomplete compact_priorities( autocomplete_base.limit(40) ) end |
#autocomplete_base(targets = QUERIES) ⇒ Object
305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 |
# File 'lib/queries/otu/autocomplete.rb', line 305 def autocomplete_base(targets = QUERIES) queries = [] targets.each do |q, p| if self.respond_to?(q) a = send(q) next if a.nil? # query has returned nil y = p[:priority] a = scope_autocomplete(a) a = a.select("otus.*, #{y} as priority") unless y.nil? queries.push a end end queries.compact! q = referenced_klass_union(queries).order('priority') q = include_common_names ? q.includes(:common_names) : q q = include_taxon_name ? q.includes(:taxon_name) : q q end |
#autocomplete_taxon_name ⇒ Scope
Returns Pull the result of a TaxonName autocomplete. Maintain the order returned, and re-cast the result in terms of an OTU query. Expensive but maintaining order is key.
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
# File 'lib/queries/otu/autocomplete.rb', line 130 def autocomplete_taxon_name taxon_names = Queries::TaxonName::Autocomplete.new(query_string, exact:, project_id:).autocomplete # an array, not a query ids = taxon_names.collect{|n| n.is_combination? ? n.cached_valid_taxon_name_id : n.id} # TODO: Experiment with :cached_valid_taxon_name_id) # We assume we want to land on Valid OTUs, but see # return nil if ids.empty? min = 10.0 max = 20.0 scale = (max - min) / ids.count.to_f # TODO: optimize * base_query.select("otus.*, ((#{min} + row_number() OVER ())::float * #{scale}) as priority") # small incrementing numbers for priority .joins("INNER JOIN ( SELECT unnest(ARRAY[#{ids.join(',')}]) AS id, row_number() OVER () AS row_num ) AS id_order ON otus.taxon_name_id = id_order.id") .order('id_order.row_num') end |
#autocomplete_taxon_name_extended ⇒ Object
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 |
# File 'lib/queries/otu/autocomplete.rb', line 172 def autocomplete_taxon_name_extended taxon_names = Queries::TaxonName::Autocomplete.new(query_string, exact:, project_id:).autocomplete # an array, not a query ids = taxon_names.collect{|n| [ (n.is_combination? ? n.cached_valid_taxon_name_id : n.id), # Points to the OTU target, if there is one n.id, # points to the label target ] } return ::Otu.none if ids.empty? ids.uniq! min = 10.0 max = 20.0 scale = (max - min) / ids.count.to_f # TODO: optimize * otus = base_query .select(<<~SQL.squish) .joins(<<~SQL.squish) INNER JOIN ( SELECT unnest(ARRAY[#{ids.map(&:first).join(',')}]) AS id, unnest(ARRAY[#{ids.map(&:last).join(',')}]) AS label_target_taxon_name_id, row_number() OVER () AS row_num ) AS id_order ON otus.taxon_name_id = id_order.id SQL .order('id_order.row_num') otus = scope_autocomplete(otus) # We could currently get away with using .includes here, but if we were # to ever filter or group `otus` on a non-otu table like id_order then # .includes would do a join on the associated table below and we could # get duplicate otu.id result rows that would be de-duplicated by rails, # losing vital (non-dup) id_order info. So just always do preload here # instead. otus = include_taxon_name ? otus.preload(:taxon_name) : otus otus = include_common_names ? otus.preload(:common_names) : otus otus end |
#autocomplete_taxon_name_hybrid ⇒ Object
For names like Tapinoma CASC_2231
116 117 118 119 120 121 122 123 124 125 |
# File 'lib/queries/otu/autocomplete.rb', line 116 def autocomplete_taxon_name_hybrid if terms.length == 2 base_query .joins(:taxon_name) .where('taxon_names.cached % ? AND otus.name % ?', terms.first, terms.second) .order('taxon_names.cached, otus.name, length(taxon_names.cached), length(otus.name)') else nil end end |
#base_query ⇒ Object
87 88 89 90 91 |
# File 'lib/queries/otu/autocomplete.rb', line 87 def base_query q = ::Otu.all q = q.where(project_id:) if project_id.any? # TODO: this needs to be a wrapping layer check, not here q end |
#compact_priorities(otus) ⇒ Object
Doesn't work for extended, as we can have the same OTU with different labels
287 288 289 290 291 292 293 294 295 296 297 298 299 |
# File 'lib/queries/otu/autocomplete.rb', line 287 def compact_priorities(otus) # Mmmmarg! # We may have the same name at different priorities, strike all but the highest/first. r = [] i = {} otus.each do |o| next if i[o.id] r.push o i[o.id] = true end r end |
#otu_name_exact ⇒ Object
93 94 95 |
# File 'lib/queries/otu/autocomplete.rb', line 93 def otu_name_exact base_query.where(otus: {name: query_string}) end |
#otu_name_similarity ⇒ Object
All records that meet the similarity cuttoff
- this is intended as a generic replacement for wildcarded results
Observations:
- was similarity(), experimenting with word_similarity
- 3 letter matches are going to be low probability, matches kick in at 4
108 109 110 111 112 113 |
# File 'lib/queries/otu/autocomplete.rb', line 108 def otu_name_similarity base_query .where('otus.name % ?', query_string) .where( ApplicationRecord.sanitize_sql_array(["word_similarity('%s', otus.name) > 0.33", query_string])) .order('otus.name, length(otus.name)') end |
#otu_name_start_match ⇒ Object
97 98 99 |
# File 'lib/queries/otu/autocomplete.rb', line 97 def otu_name_start_match base_query.where('otus.name ilike ?', query_string + '%') end |
#scope_autocomplete(query) ⇒ Object
334 335 336 337 338 339 |
# File 'lib/queries/otu/autocomplete.rb', line 334 def scope_autocomplete(query) query = query.joins(:taxon_name) if with_taxon_name query = query.where.missing(:taxon_name) if with_taxon_name == false query = query.where(otus: {name: nil}) if having_taxon_name_only query end |