Module: Vendor::Colrapi
- Defined in:
- lib/vendor/colrapi.rb
Overview
A middle-layer wrapper between Colrapi and TaxonWorks
Constant Summary collapse
- DATASETS =
{ col: '3LR', # The Human edited compilation col_extended: '3LXR' # Human plus algorithmic extensions }.freeze
Class Method Summary collapse
-
.align_classification(taxonworks_object, colrapi_result) ⇒ Array
2 row alignment facilitator.
-
.ancestors(taxon_id) ⇒ Array<Hash>
Returns the ancestor classification chain for a CoL taxon.
-
.ancestors_via_parent_id(dataset_id, taxon_id, max_depth: 20) ⇒ Array<Hash>
Builds an ancestor chain for external/denormed datasets by following the parentId field of successive taxon records.
-
.build_extension(col_result, project_id, dataset_id: nil) ⇒ Hash
Builds an alignment hash comparing a CoL nameusage result against TaxonNames in the project.
-
.col_backbone_dataset?(dataset_id) ⇒ Boolean
Returns true when dataset_id refers to one of the CoL backbone datasets that support the classification subresource for ancestor retrieval.
-
.col_rank_sort(rank_name, col_code) ⇒ Object
Maps a CoL rank name ('genus', 'family', …) and CoL nomenclatural code ('zoological', 'botanical', 'bacterial', 'viral') to the TaxonWorks RANK_SORT index.
-
.collection_object_scientific_name(collection_object) ⇒ Object
Extend to buffered with GNA in middle layer? Text only, taxon name cached or OTU name for the most recent determination.
-
.datasets(q:, limit: 20) ⇒ Array<Hash>
Searches CoL datasets by name string.
-
.extract_subgenus_name(name) ⇒ Object
Subgenus names in CoL classification arrive as "Genus (Subgenus)" combinations.
-
.name_status(taxonworks_object, colrapi_result) ⇒ Object
}, accepted: {} } ] }.
-
.search(name_string, dataset_id: nil) ⇒ Hash
Searches the Catalogue of Life by name string.
-
.uninomial_name(name_hash) ⇒ String?
Returns the single-word name component suitable for storing as a TaxonWorks Protonym name.
Class Method Details
.align_classification(taxonworks_object, colrapi_result) ⇒ Array
2 row alignment facilitator
27 28 29 |
# File 'lib/vendor/colrapi.rb', line 27 def self.align_classification(taxonworks_object, colrapi_result) r = [] end |
.ancestors(taxon_id) ⇒ Array<Hash>
Returns the ancestor classification chain for a CoL taxon.
Uses Colrapi.taxon with subresource: 'classification'. Response is an Array of hashes with keys: 'id', 'name' (String, not hash), 'authorship', 'rank', 'label', 'labelHtml'.
Only valid for backbone datasets (DATASETS, DATASETS). For external datasets use ancestors_via_parent_id instead.
150 151 152 153 154 155 |
# File 'lib/vendor/colrapi.rb', line 150 def self.ancestors(taxon_id) ::Colrapi.taxon(DATASETS[:col], taxon_id: taxon_id, subresource: 'classification') rescue => e Rails.logger.warn "Vendor::Colrapi.ancestors error: #{e.}" [] end |
.ancestors_via_parent_id(dataset_id, taxon_id, max_depth: 20) ⇒ Array<Hash>
Builds an ancestor chain for external/denormed datasets by following the parentId field of successive taxon records.
External datasets (like the Mammal Diversity Database) are ingested into ChecklistBank without a pre-built classification subresource. Instead, each nameusage record carries a parentId pointing to the immediate parent within the same dataset.
Returns entries in the same format as the classification subresource used by ancestors():
{ 'id', 'name' (String uninomial), 'rank', 'authorship', 'label', 'labelHtml' }
Order is proximal-first (immediate parent first) matching ancestors() behavior. The starting taxon itself is NOT included; only its ancestors are.
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 |
# File 'lib/vendor/colrapi.rb', line 184 def self.ancestors_via_parent_id(dataset_id, taxon_id, max_depth: 20) chain = [] visited = Set.new initial = ::Colrapi.taxon(dataset_id, taxon_id: taxon_id) return chain if initial.blank? current_id = initial['parentId'] max_depth.times do break if current_id.blank? || visited.include?(current_id) visited << current_id taxon = ::Colrapi.taxon(dataset_id, taxon_id: current_id) break if taxon.blank? chain << { 'id' => current_id, 'name' => uninomial_name(taxon['name']).to_s, 'rank' => taxon.dig('name', 'rank'), 'authorship' => taxon.dig('name', 'authorship'), 'label' => taxon.fetch('label', '').to_s, 'labelHtml' => taxon.fetch('labelHtml', '').to_s } current_id = taxon['parentId'] end # Return distal-first (kingdom before genus) to match the classification subresource # order returned by ancestors(), so build_extension can treat both paths uniformly. chain.reverse rescue => e Rails.logger.warn "Vendor::Colrapi.ancestors_via_parent_id error: #{e.}" [] end |
.build_extension(col_result, project_id, dataset_id: nil) ⇒ Hash
Builds an alignment hash comparing a CoL nameusage result against TaxonNames in the project.
col_result is a flat nameusage hash as returned by search (no 'usage' wrapper):
{ 'id' => '6MB3T', 'status' => 'accepted',
'name' => { 'scientificName' => 'Homo sapiens', 'rank' => 'species',
'authorship' => 'Linnaeus, 1758',
'combinationAuthorship' => { 'authors' => [...], 'year' => '1758' } },
'label' => 'Homo sapiens Linnaeus, 1758', … }
Classification entries from ancestors() have:
{ 'id' => '636X2', 'name' => 'Homo', 'rank' => 'genus', 'label' => 'Homo', … }
Note: in classification entries 'name' is a plain String, not a hash.
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 |
# File 'lib/vendor/colrapi.rb', line 239 def self.build_extension(col_result, project_id, dataset_id: nil) col_key = col_result['id'] col_name = uninomial_name(col_result['name']) col_rank = col_result.dig('name', 'rank')&.downcase col_name = extract_subgenus_name(col_name) if col_rank == 'subgenus' col_status = col_result['status'] = col_result.dig('name', 'authorship') col_year = col_result.dig('name', 'combinationAuthorship', 'year') || col_result.dig('name', 'basionymOrCombinationAuthorship', 'year') # CoL nomenclatural code: 'zoological', 'botanical', 'bacterial', 'viral' col_code = col_result.dig('name', 'code') # Dataset used for the search (target row). col_dataset_id = dataset_id.presence || DATASETS[:col] # Backbone datasets (main CoL, extended CoL) expose a classification subresource. # External/denormed datasets must be traversed via iterative parentId lookups. # Ancestor records carry the dataset_id of whichever source they came from. # # For synonyms with a single accepted target, CoL attaches the synonym under its accepted # name in the tree, so fetching classification via the synonym's own ID returns the accepted # name in the chain. Use the accepted name's ID for the lookup instead, then strip it out # by ID (CoL's classification endpoint includes the queried taxon itself as the most # proximal entry). # Ambiguous synonyms have no single accepted target, so we look up via their own ID. non_accepted = col_status.present? && col_status != 'accepted' accepted_id = col_status == 'synonym' ? col_result.dig('accepted', 'id').presence : nil ancestor_lookup_key = accepted_id || col_key ancestor_chain, ancestor_dataset_id = if ancestor_lookup_key.present? if col_backbone_dataset?(col_dataset_id) [ancestors(ancestor_lookup_key), DATASETS[:col]] else [ancestors_via_parent_id(col_dataset_id, ancestor_lookup_key), col_dataset_id] end else [[], col_dataset_id] end ancestor_chain = ancestor_chain.reject { |a| a['id'] == accepted_id } if accepted_id # For any non-accepted name, strip ancestors at or below the name's own rank. # CoL places non-accepted names under their accepted name, so the accepted name's # classification chain can include same- or lower-ranked entries that are not valid # parents of the queried name (e.g. a genus synonym whose accepted name is a subgenus). if non_accepted && col_rank.present? target_sort = col_rank_sort(col_rank, col_code) if target_sort ancestor_chain = ancestor_chain.reject { |a| anc_sort = col_rank_sort(a['rank']&.downcase, col_code) anc_sort && anc_sort >= target_sort } end end # Drop suprakingdom ranks (e.g. 'domain') that have no equivalent in TaxonWorks # nomenclatural codes. Kingdom is the highest rank we include. # CoL classification returns proximal→distal (immediate parent first); reverse to kingdom-first. ancestor_chain = ancestor_chain.reject { |a| a['rank']&.downcase == 'domain' }.reverse alignment = ancestor_chain.map do |ancestor| rank = ancestor['rank']&.downcase # In classification entries 'name' is a plain String (the uninomial name) anc_name = ancestor['name'].is_a?(String) ? ancestor['name'] : ancestor.dig('name', 'scientificName') anc_name = extract_subgenus_name(anc_name) if rank == 'subgenus' col_id = ancestor['id'] scope = ::TaxonName.where(cached: anc_name) # !!! scope = scope.where(project_id:) if project_id.present? tw_record = scope.first { rank:, col_name: anc_name, col_id:, dataset_id: ancestor_dataset_id, col_authorship: ancestor['authorship'].presence, taxonworks_id: tw_record&.id, taxonworks_name: tw_record&.cached, match: tw_record ? 'exact' : 'none' } end { col_key:, col_name:, col_status:, col_authorship:, col_year:, col_rank:, col_code:, col_dataset_id:, alignment: } end |
.col_backbone_dataset?(dataset_id) ⇒ Boolean
Returns true when dataset_id refers to one of the CoL backbone datasets that support the classification subresource for ancestor retrieval. External/denormed datasets (e.g. Mammal Diversity Database, dataset 9802) do not have this subresource and require iterative parentId traversal instead.
164 165 166 |
# File 'lib/vendor/colrapi.rb', line 164 def self.col_backbone_dataset?(dataset_id) DATASETS.values.include?(dataset_id.to_s) end |
.col_rank_sort(rank_name, col_code) ⇒ Object
Maps a CoL rank name ('genus', 'family', …) and CoL nomenclatural code ('zoological', 'botanical', 'bacterial', 'viral') to the TaxonWorks RANK_SORT index. Higher index = more specific rank. Returns nil when unresolvable.
352 353 354 355 356 357 358 359 360 361 362 363 |
# File 'lib/vendor/colrapi.rb', line 352 def self.col_rank_sort(rank_name, col_code) return nil if rank_name.blank? lookup = case col_code when 'zoological' then ::ICZN_LOOKUP when 'botanical' then ::ICN_LOOKUP when 'bacterial' then ::ICNP_LOOKUP when 'viral' then ::ICVCN_LOOKUP else ::ICZN_LOOKUP end rank_class = lookup[rank_name] rank_class ? ::RANK_SORT[rank_class] : nil end |
.collection_object_scientific_name(collection_object) ⇒ Object
Extend to buffered with GNA in middle layer? Text only, taxon name cached or OTU name for the most recent determination
368 369 370 371 372 373 374 375 376 377 378 379 |
# File 'lib/vendor/colrapi.rb', line 368 def self.collection_object_scientific_name(collection_object) return nil if collection_object.nil? if a = collection_object.taxon_determinations.order(:position)&.first if a.otu.taxon_name a.otu.taxon_name.cached else a.otu.name end else nil end end |
.datasets(q:, limit: 20) ⇒ Array<Hash>
Searches CoL datasets by name string.
Returns an array of dataset summaries, each containing at least 'id', 'title', and 'alias'. Used by the preferences UI to let users pick a target dataset.
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
# File 'lib/vendor/colrapi.rb', line 112 def self.datasets(q:, limit: 20) # Text search by title/name. text_results = begin result = ::Colrapi.dataset(q: q, limit: limit) (result['result'] || []).map { |d| { 'id' => d['key'].to_s, 'title' => d['title'], 'alias' => d['alias'] } } rescue => e Rails.logger.warn "Vendor::Colrapi.datasets text search error: #{e.}" [] end # Direct lookup by dataset ID — `q` may itself be a key like '3LXR'. # Colrapi.dataset(dataset_id:) returns a single hash, not a paged result. direct_hit = begin d = ::Colrapi.dataset(dataset_id: q) d.is_a?(Hash) && d['key'].present? ? { 'id' => d['key'].to_s, 'title' => d['title'], 'alias' => d['alias'] } : nil rescue nil end seen = {} [direct_hit, *text_results].compact.each_with_object([]) do |d, arr| next if seen[d['id']] seen[d['id']] = true arr << d end end |
.extract_subgenus_name(name) ⇒ Object
Subgenus names in CoL classification arrive as "Genus (Subgenus)" combinations. Extract just the subgenus epithet from inside the parentheses when present.
344 345 346 347 |
# File 'lib/vendor/colrapi.rb', line 344 def self.extract_subgenus_name(name) return name if name.nil? name[/\(([^)]+)\)/, 1] || name end |
.name_status(taxonworks_object, colrapi_result) ⇒ Object
}, accepted: {} } ] }
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/vendor/colrapi.rb', line 46 def self.name_status(taxonworks_object, colrapi_result) o = taxonworks_object r = { taxonworks_name: collection_object_scientific_name(o), col_usages: [], provisional_status: :accepted, } if colrapi_result.dig('total') == 0 r[:provisional_status] = :undeterminable return r end colrapi_result['result'].each do |u| i = u['usage'] d = { usage: {}, accepted: {} } d[:usage][:name] = i.dig *%w{name scientificName} d[:usage][:status] = i['status'] if i['accepted'] d[:accepted][:name] = i.dig *%w{accepted name scientificName} d[:accepted][:status] = i.dig *%w{accepted status} end if d[:usage][:status] == 'synonym' && (d[:usage][:name] == r[:taxonworks_name]) r[:provisional_status] = :synonym end r[:col_usages].push d end r end |
.search(name_string, dataset_id: nil) ⇒ Hash
Searches the Catalogue of Life by name string.
The Colrapi gem takes dataset_id as a positional first argument. Response structure: { 'total' => Integer, 'result' => Array } Each result entry is a flat nameusage hash with keys:
'id', 'status', 'name' (hash with 'scientificName', 'rank', 'authorship', …),
'label', 'labelHtml', 'parentId', etc.
96 97 98 99 100 101 102 |
# File 'lib/vendor/colrapi.rb', line 96 def self.search(name_string, dataset_id: nil) target = dataset_id.presence || DATASETS[:col] ::Colrapi.nameusage(target, q: name_string, limit: 20) rescue => e Rails.logger.warn "Vendor::Colrapi.search error: #{e.}" { 'total' => 0, 'result' => [] } end |
.uninomial_name(name_hash) ⇒ String?
Returns the single-word name component suitable for storing as a TaxonWorks Protonym name. CoL's scientificName is the full combination (e.g. "Homo sapiens"), but TaxonWorks Protonym requires just the uninomial or epithet. Priority: specificEpithet (species) > infraspecificEpithet (infra) > uninomial (higher) > scientificName fallback.
334 335 336 337 338 339 340 |
# File 'lib/vendor/colrapi.rb', line 334 def self.uninomial_name(name_hash) return nil if name_hash.nil? name_hash['infraspecificEpithet'].presence || name_hash['specificEpithet'].presence || name_hash['uninomial'].presence || name_hash['scientificName'] end |