Class: Autoselect::TaxonName::ColCreator
- Inherits:
-
Object
- Object
- Autoselect::TaxonName::ColCreator
- Defined in:
- lib/autoselect/taxon_name/col_creator.rb
Overview
Creates TaxonName records (and CoL URI identifiers) from a Catalog of Life selection.
Receives an ordered list of rows (distal → proximal, i.e. kingdom first, target last). Each row represents one name to be created or an existing TaxonName to be used as a parent. Rows with a taxonworks_id are already in the project and serve only as parent anchors.
TODO: Rename to remove abbreviation CatalogueOfLifeCreator
Defined Under Namespace
Classes: CreationError
Constant Summary collapse
- COL_BASE_URI =
TODO:
'https://www.catalogueoflife.org/data/taxon/'.freeze
- COL_CODE_LOOKUP_MAP =
Maps CoL code strings to the primary lookup constant to use for rank resolution.
{ 'zoological' => :ICZN_LOOKUP, 'botanical' => :ICN_LOOKUP, 'bacterial' => :ICNP_LOOKUP, 'viral' => :ICVCN_LOOKUP }.freeze
Instance Method Summary collapse
-
#call ⇒ Hash
{ taxon_name_id: Integer, created_ids: Array
}. -
#initialize(rows:, project_id:, user_id:, col_code: nil) ⇒ ColCreator
constructor
A new instance of ColCreator.
-
#project_root_id ⇒ Object
private
Returns the TaxonName id of the project's Root node.
-
#resolve_rank_class(rank_string) ⇒ Object
private
Maps a human-readable CoL rank string to a TaxonWorks rank_class string.
-
#split_authorship(col_authorship, col_year) ⇒ Object
private
TODO: Needs to be in an isolated library.
Constructor Details
#initialize(rows:, project_id:, user_id:, col_code: nil) ⇒ ColCreator
Returns a new instance of ColCreator.
49 50 51 52 53 54 |
# File 'lib/autoselect/taxon_name/col_creator.rb', line 49 def initialize(rows:, project_id:, user_id:, col_code: nil) @rows = rows @project_id = project_id.to_i @user_id = user_id.to_i @col_code = col_code.to_s.downcase.presence end |
Instance Method Details
#call ⇒ Hash
Returns { taxon_name_id: Integer, created_ids: Array
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/autoselect/taxon_name/col_creator.rb', line 58 def call parent_id = project_root_id created_ids = [] ::ActiveRecord::Base.transaction do @rows.each do |row| if row[:taxonworks_id].present? parent_id = row[:taxonworks_id].to_i next end rank_class = resolve_rank_class(row[:col_rank]) # Skip rows whose rank is entirely unknown to TaxonWorks — we cannot create a # valid Protonym without a rank_class (e.g. CoL-only ranks like 'domain'). next if rank_class.nil? , year = (row[:col_authorship], row[:col_year]) begin tn = ::TaxonName.create!( type: 'Protonym', name: row[:col_name], parent_id:, rank_class:, verbatim_author: , year_of_publication: year, project_id: @project_id, by: @user_id, ) rescue ::ActiveRecord::RecordInvalid => e raise CreationError.new( e.record.errors..join(', '), col_name: row[:col_name], col_id: row[:col_id] ) end if row[:col_id].present? begin ::Identifier::Global::Uri::ChecklistBank.create!( taxon_id: row[:col_id], dataset_id: row[:dataset_id], identifier_object: tn, project_id: @project_id, by: @user_id ) rescue ::ActiveRecord::RecordInvalid => e raise CreationError.new( "CoL identifier for #{row[:col_name]}: #{e.record.errors..join(', ')}", col_name: row[:col_name], col_id: row[:col_id] ) end end created_ids << tn.id parent_id = tn.id end end { taxon_name_id: parent_id, created_ids: } end |
#project_root_id ⇒ Object (private)
Returns the TaxonName id of the project's Root node.
125 126 127 |
# File 'lib/autoselect/taxon_name/col_creator.rb', line 125 def project_root_id ::Project.find(@project_id).root_taxon_name.id end |
#resolve_rank_class(rank_string) ⇒ Object (private)
Maps a human-readable CoL rank string to a TaxonWorks rank_class string.
When @col_code is recognized (e.g. 'botanical' → ICN_LOOKUP), that lookup is tried first so that plant names receive ICN rank classes rather than ICZN ones. Falls back through all four codes in order so that any rank resolves if possible.
Returns nil for unknown ranks; callers should skip creation for those rows.
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
# File 'lib/autoselect/taxon_name/col_creator.rb', line 139 def resolve_rank_class(rank_string) return nil if rank_string.blank? r = rank_string.to_s.downcase primary_key = COL_CODE_LOOKUP_MAP[@col_code] if primary_key primary_lookup = Object.const_get("::#{primary_key}") result = primary_lookup[r] return result if result end # Fall through all codes (excluding the primary already tried) fallbacks = [:ICZN_LOOKUP, :ICN_LOOKUP, :ICNP_LOOKUP, :ICVCN_LOOKUP].reject { |k| k == primary_key } fallbacks.each do |key| result = Object.const_get("::#{key}")[r] # TODO: BAD return result if result end nil end |
#split_authorship(col_authorship, col_year) ⇒ Object (private)
TODO: Needs to be in an isolated library
Splits a CoL authorship string into [verbatim_author, year_of_publication].
TaxonWorks Protonym validates that verbatim_author contains no digits, so the year must be stored separately in year_of_publication.
CoL provides authorship in two forms:
"Linnaeus, 1758" → author "Linnaeus", year 1758
"(Chatton, 1925) Whittaker & Margulis, 1978" → author "Chatton", year 1925
(basionym takes precedence when no explicit col_year is given)
An explicit col_year (from the target row's combinationAuthorship.year) takes precedence. If no year can be extracted the author string is returned as-is with year nil.
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
# File 'lib/autoselect/taxon_name/col_creator.rb', line 176 def (, col_year) return [nil, nil] if .blank? # Prefer the explicit year already extracted server-side (present for the target row). year = col_year.presence&.to_i # When authorship leads with "(Basionym Author, YYYY)" and no explicit year was # supplied, take author and year from the basionym parenthetical — not the trailing # combination authorship. if year.nil? && (m = .match(/\A\(([^)]+),\s*(\d{4})\)/)) = m[1].gsub(/,?\s*\b\d{4}\b/, '').strip = nil if .blank? return [, m[2].to_i] end # Strip any trailing ", YYYY" or " YYYY" from the authorship string. # Also strip years embedded inside parenthetical groups, e.g. "(Chatton, 1925)". = .gsub(/,?\s*\b\d{4}\b/, '') # remove ", 1925" and " 1925" occurrences .gsub(/\(\s*\)/, '') # clean up empty parens left behind .gsub(/,\s*\z/, '') # strip trailing comma .strip .then { |a| a.match?(/\A\([^)]*\)\z/) ? a : a.gsub(/\([^)]*\)\s*/, '').strip } # If no explicit year was given, extract the last 4-digit year from the string # (the combination year, not the basionym year in parentheses). if year.nil? if (m = .scan(/\b(\d{4})\b/).last) year = m[0].to_i end end = nil if .blank? [, year] end |