Class: Autoselect::TaxonName::ColCreator

Inherits:
Object
  • Object
show all
Defined in:
lib/autoselect/taxon_name/col_creator.rb

Overview

Creates TaxonName records (and CoL URI identifiers) from a Catalog of Life selection.

Receives an ordered list of rows (distal → proximal, i.e. kingdom first, target last). Each row represents one name to be created or an existing TaxonName to be used as a parent. Rows with a taxonworks_id are already in the project and serve only as parent anchors.

TODO: Rename to remove abbreviation CatalogueOfLifeCreator

Defined Under Namespace

Classes: CreationError

Constant Summary collapse

COL_BASE_URI =

TODO:

'https://www.catalogueoflife.org/data/taxon/'.freeze
COL_CODE_LOOKUP_MAP =

Maps CoL code strings to the primary lookup constant to use for rank resolution.

{
  'zoological' => :ICZN_LOOKUP,
  'botanical' => :ICN_LOOKUP,
  'bacterial' => :ICNP_LOOKUP,
  'viral' => :ICVCN_LOOKUP
}.freeze

Instance Method Summary collapse

Constructor Details

#initialize(rows:, project_id:, user_id:, col_code: nil) ⇒ ColCreator

Returns a new instance of ColCreator.



49
50
51
52
53
54
# File 'lib/autoselect/taxon_name/col_creator.rb', line 49

def initialize(rows:, project_id:, user_id:, col_code: nil)
  @rows = rows
  @project_id = project_id.to_i
  @user_id = user_id.to_i
  @col_code = col_code.to_s.downcase.presence
end

Instance Method Details

#callHash

Returns { taxon_name_id: Integer, created_ids: Array }.

Returns:

  • (Hash)

    { taxon_name_id: Integer, created_ids: Array }

Raises:

  • (CreationError)

    if any record fails validation; the transaction is fully rolled back.



58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# File 'lib/autoselect/taxon_name/col_creator.rb', line 58

def call
  parent_id = project_root_id
  created_ids = []

  ::ActiveRecord::Base.transaction do
    @rows.each do |row|
      if row[:taxonworks_id].present?
        parent_id = row[:taxonworks_id].to_i
        next
      end

      rank_class = resolve_rank_class(row[:col_rank])

      # Skip rows whose rank is entirely unknown to TaxonWorks — we cannot create a
      # valid Protonym without a rank_class (e.g. CoL-only ranks like 'domain').
      next if rank_class.nil?

      author, year = split_authorship(row[:col_authorship], row[:col_year])

      begin
        tn = ::TaxonName.create!(
          type:                'Protonym',
          name:                row[:col_name],
          parent_id:,
          rank_class:,
          verbatim_author:     author,
          year_of_publication: year,
          project_id:          @project_id,
          by: @user_id,
        )
      rescue ::ActiveRecord::RecordInvalid => e
        raise CreationError.new(
          e.record.errors.full_messages.join(', '),
          col_name: row[:col_name],
          col_id:   row[:col_id]
        )
      end

      if row[:col_id].present?
        begin
          ::Identifier::Global::Uri::ChecklistBank.create!(
            taxon_id:          row[:col_id],
            dataset_id:        row[:dataset_id],
            identifier_object: tn,
            project_id:        @project_id,
            by:                @user_id
          )
        rescue ::ActiveRecord::RecordInvalid => e
          raise CreationError.new(
            "CoL identifier for #{row[:col_name]}: #{e.record.errors.full_messages.join(', ')}",
            col_name: row[:col_name],
            col_id:   row[:col_id]
          )
        end
      end

      created_ids << tn.id
      parent_id = tn.id
    end
  end

  { taxon_name_id: parent_id, created_ids: }
end

#project_root_idObject (private)

Returns the TaxonName id of the project's Root node.



125
126
127
# File 'lib/autoselect/taxon_name/col_creator.rb', line 125

def project_root_id
  ::Project.find(@project_id).root_taxon_name.id
end

#resolve_rank_class(rank_string) ⇒ Object (private)

Maps a human-readable CoL rank string to a TaxonWorks rank_class string.

When @col_code is recognized (e.g. 'botanical' → ICN_LOOKUP), that lookup is tried first so that plant names receive ICN rank classes rather than ICZN ones. Falls back through all four codes in order so that any rank resolves if possible.

Returns nil for unknown ranks; callers should skip creation for those rows.



139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# File 'lib/autoselect/taxon_name/col_creator.rb', line 139

def resolve_rank_class(rank_string)
  return nil if rank_string.blank?
  r = rank_string.to_s.downcase

  primary_key = COL_CODE_LOOKUP_MAP[@col_code]

  if primary_key
    primary_lookup = Object.const_get("::#{primary_key}")
    result = primary_lookup[r]
    return result if result
  end

  # Fall through all codes (excluding the primary already tried)
  fallbacks = [:ICZN_LOOKUP, :ICN_LOOKUP, :ICNP_LOOKUP, :ICVCN_LOOKUP].reject { |k| k == primary_key }
  fallbacks.each do |key|
    result = Object.const_get("::#{key}")[r] # TODO: BAD
    return result if result
  end

  nil
end

#split_authorship(col_authorship, col_year) ⇒ Object (private)

TODO: Needs to be in an isolated library

Splits a CoL authorship string into [verbatim_author, year_of_publication].

TaxonWorks Protonym validates that verbatim_author contains no digits, so the year must be stored separately in year_of_publication.

CoL provides authorship in two forms:

"Linnaeus, 1758"                        → author "Linnaeus", year 1758
"(Chatton, 1925) Whittaker & Margulis, 1978" → author "Chatton", year 1925
(basionym takes precedence when no explicit col_year is given)

An explicit col_year (from the target row's combinationAuthorship.year) takes precedence. If no year can be extracted the author string is returned as-is with year nil.



176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
# File 'lib/autoselect/taxon_name/col_creator.rb', line 176

def split_authorship(col_authorship, col_year)
  return [nil, nil] if col_authorship.blank?

  # Prefer the explicit year already extracted server-side (present for the target row).
  year = col_year.presence&.to_i

  # When authorship leads with "(Basionym Author, YYYY)" and no explicit year was
  # supplied, take author and year from the basionym parenthetical — not the trailing
  # combination authorship.
  if year.nil? && (m = col_authorship.match(/\A\(([^)]+),\s*(\d{4})\)/))
    basionym_author = m[1].gsub(/,?\s*\b\d{4}\b/, '').strip
    basionym_author = nil if basionym_author.blank?
    return [basionym_author, m[2].to_i]
  end

  # Strip any trailing ", YYYY" or " YYYY" from the authorship string.
  # Also strip years embedded inside parenthetical groups, e.g. "(Chatton, 1925)".
  author = col_authorship
    .gsub(/,?\s*\b\d{4}\b/, '')   # remove ", 1925" and " 1925" occurrences
    .gsub(/\(\s*\)/, '')          # clean up empty parens left behind
    .gsub(/,\s*\z/, '')           # strip trailing comma
    .strip
    .then { |a| a.match?(/\A\([^)]*\)\z/) ? a : a.gsub(/\([^)]*\)\s*/, '').strip }

  # If no explicit year was given, extract the last 4-digit year from the string
  # (the combination year, not the basionym year in parentheses).
  if year.nil?
    if (m = col_authorship.scan(/\b(\d{4})\b/).last)
      year = m[0].to_i
    end
  end

  author = nil if author.blank?
  [author, year]
end