Module: Export::Coldp::Files::Taxon
- Defined in:
- lib/export/coldp/files/taxon.rb
Overview
Concepts not mapped:
`namePhrase` - e.g. `sensu lato` this would come from OTU#name
Notes
-
ColDP importer has a normalizing step that recognizes some names no longer point to any OTU
-
CoLDP can not handle assertions that a name that is currently treated as (invalid) was useds as a name (valid) for previously valid concept, i.e. CoL does not track alternative past concept heirarchies
Constant Summary collapse
- IRI_MAP =
{ extinct: 'https://api.checklistbank.org/datapackage#Taxon.extinct', # 1,0 temporal_range_end: 'https://api.checklistbank.org/datapackage#Taxon.temporal_range_end', # from https://api.checklistbank.org/vocab/geotime temporal_range_start: 'https://api.checklistbank.org/datapackage#Taxon.temporal_range_end', # from https://api.checklistbank.org/vocab/geotime lifezone: 'https://api.checklistbank.org/datapackage#Taxon.lifezone', # from https://api.checklistbank.org/vocab/lifezone remarks: 'https://github.com/catalogueoflife/coldp#Taxon.remarks', namePhrase: 'https://github.com/catalogueoflife/coldp#Taxon.namePhrase', link: 'https://api.checklistbank.org/vocab/term/col:link' }.freeze
- SKIPPED_RANKS =
%w{ NomenclaturalRank::Iczn::SpeciesGroup::Superspecies NomenclaturalRank::Iczn::SpeciesGroup::Supersuperspecies }.freeze
Class Method Summary collapse
-
.according_to_date(otu) ⇒ Object
Potentially reference Confidence level confidence_validated_at (last time this confidence level was deemed OK).
-
.according_to_id(otu) ⇒ Object
A reference to the publication of the person who established the taxonomic concept TW has a plurality of sources that reference this concept, it’s a straightforward map It is somewhat unclear how/whether CoL will use this concept.
- .attributes(otus, target) ⇒ Object
- .generate(otu, otus, project_members, reference_csv = nil, prefer_unlabelled_otus = true) ⇒ Object
- .link(link_base_url, otu) ⇒ Object
- .predicate_value(otu, predicate) ⇒ Object
-
.provisional(otu) ⇒ Object
return [Boolean, nil] TODO - reason in TW this is provisional name.
-
.reference_id(sources) ⇒ Object
“supporting the taxonomic concept” Potentially- all other Citations tied to Otu, what exactly supports a concept?.
- .remarks(otu, taxon_remarks_vocab_id) ⇒ Object
-
.scrutinizer(otu) ⇒ Object
TODO: this will be lookups on Confidence loaded into memory The scrutinizer concept is unused at present We’re looking for the canonical implementation of it before we implement/extrapolate from data here.
- .scrutinizer_date(otu) ⇒ Object
-
.scrutinizer_id(otu) ⇒ Object
ORCID version of above.
Class Method Details
.according_to_date(otu) ⇒ Object
Potentially reference
Confidence level
confidence_validated_at (last time this confidence level was deemed OK)
79 80 81 82 83 84 |
# File 'lib/export/coldp/files/taxon.rb', line 79 def self.according_to_date(otu) # a) Dynamic - !! most recent updated_at stamp for *any* OTU tied data -> this is a big grind: if so add cached_touched_on_date to Otu # b) modify Confidence level to include date # c) review what SFs does in their model nil end |
.according_to_id(otu) ⇒ Object
A reference to the publication of the person who established the taxonomic concept
TW has a plurality of sources that reference this concept, it's a straightforward map
It is somewhat unclear how/whether CoL will use this concept
72 73 74 |
# File 'lib/export/coldp/files/taxon.rb', line 72 def self.according_to_id(otu) nil end |
.attributes(otus, target) ⇒ Object
106 107 108 109 110 111 112 113 114 |
# File 'lib/export/coldp/files/taxon.rb', line 106 def self.attributes(otus, target) a = DataAttribute.with(otu_scope: otus) .joins("JOIN otu_scope on data_attributes.attribute_subject_id = otu_scope.id AND data_attributes.attribute_subject_type = 'Otu'") .joins(:predicate) .select("data_attributes.attribute_subject_id, STRING_AGG(data_attributes.value::text, ',') AS #{target}") .where(predicate: { uri: IRI_MAP[target] }) .group('data_attributes.attribute_subject_id') .map{|a| [a.id, a.send(target)]}.to_h end |
.generate(otu, otus, project_members, reference_csv = nil, prefer_unlabelled_otus = true) ⇒ Object
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 |
# File 'lib/export/coldp/files/taxon.rb', line 116 def self.generate(otu, otus, project_members, reference_csv = nil, prefer_unlabelled_otus = true) # Until we have RC5 articulations we are simplifying handling the fact # that one taxon name can be used for many OTUs. Track to see that # an OTU with a given taxon name does not already exist # `taxon_name_id: nil` - unify via Ruby hash keys observed_taxon_name_ids = { } # TODO: optional Taxon.alternativeID field allows inclusion of external identifiers: https://github.com/CatalogueOfLife/coldp#alternativeid-1 https://github.com/CatalogueOfLife/coldp#identifiers # e.g., gbif:2704179,col:6W3C4,BOLD:AAJ2287,wikidata:Q157571 targets = otus .left_joins(:sources) .select("otus.*, STRING_AGG(sources.id::text, ',') AS aggregate_source_ids") .left_joins(:taxon_name) .where("taxon_names.cached NOT LIKE '%SPECIFIED%'") # TODO: likley not doing what we think it is .group('otus.id') attributes = {} # Make one big lookup IRI_MAP.each do |k,v| attributes[k] = attributes(otus, k) end link_base_url = attributes[:link][otu.id] root_otu_id = otu.id parent_id_lookup = Otu.parent_otu_ids(otus, skip_ranks: SKIPPED_RANKS).map{|a| [a.id, a.valid_ancestor_otu_ids&.split(',')&.first&.to_i]}.to_h text = ::CSV.generate(col_sep: "\t") do |csv| csv << %w{ ID parentID nameID namePhrase provisional accordingToID scrutinizer scrutinizerID scrutinizerDate referenceID extinct temporalRangeStart temporalRangeEnd environment link remarks modified modifiedBy } targets.find_each do |o| parent_id = parent_id_lookup[o.id] parent_id = (root_otu_id == o.id ? nil : parent_id ) # TODO: This was excluding OTUs that were being excluded downstream previously # This should never happen now since parent ambiguity is caught above! # can be removed in theory # TODO: remove once RC5 better modelled next if observed_taxon_name_ids[o.taxon_name_id] observed_taxon_name_ids[o.taxon_name_id] = nil # TODO: NOT SPECIFIED is left out no from Name, but not populating a tracking list # # If this is required add it to the `target` scope above # some names are skipped (e.g., if they have NOT SPECIFIED names) csv << [ o.id, # ID (Taxon) parent_id, # parentID (Taxon) o.taxon_name_id, # nameID (Name) attributes[:namePhrase][o.id], # namePhrase nil, # provisional provisional(o) nil, # accordingToID according_to_id(o) nil, # scrutinizer scrutinizer(o) nil, # scrutinizerID scrutinizer_id(o) nil, # scrutizinerDate scrutinizer_date(o) o.aggregate_source_ids, # referenceID attributes[:extinct][o.id], # extinct attributes[:temporal_range_start][o.id], # temporalRangeStart attributes[:temporal_range_end][o.id], # temporalRangeEnd attributes[:lifezone][o.id], # environment (formerly named lifezone) link(link_base_url, o), # link Export::Coldp.sanitize_remarks(attributes[:remarks][o.id]), # remarks Export::Coldp.modified(o[:updated_at]), # modified Export::Coldp.modified_by(o[:updated_by_id], project_members) # modifiedBy ] end end sources = Source.with(name_scope: targets.unscope(:select).select(:id)) .joins(:citations) .joins("JOIN name_scope ns on ns.id = citations.citation_object_id AND citations.citation_object_type = 'Otu'") .distinct Export::Coldp::Files::Reference.add_reference_rows(sources, reference_csv, project_members) if reference_csv text end |
.link(link_base_url, otu) ⇒ Object
86 87 88 |
# File 'lib/export/coldp/files/taxon.rb', line 86 def self.link(link_base_url, otu) link_base_url&.gsub('{id}', otu.id.to_s) unless link_base_url.nil? end |
.predicate_value(otu, predicate) ⇒ Object
28 29 30 31 |
# File 'lib/export/coldp/files/taxon.rb', line 28 def self.predicate_value(otu, predicate) return nil unless IRI_MAP[predicate] otu.data_attributes.joins(:predicate).where(controlled_vocabulary_terms: {uri: IRI_MAP[predicate]}).first&.value end |
.provisional(otu) ⇒ Object
return [Boolean, nil]
TODO - reason in TW this is provisional name
35 36 37 38 39 40 41 42 43 44 45 |
# File 'lib/export/coldp/files/taxon.rb', line 35 def self.provisional(otu) # nomen dubium # incertae sedis # unresolved homonym, without replacement # # # # * if two OTUs for same name are in OTU set then both have to be provisional # * missaplication (?) nil end |
.reference_id(sources) ⇒ Object
“supporting the taxonomic concept” Potentially- all other Citations tied to Otu, what exactly supports a concept?
100 101 102 103 104 |
# File 'lib/export/coldp/files/taxon.rb', line 100 def self.reference_id(sources) i = sources.pluck(:id) return i.join(',') if i.any? nil end |
.remarks(otu, taxon_remarks_vocab_id) ⇒ Object
90 91 92 93 94 95 96 |
# File 'lib/export/coldp/files/taxon.rb', line 90 def self.remarks(otu, taxon_remarks_vocab_id) if !taxon_remarks_vocab_id.nil? && otu.data_attributes.where(controlled_vocabulary_term_id: taxon_remarks_vocab_id).any? otu.data_attributes.where(controlled_vocabulary_term_id: taxon_remarks_vocab_id).pluck(:value).join('|') else nil end end |
.scrutinizer(otu) ⇒ Object
TODO: this will be lookups on Confidence loaded into memory The scrutinizer concept is unused at present We’re looking for the canonical implementation of it before we implement/extrapolate from data here.
* crawl attribution for inference on higher/lower
* UI/methods to assign/spam/visualize throught
* project preference (!! should project preferences has reference ids? !!)
according to is the curator responsible for this OTU, comma delimited list of curators We could also look at time-stamp data to detect “staleness” of an OTU concept
56 57 58 |
# File 'lib/export/coldp/files/taxon.rb', line 56 def self.scrutinizer(otu) nil end |
.scrutinizer_date(otu) ⇒ Object
65 66 67 |
# File 'lib/export/coldp/files/taxon.rb', line 65 def self.scrutinizer_date(otu) nil end |
.scrutinizer_id(otu) ⇒ Object
ORCID version of above
61 62 63 |
# File 'lib/export/coldp/files/taxon.rb', line 61 def self.scrutinizer_id(otu) nil end |