Class: DwcOccurrence

Inherits:
ApplicationRecord show all
Includes:
Housekeeping
Defined in:
app/models/dwc_occurrence.rb

Overview

A Darwin Core Record for the Occurrence core. Field generated from Ruby dwc-meta, which references the same spec that is used in the IPT, and the Dwc Assistant. Each record references a specific CollectionObject, AssertedDistribution, or FieldOccurrence.

Important: This is a cache/index, data here are periodically destroyed and regenerated from multiple tables in TW.

DWC attributes are camelCase to facilitate matching dwcClass is a replacement for the Rails reserved ‘Class’

All DC attributes (attributes not in DwcOccurrence::TW_ATTRIBUTES) in this table are namespaced to dc (“purl.org/dc/terms/”, “rs.tdwg.org/dwc/terms/”)

README:

There is a two part strategy to building the index. 1) An individual record will rebuild on request with `parameter to collection_objects/123/dwc*?build=true`.
2) Wipe, and rebuild on some schedule. It would in theory be possible to track and rebuild when a class of every property was created (or updated), however
this is a lot of overhead to inject/code for a lot of models. It would inject latency at numerous stages that would perhaps impact UI performance.

Several terms are introduced in code:

* ghost - A DwcOccurrence record whose dwc_occurrence object has been destroyed (i.e. an error in cleanup, should ideally never happen)
* stale - an _aproximation_ checking to see that the time of build of related records is _older_ than the current index
* flagged (for rebuild) - a record related to the dwc_occurrence_object(s) has been updated, triggering the need for re-indexing 1 or more records

TODO: The basisOfRecord CVTs are not super informative.

We know collection object is definitely 1:1 with PreservedSpecimen, however
AssertedDistribution could be HumanObservation (if source is person), or ... what? if
its a published record.  Seems we need a 'PublishedAssertation', just like we model the data.

Gotchas.

* updated_at is set by touching the record, not via housekeeping.

Constant Summary collapse

DC_NAMESPACE =
'http://rs.tdwg.org/dwc/terms/'.freeze
TW_ATTRIBUTES =

Not yet implemented, but likely needed (at an even higher level) ? :id

[
  :id,
  :project_id,
  :created_at,
  :updated_at,
  :created_by_id,
  :updated_by_id,
  :dwc_occurrence_object_type,
  :dwc_occurrence_object_id
].freeze
HEADER_CONVERTERS =
{
  'dwcClass' => 'class',
}.freeze
NOMENCLATURE_RANKS =

Supported ranks (fields in db)

[
  :kingdom,
  :phylum,
  :dwcClass,
  :order,
  :superfamily,
  :family,
  :subfamily,
  :tribe,
  :subtribe,
  :genus,
  :specificEpithet
].freeze

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Housekeeping

#has_polymorphic_relationship?

Methods inherited from ApplicationRecord

transaction_with_retry

Instance Attribute Details

#occurrence_identifierObject

Returns the value of attribute occurrence_identifier.



90
91
92
# File 'app/models/dwc_occurrence.rb', line 90

def occurrence_identifier
  @occurrence_identifier
end

Class Method Details

.annotates?Boolean

Returns:

  • (Boolean)


137
138
139
# File 'app/models/dwc_occurrence.rb', line 137

def self.annotates?
  false
end

.by_collection_object_filter(filter_scope: nil, project_id: nil) ⇒ Object

TODO: use filters Return scopes by a collection object filter



166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
# File 'app/models/dwc_occurrence.rb', line 166

def self.by_collection_object_filter(filter_scope: nil, project_id: nil)
  return DwcOccurrence.none if project_id.nil? || filter_scope.nil?

  c = ::CollectionObject.arel_table
  d = arel_table

  # TODO: hackish
  k = ::CollectionObject.select('coscope.id').from( '(' + filter_scope.to_sql + ') as coscope ' )

  a = self.object_join('CollectionObject')
    .where('dwc_occurrences.project_id = ?', project_id)
    .where(dwc_occurrence_object_id: k)
    .select(::DwcOccurrence.target_columns) # TODO !! Will have to change when AssertedDistribution and other types merge in
  a
end

.computed_columnsScope

Returns the columns inferred to have data.

Returns:

  • (Scope)

    the columns inferred to have data



218
219
220
# File 'app/models/dwc_occurrence.rb', line 218

def self.computed_columns
  select(target_columns)
end

.empty_fieldsArray

Returns of column names as symbols that are blank in ALL projects (not just this one).

Returns:

  • (Array)

    of column names as symbols that are blank in ALL projects (not just this one)



184
185
186
187
188
189
190
191
192
193
194
195
# File 'app/models/dwc_occurrence.rb', line 184

def self.empty_fields
  empty_in_all_projects = ActiveRecord::Base.connection.execute("select attname
  from pg_stats
  where tablename = 'dwc_occurrences'
  and most_common_vals is null
  and most_common_freqs is null
  and histogram_bounds is null
  and correlation is null
  and null_frac = 1;").pluck('attname').map(&:to_sym)

  empty_in_all_projects #  - target_columns
end

.excluded_columnsArray

Returns of symbols.

Returns:

  • (Array)

    of symbols



212
213
214
# File 'app/models/dwc_occurrence.rb', line 212

def self.excluded_columns
  ::DwcOccurrence.columns.collect{|c| c.name.to_sym} - (self.target_columns - [:dwc_occurrence_object_id, :dwc_occurrence_object_type])
end

.object_join(target) ⇒ Object



141
142
143
144
145
146
147
# File 'app/models/dwc_occurrence.rb', line 141

def self.object_join(target)
  return DwcOccurrence.none unless ['CollectionObject', 'AssertedDistribution', 'FieldOccurrence'].include?(target)
  a = arel_table
  b = target.safe_constantize.arel_table # hmm - :: required
  j = a.join(b).on(a[:dwc_occurrence_object_type].eq(target).and(a[:dwc_occurrence_object_id].eq(b[:id])))
  joins(j.join_sources)
end

.scoped_by_otu(otu) ⇒ Scope

Returns all DwcOccurrences for the Otu

  • Includes synonymy (coordinate OTUs).

Returns:

  • (Scope)

    all DwcOccurrences for the Otu

    • Includes synonymy (coordinate OTUs).



152
153
154
155
156
157
158
159
160
161
162
# File 'app/models/dwc_occurrence.rb', line 152

def self.scoped_by_otu(otu)
  if otu.taxon_name_id.present?
    ::Queries::DwcOccurrence::Filter.new({
      taxon_name_id: otu.taxon_name_id,
    }).all
  else
    ::Queries::DwcOccurrence::Filter.new({
      otu_id: otu.id,
    }).all
  end
end

.stale(kind = 'CollectionObject') ⇒ Object (protected)



368
369
370
371
372
# File 'app/models/dwc_occurrence.rb', line 368

def self.stale(kind = 'CollectionObject')
  tbl = kind.tableize
  DwcOccurrence.joins("LEFT JOIN #{tbl} tbl on dwc_occurrences.dwc_occurrence_object_id = tbl.id")
    .where('tbl.id IS NULL and dwc_occurrences.dwc_occurrence_object_type = ?', kind )
end

.sweepObject (protected)

Delete all DwcOccurrence records where object is missing.



361
362
363
364
365
366
# File 'app/models/dwc_occurrence.rb', line 361

def self.sweep
  %w{CollectionObject AssertedDistribution FieldOccurrence}.each do |k|
    stale(k).delete_all
  end
  true
end

.target_columnsArray

!! TODO: When we come to adding AssertedDistributions, FieldOccurrnces, etc. we will have to make this more flexible

Returns:

  • (Array)

    of symbols



201
202
203
204
205
206
207
208
# File 'app/models/dwc_occurrence.rb', line 201

def self.target_columns
  [:id, # must be in position 0
   :occurrenceID,
   :basisOfRecord,
   :dwc_occurrence_object_id,   # !! We don't want this, but need it in joins, it is removed in trim via `.excluded_columns` below
   :dwc_occurrence_object_type, # !! ^
  ] + CollectionObject::DwcExtensions::DWC_OCCURRENCE_MAP.keys
end

Instance Method Details

#as_json(options = {}) ⇒ Object

Strip nils when ‘to_json` used



93
94
95
# File 'app/models/dwc_occurrence.rb', line 93

def as_json(options = {})
  super(options.merge(except: attributes.keys.select{ |key| self[key].nil? }))
end

#asserted_distributionObject



114
115
116
# File 'app/models/dwc_occurrence.rb', line 114

def asserted_distribution
  dwc_occurrence_object_type == 'AssertedDistribution' ? dwc_occurrence_object : nil
end

#basisObject



222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
# File 'app/models/dwc_occurrence.rb', line 222

def basis
  case dwc_occurrence_object_type
  when 'CollectionObject'
    if dwc_occurrence_object.is_fossil?
      return 'FossilSpecimen'
    else
      return 'PreservedSpecimen'
    end
  when 'AssertedDistribution'
    # Used to fork b/b Source::Human and Source::Bibtex:
    case dwc_occurrence_object.source&.type || dwc_occurrence_object.sources.order(cached_nomenclature_date: :DESC).first.type
    when 'Source::Bibtex'
      return 'MaterialCitation'
    when 'Source::Human'
      return 'HumanObservation'
    else # Not recommended at this point
      return 'Occurrence'
    end
  when 'FieldOccurrence'
    if dwc_occurrence_object.machine_output?
      return 'MachineObservation'
    else
      return 'HumanObservation'
    end
  end

  'Undefined'
end

#collecting_eventObject



122
123
124
# File 'app/models/dwc_occurrence.rb', line 122

def collecting_event
  collection_object&.collecting_event || field_occurrence&.collecting_event
end

#collection_objectObject



110
111
112
# File 'app/models/dwc_occurrence.rb', line 110

def collection_object
  dwc_occurrence_object_type == 'CollectionObject' ? dwc_occurrence_object : nil
end

#create_object_uuidObject (protected)



347
348
349
350
351
352
353
# File 'app/models/dwc_occurrence.rb', line 347

def create_object_uuid
  @occurrence_identifier = Identifier::Global::Uuid::TaxonworksDwcOccurrence.create!(
    identifier_object: dwc_occurrence_object,
    by: dwc_occurrence_object&.creator, # revisit, why required?
    project_id: dwc_occurrence_object&.project_id, # Current.project_id,  # revisit, why required?
    is_generated: true)
end

#dwc_jsonObject

Returns Hash

  • Legally formatted DwC fields only, with things like ‘dwcClass` translated

  • Only fields with values returned

  • Keys are sorted.

Returns:

  • Hash

    • Legally formatted DwC fields only, with things like ‘dwcClass` translated

    • Only fields with values returned

    • Keys are sorted



102
103
104
105
106
107
108
# File 'app/models/dwc_occurrence.rb', line 102

def dwc_json
  a = as_json.reject!{|k,v| TW_ATTRIBUTES.include?(k.to_sym) || v.nil?}
  HEADER_CONVERTERS.keys.each do |k|
    a[ HEADER_CONVERTERS[k] ] = a.delete(k) if a[k]
  end
  a.sort.to_h
end

#field_occurrenceObject



118
119
120
# File 'app/models/dwc_occurrence.rb', line 118

def field_occurrence
  dwc_occurrence_object_type == 'FieldOccurrence' ? dwc_occurrence_object : nil
end

#generate_uuid_if_required(force = false) ⇒ Object

TODO: quick check if occurrenceID exists in table?! <-> locking sync !?

Parameters:

  • force (Boolean) (defaults to: false)

    true - only create identifier if identifier exists false - check if occurrenceID is present, if it is, assume identifier (still) exists



263
264
265
266
267
268
269
270
271
# File 'app/models/dwc_occurrence.rb', line 263

def generate_uuid_if_required(force = false)
  if force # really make sure there is an object to work with
    create_object_uuid if !occurrence_identifier && !dwc_occurrence_object.nil? # TODO: can be simplified when inverse_of/validation added to identifiers
  else # assume if occurrenceID is not blank identifier is present
    if occurrenceID.blank?
      create_object_uuid if !occurrence_identifier && !dwc_occurrence_object.nil? # TODO: can be simplified when inverse_of/validation added to identifiers
    end
  end
end

#is_stale?Boolean

!! This a spot check, it’s not (yet) coded to be comprehensive. !! You should request a full rebuild (rebuild=true) at display time !! to ensure an up-to-date individual record

Returns:

  • (Boolean)

    By looking at the data, determine if a related record has been updated since this record ws updated at.



281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
# File 'app/models/dwc_occurrence.rb', line 281

def is_stale?
  case dwc_occurrence_object_type
  when 'CollectionObject'
    times = .values
    n = read_attribute(:updated_at)

    times.each do |v|
      return true if v > n
    end

    return false
  else # AssertedDistribution
    return  dwc_occurrence_object.updated_at > updated_at
  end
end

#is_stale_metadataObject



297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
# File 'app/models/dwc_occurrence.rb', line 297

def 
  case dwc_occurrence_object_type
  when 'CollectionObject'

    o = CollectionObject.select(:id, :updated_at, :collecting_event_id).find_by(id: dwc_occurrence_object_id)
    ce = CollectingEvent.select(:id, :updated_at).find_by(id: o.collecting_event_id)

    td =  dwc_occurrence_object&.taxon_determinations.order(:position).first

    tdr = if td&.otu&.taxon_name&.cached_name_and_author_year != scientificName
            td.updated_at
          else
            nil
          end

    tc = if fieldNumber != o.dwc_field_number
           collecting_event.identifiers.where(type: 'Identifier::Local::FieldNumber').first.updated_at
         else
           nil
         end

    return {
      collection_object: o.updated_at, # Shouldn't be neccessary since on_save rebuilds, but cheap here
      collecting_event: ce&.updated_at,
      trip_code: tc,
      taxon_determination: dwc_occurrence_object.taxon_determinations.order(:position)&.first&.updated_at,
      taxon_determination_reorder: tdr,
      taxon_determination_roles: dwc_occurrence_object.taxon_determinations.order(:position)&.first&.updated_at,
      biocuration_classification: dwc_occurrence_object.biocuration_classifications.order(:updated_at).first&.updated_at,
      georeferences: dwc_occurrence_object.georeferences.order(:updated_at).first&.updated_at,

      data_attributes: dwc_occurrence_object.data_attributes.order(:updated_at).first&.updated_at,

      collection_object_roles: dwc_occurrence_object.roles.order(:updated_at).first&.updated_at,
      collecting_event_data_attributes: dwc_occurrence_object.collecting_event&.data_attributes&.order(:updated_at)&.first&.updated_at,
      collecting_event_roles: dwc_occurrence_object.collecting_event&.roles&.order(:updated_at)&.first&.updated_at
      # citations?
      # tags?!
    }.select{|k,v| !v.nil?}

  else # AssertedDistribution
    {
      asserted_distribution: dwc_occurrence_object.updated_at,
      # TODO: Citations
    }
  end
end

#otuObject



126
127
128
129
130
131
132
133
134
135
# File 'app/models/dwc_occurrence.rb', line 126

def otu
  case dwc_occurrence_object_type
  when 'AssertedDistribution'
    dwc_occurrence_object.otu
  when 'CollectionObject'
    collection_object.otu
  when 'FieldOccurrence'
    field_occurrence.otu
  end
end

#set_metadata_attributesObject (protected)



355
356
357
358
# File 'app/models/dwc_occurrence.rb', line 355

def 
  write_attribute( :basisOfRecord, basis)
  write_attribute( :occurrenceID, occurrence_identifier&.identifier)  # TODO: Slightly janky to touch this here, might not be needed with new hooks
end

#uuid_identifier_scopeObject



251
252
253
# File 'app/models/dwc_occurrence.rb', line 251

def uuid_identifier_scope
  dwc_occurrence_object&.identifiers&.where('identifiers.type like ?', 'Identifier::Global::Uuid%')&.order(:position)
end