Module: Export::Dwca

Defined in:
lib/export/dwca.rb

Overview

Darwin Core Archive (DWC-A) shared constants and utilities

Defined Under Namespace

Modules: Checklist, Eml, GbifProfile, Occurrence

Constant Summary collapse

INDEX_VERSION =

!! If changes are made to this or related Dwc files you should update the INDEX_VERSION constant.

Version is a way to track dates where the indexing changed significantly such that all or most of the index should be regenerated. To add a version use Time.now via IRB.

[
  '2021-10-12 17:00:00.000000 -0500',    # First major refactor
  '2021-10-15 17:00:00.000000 -0500',    # Minor  Excludes footprintWKT, and references to GeographicArea in gazetteer; new form of media links
  '2021-11-04 17:00:00.000000 -0500',    # Minor  Removes '|', fixes some mappings
  '2021-11-08 13:00:00.000000 -0500',    # PENDING: Minor  Adds depth mappings
  '2021-11-30 13:00:00.000000 -0500',    # Fix inverted long,lat
  '2022-01-21 16:30:00.000000 -0500',    # basisOfRecord can now be FossilSpecimen; occurrenceId exporting; adds redundant time fields
  '2022-03-31 16:30:00.000000 -0500',    # collectionCode, occurrenceRemarks and various small fixes
  '2022-04-28 16:30:00.000000 -0500',    # add dwcOccurrenceStatus
  '2022-09-28 16:30:00.000000 -0500',    # add phylum, class, order, higherClassification
  '2023-04-03 16:30:00.000000 -0500',    # add associatedTaxa; updating InternalAttributes is now reflected in index
  '2023-12-14 16:30:00.000000 -0500',    # add verbatimLabel
  '2023-12-21 11:00:00.000000 -0500',    # add caste (via biocuration), identificationRemarks
  '2024-09-13 11:00:00.000000 -0500',    # enable collectionCode, object and collecting event related IDs
  '2026-03-21 12:00:00.000000 -0500'     # add otu_id to dwc_occurrences
].freeze
DELIMITER =

Delimiter used for concatenating multiple values in DwC fields Used when multiple items (e.g., references, media, identifiers) need to be represented in a single Darwin Core field.

' | '.freeze
DEFAULT_CHECKLIST_DESCRIPTION =
'A zip file containing a Darwin Core Archive checklist.'.freeze

Class Method Summary collapse

Class Method Details

.build_index_async(klass, record_scope, predicate_extensions: {}) ⇒ Hash

Returns Metadata including total, start_time, and sample global ids.

Parameters:

  • klass (Class)

    ActiveRecord class, e.g. CollectionObject

  • record_scope (ActiveRecord::Relation)

    Scope of records to index

Returns:

  • (Hash)

    Metadata including total, start_time, and sample global ids



44
45
46
47
48
# File 'lib/export/dwca.rb', line 44

def self.build_index_async(klass, record_scope, predicate_extensions: {})
  s = record_scope.order(:id)
  ::DwcaCreateIndexJob.perform_later(klass.to_s, sql_scope: s.to_sql)
  (klass, s)
end

.checklist_download_async(core_otu_scope_params, request_url, extensions: [], accepted_name_mode: Checklist::Data::REPLACE_WITH_ACCEPTED_NAME, description_topics: [], download_name: nil, download_description: nil, project_id: nil) ⇒ Download::DwcArchive::Checklist

Create a DwC-A checklist download asynchronously

Parameters:

  • core_otu_scope_params (Hash)

    OTU query parameters

  • request_url (String)

    URL of the request

  • extensions (Array<Symbol>) (defaults to: [])

    Extensions to include (e.g., [:distribution, :references])

  • accepted_name_mode (String) (defaults to: Checklist::Data::REPLACE_WITH_ACCEPTED_NAME)

    How to handle unaccepted names ('replace_with_accepted_name' or 'accepted_name_usage_id')

  • description_topics (Array<Integer>) (defaults to: [])

    Ordered list of topic IDs for description extension

  • download_name (String, nil) (defaults to: nil)

    Optional custom name for the Download record

  • download_description (String, nil) (defaults to: nil)

    Optional custom description for the Download record

  • project_id (Integer) (defaults to: nil)

    Project ID

Returns:



131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
# File 'lib/export/dwca.rb', line 131

def self.checklist_download_async(core_otu_scope_params, request_url, extensions: [], accepted_name_mode: Checklist::Data::REPLACE_WITH_ACCEPTED_NAME, description_topics: [], download_name: nil, download_description: nil, project_id: nil)
  filename = "dwc_checklist_#{DateTime.now}.zip"
  display_name = download_name.presence || "DwC Checklist on #{Time.now}."
  description = download_description.presence || DEFAULT_CHECKLIST_DESCRIPTION

  download = ::Download::DwcArchive::Checklist.create!(
    name: display_name,
    description: description,
    filename: filename,
    request: request_url,
    expires: 2.days.from_now
  )

  DwcaCreateChecklistDownloadJob.perform_later(
    download.id,
    core_otu_scope_params:,
    extensions:,
    accepted_name_mode:,
    description_topics:,
    project_id:
  )

  download
end

.download_async(core_scope, request_url, predicate_extensions: {}, taxonworks_extensions: [], extension_scopes: {}, project_id: nil, user_id: nil) ⇒ Download::DwcArchive

Create a DwC-A occurrence download asynchronously

Parameters:

  • core_scope (ActiveRecord::Relation)

    Scope of DwcOccurrence records

  • request_url (String)

    URL of the request

  • predicate_extensions (Hash) (defaults to: {})

    Predicate extensions to include

  • taxonworks_extensions (Array) (defaults to: [])

    TaxonWorks extensions to include

  • extension_scopes (Hash) (defaults to: {})

    Additional extension scopes

  • project_id (Integer) (defaults to: nil)

    Project ID

  • user_id (Integer) (defaults to: nil)

    User ID for housekeeping in the background job

Returns:

Raises:

  • (TaxonWorks::Error)


92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# File 'lib/export/dwca.rb', line 92

def self.download_async(core_scope, request_url, predicate_extensions: {}, taxonworks_extensions: [], extension_scopes: {}, project_id: nil, user_id: nil)
  raise TaxonWorks::Error, 'project_id is required in Export::Dwca::download_async!' if project_id.nil?
  raise TaxonWorks::Error, 'user_id is required in Export::Dwca::download_async!' if user_id.nil?

  name = "dwc_occurrences_#{DateTime.now}.zip"

  download = ::Download::DwcArchive.create!(
    name: "DwC Archive for occurrences on #{Time.now}.",
    description: 'A zip file containing a Darwin Core Archive of occurrence records.',
    filename: name,
    request: request_url,
    expires: 2.days.from_now
  )

  DwcaCreateDownloadJob.perform_later(
    download.id,
    core_scope: core_scope.to_sql,
    extension_scopes:,
    predicate_extensions:,
    taxonworks_extensions:,
    project_id:,
    user_id:
  )

  download
end

.index_metadata(klass, record_scope) ⇒ Hash{Symbol=>Integer, Time, Array}

Parameters:

  • klass (Class)

    ActiveRecord class

  • record_scope (ActiveRecord::Relation)

    Scope of records

Returns:

  • (Hash{Symbol=>Integer, Time, Array})


53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# File 'lib/export/dwca.rb', line 53

def self.(klass, record_scope)
  a = record_scope.first&.to_global_id&.to_s
  b = record_scope.last&.to_global_id&.to_s

  t = record_scope.size

   = {
    total: t,
    start_time: Time.zone.now,
    sample: [a, b].compact
  }

  if b && (t > 2)
    max = 9
    max = t if t < 9

    ids = klass
      .select('*')
      .from("(select id, type, ROW_NUMBER() OVER (ORDER BY id ASC) rn from (#{record_scope.to_sql}) b ) a")
      .where("a.rn % ((SELECT COUNT(*) FROM (#{record_scope.to_sql}) c) / #{max}) = 0")
      .limit(max)
      .collect { |o| o.to_global_id.to_s }

    [:sample].insert(1, *ids)
  end

  [:sample].uniq!
  
end

.output_csv(tbl) ⇒ String

Returns TSV content.

Parameters:

  • tbl (Array<Array>)

    table data

Returns:

  • (String)

    TSV content



33
34
35
36
37
38
39
# File 'lib/export/dwca.rb', line 33

def self.output_csv(tbl)
  output = StringIO.new
  tbl.each do |row|
    output.puts ::CSV.generate_line(row, col_sep: "\t", encoding: Encoding::UTF_8)
  end
  output.string
end