Module: Export::Dwca
- Defined in:
- lib/export/dwca.rb,
lib/export/dwca/data.rb
Defined Under Namespace
Modules: GbifProfile Classes: Data
Constant Summary collapse
- INDEX_VERSION =
Version is a way to track dates where the indexing changed significantly such that all or most of the index should be regenerated. To add a version use ‘Time.now` via IRB
[ '2021-10-12 17:00:00.000000 -0500', # First major refactor '2021-10-15 17:00:00.000000 -0500', # Minor Excludes footprintWKT, and references to GeographicArea in gazetteer; new form of media links '2021-11-04 17:00:00.000000 -0500', # Minor Removes '|', fixes some mappings '2021-11-08 13:00:00.000000 -0500', # PENDING: Minor Adds depth mappings '2021-11-30 13:00:00.000000 -0500', # Fix inverted long,lat '2022-01-21 16:30:00.000000 -0500', # basisOfRecord can now be FossilSpecimen; occurrenceId exporting; adds redundant time fields '2022-03-31 16:30:00.000000 -0500', # collectionCode, occurrenceRemarks and various small fixes '2022-04-28 16:30:00.000000 -0500', # add dwcOccurrenceStatus '2022-09-28 16:30:00.000000 -0500', # add phylum, class, order, higherClassification '2023-04-03 16:30:00.000000 -0500', # add associatedTaxa; updating InternalAttributes is now reflected in index '2023-12-14 16:30:00.000000 -0500', # add verbatimLabel '2023-12-21 11:00:00.000000 -0500', # add caste (via biocuration), identificationRemarks '2024-09-13 11:00:00.000000 -0500' # enable collectionCode, object and collecting event related IDs ].freeze
Class Method Summary collapse
-
.build_index_async(klass, record_scope, predicate_extensions: {}) ⇒ Object
When we re-index a large set of data then we run it in the background.
-
.download_async(record_scope, request = nil, extension_scopes: {}, predicate_extensions: {}, taxonworks_extensions: {}) ⇒ Download
The download object containing the archive.
- .index_metadata(klass, record_scope) ⇒ Hash{Symbol=>Integer, Time, Array}
Class Method Details
.build_index_async(klass, record_scope, predicate_extensions: {}) ⇒ Object
When we re-index a large set of data then we run it in the background. To determine when it is done we poll by the last record to be indexed.
73 74 75 76 77 |
# File 'lib/export/dwca.rb', line 73 def self.build_index_async(klass, record_scope, predicate_extensions: {} ) s = record_scope.order(:id) ::DwcaCreateIndexJob.perform_later(klass.to_s, sql_scope: s.to_sql) (klass, s) end |
.download_async(record_scope, request = nil, extension_scopes: {}, predicate_extensions: {}, taxonworks_extensions: {}) ⇒ Download
Returns the download object containing the archive.
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/export/dwca.rb', line 38 def self.download_async(record_scope, request = nil, extension_scopes: {}, predicate_extensions: {}, taxonworks_extensions: {}) name = "dwc-a_#{DateTime.now}.zip" download = ::Download::DwcArchive.create!( name: "DwC Archive generated at #{Time.now.utc}.", description: 'A Darwin Core archive.', filename: name, request:, expires: 2.days.from_now, total_records: record_scope.size # Was haveing problems with count() TODO: increment after when extensions are allowed. ) # Note we pass a string with the record scope ::DwcaCreateDownloadJob.perform_later( download, core_scope: record_scope.to_sql, extension_scopes:, predicate_extensions:, taxonworks_extensions:, ) download end |
.index_metadata(klass, record_scope) ⇒ Hash{Symbol=>Integer, Time, Array}
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
# File 'lib/export/dwca.rb', line 80 def self.(klass, record_scope) a = record_scope.first&.to_global_id&.to_s # TODO: this should be UUID? b = record_scope.last&.to_global_id&.to_s # TODO: this should be UUID? t = record_scope.size # was having problems with count = { total: t, start_time: Time.zone.now, sample: [a, b].compact } if b && (t > 2) max = 9 max = t if t < 9 ids = klass .select('*') .from("(select id, type, ROW_NUMBER() OVER (ORDER BY id ASC) rn from (#{record_scope.to_sql}) b ) a") .where("a.rn % ((SELECT COUNT(*) FROM (#{record_scope.to_sql}) c) / #{max}) = 0") .limit(max) .collect{|o| o.to_global_id.to_s} [:sample].insert(1, *ids) end [:sample].uniq! end |