Module: Export::Dwca
- Defined in:
- lib/export/dwca.rb
Defined Under Namespace
Modules: Eml, GbifProfile, Occurrence
Constant Summary collapse
- INDEX_VERSION =
Version is a way to track dates where the indexing changed significantly such that all or most of the index should be regenerated. To add a version use ‘Time.now` via IRB
[ '2021-10-12 17:00:00.000000 -0500', # First major refactor '2021-10-15 17:00:00.000000 -0500', # Minor Excludes footprintWKT, and references to GeographicArea in gazetteer; new form of media links '2021-11-04 17:00:00.000000 -0500', # Minor Removes '|', fixes some mappings '2021-11-08 13:00:00.000000 -0500', # PENDING: Minor Adds depth mappings '2021-11-30 13:00:00.000000 -0500', # Fix inverted long,lat '2022-01-21 16:30:00.000000 -0500', # basisOfRecord can now be FossilSpecimen; occurrenceId exporting; adds redundant time fields '2022-03-31 16:30:00.000000 -0500', # collectionCode, occurrenceRemarks and various small fixes '2022-04-28 16:30:00.000000 -0500', # add dwcOccurrenceStatus '2022-09-28 16:30:00.000000 -0500', # add phylum, class, order, higherClassification '2023-04-03 16:30:00.000000 -0500', # add associatedTaxa; updating InternalAttributes is now reflected in index '2023-12-14 16:30:00.000000 -0500', # add verbatimLabel '2023-12-21 11:00:00.000000 -0500', # add caste (via biocuration), identificationRemarks '2024-09-13 11:00:00.000000 -0500' # enable collectionCode, object and collecting event related IDs ].freeze
Class Method Summary collapse
-
.build_index_async(klass, record_scope, predicate_extensions: {}) ⇒ Object
When we re-index a large set of data then we run it in the background.
-
.download_async(record_scope, request = nil, extension_scopes: {}, predicate_extensions: {}, taxonworks_extensions: {}, project_id: nil) ⇒ Download
Creates a DwC-A download asynchronously by enqueuing a job.
- .index_metadata(klass, record_scope) ⇒ Hash{Symbol=>Integer, Time, Array}
Class Method Details
.build_index_async(klass, record_scope, predicate_extensions: {}) ⇒ Object
When we re-index a large set of data then we run it in the background. To determine when it is done we poll by the last record to be indexed.
93 94 95 96 97 |
# File 'lib/export/dwca.rb', line 93 def self.build_index_async(klass, record_scope, predicate_extensions: {} ) s = record_scope.order(:id) ::DwcaCreateIndexJob.perform_later(klass.to_s, sql_scope: s.to_sql) (klass, s) end |
.download_async(record_scope, request = nil, extension_scopes: {}, predicate_extensions: {}, taxonworks_extensions: {}, project_id: nil) ⇒ Download
Creates a DwC-A download asynchronously by enqueuing a job.
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
# File 'lib/export/dwca.rb', line 54 def self.download_async(record_scope, request = nil, extension_scopes: {}, predicate_extensions: {}, taxonworks_extensions: {}, project_id: nil) raise TaxonWorks::Error, 'project_id is required in Export::Dwca::download_async!' if project_id.nil? name = "dwc-a_#{DateTime.now}.zip" # TODO: move fixed attributes to model download = ::Download::DwcArchive.create!( name: "DwC Archive generated at #{Time.now.utc}.", description: 'A Darwin Core archive.', filename: name, request:, expires: 2.days.from_now, total_records: record_scope.size # Was haveing problems with count() TODO: increment after when extensions are allowed. ) # Note we pass a string with the record scope ::DwcaCreateDownloadJob.perform_later( download.id, core_scope: record_scope.to_sql, extension_scopes:, predicate_extensions:, taxonworks_extensions:, project_id: ) download end |
.index_metadata(klass, record_scope) ⇒ Hash{Symbol=>Integer, Time, Array}
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
# File 'lib/export/dwca.rb', line 100 def self.(klass, record_scope) a = record_scope.first&.to_global_id&.to_s # TODO: this should be UUID? b = record_scope.last&.to_global_id&.to_s # TODO: this should be UUID? t = record_scope.size # was having problems with count = { total: t, start_time: Time.zone.now, sample: [a, b].compact } if b && (t > 2) max = 9 max = t if t < 9 ids = klass .select('*') .from("(select id, type, ROW_NUMBER() OVER (ORDER BY id ASC) rn from (#{record_scope.to_sql}) b ) a") .where("a.rn % ((SELECT COUNT(*) FROM (#{record_scope.to_sql}) c) / #{max}) = 0") .limit(max) .collect{|o| o.to_global_id.to_s} [:sample].insert(1, *ids) end [:sample].uniq! end |