Module: Utilities::DarwinCore::Compact
- Defined in:
- lib/utilities/darwin_core/compact.rb
Overview
Methods for compacting (merging rows) in DarwinCore tables.
Constant Summary collapse
- MALE_STRINGS =
/\Amale/i- FEMALE_STRINGS =
/\Afemale/i- ADULT_STRINGS =
/\Aadult/i- EXUVIA_STRINGS =
/\Aexuvia/i- NYMPH_STRINGS =
/\Anymph/i- COMPACT_DELIMITER =
'|'- APPENDED_COLUMNS =
%w[lifeStage sex otherCatalogNumbers associatedMedia].freeze
- SUMMED_COLUMNS =
%w[individualCount].freeze
- DERIVED_COLUMNS =
%w[adultMale adultFemale immatureNymph exuvia].freeze
- SKIP_VALIDATION_COLUMNS =
Columns excluded from the differing-values validation check. These are housekeeping or per-row identity fields that are expected to differ across rows sharing a catalogNumber.
%w[ id occurrenceID dwc_occurrence_object_id dwc_occurrence_object_type ].freeze
Class Method Summary collapse
-
.add_derived_columns(row) ⇒ void
private
Add derived columns to a single (non-grouped) row.
-
.add_derived_columns_from_group(merged, rows) ⇒ void
private
Add derived columns from a group of pre-merge rows.
-
.by_catalog_number(table, preview: false) ⇒ void
Merge rows with identical catalogNumber values.
-
.ensure_derived_headers(table) ⇒ void
private
Ensure derived column headers are present in the table.
-
.merge_group(table, catalog_number, rows) ⇒ Hash
private
Merge a group of rows into a single row.
-
.validate_group(table, catalog_number, rows) ⇒ void
private
Validate a group of rows sharing a catalogNumber.
Class Method Details
.add_derived_columns(row) ⇒ void (private)
This method returns an undefined value.
Add derived columns to a single (non-grouped) row.
182 183 184 185 186 187 188 189 190 191 |
# File 'lib/utilities/darwin_core/compact.rb', line 182 def self.add_derived_columns(row) count = row['individualCount'].to_i sex_value = row['sex'].to_s.strip life_stage_value = row['lifeStage'].to_s.strip row['adultMale'] = (sex_value.match?(MALE_STRINGS) ? count : 0).to_s row['adultFemale'] = (sex_value.match?(FEMALE_STRINGS) ? count : 0).to_s row['immatureNymph'] = (life_stage_value.match?(NYMPH_STRINGS) ? count : 0).to_s row['exuvia'] = (life_stage_value.match?(EXUVIA_STRINGS) ? count : 0).to_s end |
.add_derived_columns_from_group(merged, rows) ⇒ void (private)
This method returns an undefined value.
Add derived columns from a group of pre-merge rows.
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
# File 'lib/utilities/darwin_core/compact.rb', line 155 def self.add_derived_columns_from_group(merged, rows) adult_male_count = 0 adult_female_count = 0 immature_nymph_count = 0 exuvia_count = 0 rows.each do |row| count = row['individualCount'].to_i sex_value = row['sex'].to_s.strip life_stage_value = row['lifeStage'].to_s.strip adult_male_count += count if sex_value.match?(MALE_STRINGS) adult_female_count += count if sex_value.match?(FEMALE_STRINGS) immature_nymph_count += count if life_stage_value.match?(NYMPH_STRINGS) exuvia_count += count if life_stage_value.match?(EXUVIA_STRINGS) end merged['adultMale'] = adult_male_count.to_s merged['adultFemale'] = adult_female_count.to_s merged['immatureNymph'] = immature_nymph_count.to_s merged['exuvia'] = exuvia_count.to_s end |
.by_catalog_number(table, preview: false) ⇒ void
This method returns an undefined value.
Merge rows with identical catalogNumber values. Rows without a catalogNumber are excluded from compaction but tracked in table.skipped_rows.
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
# File 'lib/utilities/darwin_core/compact.rb', line 36 def self.by_catalog_number(table, preview: false) with_catalog_number, without_catalog_number = table.rows.partition { |row| row['catalogNumber'].to_s.strip.present? } table.skipped_rows = without_catalog_number grouped = with_catalog_number.group_by { |row| row['catalogNumber'] } merged_rows = [] grouped.each do |catalog_number, rows_in_group| if rows_in_group.size == 1 row = rows_in_group.first unless preview add_derived_columns(row) merged_rows << row end next end validate_group(table, catalog_number, rows_in_group) unless preview merged = merge_group(table, catalog_number, rows_in_group) merged_rows << merged end end unless preview ensure_derived_headers(table) table.instance_variable_set(:@rows, merged_rows) end end |
.ensure_derived_headers(table) ⇒ void (private)
This method returns an undefined value.
Ensure derived column headers are present in the table.
197 198 199 200 201 |
# File 'lib/utilities/darwin_core/compact.rb', line 197 def self.ensure_derived_headers(table) DERIVED_COLUMNS.each do |col| table.headers << col unless table.headers.include?(col) end end |
.merge_group(table, catalog_number, rows) ⇒ Hash (private)
Merge a group of rows into a single row.
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
# File 'lib/utilities/darwin_core/compact.rb', line 131 def self.merge_group(table, catalog_number, rows) merged = rows.first.dup # Sum individualCount SUMMED_COLUMNS.each do |col| merged[col] = rows.sum { |r| r[col].to_i }.to_s end # Append unique values with delimiter APPENDED_COLUMNS.each do |col| unique_values = rows.map { |r| r[col].to_s.strip }.reject(&:empty?).uniq merged[col] = unique_values.join(COMPACT_DELIMITER) end add_derived_columns_from_group(merged, rows) merged end |
.validate_group(table, catalog_number, rows) ⇒ void (private)
This method returns an undefined value.
Validate a group of rows sharing a catalogNumber. Logs errors for columns with differing values. Warns if sex/lifeStage are non-adult.
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
# File 'lib/utilities/darwin_core/compact.rb', line 79 def self.validate_group(table, catalog_number, rows) operated_columns = APPENDED_COLUMNS + SUMMED_COLUMNS + SKIP_VALIDATION_COLUMNS (table.headers - operated_columns).each do |column| values = rows.map { |r| r[column] }.uniq if values.size > 1 table.errors << { type: :error, catalog_number:, column:, message: "Differing values in '#{column}'", values: } end end rows.each do |row| sex_value = row['sex'].to_s.strip life_stage_value = row['lifeStage'].to_s.strip if sex_value.present? && !sex_value.match?(MALE_STRINGS) && !sex_value.match?(FEMALE_STRINGS) if !life_stage_value.match?(ADULT_STRINGS) table.errors << { type: :warning, catalog_number:, column: 'sex', message: "Non-adult/non-standard sex '#{sex_value}' with lifeStage '#{life_stage_value}'", values: [sex_value, life_stage_value] } end end if life_stage_value.present? && !life_stage_value.match?(ADULT_STRINGS) unless life_stage_value.match?(NYMPH_STRINGS) || life_stage_value.match?(EXUVIA_STRINGS) table.errors << { type: :warning, catalog_number:, column: 'lifeStage', message: "Non-adult lifeStage '#{life_stage_value}'", values: [life_stage_value] } end end end end |