Class: Utilities::DarwinCore::Table

Inherits:
Object
  • Object
show all
Defined in:
lib/utilities/darwin_core/table.rb

Overview

A wrapper for DarwinCore Occurrence data as native Ruby objects. Accepts input as CSV, TSV string, or File. Outputs as CSV, TSV string, or File.

Author:

  • Claude (>50% of code)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(csv: nil, tsv_string: nil, file: nil) ⇒ Table

Construct a Table from one of three input types:

Parameters:

  • csv (CSV, nil) (defaults to: nil)

    a parsed CSV object with headers

  • tsv_string (String, nil) (defaults to: nil)

    a TSV-formatted string (tab-delimited, first row is headers)

  • file (String, nil) (defaults to: nil)

    path to a TSV file

Raises:

  • (ArgumentError)


31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# File 'lib/utilities/darwin_core/table.rb', line 31

def initialize(csv: nil, tsv_string: nil, file: nil)
  @errors = []
  @skipped_rows = []
  @headers = []
  @rows = []

  sources = [csv, tsv_string, file].compact
  raise ArgumentError, 'Provide exactly one of csv:, tsv_string:, or file:' unless sources.size == 1

  if csv
    load_from_csv(csv)
  elsif tsv_string
    load_from_tsv_string(tsv_string)
  elsif file
    load_from_file(file)
  end
end

Instance Attribute Details

#errorsArray<Hash>

Returns error/warning log entries from compaction or validation each entry: { type: :error|:warning, column:, message:, values: }.

Returns:

  • (Array<Hash>)

    error/warning log entries from compaction or validation each entry: { type: :error|:warning, column:, message:, values: }



20
21
22
# File 'lib/utilities/darwin_core/table.rb', line 20

def errors
  @errors
end

#headersArray<String> (readonly)

Returns column headers.

Returns:

  • (Array<String>)

    column headers



11
12
13
# File 'lib/utilities/darwin_core/table.rb', line 11

def headers
  @headers
end

#rowsArray<Hash> (readonly)

Returns each row is a Hash keyed by header string.

Returns:

  • (Array<Hash>)

    each row is a Hash keyed by header string



15
16
17
# File 'lib/utilities/darwin_core/table.rb', line 15

def rows
  @rows
end

#skipped_rowsArray<Hash>

Returns rows excluded from compaction (e.g. no catalogNumber).

Returns:

  • (Array<Hash>)

    rows excluded from compaction (e.g. no catalogNumber)



24
25
26
# File 'lib/utilities/darwin_core/table.rb', line 24

def skipped_rows
  @skipped_rows
end

Instance Method Details

#compact(by: :catalog_number, preview: false) ⇒ Utilities::DarwinCore::Table

Compact rows by merging on a key column.

Parameters:

  • by (Symbol) (defaults to: :catalog_number)

    the compaction strategy (:catalog_number)

  • preview (Boolean) (defaults to: false)

    if true, validate only — do not modify data

Returns:



85
86
87
88
89
90
91
92
93
# File 'lib/utilities/darwin_core/table.rb', line 85

def compact(by: :catalog_number, preview: false)
  case by
  when :catalog_number
    Utilities::DarwinCore::Compact.by_catalog_number(self, preview:)
  else
    raise ArgumentError, "Unknown compact strategy: #{by}"
  end
  self
end

#load_from_csv(csv) ⇒ Object (private)



97
98
99
100
101
102
# File 'lib/utilities/darwin_core/table.rb', line 97

def load_from_csv(csv)
  @headers = csv.headers.map(&:to_s)
  csv.each do |row|
    @rows << headers.each_with_object({}) { |h, hash| hash[h] = row[h] }
  end
end

#load_from_file(path) ⇒ Object (private)

Raises:

  • (ArgumentError)


109
110
111
112
# File 'lib/utilities/darwin_core/table.rb', line 109

def load_from_file(path)
  raise ArgumentError, "File not found: #{path}" unless File.exist?(path)
  load_from_tsv_string(File.read(path))
end

#load_from_tsv_string(tsv_string) ⇒ Object (private)



104
105
106
107
# File 'lib/utilities/darwin_core/table.rb', line 104

def load_from_tsv_string(tsv_string)
  csv = ::CSV.parse(tsv_string, col_sep: "\t", headers: true)
  load_from_csv(csv)
end

#to_csvCSV

Returns a CSV object with headers.

Returns:

  • (CSV)

    a CSV object with headers



51
52
53
54
55
56
57
58
59
# File 'lib/utilities/darwin_core/table.rb', line 51

def to_csv
  output = ::CSV.generate(col_sep: "\t", headers: headers, write_headers: true) do |csv_out|
    rows.each do |row|
      csv_out << headers.map { |h| row[h] }
    end
  end

  ::CSV.parse(output, col_sep: "\t", headers: true)
end

#to_file(path) ⇒ String

Write TSV data to a file.

Parameters:

  • path (String)

    output file path

Returns:

  • (String)

    the path written to



75
76
77
78
# File 'lib/utilities/darwin_core/table.rb', line 75

def to_file(path)
  File.write(path, to_tsv)
  path
end

#to_tsvString

Returns TSV-formatted string.

Returns:

  • (String)

    TSV-formatted string



63
64
65
66
67
68
69
# File 'lib/utilities/darwin_core/table.rb', line 63

def to_tsv
  lines = [headers.join("\t")]
  rows.each do |row|
    lines << headers.map { |h| row[h] }.join("\t")
  end
  lines.join("\n") + "\n"
end