Class: Queries::TaxonName::Tabular

Inherits:

Query::Autocomplete

Object
Query
Query::Autocomplete
Queries::TaxonName::Tabular

show all

Includes:: Helpers

Defined in:: lib/queries/taxon_name/tabular.rb

Overview

Summarize data across a nomenclatural hierarchy.

Results are Arrays, not AR objects.

The approach is to use OVER/PARTITION to build a ranked list, then to summarize only those ranked data requested. When we select without preserving all ranks, then the data are not sortable according to their overall nesting.

The approach only succeeds on the hierarchical data because we cache the current valid id for every name, this allows us to build an ordered, nested by hierarchy list across not only valid, but also combination and invalid names.

These were useful references:

* https://sonnym.github.io/2017/06/05/common-table-expressions-in-activerecord-a-case-study-of-quantiles/
* https://blog.codeship.com/folding-postgres-window-functions-into-rails/

Instance Attribute Summary collapse

#ancestor ⇒ Protonym^?
#columns ⇒ Array

Of hashes, with name of column, and SQL field projection definition { header: ‘foo’ projected: ‘taxon_names.id’ }.
#combinations ⇒ Boolean

If true than also include Combinations in counts.
#fieldsets ⇒ Array of Strings

Named sets of columns to include valid values are - nomenclatural_stats - observations.
#limit ⇒ Object

!! CAREFUL, too small limits will not properly nest all names !! defaults to 1000.
#query ⇒ the build query
#rank_data ⇒ Array

ORDER MATTERS.
#rank_id_fields ⇒ Object

Returns the value of attribute rank_id_fields.
#ranks ⇒ Array

ORDER matters.
#taxon_name_id ⇒ Integer

Required, the id scoping the result set.
#validity ⇒ Boolean

If true then only include valid names.

Attributes inherited from Query::Autocomplete

#dynamic_limit, #project_id, #query_string

Attributes inherited from Query

#query_string, #terms

Instance Method Summary collapse

#all ⇒ Object
#base_query ⇒ Object
#build_query ⇒ Object
#coalesce_ranks(fields) ⇒ Object
#column_headers ⇒ Object
#finest_rank_id ⇒ Object
#hierarchy_table ⇒ Object
#initialize(params = {}) ⇒ Tabular constructor

A new instance of Tabular.
#initialize_columns ⇒ Object

These are generic headers used for existing queries, ultimately subclass this with a reference to core, if it can be abstracted as such.
#nomenclatural_stats_set(query) ⇒ Object
#nomenclature_stats_column(query, rank, valid = nil, index = 0) ⇒ Object
#observations_set(query) ⇒ Object

Only calculates for OTUs, not through to get CollectionObjects, nor Extracts.
#otu_table ⇒ Object
#projected_columns ⇒ Object
#rank_classes ⇒ Object
#rank_over(source_table, valid_source_table) ⇒ Object

Lets us generate a sort order across valid, invalid, and combinations.

Constructor Details

#initialize(params = {}) ⇒ `Tabular`

Returns a new instance of Tabular.

Parameters:

params (Hash) (defaults to: {}) —

keys are symbols

# File 'lib/queries/taxon_name/tabular.rb', line 72

def initialize(params = {})
  super(nil, project_id: params[:project_id]) # We don't actually use project_id here

  @taxon_name_id = params[:taxon_name_id]
  @combinations = boolean_param(params, :combinations)
  @limit = params[:limit]
  @ranks = params[:ranks]
  @rank_data = params[:rank_data]
  @validity = boolean_param(params, :validity)
  @fieldsets = params[:fieldsets]
  @rank_id_fields = []

  initialize_columns
  build_query
end

Instance Attribute Details

#ancestor ⇒ `Protonym`^?

Returns:

(Protonym, nil)



60
61
62

# File 'lib/queries/taxon_name/tabular.rb', line 60

def ancestor
  @ancestor
end

#columns ⇒ `Array`

Returns of hashes, with name of column, and SQL field projection definition

{ header: 'foo'
 projected: 'taxon_names.id'  }.

Returns:

(Array) —
of hashes, with name of column, and SQL field projection definition
```
{ header: 'foo'
 projected: 'taxon_names.id'  }
```



68
69
70

# File 'lib/queries/taxon_name/tabular.rb', line 68

def columns
  @columns
end

#combinations ⇒ `Boolean`

Returns if true than also include Combinations in counts.

Returns:

(Boolean) —

if true than also include Combinations in counts



41
42
43

# File 'lib/queries/taxon_name/tabular.rb', line 41

def combinations
  @combinations
end

#fieldsets ⇒ `Array of Strings`

Returns named sets of columns to include valid values are

- nomenclatural_stats
- observations.

Returns:

(Array of Strings) —
named sets of columns to include valid values are
```
- nomenclatural_stats
- observations
```



48
49
50

# File 'lib/queries/taxon_name/tabular.rb', line 48

def fieldsets
  @fieldsets
end

#limit ⇒ `Object`

!! CAREFUL, too small limits will not properly nest all names !!

defaults to 1000



37
38
39

# File 'lib/queries/taxon_name/tabular.rb', line 37

def limit
  @limit
end

#query ⇒ `the build query`

Returns:

(the build query)



57
58
59

# File 'lib/queries/taxon_name/tabular.rb', line 57

def query
  @query
end

#rank_data ⇒ `Array`

ORDER MATTERS

Returns:

(Array) —

like ‘family’, ‘genus’, ‘species’



33
34
35

# File 'lib/queries/taxon_name/tabular.rb', line 33

def rank_data
  @rank_data
end

#rank_id_fields ⇒ `Object`

Returns the value of attribute rank_id_fields.



62
63
64

# File 'lib/queries/taxon_name/tabular.rb', line 62

def rank_id_fields
  @rank_id_fields
end

#ranks ⇒ `Array`

ORDER matters

Returns:

(Array) —

like ‘family’, ‘genus’, ‘species’



28
29
30

# File 'lib/queries/taxon_name/tabular.rb', line 28

def ranks
  @ranks
end

#taxon_name_id ⇒ `Integer`

Returns required, the id scoping the result set.

Returns:

(Integer) —

required, the id scoping the result set



23
24
25

# File 'lib/queries/taxon_name/tabular.rb', line 23

def taxon_name_id
  @taxon_name_id
end

#validity ⇒ `Boolean`

Returns if true then only include valid names.

Returns:

(Boolean) —

if true then only include valid names



52
53
54

# File 'lib/queries/taxon_name/tabular.rb', line 52

def validity
  @validity
end

Instance Method Details

#all ⇒ `Object`



233
234
235

# File 'lib/queries/taxon_name/tabular.rb', line 233

def all
  ApplicationRecord.connection.execute(@query.to_sql)
end

#base_query ⇒ `Object`

# File 'lib/queries/taxon_name/tabular.rb', line 150

def base_query
  q = table
  h = hierarchy_table

  # Scope all names in the result
  a = table[:id].eq(h[:descendant_id])
    .and(h[:ancestor_id].eq(taxon_name_id) )
  a = a.and(table[:cached_is_valid].eq(true)) if validity
  #a = a.and(table[:cached_valid_taxon_name_id].eq(table[:id])) if validity

  # Start a query
  q = q.join(h, Arel::Nodes::InnerJoin).on(a)

  # TODO: not always true now
  # All results can reference an otu
  q.join(otu_table, Arel::Nodes::OuterJoin).on(
    otu_table[:taxon_name_id].eq( table[:id] ).and( otu_table[:id].not_eq(nil)))
end

#build_query ⇒ `Object`

# File 'lib/queries/taxon_name/tabular.rb', line 169

def build_query
  q = base_query

  # Add a join to the valid parent_id
  valid_table = table.alias('valid_taxon_names')
  q = q.join(valid_table, Arel::Nodes::InnerJoin).on(
    table[:cached_valid_taxon_name_id].eq(valid_table[:id])
  )

  # Setup the basic joins for each requested rank,
  # these are re-used in data gathering queries.
  ranks.each_with_index do |r, i|
    l = table.alias("j_#{i}")

    a = l[:id].eq(hierarchy_table[:descendant_id])
    a = a.and( l[:rank_class].eq( Ranks.lookup(ancestor.nomenclatural_code, ranks[i])) )

    q = q.join(l, Arel::Nodes::OuterJoin).on(a)

    @columns.push( {header: r, projected:  l[:name].as( ranks[i] )  } )

    @rank_id_fields.push l[:id]
  end

  # add individual sets of data columns
  fieldsets.each do |f|
    q = send(f + '_set', q)
  end

  # TODO: rank_classes getter should merge nil when combinations,
  # i.e. we likely don't need nil
  w = table[:rank_class].in(rank_classes)
  w = w.or(table[:rank_class].eq(nil)) if combinations

  q = q.project(*projected_columns)
  q = q.from(table).where(w).distinct
  q = q.take(limit.to_i) if limit

  @query = table.project( Arel::Nodes::SqlLiteral.new('*') ).from(q.as('p')).order( 'cached_valid_taxon_name_id', 'ro' ) # if every field had a value this would work
end

#coalesce_ranks(fields) ⇒ `Object`



219
220
221

# File 'lib/queries/taxon_name/tabular.rb', line 219

def coalesce_ranks(fields)
  Arel::Nodes::NamedFunction.new('coalesce', [fields])
end

#column_headers ⇒ `Object`



117
118
119

# File 'lib/queries/taxon_name/tabular.rb', line 117

def column_headers
  columns.collect{|c| c[:header]}
end

#finest_rank_id ⇒ `Object`



215
216
217

# File 'lib/queries/taxon_name/tabular.rb', line 215

def finest_rank_id
  coalesce_ranks(rank_id_fields.reverse)
end

#hierarchy_table ⇒ `Object`



142
143
144

# File 'lib/queries/taxon_name/tabular.rb', line 142

def hierarchy_table
  Arel::Table.new(:taxon_name_hierarchies)
end

#initialize_columns ⇒ `Object`

These are generic headers used for existing queries, ultimately subclass this with a reference to core, if it can be abstracted as such

# File 'lib/queries/taxon_name/tabular.rb', line 90

def initialize_columns
  @columns = []
  valid_table = table.alias('valid_taxon_names')

  @columns.push({ header: 'ro', projected: rank_over(table, valid_table) })

  @columns.push({header: 'taxon_name_id', projected: table[:id].as('taxon_name_id') } )
  @columns.push({header: 'cached_valid_taxon_name_id', projected: table[:cached_valid_taxon_name_id].as('cached_valid_taxon_name_id') } )
  @columns.push({header: 'cached', projected: table[:cached].as('cached') } )
  @columns.push({header: 'cached_author_year', projected: table[:cached_author_year].as('cached_author_year') } )
  @columns.push({header: 'cached_is_valid', projected: table[:cached_is_valid].as('cached_is_valid') } )

  if fieldsets.include?('observations')
    @columns.push({header: 'otu_id', projected: otu_table[:id].as('otu_id')  } )
    @columns.push({header: 'otu_name', projected:  otu_table[:name].as('otu_name')} )
  end
end

#nomenclatural_stats_set(query) ⇒ `Object`

# File 'lib/queries/taxon_name/tabular.rb', line 299

def nomenclatural_stats_set(query)
  c = rank_data

  i = 0
  c.each do |r|
    %w{valid invalid}.each do |v|
      nomenclature_stats_column(query, r, v, i)
      i += 1
    end
  end

  if combinations
    nomenclature_stats_column(query, 'combination', nil, i)
  end

  query
end

#nomenclature_stats_column(query, rank, valid = nil, index = 0) ⇒ `Object`

# File 'lib/queries/taxon_name/tabular.rb', line 317

def nomenclature_stats_column(query, rank, valid = nil, index = 0)
  i = index
  v = valid
  r = rank

  h = hierarchy_table

  # May be alias?
  t = table

  s = [v, r].compact.join('_')
  a = ['ns_o', r, i, v].join('_')
  @columns.push({header: s, projected: "\"#{a}\".\"#{s}\" as #{s}" })

  x = t.where(
    # h[:generations].gt(0)
    t[:cached_is_valid].eq( v == 'valid' ? true : false )
    .and( t[:rank_class].eq( Ranks.lookup(ancestor.nomenclatural_code, r) ) )
  ).join(h, Arel::Nodes::InnerJoin).on(

    h[:descendant_id].eq( t[:cached_valid_taxon_name_id] )  ) # h[:descendant_id].eq( t[:id] ) )
    .project(
      h[:ancestor_id],
      t[:id].count.as( s )
    ).group(h[:ancestor_id]) # TODO: Might be wrong
      .as(a)

    query.join(x, Arel::Nodes::OuterJoin).on( x[:ancestor_id].eq( finest_rank_id  ))
    query
end

#observations_set(query) ⇒ `Object`

Only calculates for OTUs, not through to get CollectionObjects, nor Extracts

# File 'lib/queries/taxon_name/tabular.rb', line 239

def observations_set(query)
  o = ::Observation.arel_table

  # Descriptors on OTUs scored
  f = 'descriptors_scored_for_otus'
  fa = 'fs_d1'
  @columns.push({header: f, projected: '"' + fa + '"."' + f + '" as ' + f } )

  z = o.project(
    o[:observation_object_id],
    o[:observation_object_type],
    o[:descriptor_id].count(true).as(f) # count(true) == distinct
  ).group(
    o[:observation_object_id],
    o[:observation_object_type],
  ).as(fa)

  query.join(z, Arel::Nodes::OuterJoin).on(
    z[:observation_object_id].eq(otu_table[:id])
    .and(z[:observation_object_type].eq('Otu'))
  )

  # Observations on OTUs
  f = 'otu_observation_count'
  fa = 'fs_o1'
  @columns.push({header: f, projected: '"' + fa +  '"."' + f + '" as ' + f })

  x = o.project(
    o[:observation_object_id],
    o[:observation_object_type],
    o[:id].count.as(f)
  ).group(o[:observation_object_id], o[:observation_object_type]).as(fa)

  query.join(x, Arel::Nodes::OuterJoin).on(x[:observation_object_id].eq(otu_table[:id]).and( x[:observation_object_type].eq('Otu')  ))

  # Depictions on observations on OTUs
  f = 'otu_observation_depictions'
  fa = 'fs_o2'
  @columns.push( {header: f, projected: '"' + fa + '"."' + f + '" as ' + f } )

  p = ::Depiction.arel_table
  y = o.join(p, Arel::Nodes::InnerJoin).on(
    o[:id].eq(p[:depiction_object_id]).and(
      p[:depiction_object_type].eq('Observation')
    )
  ).project(
    o[:observation_object_id], # an OTU id
    p[:depiction_object_type],
    p[:id].count.as(f)
  ).group(
    o[:observation_object_id],
    p[:depiction_object_type]
  ).as(fa)

  query.join(y, Arel::Nodes::OuterJoin).on(
    y[:observation_object_id].eq(otu_table[:id])
  )
  query
end

#otu_table ⇒ `Object`



146
147
148

# File 'lib/queries/taxon_name/tabular.rb', line 146

def otu_table
  ::Otu.arel_table
end

#projected_columns ⇒ `Object`



121
122
123

# File 'lib/queries/taxon_name/tabular.rb', line 121

def projected_columns
  columns.collect{|c| c[:projected]}
end

#rank_classes ⇒ `Object`



210
211
212

# File 'lib/queries/taxon_name/tabular.rb', line 210

def rank_classes
  ranks.collect{|r| Ranks.lookup(ancestor.nomenclatural_code, r)}
end

#rank_over(source_table, valid_source_table) ⇒ `Object`

Lets us generate a sort order across valid, invalid, and combinations

# File 'lib/queries/taxon_name/tabular.rb', line 224

def rank_over(source_table, valid_source_table)
  Arel::Nodes::Over.new(
    Arel::Nodes::SqlLiteral.new('rank()'),
    Arel::Nodes::Window.new.partition(
      [ valid_source_table[:parent_id] ]
    ).order( source_table[:cached_valid_taxon_name_id], source_table[:cached], source_table[:name] )
  ).as('ro')
end

Class: Queries::TaxonName::Tabular

Overview

Instance Attribute Summary collapse

Attributes inherited from Query::Autocomplete

Attributes inherited from Query

Instance Method Summary collapse

Methods included from Helpers

Methods inherited from Query::Autocomplete

Methods inherited from Query

Constructor Details

#initialize(params = {}) ⇒ Tabular

Instance Attribute Details

#ancestor ⇒ Protonym?

#columns ⇒ Array

#combinations ⇒ Boolean

#fieldsets ⇒ Array of Strings

#limit ⇒ Object

#query ⇒ the build query

#rank_data ⇒ Array

#rank_id_fields ⇒ Object

#ranks ⇒ Array

#taxon_name_id ⇒ Integer

#validity ⇒ Boolean

Instance Method Details

#all ⇒ Object

#base_query ⇒ Object

#build_query ⇒ Object

#coalesce_ranks(fields) ⇒ Object

#column_headers ⇒ Object

#finest_rank_id ⇒ Object

#hierarchy_table ⇒ Object

#initialize_columns ⇒ Object

#nomenclatural_stats_set(query) ⇒ Object

#nomenclature_stats_column(query, rank, valid = nil, index = 0) ⇒ Object

#observations_set(query) ⇒ Object

#otu_table ⇒ Object

#projected_columns ⇒ Object

#rank_classes ⇒ Object

#rank_over(source_table, valid_source_table) ⇒ Object

#initialize(params = {}) ⇒ `Tabular`

#ancestor ⇒ `Protonym`^?

#columns ⇒ `Array`

#combinations ⇒ `Boolean`

#fieldsets ⇒ `Array of Strings`

#limit ⇒ `Object`

#query ⇒ `the build query`

#rank_data ⇒ `Array`

#rank_id_fields ⇒ `Object`

#ranks ⇒ `Array`

#taxon_name_id ⇒ `Integer`

#validity ⇒ `Boolean`

#all ⇒ `Object`

#base_query ⇒ `Object`

#build_query ⇒ `Object`

#coalesce_ranks(fields) ⇒ `Object`

#column_headers ⇒ `Object`

#finest_rank_id ⇒ `Object`

#hierarchy_table ⇒ `Object`

#initialize_columns ⇒ `Object`

#nomenclatural_stats_set(query) ⇒ `Object`

#nomenclature_stats_column(query, rank, valid = nil, index = 0) ⇒ `Object`

#observations_set(query) ⇒ `Object`

#otu_table ⇒ `Object`

#projected_columns ⇒ `Object`

#rank_classes ⇒ `Object`

#rank_over(source_table, valid_source_table) ⇒ `Object`