Class: Descriptor::Gene

Inherits:
Descriptor show all
Defined in:
app/models/descriptor/gene.rb

Overview

A Descriptor::Gene defines a set of sequences, i.e. column in a “matrix” whose cells contain Sequences with a specific set of attributes (e.g. forward and reverse primers), as defined by GeneAttributes.

A user may define the set of sequences returned by the descriptor via a logical expression. For example show me all sequences with this set of forward primers and that set of reverse primers. The logic can be expanded as extensively as needed, up to a maximum 52 attributes.

@!attribute cached_gene_attribute_sql
 @return [String]
   An automatically composed SQL fragment that corresponds to #gene_attribute_logic.  Used in #sequences.

Constant Summary

Constant Summary

Constants inherited from Descriptor

ALTERNATE_VALUES_FOR

Constants included from SoftValidation

SoftValidation::ANCESTORS_WITH_SOFT_VALIDATIONS

Instance Attribute Summary (collapse)

Class Method Summary (collapse)

Instance Method Summary (collapse)

Methods inherited from Descriptor

human_name, #qualitative?, #short_name_is_shorter, #sv_short_name_is_short, #type_is_subclassed

Methods included from SoftValidation

#clear_soft_validations, #fix_soft_validations, #soft_fixed?, #soft_valid?, #soft_validate, #soft_validated?, #soft_validations

Methods included from Housekeeping

#has_polymorphic_relationship?

Methods included from ActiverecordUtilities

#trim_attributes

Instance Attribute Details

- (Object) base_on_sequence

A Sequence, if provided clone that sequence description to this Descriptor::Gene



26
27
28
# File 'app/models/descriptor/gene.rb', line 26

def base_on_sequence
  @base_on_sequence
end

- (String) gene_attribute_logic

A logical expression describing how the gene attributes (e.g. primers) should be intepretted when return sequences. Call @gene_attribute.to_logic_literal for the format of individual gene attribute references. Use parenthesis, ` AND ` and ` OR ` to compose the statements. For example:

( SequenceRelationship::ForwardPrimer.2 OR SequenceRelationship::ForwardPrimer.3) AND SequenceRelationship::ReversePrimer.4

Returns:

  • (String)

    A logical expression describing how the gene attributes (e.g. primers) should be intepretted when return sequences. Call @gene_attribute.to_logic_literal for the format of individual gene attribute references. Use parenthesis, ` AND ` and ` OR ` to compose the statements. For example:

    ( SequenceRelationship::ForwardPrimer.2 OR SequenceRelationship::ForwardPrimer.3) AND SequenceRelationship::ReversePrimer.4


20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
# File 'app/models/descriptor/gene.rb', line 20

class Descriptor::Gene < Descriptor

  has_many :gene_attributes, inverse_of: :descriptor, foreign_key: :descriptor_id
  accepts_nested_attributes_for :gene_attributes

  # A Sequence, if provided clone that sequence description to this Descriptor::Gene
  attr_accessor :base_on_sequence

  before_validation :add_gene_attributes, if: -> {base_on_sequence.present?}

  validate :gene_attribute_logic_compresses, if: :gene_attribute_logic_changed?
  validate :gene_attribute_logic_parses, if: -> {
    ActiveSupport::Deprecation.silence do
      gene_attribute_logic_changed? && !errors.any?
    end
  }
  validate :gene_attribute_logic_matches_gene_attributes, if: :gene_attribute_logic_changed?

  after_save :cache_gene_attribute_logic_sql, if: -> {
    ActiveSupport::Deprecation.silence do
      saved_change_to_gene_attribute_logic? && valid?
    end
  }

  # @return [Scope]
  #   Sequences using AND for the supplied target attributes
  #
  # @param :target_attributes
  #    [[], [] ...] an array as generated from #sequence_query_set
  def self.sequences_for_gene_attributes(object_sequence_id = nil, target_attributes = [], table_alias = nil)
    return Sequence.none if target_attributes.empty?

    s  = Sequence.arel_table
    sr = SequenceRelationship.arel_table

    a = s.alias("a_#{table_alias}")

    b = s.project(a[Arel.star]).from(a)
          .join(sr)
          .on(sr['object_sequence_id'].eq(a['id']))

    i = 0
    target_attributes.each do |sequence_type, id|
      sr_a = sr.alias("#{table_alias}_#{i}")
      b    = b.join(sr_a).on(
        sr_a['object_sequence_id'].eq(a['id']),
        sr_a['type'].eq(sequence_type),
        sr_a['subject_sequence_id'].eq(id)
      )
      i    += 1
    end

    b = b.group(a['id']).having(sr['object_sequence_id'].count.gteq(target_attributes.count))
    b = b.as("z_#{table_alias}")

    Sequence.joins(Arel::Nodes::InnerJoin.new(b, Arel::Nodes::On.new(b['id'].eq(s['id']))))
  end

  # @return [Array]
  #   of Arrays, like [[sequence_id, sequence_relationship_type], [sequence_id, sequence_relationship_type]]
  def self.gene_attribute_pairs(target_gene_attributes = GeneAttribute.none)
    target_gene_attributes.pluck(:sequence_id, :sequence_relationship_type)
  end

  # @return [Scope]
  #   Sequences as determined by #gene_attribute_logic, or
  #   if that is nil, #sequences_matching_any_gene_attributes
  def sequences
    return Sequence.none if !gene_attributes.all.any?
    return sequences_matching_any_gene_attributes if gene_attribute_logic.blank?
    Sequence.from("(#{cached_gene_attribute_sql}) as sequences").distinct
  end

  # @return [Scope]
  #   a Sequence scope that matches ALL, and only ALL gene attributes
  #   AHA from http://stackoverflow.com/questions/28568205/rails-4-arel-join-on-subquery
  def strict_and_sequences
    return Sequence.none if !gene_attributes.all.any?

    data = gene_attribute_pairs

    s  = Sequence.arel_table
    sr = SequenceRelationship.arel_table

    j = s.alias('j') # required for group/having purposes

    b = s.project(j[Arel.star]).from(j)
          .join(sr)
          .on(sr['object_sequence_id'].eq(j['id']))

    # Build an aliased join for each set of attributes
    data.each do |id, type|
      sr_a = sr.alias("b#{id}")
      b    = b.join(sr_a).on(
        sr_a['object_sequence_id'].eq(j['id']),
        sr_a['type'].eq(type),
        sr_a['subject_sequence_id'].eq(id)
      )
    end

    # match only those sequences with exactly these attributes, no more, no less
    b = b.group(j['id']).having(sr['object_sequence_id'].count.eq(data.count))

    b = b.as('join_alias')

    Sequence.joins(Arel::Nodes::InnerJoin.new(b, Arel::Nodes::On.new(b['id'].eq(s['id']))))
  end

  # @return [Scope]
  #   Sequences matching any #gene_attributes
  #   !! This ignores logic in gene_attribute_logic !!
  def sequences_matching_any_gene_attributes
    return Sequence.none if !gene_attributes.all.any?

    sr = SequenceRelationship.arel_table

    clauses = gene_attribute_pairs.collect {|subject_sequence_id, type|
      sr[:subject_sequence_id].eq(subject_sequence_id)
        .and(sr[:type].eq(type))
    }

    q = clauses.shift
    clauses.each do |c|
      q = q.or(c)
    end

    Sequence.joins(:related_sequence_relationships).where(q.to_sql).references(:sequence_relationships).distinct
  end

  # @return [Array]
  #   of Arrays, like [[sequence_id, sequence_relationship_type], [sequence_id, sequence_relationship_type]]
  def gene_attribute_pairs
    Descriptor::Gene.gene_attribute_pairs(gene_attributes.all)
  end

  # @return [Boolean]
  #   true if the current logic statement contains the gene_attribute
  def contains_logic_for?(gene_attribute)
    gene_attribute_logic =~ /#{gene_attribute.to_logic_literal}/ ? true : false
  end

  # @return [Array]
  def sequence_query_set
    attributes_from_or_queries(
      Utilities::Logic.or_queries(compress_logic)
    )
  end

  # @return [Array]
  def attributes_from_or_queries(queries)
    translate = gene_attribute_term_index.invert
    a         = []
    queries.each do |v|
      b = []
      v.split(//).each do |axiom|
        b.push translate[axiom].split(/\./)
      end
      a.push b
    end
    a
  end

  # @return [Hash]
  #   a lookup linking key/value terms to their single letter representation
  #   note that
  def gene_attribute_term_index
    symbols = ("a".."z").to_a + ("A".."Z").to_a
    matches = gene_attribute_logic.scan(/([A-Za-z:]+\.[\d]+)/).flatten
    h       = {}
    matches.each_with_index do |m, i|
      h[m] = symbols[i]
    end
    h
  end

  # @return [String]
  #   translates each key/value (SequenceRelationshipType.SequenceID) term into a single letter term
  #   and translates 'AND' to '+' and 'OR' to '.'
  def compress_logic
    a = gene_attribute_logic.dup
    b = gene_attribute_term_index

    b.each do |k, v|
      # Match whole words, ABC should only match ABC NOT AB.
      # Uses lookahead to acomplish this by checking if the
      # next character is a space, closing parentheses, or
      # is the end of the string
      a.gsub!(/#{k}(?=[\s)]|$)/, v)
    end
    a.gsub!(/\s+OR\s+/, '+')
    a.gsub!(/\s+AND\s+/, '.')
    a.gsub!(/\s+/, '')
    a
  end

  # See use in GeneAttribute
  def extend_gene_attribute_logic(gene_attribute, logic = :and)
    logic.downcase!.to_sym! unless logic.kind_of?(Symbol)
    raise if ![:and, :or].include?(logic)

    append_gene_attribute_logic(gene_attribute, logic)
    cache_gene_attribute_logic_sql
  end

  protected

  def gene_attribute_logic_compresses
    if compress_logic.match?(/[^a-zA-Z\(\)\.\+]/)
      errors.add(:gene_attribute_logic, "is invalidly formed (likely a bad sequence_relationship_type)")
    end
  end

  def gene_attribute_logic_parses
    begin
      Utilities::Logic.parse_logic(compress_logic).to_s.split('+')
    rescue Parslet::ParseFailed => e
      errors.add(:gene_attribute_logic, "is invalidly formed: #{e.to_s}")
    end
  end

  def gene_attribute_logic_matches_gene_attributes
    a = gene_attribute_term_index.keys
    b = gene_attributes.collect {|ga| ga.to_logic_literal}
    c = a - b
    d = b - a
    errors.add(:gene_attribute_logic, "provided logic without matching gene attribute: #{c.join(';')}") if !c.empty?
    errors.add(:gene_attribute_logic, "gene attribute (#{d.join(';')}) not referenced in provided logic") if !d.empty?
    !errors.any?
  end

  def add_gene_attributes
    base_on_sequence.related_sequence_relationships.each do |sa|
      gene_attributes.build(sequence: sa.sequence, type: sa.type)
    end
  end

  def append_gene_attribute_logic(gene_attribute, logic = :and)
    v = [gene_attribute_logic, gene_attribute.to_logic_literal].compact.join(' AND ')
    update_column(:gene_attribute_logic, v)
  end

  def build_gene_attribute_logic_sql
    queries = []
    sequence_query_set.each_with_index do |target_attributes, i|
      queries.push Descriptor::Gene.sequences_for_gene_attributes(id, target_attributes, "uq#{i}")
    end

    queries.collect {|q| "(#{q.to_sql})"}.join(' UNION ')
  end

  def cache_gene_attribute_logic_sql
    update_column(:cached_gene_attribute_sql, build_gene_attribute_logic_sql)
  end

end

Class Method Details

+ (Array) gene_attribute_pairs(target_gene_attributes = GeneAttribute.none)

Returns of Arrays, like [[sequence_id, sequence_relationship_type], [sequence_id, sequence_relationship_type]]

Returns:

  • (Array)

    of Arrays, like [[sequence_id, sequence_relationship_type], [sequence_id, sequence_relationship_type]]



80
81
82
# File 'app/models/descriptor/gene.rb', line 80

def self.gene_attribute_pairs(target_gene_attributes = GeneAttribute.none)
  target_gene_attributes.pluck(:sequence_id, :sequence_relationship_type)
end

+ (Scope) sequences_for_gene_attributes(object_sequence_id = nil, target_attributes = [], table_alias = nil)

Returns Sequences using AND for the supplied target attributes

Parameters:

  • :target_attributes ([], [] ...)

    an array as generated from #sequence_query_set

Returns:

  • (Scope)

    Sequences using AND for the supplied target attributes



49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# File 'app/models/descriptor/gene.rb', line 49

def self.sequences_for_gene_attributes(object_sequence_id = nil, target_attributes = [], table_alias = nil)
  return Sequence.none if target_attributes.empty?

  s  = Sequence.arel_table
  sr = SequenceRelationship.arel_table

  a = s.alias("a_#{table_alias}")

  b = s.project(a[Arel.star]).from(a)
        .join(sr)
        .on(sr['object_sequence_id'].eq(a['id']))

  i = 0
  target_attributes.each do |sequence_type, id|
    sr_a = sr.alias("#{table_alias}_#{i}")
    b    = b.join(sr_a).on(
      sr_a['object_sequence_id'].eq(a['id']),
      sr_a['type'].eq(sequence_type),
      sr_a['subject_sequence_id'].eq(id)
    )
    i    += 1
  end

  b = b.group(a['id']).having(sr['object_sequence_id'].count.gteq(target_attributes.count))
  b = b.as("z_#{table_alias}")

  Sequence.joins(Arel::Nodes::InnerJoin.new(b, Arel::Nodes::On.new(b['id'].eq(s['id']))))
end

Instance Method Details

- (Object) add_gene_attributes (protected)



250
251
252
253
254
# File 'app/models/descriptor/gene.rb', line 250

def add_gene_attributes
  base_on_sequence.related_sequence_relationships.each do |sa|
    gene_attributes.build(sequence: sa.sequence, type: sa.type)
  end
end

- (Object) append_gene_attribute_logic(gene_attribute, logic = :and) (protected)



256
257
258
259
# File 'app/models/descriptor/gene.rb', line 256

def append_gene_attribute_logic(gene_attribute, logic = :and)
  v = [gene_attribute_logic, gene_attribute.to_logic_literal].compact.join(' AND ')
  update_column(:gene_attribute_logic, v)
end

- (Array) attributes_from_or_queries(queries)

Returns:

  • (Array)


169
170
171
172
173
174
175
176
177
178
179
180
# File 'app/models/descriptor/gene.rb', line 169

def attributes_from_or_queries(queries)
  translate = gene_attribute_term_index.invert
  a         = []
  queries.each do |v|
    b = []
    v.split(//).each do |axiom|
      b.push translate[axiom].split(/\./)
    end
    a.push b
  end
  a
end

- (Object) build_gene_attribute_logic_sql (protected)



261
262
263
264
265
266
267
268
# File 'app/models/descriptor/gene.rb', line 261

def build_gene_attribute_logic_sql
  queries = []
  sequence_query_set.each_with_index do |target_attributes, i|
    queries.push Descriptor::Gene.sequences_for_gene_attributes(id, target_attributes, "uq#{i}")
  end

  queries.collect {|q| "(#{q.to_sql})"}.join(' UNION ')
end

- (Object) cache_gene_attribute_logic_sql (protected)



270
271
272
# File 'app/models/descriptor/gene.rb', line 270

def cache_gene_attribute_logic_sql
  update_column(:cached_gene_attribute_sql, build_gene_attribute_logic_sql)
end

- (String) compress_logic

Returns translates each key/value (SequenceRelationshipType.SequenceID) term into a single letter term and translates 'AND' to '+' and 'OR' to '.'

Returns:

  • (String)

    translates each key/value (SequenceRelationshipType.SequenceID) term into a single letter term and translates 'AND' to '+' and 'OR' to '.'



198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
# File 'app/models/descriptor/gene.rb', line 198

def compress_logic
  a = gene_attribute_logic.dup
  b = gene_attribute_term_index

  b.each do |k, v|
    # Match whole words, ABC should only match ABC NOT AB.
    # Uses lookahead to acomplish this by checking if the
    # next character is a space, closing parentheses, or
    # is the end of the string
    a.gsub!(/#{k}(?=[\s)]|$)/, v)
  end
  a.gsub!(/\s+OR\s+/, '+')
  a.gsub!(/\s+AND\s+/, '.')
  a.gsub!(/\s+/, '')
  a
end

- (Boolean) contains_logic_for?(gene_attribute)

Returns true if the current logic statement contains the gene_attribute

Returns:

  • (Boolean)

    true if the current logic statement contains the gene_attribute



157
158
159
# File 'app/models/descriptor/gene.rb', line 157

def contains_logic_for?(gene_attribute)
  gene_attribute_logic =~ /#{gene_attribute.to_logic_literal}/ ? true : false
end

- (Object) extend_gene_attribute_logic(gene_attribute, logic = :and)

See use in GeneAttribute



216
217
218
219
220
221
222
# File 'app/models/descriptor/gene.rb', line 216

def extend_gene_attribute_logic(gene_attribute, logic = :and)
  logic.downcase!.to_sym! unless logic.kind_of?(Symbol)
  raise if ![:and, :or].include?(logic)

  append_gene_attribute_logic(gene_attribute, logic)
  cache_gene_attribute_logic_sql
end

- (Object) gene_attribute_logic_compresses (protected)



226
227
228
229
230
# File 'app/models/descriptor/gene.rb', line 226

def gene_attribute_logic_compresses
  if compress_logic.match?(/[^a-zA-Z\(\)\.\+]/)
    errors.add(:gene_attribute_logic, "is invalidly formed (likely a bad sequence_relationship_type)")
  end
end

- (Object) gene_attribute_logic_matches_gene_attributes (protected)



240
241
242
243
244
245
246
247
248
# File 'app/models/descriptor/gene.rb', line 240

def gene_attribute_logic_matches_gene_attributes
  a = gene_attribute_term_index.keys
  b = gene_attributes.collect {|ga| ga.to_logic_literal}
  c = a - b
  d = b - a
  errors.add(:gene_attribute_logic, "provided logic without matching gene attribute: #{c.join(';')}") if !c.empty?
  errors.add(:gene_attribute_logic, "gene attribute (#{d.join(';')}) not referenced in provided logic") if !d.empty?
  !errors.any?
end

- (Object) gene_attribute_logic_parses (protected)



232
233
234
235
236
237
238
# File 'app/models/descriptor/gene.rb', line 232

def gene_attribute_logic_parses
  begin
    Utilities::Logic.parse_logic(compress_logic).to_s.split('+')
  rescue Parslet::ParseFailed => e
    errors.add(:gene_attribute_logic, "is invalidly formed: #{e.to_s}")
  end
end

- (Array) gene_attribute_pairs

Returns of Arrays, like [[sequence_id, sequence_relationship_type], [sequence_id, sequence_relationship_type]]

Returns:

  • (Array)

    of Arrays, like [[sequence_id, sequence_relationship_type], [sequence_id, sequence_relationship_type]]



151
152
153
# File 'app/models/descriptor/gene.rb', line 151

def gene_attribute_pairs
  Descriptor::Gene.gene_attribute_pairs(gene_attributes.all)
end

- (Hash) gene_attribute_term_index

Returns a lookup linking key/value terms to their single letter representation note that

Returns:

  • (Hash)

    a lookup linking key/value terms to their single letter representation note that



185
186
187
188
189
190
191
192
193
# File 'app/models/descriptor/gene.rb', line 185

def gene_attribute_term_index
  symbols = ("a".."z").to_a + ("A".."Z").to_a
  matches = gene_attribute_logic.scan(/([A-Za-z:]+\.[\d]+)/).flatten
  h       = {}
  matches.each_with_index do |m, i|
    h[m] = symbols[i]
  end
  h
end

- (Array) sequence_query_set

Returns:

  • (Array)


162
163
164
165
166
# File 'app/models/descriptor/gene.rb', line 162

def sequence_query_set
  attributes_from_or_queries(
    Utilities::Logic.or_queries(compress_logic)
  )
end

- (Scope) sequences

Returns Sequences as determined by #gene_attribute_logic, or if that is nil, #sequences_matching_any_gene_attributes

Returns:

  • (Scope)

    Sequences as determined by #gene_attribute_logic, or if that is nil, #sequences_matching_any_gene_attributes



87
88
89
90
91
# File 'app/models/descriptor/gene.rb', line 87

def sequences
  return Sequence.none if !gene_attributes.all.any?
  return sequences_matching_any_gene_attributes if gene_attribute_logic.blank?
  Sequence.from("(#{cached_gene_attribute_sql}) as sequences").distinct
end

- (Scope) sequences_matching_any_gene_attributes

Returns Sequences matching any #gene_attributes !! This ignores logic in gene_attribute_logic !!

Returns:

  • (Scope)

    Sequences matching any #gene_attributes !! This ignores logic in gene_attribute_logic !!



131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# File 'app/models/descriptor/gene.rb', line 131

def sequences_matching_any_gene_attributes
  return Sequence.none if !gene_attributes.all.any?

  sr = SequenceRelationship.arel_table

  clauses = gene_attribute_pairs.collect {|subject_sequence_id, type|
    sr[:subject_sequence_id].eq(subject_sequence_id)
      .and(sr[:type].eq(type))
  }

  q = clauses.shift
  clauses.each do |c|
    q = q.or(c)
  end

  Sequence.joins(:related_sequence_relationships).where(q.to_sql).references(:sequence_relationships).distinct
end

- (Scope) strict_and_sequences

Returns a Sequence scope that matches ALL, and only ALL gene attributes AHA from stackoverflow.com/questions/28568205/rails-4-arel-join-on-subquery

Returns:



96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# File 'app/models/descriptor/gene.rb', line 96

def strict_and_sequences
  return Sequence.none if !gene_attributes.all.any?

  data = gene_attribute_pairs

  s  = Sequence.arel_table
  sr = SequenceRelationship.arel_table

  j = s.alias('j') # required for group/having purposes

  b = s.project(j[Arel.star]).from(j)
        .join(sr)
        .on(sr['object_sequence_id'].eq(j['id']))

  # Build an aliased join for each set of attributes
  data.each do |id, type|
    sr_a = sr.alias("b#{id}")
    b    = b.join(sr_a).on(
      sr_a['object_sequence_id'].eq(j['id']),
      sr_a['type'].eq(type),
      sr_a['subject_sequence_id'].eq(id)
    )
  end

  # match only those sequences with exactly these attributes, no more, no less
  b = b.group(j['id']).having(sr['object_sequence_id'].count.eq(data.count))

  b = b.as('join_alias')

  Sequence.joins(Arel::Nodes::InnerJoin.new(b, Arel::Nodes::On.new(b['id'].eq(s['id']))))
end