Module: Utilities::Strings

Defined in:
lib/utilities/strings.rb

Overview

Methods that recieve or generate a String. This methods in this library should be completely independant (i.e. ultimately gemifiable) from TaxonWorks.

Class Method Summary collapse

Class Method Details

.alphabetic_strings(string) ⇒ Array

Splits a string on special characters, returning an array of the strings that do not contain digits.

It splits on accent characters, and does not split on underscores. The method is used for building wildcard searches, so splitting on accents creates pseudo accent insensitivity in searches.

Parameters:

  • string (String)

Returns:

  • (Array)

    whitespace and special character split, then any string containing a digit eliminated



109
110
111
112
# File 'lib/utilities/strings.rb', line 109

def self.alphabetic_strings(string)
  return [] if string.nil? || string.length == 0
  string.split(/\W/).select { |b| !(b =~ /\d/) }.reject { |b| b.empty? }
end

.authorship_sentence(last_names = []) ⇒ String?

TODO: DEPRECATE (doesn't belong here because to_sentence is Rails?

Parameters:

  • last_names (Array) (defaults to: [])

Returns:

  • (String, nil)


96
97
98
99
# File 'lib/utilities/strings.rb', line 96

def self.authorship_sentence(last_names = [])
  return nil if last_names.empty?
  last_names.to_sentence(two_words_connector: ' & ', last_word_connector: ' & ')
end

.encode_with_utf8(string) ⇒ String, false

Returns !! this is a bad sign, you should know your encoding before it gets to needing this.

Parameters:

  • string (String)

Returns:

  • (String, false)

    !! this is a bad sign, you should know your encoding before it gets to needing this



117
118
119
120
121
122
123
124
# File 'lib/utilities/strings.rb', line 117

def self.encode_with_utf8(string)
  return false if string.nil?
  if Encoding.compatible?('test'.encode(Encoding::UTF_8), string)
    string.force_encoding(Encoding::UTF_8)
  else
    false
  end
end

.escape_single_quote(string) ⇒ String

Adds a second single quote to escape apostrophe in SQL query strings

Parameters:

  • string (String)

Returns:

  • (String)


58
59
60
61
# File 'lib/utilities/strings.rb', line 58

def self.escape_single_quote(string)
  return nil if string.blank?
  string.gsub("'", "''")
end

.generate_md5(text) ⇒ Digest::MD5

Parameters:

  • text (String)

Returns:

  • (Digest::MD5)


37
38
39
40
41
# File 'lib/utilities/strings.rb', line 37

def self.generate_md5(text)
  return nil if text.blank?
  text = text.downcase.gsub(/[\s\.,;:\?!]*/, '')
  Digest::MD5.hexdigest(text)
end

.increment_contained_integer(string) ⇒ String, Boolean

Increments the first integer encountered in the string, wrapping it in only the immediate non integer strings before and after (see tests). Returns false if no number is found

Parameters:

  • string (String)

Returns:

  • (String, Boolean)

    increments the first integer encountered in the string, wrapping it in only the immediate non integer strings before and after (see tests). Returns false if no number is found



48
49
50
51
52
53
# File 'lib/utilities/strings.rb', line 48

def self.increment_contained_integer(string)
  string =~ /([^\d]*)(\d+)([^\d]*)/
  a, b, c = $1, $2, $3
  return false if b.nil?
  [a, (b.to_i + 1), c].compact.join
end

.integers(string) ⇒ Array<String>

Get numbers separated by spaces from a string

Parameters:

  • string (String)

Returns:

  • (Array<String>)

    of strings representing integers



143
144
145
146
# File 'lib/utilities/strings.rb', line 143

def self.integers(string)
  return [] if string.nil? || string.length == 0
  string.split(/\s+/).select { |t| is_i?(t) }
end

.is_i?(string) ⇒ Boolean

see stackoverflow.com/questions/1235863/test-if-a-string-is-basically-an-integer-in-quotes-using-ruby Note: Might checkout CSV::Converters constants to see how they handle this Allows '02' … hmm

Parameters:

  • string (String)

Returns:

  • (Boolean)

    whether the string is an integer (positive or negative)



69
70
71
# File 'lib/utilities/strings.rb', line 69

def self.is_i?(string)
  /\A[-+]?\d+\z/ === string
end

.nil_squish_strip(string) ⇒ String?

Returns strips pre/post fixed space and condenses internal spaces, but returns nil (not empty string) if nothing is left.

Parameters:

  • string (String)

Returns:

  • (String, nil)

    strips pre/post fixed space and condenses internal spaces, but returns nil (not empty string) if nothing is left



27
28
29
30
31
32
33
# File 'lib/utilities/strings.rb', line 27

def self.nil_squish_strip(string)
  if !string.nil?
    string.squish!
    string = nil if string == ''
  end
  string
end

.nil_strip(string) ⇒ String?

Returns strips space, leaves internal whitespace as is, returns nil if nothing is left.

Parameters:

  • string (String)

Returns:

  • (String, nil)

    strips space, leaves internal whitespace as is, returns nil if nothing is left



15
16
17
18
19
20
21
22
# File 'lib/utilities/strings.rb', line 15

def self.nil_strip(string)
  # string should have content or be empty
  if !string.nil?
    string.strip!
    string = nil if string == ''
  end
  string
end

.nil_wrap(pre = nil, content = nil, post = nil) ⇒ String

return nil if content.nil?, else wrap and return string if provided

Parameters:

  • pre (String) (defaults to: nil)
  • content (String) (defaults to: nil)
  • post (String) (defaults to: nil)

Returns:

  • (String)


88
89
90
91
# File 'lib/utilities/strings.rb', line 88

def self.nil_wrap(pre = nil, content = nil, post = nil)
  return nil if content.blank?
  [pre, content, post].compact.join.html_safe
end

.only_integers?(string) ⇒ Boolean

Returns true if the query string only contains integers separated by whitespace.

Returns:

  • (Boolean)

    true if the query string only contains integers separated by whitespace



150
151
152
# File 'lib/utilities/strings.rb', line 150

def self.only_integers?(string)
  !(string =~ /[^\d\s]/i) && !integers(string).empty?
end

.parse_authorship(authorship) ⇒ Array

Parse a scientificAuthorship field to extract author and year information.

If the format matches ICZN, adds parentheses around author name (if detected)

Parameters:

  • authorship (String)

Returns:

  • (Array)
    author_name, year


159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
# File 'lib/utilities/strings.rb', line 159

def self.parse_authorship(authorship)
  return [] if authorship.to_s.strip.empty?
  if (authorship_matchdata = authorship.match(/\(?(?<author>.+?),? (?<year>\d{4})?\)?/))

    author_name = authorship_matchdata[:author]
    year = authorship_matchdata[:year]

    # author name should be wrapped in parentheses if the verbatim authorship was
    if authorship.start_with?('(') and authorship.end_with?(')')
      author_name = '(' + author_name + ')'
    end

  else
    # Fall back to simple name + date parsing
    author_name = verbatim_author(authorship)
    year = year_of_publication(authorship)
  end

  [author_name, year]
end

.random_string(string_length) ⇒ String?

Returns stub a string of a certain length.

Parameters:

  • string_length (Integer)

Returns:

  • (String, nil)

    stub a string of a certain length



7
8
9
10
# File 'lib/utilities/strings.rb', line 7

def self.random_string(string_length)
  return nil if string_length.to_i == 0
  ('a'..'z').to_a.shuffle[0, string_length].join
end

.sanitize_for_csv(string) ⇒ String, param

Returns the goal is to sanitizie an individual string such that it is usable in TAB delimited, UTF-8, column. See Download TODO: Likely need to handle quotes, and write better UTF compliancy tests ~~ Technically n is allowed!.

Parameters:

  • string (String)

Returns:

  • (String, param)

    the goal is to sanitizie an individual string such that it is usable in TAB delimited, UTF-8, column. See Download TODO: Likely need to handle quotes, and write better UTF compliancy tests ~~ Technically n is allowed!



78
79
80
81
# File 'lib/utilities/strings.rb', line 78

def self.sanitize_for_csv(string)
  return string if string.blank?
  string.to_s.gsub(/\n|\t/, ' ')
end

.verbatim_author(author_year_string) ⇒ String?

Parameters:

  • author_year_string (String)

Returns:

  • (String, nil)


193
194
195
196
197
198
# File 'lib/utilities/strings.rb', line 193

def self.verbatim_author(author_year_string)
  return nil if author_year_string.to_s.strip.empty?  # alternative to .blank?
  author_end_index = author_year_string.rindex(' ')
  author_end_index ||= author_year_string.length
  author_year_string[0...author_end_index]
end

.year_letter(string) ⇒ String?

Returns the immediately following letter recognized as coming directly past the first year

`Smith, 1920a. ... ` returns `a`.

Returns:

  • (String, nil)

    the immediately following letter recognized as coming directly past the first year

    `Smith, 1920a. ... ` returns `a`
    


135
136
137
# File 'lib/utilities/strings.rb', line 135

def self.year_letter(string)
  string.match(/\d{4}([a-zAZ]+)/).to_a.last
end

.year_of_publication(author_year) ⇒ String?

Parameters:

  • author_year (String)

Returns:

  • (String, nil)


182
183
184
185
186
187
188
189
# File 'lib/utilities/strings.rb', line 182

def self.year_of_publication(author_year)
  return nil if author_year.to_s.strip.empty?   # alternative to .blank?
  split_author_year = author_year.split(' ')
  year = split_author_year[split_author_year.length - 1]
  # try matching last element first, otherwise scan entire string for year
  # Maybe we don't need regex match and can use years(author_year) exclusively?
  year =~ /\A\d+\z/ ? year : years(author_year).last.to_s
end

.years(string) ⇒ Array

Returns:

  • (Array)


127
128
129
130
# File 'lib/utilities/strings.rb', line 127

def self.years(string)
  return [] if string.nil?
  string.scan(/\d{4}/).to_a.uniq
end