Module: Utilities::Strings

Defined in:
lib/utilities/strings.rb

Overview

Methods that recieve or generate a String. This methods in this library should be completely independant (i.e. ultimately gemifiable) from TaxonWorks.

Class Method Summary collapse

Class Method Details

.alphabetic_strings(string) ⇒ Array

Splits a string on special characters, returning an array of the strings that do not contain digits.

It splits on accent characters, and does not split on underscores. The method is used for building wildcard searches, so splitting on accents creates pseudo accent insensitivity in searches.

Parameters:

  • string (String)

Returns:

  • (Array)

    whitespace and special character split, then any string containing a digit eliminated



112
113
114
115
# File 'lib/utilities/strings.rb', line 112

def self.alphabetic_strings(string)
  return [] if string.nil? || string.length == 0
  string.split(/[^[[:word:]]]+/).select { |b| !(b =~ /\d/) }.reject { |b| b.empty? }
end

.alphanumeric_strings(string) ⇒ Object



117
118
119
120
# File 'lib/utilities/strings.rb', line 117

def self.alphanumeric_strings(string)
  return [] if string.nil? || string.length == 0
  string.split(/[^[[:word:]]]+/).reject { |b| b.empty? }
end

.authorship_sentence(last_names = []) ⇒ String?

TODO: DEPRECATE (doesn't belong here because to_sentence is Rails?

Parameters:

  • last_names (Array) (defaults to: [])

Returns:

  • (String, nil)


99
100
101
102
# File 'lib/utilities/strings.rb', line 99

def self.authorship_sentence(last_names = [])
  return nil if last_names.empty?
  last_names.to_sentence(two_words_connector: ' & ', last_word_connector: ' & ')
end

.encode_with_utf8(string) ⇒ String, false

Returns !! this is a bad sign, you should know your encoding before it gets to needing this.

Parameters:

  • string (String)

Returns:

  • (String, false)

    !! this is a bad sign, you should know your encoding before it gets to needing this



125
126
127
128
129
130
131
132
# File 'lib/utilities/strings.rb', line 125

def self.encode_with_utf8(string)
  return false if string.nil?
  if Encoding.compatible?('test'.encode(Encoding::UTF_8), string)
    string.force_encoding(Encoding::UTF_8)
  else
    false
  end
end

.escape_single_quote(string) ⇒ String

Adds a second single quote to escape apostrophe in SQL query strings

Parameters:

  • string (String)

Returns:

  • (String)


60
61
62
63
# File 'lib/utilities/strings.rb', line 60

def self.escape_single_quote(string)
  return nil if string.blank?
  string.gsub("'", "''")
end

.generate_md5(text) ⇒ Digest::MD5

Parameters:

  • text (String)

Returns:

  • (Digest::MD5)


39
40
41
42
43
# File 'lib/utilities/strings.rb', line 39

def self.generate_md5(text)
  return nil if text.blank?
  text = text.downcase.gsub(/[\s\.,;:\?!]*/, '')
  Digest::MD5.hexdigest(text)
end

.increment_contained_integer(string) ⇒ String, Boolean

Increments the first integer encountered in the string, wrapping it in only the immediate non integer strings before and after (see tests). Returns false if no number is found

Parameters:

  • string (String)

Returns:

  • (String, Boolean)

    increments the first integer encountered in the string, wrapping it in only the immediate non integer strings before and after (see tests). Returns false if no number is found



50
51
52
53
54
55
# File 'lib/utilities/strings.rb', line 50

def self.increment_contained_integer(string)
  string =~ /([^\d]*)(\d+)([^\d]*)/
  a, b, c = $1, $2, $3
  return false if b.nil?
  [a, (b.to_i + 1), c].compact.join
end

.integers(string) ⇒ Array<String>

Get numbers separated by spaces from a string

Parameters:

  • string (String)

Returns:

  • (Array<String>)

    of strings representing integers



151
152
153
154
# File 'lib/utilities/strings.rb', line 151

def self.integers(string)
  return [] if string.nil? || string.length == 0
  string.split(/\s+/).select { |t| is_i?(t) }
end

.is_i?(string) ⇒ Boolean

see stackoverflow.com/questions/1235863/test-if-a-string-is-basically-an-integer-in-quotes-using-ruby Note: Might checkout CSV::Converters constants to see how they handle this Allows '02' … hmm

Parameters:

  • string (String)

Returns:

  • (Boolean)

    whether the string is an integer (positive or negative)



71
72
73
# File 'lib/utilities/strings.rb', line 71

def self.is_i?(string)
  /\A[-+]?\d+\z/ === string
end

.nil_squish_strip(string) ⇒ String?

Returns strips pre/post fixed space and condenses internal spaces, and also but returns nil (not empty string) if nothing is left.

Parameters:

  • string (String)

Returns:

  • (String, nil)

    strips pre/post fixed space and condenses internal spaces, and also but returns nil (not empty string) if nothing is left



27
28
29
30
31
32
33
34
35
# File 'lib/utilities/strings.rb', line 27

def self.nil_squish_strip(string)
  a = string.dup
  if !a.nil?
    a.delete("\u0000")
    a.squish!
    a = nil if a == ''
  end
  a 
end

.nil_strip(string) ⇒ String?

Returns strips space, leaves internal whitespace as is, returns nil if nothing is left.

Parameters:

  • string (String)

Returns:

  • (String, nil)

    strips space, leaves internal whitespace as is, returns nil if nothing is left



15
16
17
18
19
20
21
22
# File 'lib/utilities/strings.rb', line 15

def self.nil_strip(string) # string should have content or be empty
  a = string.dup
  if !a.nil?
    a.strip!
    a = nil if a == ''
  end
  a 
end

.nil_wrap(pre = nil, content = nil, post = nil) ⇒ String?

Return nil if content.nil?, else wrap and return string if provided

Parameters:

  • pre (String) (defaults to: nil)
  • content (String) (defaults to: nil)
  • post (String) (defaults to: nil)

Returns:

  • (String, nil)

    return nil if content.nil?, else wrap and return string if provided



91
92
93
94
# File 'lib/utilities/strings.rb', line 91

def self.nil_wrap(pre = nil, content = nil, post = nil)
  return nil if content.blank?
  [pre, content, post].compact.join
end

.only_integers?(string) ⇒ Boolean

Returns true if the query string only contains integers separated by whitespace.

Returns:

  • (Boolean)

    true if the query string only contains integers separated by whitespace



158
159
160
# File 'lib/utilities/strings.rb', line 158

def self.only_integers?(string)
  !(string =~ /[^\d\s]/i) && !integers(string).empty?
end

.parse_authorship(authorship) ⇒ Array

Parse a scientificAuthorship field to extract author and year information.

If the format matches ICZN, adds parentheses around author name (if detected)

Parameters:

  • authorship (String)

Returns:

  • (Array)
    author_name, year


167
168
169
170
171
172
173
174
# File 'lib/utilities/strings.rb', line 167

def self.parse_authorship(authorship)
  return [] if (authorship = authorship.to_s.strip).empty?

  year_match = /(,|\s)\s*(?<year>\d+)(?<paren>\))?$/.match(authorship)
  author_name = "#{authorship[..(year_match&.offset(0)&.first || 0)-1]}#{year_match&.[](:paren)}"

  [author_name, year_match&.[](:year)]
end

.random_string(string_length) ⇒ String?

Returns stub a string of a certain length.

Parameters:

  • string_length (Integer)

Returns:

  • (String, nil)

    stub a string of a certain length



7
8
9
10
# File 'lib/utilities/strings.rb', line 7

def self.random_string(string_length)
  return nil if string_length.to_i == 0
  ('a'..'z').to_a.shuffle[0, string_length].join
end

.sanitize_for_csv(string) ⇒ String, param

Returns the goal is to sanitizie an individual string such that it is usable in TAB delimited, UTF-8, column. See Download TODO: Likely need to handle quotes, and write better UTF compliancy tests ~~ Technically n is allowed!.

Parameters:

  • string (String)

Returns:

  • (String, param)

    the goal is to sanitizie an individual string such that it is usable in TAB delimited, UTF-8, column. See Download TODO: Likely need to handle quotes, and write better UTF compliancy tests ~~ Technically n is allowed!



80
81
82
83
84
# File 'lib/utilities/strings.rb', line 80

def self.sanitize_for_csv(string)
  a = string.dup
  return a if a.blank? # TODO: .blank is Rails, not OK here
  a.to_s.gsub(/\n|\t/, ' ')
end

.verbatim_author(author_year_string) ⇒ String?

Parameters:

  • author_year_string (String)

Returns:

  • (String, nil)


189
190
191
192
193
194
# File 'lib/utilities/strings.rb', line 189

def self.verbatim_author(author_year_string)
  return nil if author_year_string.to_s.strip.empty?  # alternative to .blank?
  author_end_index = author_year_string.rindex(' ')
  author_end_index ||= author_year_string.length
  author_year_string[0...author_end_index]
end

.year_letter(string) ⇒ String?

Returns the immediately following letter recognized as coming directly past the first year

`Smith, 1920a. ... ` returns `a`.

Returns:

  • (String, nil)

    the immediately following letter recognized as coming directly past the first year

    `Smith, 1920a. ... ` returns `a`
    


143
144
145
# File 'lib/utilities/strings.rb', line 143

def self.year_letter(string)
  string.match(/\d{4}([a-zAZ]+)/).to_a.last
end

.year_of_publication(author_year) ⇒ String?

Parameters:

  • author_year (String)

Returns:

  • (String, nil)


178
179
180
181
182
183
184
185
# File 'lib/utilities/strings.rb', line 178

def self.year_of_publication(author_year)
  return nil if author_year.to_s.strip.empty?   # alternative to .blank?
  split_author_year = author_year.split(' ')
  year = split_author_year[split_author_year.length - 1]
  # try matching last element first, otherwise scan entire string for year
  # Maybe we don't need regex match and can use years(author_year) exclusively?
  year =~ /\A\d+\z/ ? year : years(author_year).last.to_s
end

.years(string) ⇒ Array

Returns:

  • (Array)


135
136
137
138
# File 'lib/utilities/strings.rb', line 135

def self.years(string)
  return [] if string.nil?
  string.scan(/\d{4}/).to_a.uniq
end