Module: Utilities::Strings

Defined in:
lib/utilities/strings.rb

Overview

Methods that recieve or generate a String. This methods in this library should be completely independant (i.e. ultimately gemifiable) from TaxonWorks.

Class Method Summary collapse

Class Method Details

.a_label(string) ⇒ Object

Returns String,nil the string preceeded with “a” or “an”.

Returns:

  • String,nil the string preceeded with “a” or “an”



11
12
13
14
# File 'lib/utilities/strings.rb', line 11

def self.a_label(string)
  return nil if string.to_s.length == 0
  (string =~ /\A[aeiou]/i ? 'an ' : 'a ') + string
end

.alphabetic_strings(string) ⇒ Array

Splits a string on special characters, returning an array of the strings that do not contain digits.

It splits on accent characters, and does not split on underscores. The method is used for building wildcard searches, so splitting on accents creates pseudo accent insensitivity in searches.

Parameters:

  • string (String)

Returns:

  • (Array)

    whitespace and special character split, then any string containing a digit eliminated



124
125
126
127
# File 'lib/utilities/strings.rb', line 124

def self.alphabetic_strings(string)
  return [] if string.nil? || string.length == 0
  string.split(/[^[[:word:]]]+/).select { |b| !(b =~ /\d/) }.reject { |b| b.empty? }
end

.alphanumeric_strings(string) ⇒ Object



129
130
131
132
# File 'lib/utilities/strings.rb', line 129

def self.alphanumeric_strings(string)
  return [] if string.nil? || string.length == 0
  string.split(/[^[[:word:]]]+/).reject { |b| b.empty? }
end

.authorship_sentence(last_names = []) ⇒ String?

TODO: DEPRECATE (doesn't belong here because to_sentence is Rails?

Parameters:

  • last_names (Array) (defaults to: [])

Returns:

  • (String, nil)


111
112
113
114
# File 'lib/utilities/strings.rb', line 111

def self.authorship_sentence(last_names = [])
  return nil if last_names.empty?
  last_names.to_sentence(two_words_connector: ' & ', last_word_connector: ' & ')
end

.encode_with_utf8(string) ⇒ String, false

Returns !! this is a bad sign, you should know your encoding before it gets to needing this.

Parameters:

  • string (String)

Returns:

  • (String, false)

    !! this is a bad sign, you should know your encoding before it gets to needing this



137
138
139
140
141
142
143
144
# File 'lib/utilities/strings.rb', line 137

def self.encode_with_utf8(string)
  return false if string.nil?
  if Encoding.compatible?('test'.encode(Encoding::UTF_8), string)
    string.force_encoding(Encoding::UTF_8)
  else
    false
  end
end

.escape_single_quote(string) ⇒ String

Adds a second single quote to escape apostrophe in SQL query strings

Parameters:

  • string (String)

Returns:

  • (String)


72
73
74
75
# File 'lib/utilities/strings.rb', line 72

def self.escape_single_quote(string)
  return nil if string.blank?
  string.gsub("'", "''")
end

.generate_md5(text) ⇒ Digest::MD5

Parameters:

  • text (String)

Returns:

  • (Digest::MD5)


51
52
53
54
55
# File 'lib/utilities/strings.rb', line 51

def self.generate_md5(text)
  return nil if text.blank?
  text = text.downcase.gsub(/[\s\.,;:\?!]*/, '')
  Digest::MD5.hexdigest(text)
end

.increment_contained_integer(string) ⇒ String, Boolean

Increments the first integer encountered in the string, wrapping it in only the immediate non integer strings before and after (see tests). Returns false if no number is found

Parameters:

  • string (String)

Returns:

  • (String, Boolean)

    increments the first integer encountered in the string, wrapping it in only the immediate non integer strings before and after (see tests). Returns false if no number is found



62
63
64
65
66
67
# File 'lib/utilities/strings.rb', line 62

def self.increment_contained_integer(string)
  string =~ /([^\d]*)(\d+)([^\d]*)/
  a, b, c = $1, $2, $3
  return false if b.nil?
  [a, (b.to_i + 1), c].compact.join
end

.integers(string) ⇒ Array<String>

Get numbers separated by spaces from a string

Parameters:

  • string (String)

Returns:

  • (Array<String>)

    of strings representing integers



163
164
165
166
# File 'lib/utilities/strings.rb', line 163

def self.integers(string)
  return [] if string.nil? || string.length == 0
  string.split(/\s+/).select { |t| is_i?(t) }
end

.is_i?(string) ⇒ Boolean

see stackoverflow.com/questions/1235863/test-if-a-string-is-basically-an-integer-in-quotes-using-ruby Note: Might checkout CSV::Converters constants to see how they handle this Allows '02', but treated as OK as 02.to_i returns 2

Parameters:

  • string (String)

Returns:

  • (Boolean)

    whether the string is an integer (positive or negative)



83
84
85
# File 'lib/utilities/strings.rb', line 83

def self.is_i?(string)
  /\A[-+]?\d+\z/ === string
end

.linearize(string, separator = ' | ') ⇒ Object



4
5
6
7
# File 'lib/utilities/strings.rb', line 4

def self.linearize(string, separator = ' | ')
  return nil if string.to_s.length == 0
  string.gsub(/\n|(\r\n)/, separator)
end

.nil_squish_strip(string) ⇒ String?

Returns strips pre/post fixed space and condenses internal spaces, and also but returns nil (not empty string) if nothing is left.

Parameters:

  • string (String)

Returns:

  • (String, nil)

    strips pre/post fixed space and condenses internal spaces, and also but returns nil (not empty string) if nothing is left



39
40
41
42
43
44
45
46
47
# File 'lib/utilities/strings.rb', line 39

def self.nil_squish_strip(string)
  a = string.dup
  if !a.nil?
    a.delete("\u0000")
    a.squish!
    a = nil if a == ''
  end
  a
end

.nil_strip(string) ⇒ String?

Returns strips space, leaves internal whitespace as is, returns nil if nothing is left.

Parameters:

  • string (String)

Returns:

  • (String, nil)

    strips space, leaves internal whitespace as is, returns nil if nothing is left



27
28
29
30
31
32
33
34
# File 'lib/utilities/strings.rb', line 27

def self.nil_strip(string) # string should have content or be empty
  a = string.dup
  if !a.nil?
    a.strip!
    a = nil if a == ''
  end
  a
end

.nil_wrap(pre = nil, content = nil, post = nil) ⇒ String?

Return nil if content.nil?, else wrap and return string if provided

Parameters:

  • pre (String) (defaults to: nil)
  • content (String) (defaults to: nil)
  • post (String) (defaults to: nil)

Returns:

  • (String, nil)

    return nil if content.nil?, else wrap and return string if provided



103
104
105
106
# File 'lib/utilities/strings.rb', line 103

def self.nil_wrap(pre = nil, content = nil, post = nil)
  return nil if content.blank?
  [pre, content, post].compact.join
end

.only_integer(string) ⇒ Integer?

Return an integer if and only if the string is a single integer, otherwise nil

Parameters:

  • string (String)

Returns:

  • (Integer, nil)

    return an integer if and only if the string is a single integer, otherwise nil



172
173
174
175
176
177
178
# File 'lib/utilities/strings.rb', line 172

def self.only_integer(string)
  if is_i?(string)
    string.to_i
  else
    nil
  end
end

.only_integers?(string) ⇒ Boolean

Returns true if the query string only contains integers separated by whitespace.

Returns:

  • (Boolean)

    true if the query string only contains integers separated by whitespace



182
183
184
# File 'lib/utilities/strings.rb', line 182

def self.only_integers?(string)
  !(string =~ /[^\d\s]/i) && !integers(string).empty?
end

.parse_authorship(authorship) ⇒ Array

Parse a scientificAuthorship field to extract author and year information.

If the format matches ICZN, adds parentheses around author name (if detected)

Parameters:

  • authorship (String)

Returns:

  • (Array)
    author_name, year


191
192
193
194
195
196
197
198
# File 'lib/utilities/strings.rb', line 191

def self.parse_authorship(authorship)
  return [] if (authorship = authorship.to_s.strip).empty?

  year_match = /(,|\s)\s*(?<year>\d+)(?<paren>\))?$/.match(authorship)
  author_name = "#{authorship[..(year_match&.offset(0)&.first || 0)-1]}#{year_match&.[](:paren)}"

  [author_name, year_match&.[](:year)]
end

.random_string(string_length) ⇒ String?

Returns stub a string of a certain length.

Parameters:

  • string_length (Integer)

Returns:

  • (String, nil)

    stub a string of a certain length



19
20
21
22
# File 'lib/utilities/strings.rb', line 19

def self.random_string(string_length)
  return nil if string_length.to_i == 0
  ('a'..'z').to_a.shuffle[0, string_length].join
end

.sanitize_for_csv(string) ⇒ String, param

Returns the goal is to sanitizie an individual string such that it is usable in TAB delimited, UTF-8, column. See Download TODO: Likely need to handle quotes, and write better UTF compliancy tests ~~ Technically n is allowed!.

Parameters:

  • string (String)

Returns:

  • (String, param)

    the goal is to sanitizie an individual string such that it is usable in TAB delimited, UTF-8, column. See Download TODO: Likely need to handle quotes, and write better UTF compliancy tests ~~ Technically n is allowed!



92
93
94
95
96
# File 'lib/utilities/strings.rb', line 92

def self.sanitize_for_csv(string)
  a = string.dup
  return a if a.blank? # TODO: .blank is Rails, not OK here
  a.to_s.gsub(/\n|\t/, ' ')
end

.verbatim_author(author_year_string) ⇒ String?

Parameters:

  • author_year_string (String)

Returns:

  • (String, nil)


213
214
215
216
217
218
# File 'lib/utilities/strings.rb', line 213

def self.verbatim_author(author_year_string)
  return nil if author_year_string.to_s.strip.empty?  # alternative to .blank?
  author_end_index = author_year_string.rindex(' ')
  author_end_index ||= author_year_string.length
  author_year_string[0...author_end_index]
end

.year_letter(string) ⇒ String?

Returns the immediately following letter recognized as coming directly past the first year

`Smith, 1920a. ... ` returns `a`.

Returns:

  • (String, nil)

    the immediately following letter recognized as coming directly past the first year

    `Smith, 1920a. ... ` returns `a`
    


155
156
157
# File 'lib/utilities/strings.rb', line 155

def self.year_letter(string)
  string.match(/\d{4}([a-zAZ]+)/).to_a.last
end

.year_of_publication(author_year) ⇒ String?

Parameters:

  • author_year (String)

Returns:

  • (String, nil)


202
203
204
205
206
207
208
209
# File 'lib/utilities/strings.rb', line 202

def self.year_of_publication(author_year)
  return nil if author_year.to_s.strip.empty?   # alternative to .blank?
  split_author_year = author_year.split(' ')
  year = split_author_year[split_author_year.length - 1]
  # try matching last element first, otherwise scan entire string for year
  # Maybe we don't need regex match and can use years(author_year) exclusively?
  year =~ /\A\d+\z/ ? year : years(author_year).last.to_s
end

.years(string) ⇒ Array

Returns:

  • (Array)


147
148
149
150
# File 'lib/utilities/strings.rb', line 147

def self.years(string)
  return [] if string.nil?
  string.scan(/\d{4}/).to_a.uniq
end