Module: Utilities::Strings

Defined in:
lib/utilities/strings.rb

Overview

Methods that receive or generate a String. This methods in this library should be completely independant (i.e. ultimately gemifiable) from TaxonWorks.

Constant Summary collapse

CLEANABLE =
[
  "\u0000", # nill string
  "\u0010", # DLE - data link escape
  "\u009A", # SCI - single character introducer
  "\u007F", # DEL - delete
  # "\u00A0", # no-break space !! Rails .blank? is true
].freeze

Class Method Summary collapse

Class Method Details

.a_label(string) ⇒ Object

Returns String,nil the string preceeded with “a” or “an”.

Returns:

  • String,nil the string preceeded with “a” or “an”



30
31
32
33
# File 'lib/utilities/strings.rb', line 30

def self.a_label(string)
  return nil if string.to_s.length == 0
  (string =~ /\A[aeiou]/i ? 'an ' : 'a ') + string
end

.alphabetic_strings(string) ⇒ Array

Splits a string on special characters, returning an array of the strings that do not contain digits.

It splits on accent characters, and does not split on underscores. The method is used for building wildcard searches, so splitting on accents creates pseudo accent insensitivity in searches.

#alphanumeric allows searches by page number, year, etc.

Parameters:

  • string (String)

Returns:

  • (Array)

    whitespace and special character split, then any string containing a digit eliminated



148
149
150
151
# File 'lib/utilities/strings.rb', line 148

def self.alphabetic_strings(string)
  return [] if string.nil? || string.length == 0
  string.split(/[^[[:word:]]]+/).select { |b| !(b =~ /\d/) }.reject { |b| b.empty? }
end

.alphanumeric_strings(string) ⇒ Object

alphanumeric allows searches by page number, year, etc.



154
155
156
157
# File 'lib/utilities/strings.rb', line 154

def self.alphanumeric_strings(string)
  return [] if string.nil? || string.length == 0
  string.split(/[^[[:word:]]]+/).reject { |b| b.empty? }
end

.asciify(string) ⇒ Object

Returns String, nil replace
, <i>, <b> tags with their asciidoc equivalents.

Returns:

  • String, nil replace
    , <i>, <b> tags with their asciidoc equivalents



14
15
16
17
18
19
20
21
# File 'lib/utilities/strings.rb', line 14

def self.asciify(string)
  return nil if string.to_s.length == 0

  string.gsub!(/<br>/, "\n")
  string.gsub!(/<i>|<\/i>/, '_')
  string.gsub!(/<b>|<\/b>/, '**')
  string
end

.authorship_sentence(last_names = []) ⇒ String?

TODO: DEPRECATE (doesn’t belong here because to_sentence is Rails?

Parameters:

  • last_names (Array) (defaults to: [])

Returns:

  • (String, nil)


134
135
136
137
# File 'lib/utilities/strings.rb', line 134

def self.authorship_sentence(last_names = [])
  return nil if last_names.empty?
  last_names.to_sentence(two_words_connector: ' & ', last_word_connector: ' & ')
end

.clean(string) ⇒ String?

Parameters:

  • string (String)

Returns:

  • (String, nil)


63
64
65
66
67
68
69
70
# File 'lib/utilities/strings.rb', line 63

def self.clean(string)
  a = string.dup
  if !a.nil?
    a.gsub!(/#{CLEANABLE.join('|')}/, '')
    a = nil if a == ''
  end
  a
end

.cleanable?(string) ⇒ Boolean

Returns Boolean.

Parameters:

  • string (String)

Returns:

  • (Boolean)

    Boolean



57
58
59
# File 'lib/utilities/strings.rb', line 57

def self.cleanable?(string)
  string =~ /#{CLEANABLE.join('|')}|\s/
end

.encode_with_utf8(string) ⇒ String, false

Returns !! this is a bad sign, you should know your encoding before it gets to needing this.

Parameters:

  • string (String)

Returns:

  • (String, false)

    !! this is a bad sign, you should know your encoding before it gets to needing this



162
163
164
165
166
167
168
169
# File 'lib/utilities/strings.rb', line 162

def self.encode_with_utf8(string)
  return false if string.nil?
  if Encoding.compatible?('test'.encode(Encoding::UTF_8), string)
    string.force_encoding(Encoding::UTF_8)
  else
    false
  end
end

.escape_single_quote(string) ⇒ String

Adds a second single quote to escape apostrophe in SQL query strings

Parameters:

  • string (String)

Returns:

  • (String)


95
96
97
98
# File 'lib/utilities/strings.rb', line 95

def self.escape_single_quote(string)
  return nil if string.blank?
  string.gsub("'", "''")
end

.generate_md5(text) ⇒ Digest::MD5

Parameters:

  • text (String)

Returns:

  • (Digest::MD5)


74
75
76
77
78
# File 'lib/utilities/strings.rb', line 74

def self.generate_md5(text)
  return nil if text.blank?
  text = text.downcase.gsub(/[\s\.,;:\?!]*/, '')
  Digest::MD5.hexdigest(text)
end

.increment_contained_integer(string) ⇒ String, Boolean

Increments the first integer encountered in the string, wrapping it in only the immediate non integer strings before and after (see tests). Returns false if no number is found

Parameters:

  • string (String)

Returns:

  • (String, Boolean)

    increments the first integer encountered in the string, wrapping it in only the immediate non integer strings before and after (see tests). Returns false if no number is found



85
86
87
88
89
90
# File 'lib/utilities/strings.rb', line 85

def self.increment_contained_integer(string)
  string =~ /([^\d]*)(\d+)([^\d]*)/
  a, b, c = $1, $2, $3
  return false if b.nil?
  [a, (b.to_i + 1), c].compact.join
end

.integers(string) ⇒ Array<String>

Get numbers separated by spaces from a string

Parameters:

  • string (String)

Returns:

  • (Array<String>)

    of strings representing integers



188
189
190
191
# File 'lib/utilities/strings.rb', line 188

def self.integers(string)
  return [] if string.nil? || string.length == 0
  string.split(/\s+/).select { |t| is_i?(t) }
end

.is_i?(string) ⇒ Boolean

see stackoverflow.com/questions/1235863/test-if-a-string-is-basically-an-integer-in-quotes-using-ruby Note: Might checkout CSV::Converters constants to see how they handle this Allows ‘02’, but treated as OK as 02.to_i returns 2

Parameters:

  • string (String)

Returns:

  • (Boolean)

    whether the string is an integer (positive or negative)



106
107
108
# File 'lib/utilities/strings.rb', line 106

def self.is_i?(string)
  /\A[-+]?\d+\z/ === string
end

.linearize(string, separator = ' | ') ⇒ Object



23
24
25
26
# File 'lib/utilities/strings.rb', line 23

def self.linearize(string, separator = ' | ')
  return nil if string.to_s.length == 0
  string.gsub(/\n|(\r\n)/, separator)
end

.nil_strip(string) ⇒ String?

Returns strips space, leaves internal whitespace as is, returns nil if nothing is left.

Parameters:

  • string (String)

Returns:

  • (String, nil)

    strips space, leaves internal whitespace as is, returns nil if nothing is left



46
47
48
49
50
51
52
53
# File 'lib/utilities/strings.rb', line 46

def self.nil_strip(string) # string should have content or be empty
  a = string.dup
  if !a.nil?
    a.strip!
    a = nil if a == ''
  end
  a
end

.nil_wrap(pre = nil, content = nil, post = nil) ⇒ String?

Return nil if content.nil?, else wrap and return string if provided

Parameters:

  • pre (String) (defaults to: nil)
  • content (String) (defaults to: nil)
  • post (String) (defaults to: nil)

Returns:

  • (String, nil)

    return nil if content.nil?, else wrap and return string if provided



126
127
128
129
# File 'lib/utilities/strings.rb', line 126

def self.nil_wrap(pre = nil, content = nil, post = nil)
  return nil if content.blank?
  [pre, content, post].compact.join
end

.only_integer(string) ⇒ Integer?

Return an integer if and only if the string is a single integer, otherwise nil

Parameters:

  • string (String)

Returns:

  • (Integer, nil)

    return an integer if and only if the string is a single integer, otherwise nil



197
198
199
200
201
202
203
# File 'lib/utilities/strings.rb', line 197

def self.only_integer(string)
  if is_i?(string)
    string.to_i
  else
    nil
  end
end

.only_integers?(string) ⇒ Boolean

Returns true if the query string only contains integers separated by whitespace.

Returns:

  • (Boolean)

    true if the query string only contains integers separated by whitespace



207
208
209
# File 'lib/utilities/strings.rb', line 207

def self.only_integers?(string)
  !(string =~ /[^\d\s]/i) && !integers(string).empty?
end

.parse_authorship(authorship) ⇒ Array

Parse a scientificAuthorship field to extract author and year information.

If the format matches ICZN, adds parentheses around author name (if detected)

Parameters:

  • authorship (String)

Returns:

  • (Array)
    author_name, year


216
217
218
219
220
221
222
223
# File 'lib/utilities/strings.rb', line 216

def self.parse_authorship(authorship)
  return [] if (authorship = authorship.to_s.strip).empty?

  year_match = /(,|\s)\s*(?<year>\d+)(?<paren>\))?$/.match(authorship)
  author_name = "#{authorship[..(year_match&.offset(0)&.first || 0)-1]}#{year_match&.[](:paren)}"

  [author_name, year_match&.[](:year)]
end

.random_string(string_length) ⇒ String?

Returns stub a string of a certain length.

Parameters:

  • string_length (Integer)

Returns:

  • (String, nil)

    stub a string of a certain length



38
39
40
41
# File 'lib/utilities/strings.rb', line 38

def self.random_string(string_length)
  return nil if string_length.to_i == 0
  ('a'..'z').to_a.shuffle[0, string_length].join
end

.sanitize_for_csv(string) ⇒ String, param

Returns the goal is to sanitizie an individual string such that it is usable in TAB delimited, UTF-8, column. See Download TODO: Likely need to handle quotes, and write better UTF compliancy tests ~~ Technically n is allowed!.

Parameters:

  • string (String)

Returns:

  • (String, param)

    the goal is to sanitizie an individual string such that it is usable in TAB delimited, UTF-8, column. See Download TODO: Likely need to handle quotes, and write better UTF compliancy tests ~~ Technically n is allowed!



115
116
117
118
119
# File 'lib/utilities/strings.rb', line 115

def self.sanitize_for_csv(string)
  a = string.dup
  return a if a.blank? # TODO: .blank is Rails, not OK here
  a.to_s.gsub(/\n|\t/, ' ')
end

.verbatim_author(author_year_string) ⇒ String?

Parameters:

  • author_year_string (String)

Returns:

  • (String, nil)


238
239
240
241
242
243
# File 'lib/utilities/strings.rb', line 238

def self.verbatim_author(author_year_string)
  return nil if author_year_string.to_s.strip.empty?  # alternative to .blank?
  author_end_index = author_year_string.rindex(' ')
  author_end_index ||= author_year_string.length
  author_year_string[0...author_end_index]
end

.year_letter(string) ⇒ String?

Returns the immediately following letter recognized as coming directly past the first year

`Smith, 1920a. ... ` returns `a`.

Returns:

  • (String, nil)

    the immediately following letter recognized as coming directly past the first year

    `Smith, 1920a. ... ` returns `a`
    


180
181
182
# File 'lib/utilities/strings.rb', line 180

def self.year_letter(string)
  string.match(/\d{4}([a-zAZ]+)/).to_a.last
end

.year_of_publication(author_year) ⇒ String?

Parameters:

  • author_year (String)

Returns:

  • (String, nil)


227
228
229
230
231
232
233
234
# File 'lib/utilities/strings.rb', line 227

def self.year_of_publication(author_year)
  return nil if author_year.to_s.strip.empty?   # alternative to .blank?
  split_author_year = author_year.split(' ')
  year = split_author_year[split_author_year.length - 1]
  # try matching last element first, otherwise scan entire string for year
  # Maybe we don't need regex match and can use years(author_year) exclusively?
  year =~ /\A\d+\z/ ? year : years(author_year).last.to_s
end

.years(string) ⇒ Array

Returns:

  • (Array)


172
173
174
175
# File 'lib/utilities/strings.rb', line 172

def self.years(string)
  return [] if string.nil?
  string.scan(/\d{4}/).to_a.uniq
end