Module: Utilities::Strings
- Defined in:
- lib/utilities/strings.rb
Overview
Methods that receive or generate a String. This methods in this library should be completely independant (i.e. ultimately gemifiable) from TaxonWorks.
Class Method Summary collapse
-
.a_label(string) ⇒ Object
String,nil the string preceeded with “a” or “an”.
-
.alphabetic_strings(string) ⇒ Array
Splits a string on special characters, returning an array of the strings that do not contain digits.
-
.alphanumeric_strings(string) ⇒ Object
alphanumeric allows searches by page number, year, etc.
-
.asciify(string) ⇒ Object
String, nil replace
, <i>, <b> tags with their asciidoc equivalents. -
.authorship_sentence(last_names = []) ⇒ String?
TODO: DEPRECATE (doesn't belong here because to_sentence is Rails?.
-
.encode_with_utf8(string) ⇒ String, false
!! this is a bad sign, you should know your encoding before it gets to needing this.
-
.escape_single_quote(string) ⇒ String
Adds a second single quote to escape apostrophe in SQL query strings.
- .generate_md5(text) ⇒ Digest::MD5
-
.increment_contained_integer(string) ⇒ String, Boolean
Increments the first integer encountered in the string, wrapping it in only the immediate non integer strings before and after (see tests).
-
.integers(string) ⇒ Array<String>
Get numbers separated by spaces from a string.
-
.is_i?(string) ⇒ Boolean
see stackoverflow.com/questions/1235863/test-if-a-string-is-basically-an-integer-in-quotes-using-ruby Note: Might checkout CSV::Converters constants to see how they handle this Allows '02', but treated as OK as 02.to_i returns 2.
- .linearize(string, separator = ' | ') ⇒ Object
-
.nil_squish_strip(string) ⇒ String?
Strips pre/post fixed space and condenses internal spaces, and also but returns nil (not empty string) if nothing is left.
-
.nil_strip(string) ⇒ String?
Strips space, leaves internal whitespace as is, returns nil if nothing is left.
-
.nil_wrap(pre = nil, content = nil, post = nil) ⇒ String?
Return nil if content.nil?, else wrap and return string if provided.
-
.only_integer(string) ⇒ Integer?
Return an integer if and only if the string is a single integer, otherwise nil.
-
.only_integers?(string) ⇒ Boolean
True if the query string only contains integers separated by whitespace.
-
.parse_authorship(authorship) ⇒ Array
Parse a scientificAuthorship field to extract author and year information.
-
.random_string(string_length) ⇒ String?
Stub a string of a certain length.
-
.sanitize_for_csv(string) ⇒ String, param
The goal is to sanitizie an individual string such that it is usable in TAB delimited, UTF-8, column.
- .verbatim_author(author_year_string) ⇒ String?
-
.year_letter(string) ⇒ String?
The immediately following letter recognized as coming directly past the first year `Smith, 1920a.
- .year_of_publication(author_year) ⇒ String?
- .years(string) ⇒ Array
Class Method Details
.a_label(string) ⇒ Object
Returns String,nil the string preceeded with “a” or “an”.
22 23 24 25 |
# File 'lib/utilities/strings.rb', line 22 def self.a_label(string) return nil if string.to_s.length == 0 (string =~ /\A[aeiou]/i ? 'an ' : 'a ') + string end |
.alphabetic_strings(string) ⇒ Array
Splits a string on special characters, returning an array of the strings that do not contain digits.
It splits on accent characters, and does not split on underscores. The method is used for building wildcard searches, so splitting on accents creates pseudo accent insensitivity in searches.
#alphanumeric allows searches by page number, year, etc.
136 137 138 139 |
# File 'lib/utilities/strings.rb', line 136 def self.alphabetic_strings(string) return [] if string.nil? || string.length == 0 string.split(/[^[[:word:]]]+/).select { |b| !(b =~ /\d/) }.reject { |b| b.empty? } end |
.alphanumeric_strings(string) ⇒ Object
alphanumeric allows searches by page number, year, etc.
142 143 144 145 |
# File 'lib/utilities/strings.rb', line 142 def self.alphanumeric_strings(string) return [] if string.nil? || string.length == 0 string.split(/[^[[:word:]]]+/).reject { |b| b.empty? } end |
.asciify(string) ⇒ Object
Returns String, nil replace
, <i>, <b> tags with their asciidoc equivalents.
6 7 8 9 10 11 12 13 |
# File 'lib/utilities/strings.rb', line 6 def self.asciify(string) return nil if string.to_s.length == 0 string.gsub!(/<br>/, "\n") string.gsub!(/<i>|<\/i>/, '_') string.gsub!(/<b>|<\/b>/, '**') string end |
.authorship_sentence(last_names = []) ⇒ String?
TODO: DEPRECATE (doesn't belong here because to_sentence is Rails?
122 123 124 125 |
# File 'lib/utilities/strings.rb', line 122 def self.(last_names = []) return nil if last_names.empty? last_names.to_sentence(two_words_connector: ' & ', last_word_connector: ' & ') end |
.encode_with_utf8(string) ⇒ String, false
Returns !! this is a bad sign, you should know your encoding before it gets to needing this.
150 151 152 153 154 155 156 157 |
# File 'lib/utilities/strings.rb', line 150 def self.encode_with_utf8(string) return false if string.nil? if Encoding.compatible?('test'.encode(Encoding::UTF_8), string) string.force_encoding(Encoding::UTF_8) else false end end |
.escape_single_quote(string) ⇒ String
Adds a second single quote to escape apostrophe in SQL query strings
83 84 85 86 |
# File 'lib/utilities/strings.rb', line 83 def self.escape_single_quote(string) return nil if string.blank? string.gsub("'", "''") end |
.generate_md5(text) ⇒ Digest::MD5
62 63 64 65 66 |
# File 'lib/utilities/strings.rb', line 62 def self.generate_md5(text) return nil if text.blank? text = text.downcase.gsub(/[\s\.,;:\?!]*/, '') Digest::MD5.hexdigest(text) end |
.increment_contained_integer(string) ⇒ String, Boolean
Increments the first integer encountered in the string, wrapping it in only the immediate non integer strings before and after (see tests). Returns false if no number is found
73 74 75 76 77 78 |
# File 'lib/utilities/strings.rb', line 73 def self.increment_contained_integer(string) string =~ /([^\d]*)(\d+)([^\d]*)/ a, b, c = $1, $2, $3 return false if b.nil? [a, (b.to_i + 1), c].compact.join end |
.integers(string) ⇒ Array<String>
Get numbers separated by spaces from a string
176 177 178 179 |
# File 'lib/utilities/strings.rb', line 176 def self.integers(string) return [] if string.nil? || string.length == 0 string.split(/\s+/).select { |t| is_i?(t) } end |
.is_i?(string) ⇒ Boolean
see stackoverflow.com/questions/1235863/test-if-a-string-is-basically-an-integer-in-quotes-using-ruby Note: Might checkout CSV::Converters constants to see how they handle this Allows '02', but treated as OK as 02.to_i returns 2
94 95 96 |
# File 'lib/utilities/strings.rb', line 94 def self.is_i?(string) /\A[-+]?\d+\z/ === string end |
.linearize(string, separator = ' | ') ⇒ Object
15 16 17 18 |
# File 'lib/utilities/strings.rb', line 15 def self.linearize(string, separator = ' | ') return nil if string.to_s.length == 0 string.gsub(/\n|(\r\n)/, separator) end |
.nil_squish_strip(string) ⇒ String?
Returns strips pre/post fixed space and condenses internal spaces, and also but returns nil (not empty string) if nothing is left.
50 51 52 53 54 55 56 57 58 |
# File 'lib/utilities/strings.rb', line 50 def self.nil_squish_strip(string) a = string.dup if !a.nil? a.delete("\u0000") a.squish! a = nil if a == '' end a end |
.nil_strip(string) ⇒ String?
Returns strips space, leaves internal whitespace as is, returns nil if nothing is left.
38 39 40 41 42 43 44 45 |
# File 'lib/utilities/strings.rb', line 38 def self.nil_strip(string) # string should have content or be empty a = string.dup if !a.nil? a.strip! a = nil if a == '' end a end |
.nil_wrap(pre = nil, content = nil, post = nil) ⇒ String?
Return nil if content.nil?, else wrap and return string if provided
114 115 116 117 |
# File 'lib/utilities/strings.rb', line 114 def self.nil_wrap(pre = nil, content = nil, post = nil) return nil if content.blank? [pre, content, post].compact.join end |
.only_integer(string) ⇒ Integer?
Return an integer if and only if the string is a single integer, otherwise nil
185 186 187 188 189 190 191 |
# File 'lib/utilities/strings.rb', line 185 def self.only_integer(string) if is_i?(string) string.to_i else nil end end |
.only_integers?(string) ⇒ Boolean
Returns true if the query string only contains integers separated by whitespace.
195 196 197 |
# File 'lib/utilities/strings.rb', line 195 def self.only_integers?(string) !(string =~ /[^\d\s]/i) && !integers(string).empty? end |
.parse_authorship(authorship) ⇒ Array
Parse a scientificAuthorship field to extract author and year information.
If the format matches ICZN, adds parentheses around author name (if detected)
204 205 206 207 208 209 210 211 |
# File 'lib/utilities/strings.rb', line 204 def self.() return [] if ( = .to_s.strip).empty? year_match = /(,|\s)\s*(?<year>\d+)(?<paren>\))?$/.match() = "#{[..(year_match&.offset(0)&.first || 0)-1]}#{year_match&.[](:paren)}" [, year_match&.[](:year)] end |
.random_string(string_length) ⇒ String?
Returns stub a string of a certain length.
30 31 32 33 |
# File 'lib/utilities/strings.rb', line 30 def self.random_string(string_length) return nil if string_length.to_i == 0 ('a'..'z').to_a.shuffle[0, string_length].join end |
.sanitize_for_csv(string) ⇒ String, param
Returns the goal is to sanitizie an individual string such that it is usable in TAB delimited, UTF-8, column. See Download TODO: Likely need to handle quotes, and write better UTF compliancy tests ~~ Technically n is allowed!.
103 104 105 106 107 |
# File 'lib/utilities/strings.rb', line 103 def self.sanitize_for_csv(string) a = string.dup return a if a.blank? # TODO: .blank is Rails, not OK here a.to_s.gsub(/\n|\t/, ' ') end |
.verbatim_author(author_year_string) ⇒ String?
226 227 228 229 230 231 |
# File 'lib/utilities/strings.rb', line 226 def self.() return nil if .to_s.strip.empty? # alternative to .blank? = .rindex(' ') ||= .length [0...] end |
.year_letter(string) ⇒ String?
Returns the immediately following letter recognized as coming directly past the first year
`Smith, 1920a. ... ` returns `a`.
168 169 170 |
# File 'lib/utilities/strings.rb', line 168 def self.year_letter(string) string.match(/\d{4}([a-zAZ]+)/).to_a.last end |
.year_of_publication(author_year) ⇒ String?
215 216 217 218 219 220 221 222 |
# File 'lib/utilities/strings.rb', line 215 def self.year_of_publication() return nil if .to_s.strip.empty? # alternative to .blank? = .split(' ') year = [.length - 1] # try matching last element first, otherwise scan entire string for year # Maybe we don't need regex match and can use years(author_year) exclusively? year =~ /\A\d+\z/ ? year : years().last.to_s end |
.years(string) ⇒ Array
160 161 162 163 |
# File 'lib/utilities/strings.rb', line 160 def self.years(string) return [] if string.nil? string.scan(/\d{4}/).to_a.uniq end |