Module: Utilities::Strings
- Defined in:
- lib/utilities/strings.rb
Overview
Methods that receive or generate a String. This methods in this library should be completely independant (i.e. ultimately gemifiable) from TaxonWorks.
Constant Summary collapse
- CLEANABLE =
[ "\u0000", # nill string "\u0010", # DLE - data link escape "\u009A", # SCI - single character introducer "\u007F", # DEL - delete # "\u00A0", # no-break space !! Rails .blank? is true ].freeze
Class Method Summary collapse
-
.a_label(string) ⇒ Object
String,nil the string preceeded with “a” or “an”.
-
.alphabetic_strings(string) ⇒ Array
Splits a string on special characters, returning an array of the strings that do not contain digits.
-
.alphanumeric_strings(string) ⇒ Object
alphanumeric allows searches by page number, year, etc.
-
.asciify(string) ⇒ Object
String, nil replace
, <i>, <b> tags with their asciidoc equivalents. -
.authorship_sentence(last_names = []) ⇒ String?
TODO: DEPRECATE (doesn’t belong here because to_sentence is Rails?.
- .clean(string) ⇒ String?
-
.cleanable?(string) ⇒ Boolean
Boolean.
-
.encode_with_utf8(string) ⇒ String, false
!! this is a bad sign, you should know your encoding before it gets to needing this.
-
.escape_single_quote(string) ⇒ String
Adds a second single quote to escape apostrophe in SQL query strings.
- .generate_md5(text) ⇒ Digest::MD5
-
.increment_contained_integer(string) ⇒ String, Boolean
Increments the first integer encountered in the string, wrapping it in only the immediate non integer strings before and after (see tests).
-
.integers(string) ⇒ Array<String>
Get numbers separated by spaces from a string.
-
.is_i?(string) ⇒ Boolean
see stackoverflow.com/questions/1235863/test-if-a-string-is-basically-an-integer-in-quotes-using-ruby Note: Might checkout CSV::Converters constants to see how they handle this Allows ‘02’, but treated as OK as 02.to_i returns 2.
- .linearize(string, separator = ' | ') ⇒ Object
-
.nil_strip(string) ⇒ String?
Strips space, leaves internal whitespace as is, returns nil if nothing is left.
-
.nil_wrap(pre = nil, content = nil, post = nil) ⇒ String?
Return nil if content.nil?, else wrap and return string if provided.
-
.only_integer(string) ⇒ Integer?
Return an integer if and only if the string is a single integer, otherwise nil.
-
.only_integers?(string) ⇒ Boolean
True if the query string only contains integers separated by whitespace.
-
.parse_authorship(authorship) ⇒ Array
Parse a scientificAuthorship field to extract author and year information.
-
.random_string(string_length) ⇒ String?
Stub a string of a certain length.
-
.sanitize_for_csv(string) ⇒ String, param
The goal is to sanitizie an individual string such that it is usable in TAB delimited, UTF-8, column.
- .verbatim_author(author_year_string) ⇒ String?
-
.year_letter(string) ⇒ String?
The immediately following letter recognized as coming directly past the first year ‘Smith, 1920a.
- .year_of_publication(author_year) ⇒ String?
- .years(string) ⇒ Array
Class Method Details
.a_label(string) ⇒ Object
Returns String,nil the string preceeded with “a” or “an”.
30 31 32 33 |
# File 'lib/utilities/strings.rb', line 30 def self.a_label(string) return nil if string.to_s.length == 0 (string =~ /\A[aeiou]/i ? 'an ' : 'a ') + string end |
.alphabetic_strings(string) ⇒ Array
Splits a string on special characters, returning an array of the strings that do not contain digits.
It splits on accent characters, and does not split on underscores. The method is used for building wildcard searches, so splitting on accents creates pseudo accent insensitivity in searches.
#alphanumeric allows searches by page number, year, etc.
148 149 150 151 |
# File 'lib/utilities/strings.rb', line 148 def self.alphabetic_strings(string) return [] if string.nil? || string.length == 0 string.split(/[^[[:word:]]]+/).select { |b| !(b =~ /\d/) }.reject { |b| b.empty? } end |
.alphanumeric_strings(string) ⇒ Object
alphanumeric allows searches by page number, year, etc.
154 155 156 157 |
# File 'lib/utilities/strings.rb', line 154 def self.alphanumeric_strings(string) return [] if string.nil? || string.length == 0 string.split(/[^[[:word:]]]+/).reject { |b| b.empty? } end |
.asciify(string) ⇒ Object
Returns String, nil replace
, <i>, <b> tags with their asciidoc equivalents.
14 15 16 17 18 19 20 21 |
# File 'lib/utilities/strings.rb', line 14 def self.asciify(string) return nil if string.to_s.length == 0 string.gsub!(/<br>/, "\n") string.gsub!(/<i>|<\/i>/, '_') string.gsub!(/<b>|<\/b>/, '**') string end |
.authorship_sentence(last_names = []) ⇒ String?
TODO: DEPRECATE (doesn’t belong here because to_sentence is Rails?
134 135 136 137 |
# File 'lib/utilities/strings.rb', line 134 def self.(last_names = []) return nil if last_names.empty? last_names.to_sentence(two_words_connector: ' & ', last_word_connector: ' & ') end |
.clean(string) ⇒ String?
63 64 65 66 67 68 69 70 |
# File 'lib/utilities/strings.rb', line 63 def self.clean(string) a = string.dup if !a.nil? a.gsub!(/#{CLEANABLE.join('|')}/, '') a = nil if a == '' end a end |
.cleanable?(string) ⇒ Boolean
Returns Boolean.
57 58 59 |
# File 'lib/utilities/strings.rb', line 57 def self.cleanable?(string) string =~ /#{CLEANABLE.join('|')}|\s/ end |
.encode_with_utf8(string) ⇒ String, false
Returns !! this is a bad sign, you should know your encoding before it gets to needing this.
162 163 164 165 166 167 168 169 |
# File 'lib/utilities/strings.rb', line 162 def self.encode_with_utf8(string) return false if string.nil? if Encoding.compatible?('test'.encode(Encoding::UTF_8), string) string.force_encoding(Encoding::UTF_8) else false end end |
.escape_single_quote(string) ⇒ String
Adds a second single quote to escape apostrophe in SQL query strings
95 96 97 98 |
# File 'lib/utilities/strings.rb', line 95 def self.escape_single_quote(string) return nil if string.blank? string.gsub("'", "''") end |
.generate_md5(text) ⇒ Digest::MD5
74 75 76 77 78 |
# File 'lib/utilities/strings.rb', line 74 def self.generate_md5(text) return nil if text.blank? text = text.downcase.gsub(/[\s\.,;:\?!]*/, '') Digest::MD5.hexdigest(text) end |
.increment_contained_integer(string) ⇒ String, Boolean
Increments the first integer encountered in the string, wrapping it in only the immediate non integer strings before and after (see tests). Returns false if no number is found
85 86 87 88 89 90 |
# File 'lib/utilities/strings.rb', line 85 def self.increment_contained_integer(string) string =~ /([^\d]*)(\d+)([^\d]*)/ a, b, c = $1, $2, $3 return false if b.nil? [a, (b.to_i + 1), c].compact.join end |
.integers(string) ⇒ Array<String>
Get numbers separated by spaces from a string
188 189 190 191 |
# File 'lib/utilities/strings.rb', line 188 def self.integers(string) return [] if string.nil? || string.length == 0 string.split(/\s+/).select { |t| is_i?(t) } end |
.is_i?(string) ⇒ Boolean
see stackoverflow.com/questions/1235863/test-if-a-string-is-basically-an-integer-in-quotes-using-ruby Note: Might checkout CSV::Converters constants to see how they handle this Allows ‘02’, but treated as OK as 02.to_i returns 2
106 107 108 |
# File 'lib/utilities/strings.rb', line 106 def self.is_i?(string) /\A[-+]?\d+\z/ === string end |
.linearize(string, separator = ' | ') ⇒ Object
23 24 25 26 |
# File 'lib/utilities/strings.rb', line 23 def self.linearize(string, separator = ' | ') return nil if string.to_s.length == 0 string.gsub(/\n|(\r\n)/, separator) end |
.nil_strip(string) ⇒ String?
Returns strips space, leaves internal whitespace as is, returns nil if nothing is left.
46 47 48 49 50 51 52 53 |
# File 'lib/utilities/strings.rb', line 46 def self.nil_strip(string) # string should have content or be empty a = string.dup if !a.nil? a.strip! a = nil if a == '' end a end |
.nil_wrap(pre = nil, content = nil, post = nil) ⇒ String?
Return nil if content.nil?, else wrap and return string if provided
126 127 128 129 |
# File 'lib/utilities/strings.rb', line 126 def self.nil_wrap(pre = nil, content = nil, post = nil) return nil if content.blank? [pre, content, post].compact.join end |
.only_integer(string) ⇒ Integer?
Return an integer if and only if the string is a single integer, otherwise nil
197 198 199 200 201 202 203 |
# File 'lib/utilities/strings.rb', line 197 def self.only_integer(string) if is_i?(string) string.to_i else nil end end |
.only_integers?(string) ⇒ Boolean
Returns true if the query string only contains integers separated by whitespace.
207 208 209 |
# File 'lib/utilities/strings.rb', line 207 def self.only_integers?(string) !(string =~ /[^\d\s]/i) && !integers(string).empty? end |
.parse_authorship(authorship) ⇒ Array
Parse a scientificAuthorship field to extract author and year information.
If the format matches ICZN, adds parentheses around author name (if detected)
216 217 218 219 220 221 222 223 |
# File 'lib/utilities/strings.rb', line 216 def self.() return [] if ( = .to_s.strip).empty? year_match = /(,|\s)\s*(?<year>\d+)(?<paren>\))?$/.match() = "#{[..(year_match&.offset(0)&.first || 0)-1]}#{year_match&.[](:paren)}" [, year_match&.[](:year)] end |
.random_string(string_length) ⇒ String?
Returns stub a string of a certain length.
38 39 40 41 |
# File 'lib/utilities/strings.rb', line 38 def self.random_string(string_length) return nil if string_length.to_i == 0 ('a'..'z').to_a.shuffle[0, string_length].join end |
.sanitize_for_csv(string) ⇒ String, param
Returns the goal is to sanitizie an individual string such that it is usable in TAB delimited, UTF-8, column. See Download TODO: Likely need to handle quotes, and write better UTF compliancy tests ~~ Technically n is allowed!.
115 116 117 118 119 |
# File 'lib/utilities/strings.rb', line 115 def self.sanitize_for_csv(string) a = string.dup return a if a.blank? # TODO: .blank is Rails, not OK here a.to_s.gsub(/\n|\t/, ' ') end |
.verbatim_author(author_year_string) ⇒ String?
238 239 240 241 242 243 |
# File 'lib/utilities/strings.rb', line 238 def self.() return nil if .to_s.strip.empty? # alternative to .blank? = .rindex(' ') ||= .length [0...] end |
.year_letter(string) ⇒ String?
Returns the immediately following letter recognized as coming directly past the first year
`Smith, 1920a. ... ` returns `a`.
180 181 182 |
# File 'lib/utilities/strings.rb', line 180 def self.year_letter(string) string.match(/\d{4}([a-zAZ]+)/).to_a.last end |
.year_of_publication(author_year) ⇒ String?
227 228 229 230 231 232 233 234 |
# File 'lib/utilities/strings.rb', line 227 def self.year_of_publication() return nil if .to_s.strip.empty? # alternative to .blank? = .split(' ') year = [.length - 1] # try matching last element first, otherwise scan entire string for year # Maybe we don't need regex match and can use years(author_year) exclusively? year =~ /\A\d+\z/ ? year : years().last.to_s end |
.years(string) ⇒ Array
172 173 174 175 |
# File 'lib/utilities/strings.rb', line 172 def self.years(string) return [] if string.nil? string.scan(/\d{4}/).to_a.uniq end |