databionics.text
Class StringUtils

java.lang.Object
  extended by databionics.text.StringUtils

public class StringUtils
extends java.lang.Object

Utility methods for Strings.

See Also:
SimilarString

Constructor Summary
StringUtils()
           
 
Method Summary
static java.lang.String alignRight(int value, int length)
          Right align a number by adding spaces on the left up to specified length.
static java.lang.String alignRight(java.lang.String text, int length)
          Right align a string by adding spaces on the left up to specified length.
static java.lang.String balance(java.lang.String s)
          Balance parentethese, braces and brackets Could be much more sophisticated! later...
protected static java.lang.String balanceOne(java.lang.String s, java.lang.String open, java.lang.String close)
          Balance open and closing strings
static java.lang.String correctSpelling(java.lang.String s)
          Correct spelling in a string as typically found int song titles.
static float extNGramMetric(int n, java.lang.String first, java.lang.String second)
          Calculate extended n-grams metric distance of two strings.
static java.lang.String formatFilesize(int s)
          Format file size
static java.lang.String getRegExp(java.lang.String s)
          Build a regular expression that matches this String and similar ones, e.g.
static boolean isInt(java.lang.String s)
          Text whether string is an integer
static java.lang.String longer(java.lang.String first, java.lang.String second)
          Return longer of two strings.
static float nGramMetric(int n, java.lang.String first, java.lang.String second)
          Calculate n-grams metric distance of two strings.
static java.lang.String normalize(java.lang.String s)
          Normalize a String, that is make it lowercase, remove all no word characters like spacec and punctuation, remove articles replace German Umlaute
static int occurrences(java.lang.String of, java.lang.String in)
          Count number of occurences of one String int another.
static java.lang.String removeArticles(java.lang.String s)
          Remove all German, English and French articles from a String.
static java.lang.String removeNonWordChars(java.lang.String s)
          Remove all character from a string that don't match the regular expression \W+
static java.lang.String replace(java.lang.String what, java.lang.String with, java.lang.String s)
          Replace parts of a string.
static java.lang.String replaceUmlauts(java.lang.String s)
          Replace German Umlaute with official replacements.
static StringList toExtNGrams(int n, java.lang.String s)
          Build extended n-grams of string by adding _ at start and end.
static StringList toNGrams(int n, java.lang.String s)
          Build n-grams of string.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StringUtils

public StringUtils()
Method Detail

longer

public static java.lang.String longer(java.lang.String first,
                                      java.lang.String second)
Return longer of two strings.

Parameters:
first - First string.
second - Second string.
Returns:
Returns longer String.

toNGrams

public static StringList toNGrams(int n,
                                  java.lang.String s)
Build n-grams of string.

Parameters:
n - Length of n-grams.
s - String.
Returns:
Returns StringList of n-grams.

toExtNGrams

public static StringList toExtNGrams(int n,
                                     java.lang.String s)
Build extended n-grams of string by adding _ at start and end.

Parameters:
n - Length of n-grams.
s - String.
Returns:
Returns StringList of n-grams.

nGramMetric

public static float nGramMetric(int n,
                                java.lang.String first,
                                java.lang.String second)
Calculate n-grams metric distance of two strings.

Parameters:
n - Length of n-grams.
first - First string.
second - Second string.
Returns:
Returns value of n-gram metric.

extNGramMetric

public static float extNGramMetric(int n,
                                   java.lang.String first,
                                   java.lang.String second)
Calculate extended n-grams metric distance of two strings.

Parameters:
n - Length of n-grams.
first - First string.
second - Second string.
Returns:
Returns value of extended n-gram metric.

replace

public static java.lang.String replace(java.lang.String what,
                                       java.lang.String with,
                                       java.lang.String s)
Replace parts of a string.

Parameters:
what - String to replace.
with - Replacement String.
s - Replace all occurences in this String.
Returns:
Returns new String.

removeNonWordChars

public static java.lang.String removeNonWordChars(java.lang.String s)
Remove all character from a string that don't match the regular expression \W+

Parameters:
s - Remove from this String.
Returns:
Returns new String.

removeArticles

public static java.lang.String removeArticles(java.lang.String s)
Remove all German, English and French articles from a String.

Parameters:
s - Remove from this String.
Returns:
Returns new String.

replaceUmlauts

public static java.lang.String replaceUmlauts(java.lang.String s)
Replace German Umlaute with official replacements.

Parameters:
s - Replace in this String.
Returns:
Returns new String.

normalize

public static java.lang.String normalize(java.lang.String s)
Normalize a String, that is make it lowercase, remove all no word characters like spacec and punctuation, remove articles replace German Umlaute

Parameters:
s - Normalize this String.
Returns:
Returns new String.

getRegExp

public static java.lang.String getRegExp(java.lang.String s)
Build a regular expression that matches this String and similar ones, e.g. with article int front, different spelling of German Umlauts or different punctuation and spacing.

Parameters:
s - Match this String.
Returns:
Returns regular expression as String.

occurrences

public static int occurrences(java.lang.String of,
                              java.lang.String in)
Count number of occurences of one String int another.

Parameters:
of - Search for this String.
int - Search in this String.
Returns:
Returns number of occurences.

correctSpelling

public static java.lang.String correctSpelling(java.lang.String s)
Correct spelling in a string as typically found int song titles. That is normalize spelling for Mr./Mrs., German Umlauts and make the first Letter of each word uppercase. Add space before "(" and after ")".

Parameters:
s - Correct this String.
Returns:
Returns new String.

balance

public static java.lang.String balance(java.lang.String s)
Balance parentethese, braces and brackets Could be much more sophisticated! later...

Parameters:
s - Correct this String.
Returns:
Returns new String.

balanceOne

protected static java.lang.String balanceOne(java.lang.String s,
                                             java.lang.String open,
                                             java.lang.String close)
Balance open and closing strings

Parameters:
s - Correct this String.
Returns:
Returns new String.

alignRight

public static java.lang.String alignRight(java.lang.String text,
                                          int length)
Right align a string by adding spaces on the left up to specified length.

Parameters:
text - Text to align
length - Length of result
Returns:
Text right aligned

alignRight

public static java.lang.String alignRight(int value,
                                          int length)
Right align a number by adding spaces on the left up to specified length.

Parameters:
value - Number to align
length - Length of result
Returns:
Number right aligned

isInt

public static boolean isInt(java.lang.String s)
Text whether string is an integer

Parameters:
s - String
Returns:
True if string is an integer

formatFilesize

public static java.lang.String formatFilesize(int s)
Format file size

Parameters:
s - file size
Returns:
file size as string


Copyright © 2005-2006 Databionics Research Group. All Rights Reserved.