databionics.text
Class SimilarString

java.lang.Object
  extended by databionics.text.SimilarString

public class SimilarString
extends java.lang.Object

A Strings that can be compared to other strings byte common metrics.

See Also:
StringUtils

Constructor Summary
SimilarString()
          Standard constructor without arguments.
SimilarString(java.lang.String value)
           
 
Method Summary
 int countSimilarExtTrigramMetric(StringList compare)
          Count how many Strings in this list are similar to this String based on the trigram metric and a threshhold of 0.5.
 float extNGramMetric(int n, java.lang.String compare)
          Calculate extended n-grams metric distance of string and argument.
 boolean isSimilarTo(java.lang.String compare)
          Check whether the String is similar to another String.
 float nGramMetric(int n, java.lang.String compare)
          Calculate n-grams metric distance of string and argument.
 java.lang.String normalize()
          Normalize the String, that is make it lowercase, remove all no word characters like spacec and punctuation, remove articles replace German Umlaute
static boolean similar(java.lang.String first, java.lang.String second)
          Check whether the Strings are similar As similar qualifies: 1) equal 2) both not empty and contained int one another 3) trigram metric greater than 0.5
 StringList toExtNGrams(int n)
          Build extended n-grams of string by adding _ at start and end.
 StringList toNGrams(int n)
          Build n-grams of string.
 java.lang.String toString()
          String representation
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SimilarString

public SimilarString()
Standard constructor without arguments.


SimilarString

public SimilarString(java.lang.String value)
Method Detail

toString

public java.lang.String toString()
String representation

Overrides:
toString in class java.lang.Object
Returns:
Returns the string.

toNGrams

public StringList toNGrams(int n)
Build n-grams of string.

Parameters:
n - Length of n-grams.
Returns:
Returns StringList of n-grams.

toExtNGrams

public StringList toExtNGrams(int n)
Build extended n-grams of string by adding _ at start and end.

Parameters:
n - Length of n-grams.
Returns:
Returns StringList of n-grams.

nGramMetric

public float nGramMetric(int n,
                         java.lang.String compare)
Calculate n-grams metric distance of string and argument.

Parameters:
n - Length of n-grams.
compare - String to compare to.
Returns:
Returns value of n-gram metric.

extNGramMetric

public float extNGramMetric(int n,
                            java.lang.String compare)
Calculate extended n-grams metric distance of string and argument.

Parameters:
n - Length of n-grams.
compare - String to compare to.
Returns:
Returns value of extended n-gram metric.

countSimilarExtTrigramMetric

public int countSimilarExtTrigramMetric(StringList compare)
Count how many Strings in this list are similar to this String based on the trigram metric and a threshhold of 0.5.

Parameters:
compare - Strings to compare to.
Returns:
Returns number of similar Strings.

normalize

public java.lang.String normalize()
Normalize the String, that is make it lowercase, remove all no word characters like spacec and punctuation, remove articles replace German Umlaute

Returns:
Returns new String.

isSimilarTo

public boolean isSimilarTo(java.lang.String compare)
Check whether the String is similar to another String. As similar qualifies: 1) equal 2) both not empty and contained int one another 3) trigram metric greater than 0.5

Returns:
Whether it's similar.

similar

public static boolean similar(java.lang.String first,
                              java.lang.String second)
Check whether the Strings are similar As similar qualifies: 1) equal 2) both not empty and contained int one another 3) trigram metric greater than 0.5

Parameters:
first - First string.
second - Second string.
Returns:
Whether it's similar.


Copyright © 2005-2006 Databionics Research Group. All Rights Reserved.