Approximate string matching cheat sheet

  1. Levenshtein distance : if the pattern is coil, foil differs by one substitution, coils by one insertion, oil by one deletion, and foal by two substitutions.
  2. Damerau–Levenshtein distance : Like Levenshtein but including transpositions among its allowable operations.
  3. Jaro–Winkler distance : designed and best suited for short strings such as person names.
  4. Smith–Waterman algorithm : performs local sequence alignment for determining similar regions between two strings, instead of looking at the total sequence.
  5. Needleman–Wunsch algorithm : divides a large sequence into a series of smaller problems and uses the solutions to the smaller problems to reconstruct a solution to the larger problem.
  6. Soundex : a phonetic algorithm for indexing names by sound, as pronounced in English.
  7. Metaphone : improves on Soundex by using variations and inconsistencies in English spelling and pronunciation.

No comments: