use Text::Levenshtein qw(distance); print distance("foo","four"); # prints "2" my @words = qw/ four foo bar /; my @distances = distance("foo",@words); print "@distances"; # prints "2 0 3"
To learn more about the Levenshtein metric, have a look at the wikipedia page <http://en.wikipedia.org/wiki/Levenshtein_distance>.
$distance = distance('brown', 'green'); # returns 3, as 'r' and 'n' don't change
Instead of a single second string, you can pass a list of strings. Each string will be compared to the first string passed, and a list of the edit distances returned:
@words = qw/ green trainee brains /; @distances = distances('brown', @words); # returns (3, 5, 3)
Unlike "distance()", "fastdistance()" only takes two strings, and returns the edit distance between them.
use Text::Levenshtein 0.11 qw/ distance /; $distance = distance($word1, $word2, {ignore_diacritics => 1});
If you turn on this option, then Unicode::Collate will be loaded, and used when comparing characters in the words.
Early version of "Text::Levenshtein" didn't support this version, so you should require version 0.11 or later, as above.
Text::LevenshteinXS and Text::Levenshtein::XS are both versions of the Levenshtein algorithm that require a C compiler, but will be a lot faster than this module.
The Damerau-Levenshtein edit distance is like the Levenshtein distance, but in addition to insertion, deletion and substitution, it also considers the transposition of two adjacent characters to be a single edit. The module Text::Levenshtein::Damerau defaults to using a pure perl implementation, but if you've installed Text::Levenshtein::Damerau::XS then it will be a lot quicker.
Text::WagnerFischer is an implementation of the Wagner-Fischer edit distance, which is similar to the Levenshtein, but applies different weights to each edit type.
Text::Brew is an implementation of the Brew edit distance, which is another algorithm based on edit weights.
Text::Fuzzy provides a number of operations for partial or fuzzy matching of text based on edit distance. Text::Fuzzy::PP is a pure perl implementation of the same interface.
String::Similarity takes two strings and returns a value between 0 (meaning entirely different) and 1 (meaning identical). Apparently based on edit distance.
Text::Dice calculates Dice's coefficient <https://en.wikipedia.org/wiki/Sørensen–Dice_coefficient> for two strings. This formula was originally developed to measure the similarity of two different populations in ecological research.
Josh Goldberg then took over maintenance and released versions between 2004 and 2008.
Neil Bowers (NEILB on CPAN) is now maintaining this module. Version 0.07 was a complete rewrite, based on one of the algorithms on the wikipedia page.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.