Safe Haskell | None |
---|---|
Language | Haskell2010 |
Synopsis
- oneShot :: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) => (QuickSearch uid2 -> Int -> Scorer -> String -> [Match Score (Entry String uid2)]) -> Int -> [(String, uid1)] -> [(String, uid2)] -> Scorer -> [(Entry String uid1, [Match Score (Entry String uid2)])]
- oneShotTopNMatches :: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) => Int -> [(String, uid1)] -> [(String, uid2)] -> Scorer -> [(Entry String uid1, [Match Score (Entry String uid2)])]
- oneShotMatchesWithThreshold :: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) => Int -> [(String, uid1)] -> [(String, uid2)] -> Scorer -> [(Entry String uid1, [Match Score (Entry String uid2)])]
- damerauLevenshteinNorm :: Text -> Text -> Ratio Int
- jaro :: Text -> Text -> Ratio Int
- jaroWinkler :: Text -> Text -> Ratio Int
Documentation
:: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) | |
=> (QuickSearch uid2 -> Int -> Scorer -> String -> [Match Score (Entry String uid2)]) | Match retrieval function to be converted into a one-shot |
-> Int | The reference number for the match retrieval function. |
-> [(String, uid1)] | List of entries to be processed |
-> [(String, uid2)] | List of entries making up the search space |
-> Scorer | Similarity function with type (Text -> Text -> Ratio Int) |
-> [(Entry String uid1, [Match Score (Entry String uid2)])] | List of entries and their matches. |
Turn a match retrieval function into a one-shot batch function. Instead of creating a QuickSearch for reuse, this creates it in the background and discards it when done.
:: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) | |
=> Int | N: Number of matches to return |
-> [(String, uid1)] | List of entries to be processed |
-> [(String, uid2)] | List of entries making up the search space |
-> Scorer | Similarity function with type (Text -> Text -> Ratio Int) |
-> [(Entry String uid1, [Match Score (Entry String uid2)])] | List of entries and up to N of the best matches. |
One-shot version of topNMatches. Builds the QuickSearch in the background and discards it when finished.
oneShotMatchesWithThreshold Source #
:: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) | |
=> Int | Score threshold above which to return matches |
-> [(String, uid1)] | List of entries to be processed |
-> [(String, uid2)] | List of entries making up the search space |
-> Scorer | Similarity function with type (Text -> Text -> Ratio Int) |
-> [(Entry String uid1, [Match Score (Entry String uid2)])] | List of entries and their matches above the score threshold. |
One-shot version of matchesWithThreshold. Builds the QuickSearch in the background and discards it when finished.
damerauLevenshteinNorm :: Text -> Text -> Ratio Int #
Return normalized Damerau-Levenshtein distance between two Text
values. 0 signifies no similarity between the strings, while 1 means
exact match.
See also: https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance.
Heads up, before version 0.3.0 this function returned
.Ratio
Natural
jaro :: Text -> Text -> Ratio Int #
Return Jaro distance between two Text
values. Returned value is in
the range from 0 (no similarity) to 1 (exact match).
While the algorithm is pretty clear for artificial examples (like those from the linked Wikipedia article), for arbitrary strings, it may be hard to decide which of two strings should be considered as one having “reference” order of characters (order of matching characters in an essential part of the definition of the algorithm). This makes us consider the first string the “reference” string (with correct order of characters). Thus generally,
jaro a b ≠ jaro b a
This asymmetry can be found in all implementations of the algorithm on the internet, AFAIK.
See also: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
Heads up, before version 0.3.0 this function returned
.Ratio
Natural
Since: text-metrics-0.2.0
jaroWinkler :: Text -> Text -> Ratio Int #
Return Jaro-Winkler distance between two Text
values. Returned value
is in range from 0 (no similarity) to 1 (exact match).
See also: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
Heads up, before version 0.3.0 this function returned
.Ratio
Natural
Since: text-metrics-0.2.0