Safe Haskell | None |
---|---|
Language | Haskell2010 |
Synopsis
- buildQuickSearch :: (Hashable uid, Eq uid) => [(String, uid)] -> QuickSearch uid
- rawBuildQuickSearch :: (Hashable uid, Eq uid) => [Entry Text uid] -> QuickSearch uid
- topNMatches :: (Hashable uid, Eq uid) => QuickSearch uid -> Int -> Scorer -> String -> [Match Score (Entry String uid)]
- matchesWithThreshold :: (Hashable uid, Eq uid) => QuickSearch uid -> Int -> Scorer -> String -> [Match Score (Entry String uid)]
- batch :: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) => (QuickSearch uid2 -> Int -> Scorer -> String -> [Match Score (Entry String uid2)]) -> QuickSearch uid2 -> Int -> Scorer -> [(String, uid1)] -> [(Entry String uid1, [Match Score (Entry String uid2)])]
- batchTopNMatches :: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) => QuickSearch uid2 -> Int -> Scorer -> [(String, uid1)] -> [(Entry String uid1, [Match Score (Entry String uid2)])]
- batchMatchesWithThreshold :: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) => QuickSearch uid2 -> Int -> Scorer -> [(String, uid1)] -> [(Entry String uid1, [Match Score (Entry String uid2)])]
- type Token = Text
- newtype Entry name uid = Entry (name, uid)
- type Score = Int
- type Scorer = Text -> Text -> Ratio Int
- data Match score entry
- newtype QuickSearch uid = QuickSearch ([Entry Text uid], HashMap Token (HashSet uid))
- damerauLevenshteinNorm :: Text -> Text -> Ratio Int
- jaro :: Text -> Text -> Ratio Int
- jaroWinkler :: Text -> Text -> Ratio Int
Documentation
:: (Hashable uid, Eq uid) | |
=> [(String, uid)] | List of entries to be searched |
-> QuickSearch uid | QuickSearch object holding token partitions |
Given a list of pairs of (String, uid) to be searched, create a QuickSearch object.
:: (Hashable uid, Eq uid) | |
=> [Entry Text uid] | List of entries to be searched |
-> QuickSearch uid | QuickSearch object holding token partitions |
Given a list of entries to be searched, create a QuickSearch object.
:: (Hashable uid, Eq uid) | |
=> QuickSearch uid | QuickSearch object holding token partitions |
-> Int | N: Number of results to return |
-> Scorer | String similarity function of type (Text -> Text -> Ratio Int) |
-> String | String to be searched |
-> [Match Score (Entry String uid)] | Top N most similar entries |
Given a QuickSearch object, scorer, and string, return the top N matches.
:: (Hashable uid, Eq uid) | |
=> QuickSearch uid | QuickSearch object holding token partitions |
-> Int | Threshold score above which to return results |
-> Scorer | String similarity function of type (Text -> Text -> Ratio Int) |
-> String | String to be searched |
-> [Match Score (Entry String uid)] | Top N most similar entries |
Given a QuickSearch object, scorer, and string, return all matches with a score greater than the given threshold.
:: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) | |
=> (QuickSearch uid2 -> Int -> Scorer -> String -> [Match Score (Entry String uid2)]) | A match retrieval function, such as topNMatches |
-> QuickSearch uid2 | QuickSearch object holding token partitions |
-> Int | The reference number for the match retrieval function. N for topNMatches, threshold for matchesWithThreshold |
-> Scorer | String similarity function of type (Text -> Text -> Ratio Int) |
-> [(String, uid1)] | List of entries to be processed |
-> [(Entry String uid1, [Match Score (Entry String uid2)])] | List of entries and the results returned for each. |
Turn a match retrieval function into one that works on lists of entries.
:: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) | |
=> QuickSearch uid2 | QuickSearch object holding token partitions |
-> Int | N: Number of results to return |
-> Scorer | String similarity function of type (Text -> Text -> Ratio Int) |
-> [(String, uid1)] | List of entries to be processed |
-> [(Entry String uid1, [Match Score (Entry String uid2)])] | List of entries and up to the top N matches for each. |
Version of topNMatches that processes lists of entries instead of strings.
batchMatchesWithThreshold Source #
:: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) | |
=> QuickSearch uid2 | QuickSearch object holding token partitions |
-> Int | N: Number of results to return |
-> Scorer | String similarity function of type (Text -> Text -> Ratio Int) |
-> [(String, uid1)] | List of entries to be processed |
-> [(Entry String uid1, [Match Score (Entry String uid2)])] | List of entries and their matches above the score threshold. |
Version of matchesWithThreshold that processes lists of entries instead of strings.
newtype Entry name uid Source #
Structure associating a name with its unique identifier
Entry (name, uid) |
data Match score entry Source #
Structure associating a Score with an Entry, for holding search results
newtype QuickSearch uid Source #
List of entries to be searched and a HashMap associating tokens with HashSets of UIDs related to entries containing the tokens.
Instances
Show uid => Show (QuickSearch uid) Source # | |
Defined in QuickSearch.Internal.Matcher showsPrec :: Int -> QuickSearch uid -> ShowS # show :: QuickSearch uid -> String # showList :: [QuickSearch uid] -> ShowS # |
damerauLevenshteinNorm :: Text -> Text -> Ratio Int #
Return normalized Damerau-Levenshtein distance between two Text
values. 0 signifies no similarity between the strings, while 1 means
exact match.
See also: https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance.
Heads up, before version 0.3.0 this function returned
.Ratio
Natural
jaro :: Text -> Text -> Ratio Int #
Return Jaro distance between two Text
values. Returned value is in
the range from 0 (no similarity) to 1 (exact match).
While the algorithm is pretty clear for artificial examples (like those from the linked Wikipedia article), for arbitrary strings, it may be hard to decide which of two strings should be considered as one having “reference” order of characters (order of matching characters in an essential part of the definition of the algorithm). This makes us consider the first string the “reference” string (with correct order of characters). Thus generally,
jaro a b ≠ jaro b a
This asymmetry can be found in all implementations of the algorithm on the internet, AFAIK.
See also: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
Heads up, before version 0.3.0 this function returned
.Ratio
Natural
Since: text-metrics-0.2.0
jaroWinkler :: Text -> Text -> Ratio Int #
Return Jaro-Winkler distance between two Text
values. Returned value
is in range from 0 (no similarity) to 1 (exact match).
See also: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
Heads up, before version 0.3.0 this function returned
.Ratio
Natural
Since: text-metrics-0.2.0