Name-QuickSearch-0.2.0.0
Safe HaskellNone
LanguageHaskell2010

QuickSearch.OneShot

Synopsis

Documentation

oneShot Source #

Arguments

:: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) 
=> (QuickSearch uid2 -> Int -> Scorer -> Text -> [Match Score (Entry Text uid2)])

Match retrieval function to be converted into a one-shot

-> Int

The reference number for the match retrieval function.

-> [(Text, uid1)]

List of entries to be processed

-> [(Text, uid2)]

List of entries making up the search space

-> Scorer

Similarity function with type (Text -> Text -> Ratio Int)

-> [(Entry Text uid1, [Match Score (Entry Text uid2)])]

List of entries and their matches.

Turn a match retrieval function into a one-shot batch function. Instead of creating a QuickSearch for reuse, this creates it in the background and discards it when done.

oneShotTopNMatches Source #

Arguments

:: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) 
=> Int

N: Number of matches to return

-> [(Text, uid1)]

List of entries to be processed

-> [(Text, uid2)]

List of entries making up the search space

-> Scorer

Similarity function with type (Text -> Text -> Ratio Int)

-> [(Entry Text uid1, [Match Score (Entry Text uid2)])]

List of entries and up to N of the best matches.

One-shot version of topNMatches. Builds the QuickSearch in the background and discards it when finished.

oneShotMatchesWithThreshold Source #

Arguments

:: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) 
=> Int

Score threshold above which to return matches

-> [(Text, uid1)]

List of entries to be processed

-> [(Text, uid2)]

List of entries making up the search space

-> Scorer

Similarity function with type (Text -> Text -> Ratio Int)

-> [(Entry Text uid1, [Match Score (Entry Text uid2)])]

List of entries and their matches above the score threshold.

One-shot version of matchesWithThreshold. Builds the QuickSearch in the background and discards it when finished.

jaro :: Text -> Text -> Ratio Int #

Return Jaro distance between two Text values. Returned value is in the range from 0 (no similarity) to 1 (exact match).

While the algorithm is pretty clear for artificial examples (like those from the linked Wikipedia article), for arbitrary strings, it may be hard to decide which of two strings should be considered as one having “reference” order of characters (order of matching characters in an essential part of the definition of the algorithm). This makes us consider the first string the “reference” string (with correct order of characters). Thus generally,

jaro a b ≠ jaro b a

This asymmetry can be found in all implementations of the algorithm on the internet, AFAIK.

See also: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance

Heads up, before version 0.3.0 this function returned Ratio Natural.

Since: text-metrics-0.2.0

jaroWinkler :: Text -> Text -> Ratio Int #

Return Jaro-Winkler distance between two Text values. Returned value is in range from 0 (no similarity) to 1 (exact match).

See also: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance

Heads up, before version 0.3.0 this function returned Ratio Natural.

Since: text-metrics-0.2.0

damerauLevenshteinNorm :: Text -> Text -> Ratio Int #

Return normalized Damerau-Levenshtein distance between two Text values. 0 signifies no similarity between the strings, while 1 means exact match.

See also: https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance.

Heads up, before version 0.3.0 this function returned Ratio Natural.