Name-QuickSearch-0.2.0.0
Safe HaskellNone
LanguageHaskell2010

QuickSearch

Synopsis

Documentation

buildTokenPartitions Source #

Arguments

:: (Hashable uid, Eq uid) 
=> [Entry Text uid]

List of entries

-> HashMap Token (HashSet uid)

A map of Token -> [uids]

Given the list of entries to be held by QuickSearch, return a HashMap keyed on tokens from the strings in the entries, where the associated HashMap value is the list of uids of entries containing the token.

buildQuickSearch Source #

Arguments

:: (Hashable uid, Eq uid) 
=> [(Text, uid)]

List of entries to be searched

-> QuickSearch uid

QuickSearch object holding token partitions

Given a list of pairs of (T.Text, uid) to be searched, create a QuickSearch object.

rawBuildQuickSearch Source #

Arguments

:: (Hashable uid, Eq uid) 
=> [Entry Text uid]

List of entries to be searched

-> QuickSearch uid

QuickSearch object holding token partitions

Given a list of entries to be searched, create a QuickSearch object.

matchesWithThreshold Source #

Arguments

:: (Hashable uid, Eq uid) 
=> QuickSearch uid

QuickSearch object holding token partitions

-> Int

Threshold score above which to return results

-> Scorer

String similarity function of type (Text -> Text -> Ratio Int)

-> Text

String to be searched

-> [Match Score (Entry Text uid)]

Top N most similar entries

Given a QuickSearch object, scorer, and string, return all matches with a score greater than the given threshold.

topNMatches Source #

Arguments

:: (Hashable uid, Eq uid) 
=> QuickSearch uid

QuickSearch object holding token partitions

-> Int

N: Number of results to return

-> Scorer

String similarity function of type (Text -> Text -> Ratio Int)

-> Text

String to be searched

-> [Match Score (Entry Text uid)]

Top N most similar entries

Given a QuickSearch object, scorer, and string, return the top N matches.

batch Source #

Arguments

:: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) 
=> (QuickSearch uid2 -> Int -> Scorer -> Text -> [Match Score (Entry Text uid2)])

A match retrieval function, such as topNMatches

-> QuickSearch uid2

QuickSearch object holding token partitions

-> Int

The reference number for the match retrieval function. N for topNMatches, threshold for matchesWithThreshold

-> Scorer

String similarity function of type (Text -> Text -> Ratio Int)

-> [(Text, uid1)]

List of entries to be processed

-> [(Entry Text uid1, [Match Score (Entry Text uid2)])]

List of entries and the results returned for each.

Turn a match retrieval function into one that works on lists of entries.

batchTopNMatches Source #

Arguments

:: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) 
=> QuickSearch uid2

QuickSearch object holding token partitions

-> Int

N: Number of results to return

-> Scorer

String similarity function of type (Text -> Text -> Ratio Int)

-> [(Text, uid1)]

List of entries to be processed

-> [(Entry Text uid1, [Match Score (Entry Text uid2)])]

List of entries and up to the top N matches for each.

Version of topNMatches that processes lists of entries instead of strings.

batchMatchesWithThreshold Source #

Arguments

:: (Hashable uid1, Eq uid1, Hashable uid2, Eq uid2) 
=> QuickSearch uid2

QuickSearch object holding token partitions

-> Int

Score threshold above which to keep results

-> Scorer

String similarity function of type (Text -> Text -> Ratio Int)

-> [(Text, uid1)]

List of entries to be processed

-> [(Entry Text uid1, [Match Score (Entry Text uid2)])]

List of entries and their matches above the score threshold.

Version of matchesWithThreshold that processes lists of entries instead of strings.

type Score = Int Source #

newtype Entry name uid Source #

Structure associating a name with its unique identifier

Constructors

Entry (name, uid) 

Instances

Instances details
Bifunctor Entry Source # 
Instance details

Defined in QuickSearch.Internal.Filter

Methods

bimap :: (a -> b) -> (c -> d) -> Entry a c -> Entry b d #

first :: (a -> b) -> Entry a c -> Entry b c #

second :: (b -> c) -> Entry a b -> Entry a c #

(Eq name, Eq uid) => Eq (Entry name uid) Source # 
Instance details

Defined in QuickSearch.Internal.Filter

Methods

(==) :: Entry name uid -> Entry name uid -> Bool #

(/=) :: Entry name uid -> Entry name uid -> Bool #

(Show name, Show uid) => Show (Entry name uid) Source # 
Instance details

Defined in QuickSearch.Internal.Filter

Methods

showsPrec :: Int -> Entry name uid -> ShowS #

show :: Entry name uid -> String #

showList :: [Entry name uid] -> ShowS #

entryName :: Entry name uid -> name Source #

Name accessor for an Entry

entryUID :: Entry name uid -> uid Source #

UID accessor for an Entry

newtype Match score entry Source #

Structure associating a Score with an Entry, for holding search results

Constructors

Match (score, entry) 

Instances

Instances details
Bifunctor Match Source # 
Instance details

Defined in QuickSearch.Internal.Matcher

Methods

bimap :: (a -> b) -> (c -> d) -> Match a c -> Match b d #

first :: (a -> b) -> Match a c -> Match b c #

second :: (b -> c) -> Match a b -> Match a c #

(Eq score, Eq entry) => Eq (Match score entry) Source # 
Instance details

Defined in QuickSearch.Internal.Matcher

Methods

(==) :: Match score entry -> Match score entry -> Bool #

(/=) :: Match score entry -> Match score entry -> Bool #

(Show score, Show entry) => Show (Match score entry) Source # 
Instance details

Defined in QuickSearch.Internal.Matcher

Methods

showsPrec :: Int -> Match score entry -> ShowS #

show :: Match score entry -> String #

showList :: [Match score entry] -> ShowS #

newtype QuickSearch uid Source #

List of entries to be searched and a HashMap associating tokens with HashSets of UIDs related to entries containing the tokens.

Constructors

QuickSearch ([Entry Text uid], HashMap Token (HashSet uid)) 

Instances

Instances details
Show uid => Show (QuickSearch uid) Source # 
Instance details

Defined in QuickSearch.Internal.Matcher

Methods

showsPrec :: Int -> QuickSearch uid -> ShowS #

show :: QuickSearch uid -> String #

showList :: [QuickSearch uid] -> ShowS #

damerauLevenshteinNorm :: Text -> Text -> Ratio Int #

Return normalized Damerau-Levenshtein distance between two Text values. 0 signifies no similarity between the strings, while 1 means exact match.

See also: https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance.

Heads up, before version 0.3.0 this function returned Ratio Natural.

jaro :: Text -> Text -> Ratio Int #

Return Jaro distance between two Text values. Returned value is in the range from 0 (no similarity) to 1 (exact match).

While the algorithm is pretty clear for artificial examples (like those from the linked Wikipedia article), for arbitrary strings, it may be hard to decide which of two strings should be considered as one having “reference” order of characters (order of matching characters in an essential part of the definition of the algorithm). This makes us consider the first string the “reference” string (with correct order of characters). Thus generally,

jaro a b ≠ jaro b a

This asymmetry can be found in all implementations of the algorithm on the internet, AFAIK.

See also: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance

Heads up, before version 0.3.0 this function returned Ratio Natural.

Since: text-metrics-0.2.0

jaroWinkler :: Text -> Text -> Ratio Int #

Return Jaro-Winkler distance between two Text values. Returned value is in range from 0 (no similarity) to 1 (exact match).

See also: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance

Heads up, before version 0.3.0 this function returned Ratio Natural.

Since: text-metrics-0.2.0