Search Handler¶
Source code in tokensmith/search/handler.py
count
¶
Counts the occurrences of a query in the index.
Source code in tokensmith/search/handler.py
contains
¶
Checks if a query is present in the index.
Source code in tokensmith/search/handler.py
positions
¶
Returns an unordered list of positions where query
starts in tokens
.
Source code in tokensmith/search/handler.py
count_next
¶
Count the occurrences of each token directly following query
.
Source code in tokensmith/search/handler.py
batch_count_next
¶
Count the occurrences of each token directly following each query in a batch.
Source code in tokensmith/search/handler.py
sample_smoothed
¶
Sample num_samples
sequences of length k
that follow query
based on previous (n- 1) characters (n-gram prefix). Uses a Kneser-New smoothed conditional distribution. If less than (n - 1) characters are available, it uses all available characters.
Source code in tokensmith/search/handler.py
sample_unsmoothed
¶
Sample num_samples
sequences of length k
that follow query
based on previous characters (n-gram prefix). If less than (n - 1) characters are available, it uses all available characters.
Source code in tokensmith/search/handler.py
get_smoothed_probs
¶
Get the interpolated Kneser-Ney smoothed token probability distribution using all previous tokens in the query.
Source code in tokensmith/search/handler.py
batch_get_smoothed_probs
¶
Get the interpolated Kneser-Ney smoothed token probability distribution using all previous tokens in each query.
Source code in tokensmith/search/handler.py
estimate_delta
¶
Warning: O(k**n) where k is vocabulary size, use with caution. Improve smoothed model quality by replacing the default delta hyperparameters for models of order n and below with improved estimates over the entire index. https://people.eecs.berkeley.edu/~klein/cs294-5/chen_goodman.pdf, page 16.