Pages

Thursday, May 5, 2016

Bags of Words vs. Bags of Things

 What most users don’t know is that the search engine is answering a different question than they are asking – it returns the documents that have the words that they entered (or synonyms if the application designers have provided this) which is sometimes not exactly “what” they are looking for.

https://lucidworks.com/blog/2014/07/02/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/
The user is telling the search engine what they want. They are looking for specific things, not specific words. The more that we can do to redress this mismatch between tokens and things, the better the user experience will be.