When do you really need full text search capabilities? Need help understanding tradeoffs

Should we add Elastic for our very simple searches on metadata fields that have a few words (<10) such as titles? Does this constitute requiring "Full Text Search"?

We have a database w/ postgres to store all our metadata. Most of our searches are filters for exact matches, but we have some metadata fields that are unstructured text, such as titles. I don't currently expect us to need to search for text beyond a few words per field (titles, artists, taglines, etc...). We also need to to filter or sort by date ranges, region, duration, etc...

For those fields that have a few words, I believe we would still want to do some analysis like lowercasing, removing punctuation, and tokenization. Then do we really need elasticsearch w/ full blown TF/IDF, scoring, boosting, etc... or are there some simple things that could be done to augment our metadata database?

With postgresql supporting simpler functionality through LIKE, ~, etc.. and more extensive full text search features, the decision between ES vs postgres appears more nuanced.

We ideally want to reduce latency for when reads reflect writes in our system. We have a low QPS service that serves users within a fixed region, but users can make rather large writes. Ideally that latency would be reduced, as we don't want to confuse users that their reads did not occur. ES currently adds to that latency, because we currently scan for updates in postgres, create the updated document, and push to ES.

For our cases, we want high precision and are okay with lower recall. But if the word is mistyped (e.g. "are not" versus "aren't") for a title, then it's a case where we actually want to recall the correct form. We don't care for documents that might have a few of the words for the most part, so this is what I mean by "lower recall".

The scoring you can get from full text search is often quite useful. Especially for text fields with more than a few words. Playing with the analysis stuff can get you a long way too.

But Postgres is a very respectable database and if it works for you should use it. If you need sorting based on match quality that it can't provide or you need horizontal scalability you can't get from it or something like that then we are here for you. We have a ton of great stuff, but postgres does too. Different stuff.

Anyway. So long as it works and you have a disaster recovery plan like restoring from a backup or whatever then feel good.

1 Like

Oh and when I read more closely I see you use es for some search already..... You can use GIN to do some of that stuff in postgres if you like, but it's difficult. At least it was when I last looked.

One thing you could look at is refresh=wait_for in es. It'll block the response from a bulk or index operation until it is visible. You could use that to signal folks as soon as it become available. Hopefully to cut down on the confusion from the latency.

1 Like

Thank you very much @nik9000

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.