I've just implemented an autocomplete suggestion using Elasticsearch which works like a charm.
The index-size is just about 100kb, containing about 500 filter-combinations which clients then use to make further requests to the webserver.
Considering that Elasticsearch uses about 1.2Gb of memory I feel like using a
sledgehammer to crack a nut here.
Is Elasticsearch suitable for small datasets like this? Or are there other more lightweight frameworks for such a use case?
I really could not find much on this topic, in other threads people consider datasets of several Gb to be small.
As @warkolm notes, seems excessive and I'd suggest simple text file or JSON or SQLite, but for each you need to consider how you update them, i.e. how static this is, and for SQLite, the multi-threaded needs (as I recall some languages/libs not handle this well, including & especially in Golang).
Depending on your language, SQLite is probably the best, as easily handles data to a GB, uses SQL which is nice, does locking, handles updates, etc.
If you don't update often, then at text/JSON file works but use atomic updates, like write the file to a temp name and then rename current/new so it's atomic (and any readers will retain the open file handle on the old file); never write to a text file other can be reading, of course.
The website i have implemented the solution for already uses a relational database,
but mostly relies on doctrine-ORM, which often results in performance issues even when adding in native queries to speed things up.
I kind of assumed that for such a small dataset, my approach would be considered excessive.
Still it keeps me wondering, where to draw the line?
Since my post yesterday I went along and indexed the biggest chunks of data using logstash. Still the index-Sizes are in total under 100Mb.
Using Elastic to make geo-queries, text search and different bool queries far outperforms any sql query, while offering a great api/documentation.
Even if I were to implement all these functionalities myself I wonder if it would be worth the time and effort if I can just rely on a proven technology like Elasticsearch.
After all an additional Gb of Ram is not that excessive when considering the development cost of doing everything yourself efficiently, or does Elastic come with huge overhead besides the Ram requirements?
Ah, I'm confused - thought you had a tiny 100KB data set of 500 items, but this is really a 100MB Geo-based system? That's totally different - agreed that ES might be good for that even at this scale, as you note.
The only issue is how to replicate or backup the data, i.e. 2-3 nodes, and so on depending how reliable it needs to be.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.