Hi,
I am new to elasticsearch and I am unable to figure out ideal configurations for my setup.
I have 2 nodes, 1 master/data node and 1 data node on 2 8GB RAM machines running on core processor 2.6Ghz.
I have a months backlog of data that I want to update into ElasticSearch from PostGres.
How I am doing it currently ?
I am using Python elasticsearch api to make Bulk queries to elasticsearch for batches of 1000 documents. It runs well for first few documents after that I start seeing errors like : -
- Out of Memory, java heap size.
- Flush Fail
- The data node is unable to get response from master
- Master responded too late
- and finally request Timeout
I have tried reducing the number of documents to 250 and increasing the timeout to 3000s but I still end up with the same errors eventually.
The data model, so we are creating 1 Index per User and each Index contains 3 types. At the moment we have around 50 Users on our platform and the total data size is around 15Gb. The data will be updated in 1 batch process daily, so its bound to increase.
What we are using Elastic Search for ?
Mainly for aggregating (Groupby and filters) data points, there is no NLP involved or any textual search yet.
- Am I using elasticsearch for the wrong purpose ?
- Is my data model wrong for these kind of machines ?
- Do i need better hardware ?
- or can I improve this with some configuration changes ?
Thanks in Advance
P.S. Appologies, there is a lot of information is in the guide, I tried to go through it and make a few choices but so far those are definitely not working for us.