Errors while doing bulk update, Am I doing this wrong?

heaven00 · October 26, 2015, 5:27am

Hi,

I am new to elasticsearch and I am unable to figure out ideal configurations for my setup.

I have 2 nodes, 1 master/data node and 1 data node on 2 8GB RAM machines running on core processor 2.6Ghz.
I have a months backlog of data that I want to update into ElasticSearch from PostGres.

How I am doing it currently ?
I am using Python elasticsearch api to make Bulk queries to elasticsearch for batches of 1000 documents. It runs well for first few documents after that I start seeing errors like : -

Out of Memory, java heap size.
Flush Fail
The data node is unable to get response from master
Master responded too late
and finally request Timeout

I have tried reducing the number of documents to 250 and increasing the timeout to 3000s but I still end up with the same errors eventually.

The data model, so we are creating 1 Index per User and each Index contains 3 types. At the moment we have around 50 Users on our platform and the total data size is around 15Gb. The data will be updated in 1 batch process daily, so its bound to increase.

What we are using Elastic Search for ?
Mainly for aggregating (Groupby and filters) data points, there is no NLP involved or any textual search yet.

Am I using elasticsearch for the wrong purpose ?
Is my data model wrong for these kind of machines ?
Do i need better hardware ?
or can I improve this with some configuration changes ?

Thanks in Advance

P.S. Appologies, there is a lot of information is in the guide, I tried to go through it and make a few choices but so far those are definitely not working for us.

warkolm · October 26, 2015, 5:58am

How much data is being stored when you get the OOM?

heaven00 · October 26, 2015, 6:29am

around 200MB

warkolm · October 26, 2015, 6:30am

That's pretty odd then.

What are your ES settings?

heaven00 · October 26, 2015, 6:31am

zen is disabled. and others set to defaults So far

just 2 nodes with 1 as master/data and 1 as data only.

Master/Data Node
cluster.name: elasticsearch
node.master: true
node.data: true
bootstrap.mlockall: true
http.port: 9200
discovery.zen.ping.multicast.enabled: false

Data Node
cluster.name: elasticsearch
node.master: false
node.data: true
bootstrap.mlockall: true
http.port: 9200
discovery.zen.ping.multicast.enabled: false

magnusbaeck · October 26, 2015, 6:42am

How big is the JVM heap? With 8 GB RAM a 4 GB heap size would be in order, but unless you've changed the default I believe you're stuck with 1 GB.

warkolm · October 26, 2015, 6:51am

Also to that, if you have multicast disabled then you need to set a unicast list.

heaven00 · October 26, 2015, 6:52am

unicast list containing the address of master is there

heaven00 · October 26, 2015, 6:53am

yes this seems to be the problem, will update it get back to you guys

Thank you very much

heaven00 · October 26, 2015, 9:03am

Thanks guys

The bulk processing seems to be working now

Topic		Replies	Views
Elasticsearch sizing Elasticsearch	3	545	January 27, 2018
Timeout Elasticsearch	4	900	July 6, 2017
Elasticsearch 1.5.2 High JVM Heap in 2 nodes even without bulk indexing Elasticsearch	1	852	July 5, 2017
Scalability problems Elasticsearch	7	552	May 23, 2020
BulkUpdate elasticsearch issue Elasticsearch	4	778	July 5, 2017

Errors while doing bulk update, Am I doing this wrong?

Related topics