Errors while doing bulk update, Am I doing this wrong?

Hi,

I am new to elasticsearch and I am unable to figure out ideal configurations for my setup.

I have 2 nodes, 1 master/data node and 1 data node on 2 8GB RAM machines running on core processor 2.6Ghz.
I have a months backlog of data that I want to update into ElasticSearch from PostGres.

How I am doing it currently ?
I am using Python elasticsearch api to make Bulk queries to elasticsearch for batches of 1000 documents. It runs well for first few documents after that I start seeing errors like : -

  • Out of Memory, java heap size.
  • Flush Fail
  • The data node is unable to get response from master
  • Master responded too late
  • and finally request Timeout

I have tried reducing the number of documents to 250 and increasing the timeout to 3000s but I still end up with the same errors eventually.

The data model, so we are creating 1 Index per User and each Index contains 3 types. At the moment we have around 50 Users on our platform and the total data size is around 15Gb. The data will be updated in 1 batch process daily, so its bound to increase.

What we are using Elastic Search for ?
Mainly for aggregating (Groupby and filters) data points, there is no NLP involved or any textual search yet.

  • Am I using elasticsearch for the wrong purpose ?
  • Is my data model wrong for these kind of machines ?
  • Do i need better hardware ?
  • or can I improve this with some configuration changes ?

Thanks in Advance :smile:

P.S. Appologies, there is a lot of information is in the guide, I tried to go through it and make a few choices but so far those are definitely not working for us.

How much data is being stored when you get the OOM?

around 200MB

That's pretty odd then.

What are your ES settings?

zen is disabled. and others set to defaults So far

just 2 nodes with 1 as master/data and 1 as data only.

Master/Data Node
cluster.name: elasticsearch
node.master: true
node.data: true
bootstrap.mlockall: true
http.port: 9200
discovery.zen.ping.multicast.enabled: false

Data Node
cluster.name: elasticsearch
node.master: false
node.data: true
bootstrap.mlockall: true
http.port: 9200
discovery.zen.ping.multicast.enabled: false

How big is the JVM heap? With 8 GB RAM a 4 GB heap size would be in order, but unless you've changed the default I believe you're stuck with 1 GB.

Also to that, if you have multicast disabled then you need to set a unicast list.

unicast list containing the address of master is there

yes this seems to be the problem, will update it get back to you guys :smiley:

Thank you very much

Thanks guys

The bulk processing seems to be working now :smile: