Errors while doing bulk update, Am I doing this wrong?


(Heaven00) #1

Hi,

I am new to elasticsearch and I am unable to figure out ideal configurations for my setup.

I have 2 nodes, 1 master/data node and 1 data node on 2 8GB RAM machines running on core processor 2.6Ghz.
I have a months backlog of data that I want to update into ElasticSearch from PostGres.

How I am doing it currently ?
I am using Python elasticsearch api to make Bulk queries to elasticsearch for batches of 1000 documents. It runs well for first few documents after that I start seeing errors like : -

  • Out of Memory, java heap size.
  • Flush Fail
  • The data node is unable to get response from master
  • Master responded too late
  • and finally request Timeout

I have tried reducing the number of documents to 250 and increasing the timeout to 3000s but I still end up with the same errors eventually.

The data model, so we are creating 1 Index per User and each Index contains 3 types. At the moment we have around 50 Users on our platform and the total data size is around 15Gb. The data will be updated in 1 batch process daily, so its bound to increase.

What we are using Elastic Search for ?
Mainly for aggregating (Groupby and filters) data points, there is no NLP involved or any textual search yet.

  • Am I using elasticsearch for the wrong purpose ?
  • Is my data model wrong for these kind of machines ?
  • Do i need better hardware ?
  • or can I improve this with some configuration changes ?

Thanks in Advance :smile:

P.S. Appologies, there is a lot of information is in the guide, I tried to go through it and make a few choices but so far those are definitely not working for us.


(Mark Walkom) #2

How much data is being stored when you get the OOM?


(Heaven00) #3

around 200MB


(Mark Walkom) #4

That's pretty odd then.

What are your ES settings?


(Heaven00) #5

zen is disabled. and others set to defaults So far

just 2 nodes with 1 as master/data and 1 as data only.

Master/Data Node
cluster.name: elasticsearch
node.master: true
node.data: true
bootstrap.mlockall: true
http.port: 9200
discovery.zen.ping.multicast.enabled: false

Data Node
cluster.name: elasticsearch
node.master: false
node.data: true
bootstrap.mlockall: true
http.port: 9200
discovery.zen.ping.multicast.enabled: false


(Magnus B├Ąck) #6

How big is the JVM heap? With 8 GB RAM a 4 GB heap size would be in order, but unless you've changed the default I believe you're stuck with 1 GB.


(Mark Walkom) #7

Also to that, if you have multicast disabled then you need to set a unicast list.


(Heaven00) #8

unicast list containing the address of master is there


(Heaven00) #9

yes this seems to be the problem, will update it get back to you guys :smiley:

Thank you very much


(Heaven00) #10

Thanks guys

The bulk processing seems to be working now :smile:


(system) #11