(Adwait Joshi) #1

We are currently sending data to elastic search using a python plugin and suddenly started getting read time out errors on elastic. So i increased the read timeout to 60 seconds. The process ran for 6 hours and failed with same error again. We are writing about 300 million records to the index. The index write performance has been very slow as well. Can anyone help?

(Christian Dahlqvist) #2

What is the specification of your cluster?

What is the average size of the document?

Are you assigning document IDs to documents in the application or letting Elasticsearch assign them?

Have you followed these recommendations?

(Adwait Joshi) #3

I have a 8C/64GB/4.6TB SSD machine. Its just a single node.

The documents are fairly large. (I will have to check exact size)

The document IDs are assigned in the application.

(Christian Dahlqvist) #4

If you are assigning document ids in the application, each indexing operation will also require a read as Elasticsearch will need to check if the document already exists. This tends to get slower the larger your shards are. Do you have monitoring in place so you can see if indexing performance is dropping with increased shard size?

(Adwait Joshi) #5

Not sure what kind of monitoring is available out of the box but I dont have xpack. This is the community edition.

(Christian Dahlqvist) #6

I would recommend adding monitoring, as that would make it easier to troubleshoot something like this and see patterns/trends.

(Adwait Joshi) #7

But that requires me to add a license to xpack, Elastic doesnt make it easy! They want $6,600 per node for 3 nodes even if you have less than 3 nodes.

(Christian Dahlqvist) #8

Monitoring is part of the free Basic license, which can be used in production. If you want to monitor it some other way I am sure there are other tools available as well. See here for more details.

Which version of Elasticsearch are you using?

(Adwait Joshi) #9

Oh! I did not know that. Let me install the monitoring piece then. Thanks for clarification, ill report back.

(Adwait Joshi) #10

Ok I have monitoring installed via XPACK. What should I check for here?

(Christian Dahlqvist) #11

It would be interesting to see if the indexing rate drops with the size of the shards, but that will require data to be gathered for some time.

(Adwait Joshi) #12

Do you want me to send the data and watch the shards and performance in general?

(Christian Dahlqvist) #13

Watch the indexing rate and data volume as data is being indexed, then share a screenshot if you see something interesting.

(Adwait Joshi) #14

I am populating the index with a small set (48MM records) right now. One thing I did notice is that the memory is only 1GB! and I changed the heap to 16GB so I should hopefully see some performance increase the next time I run something, however I am currently indexing so cant restart the services for the new memory to take into effect just yet.

(Adwait Joshi) #15

During the import Elastic plugin went red. Log file tells it went out of memory since there is a Java out of memory exception.

(Adwait Joshi) #16

I tried increasing the memory and I am still seeing the same issue! What else can I try?

(Adwait Joshi) #17

After I increased the memory the CPU usage goes to 100% and then elastic index goes red and the CPU stays 100%.

(Christian Dahlqvist) #18

How much heap did you configure? Is there anything in the logs? What does the hot threads API show when the node is busy? What is the output of the cluster stats API? Which version of Elasticsearch are you using?

(Adwait Joshi) #19

I gave 16GB memory to the heap. I did not check hot threads but I can as well as the cluster stats API. After increasing the heap to 16G I see that the data moves to elastic much faster. The whole process still takes 1 hour or so and then fails. When I to a top on the linux box, elastic is having a high cpu usage.

(Adwait Joshi) #20

Actually after making the change from 1GB to 16GB I see it going down more than before. How do I get to the bottom of this? I am sure its the JVM process or something because when it goes down, I see the CPU usage to > 100% on elastic