ReadTimeout

We are currently sending data to elastic search using a python plugin and suddenly started getting read time out errors on elastic. So i increased the read timeout to 60 seconds. The process ran for 6 hours and failed with same error again. We are writing about 300 million records to the index. The index write performance has been very slow as well. Can anyone help?

What is the specification of your cluster?

What is the average size of the document?

Are you assigning document IDs to documents in the application or letting Elasticsearch assign them?

Have you followed these recommendations?

I have a 8C/64GB/4.6TB SSD machine. Its just a single node.

The documents are fairly large. (I will have to check exact size)

The document IDs are assigned in the application.

If you are assigning document ids in the application, each indexing operation will also require a read as Elasticsearch will need to check if the document already exists. This tends to get slower the larger your shards are. Do you have monitoring in place so you can see if indexing performance is dropping with increased shard size?

Not sure what kind of monitoring is available out of the box but I dont have xpack. This is the community edition.

I would recommend adding monitoring, as that would make it easier to troubleshoot something like this and see patterns/trends.

But that requires me to add a license to xpack, Elastic doesnt make it easy! They want $6,600 per node for 3 nodes even if you have less than 3 nodes.

Monitoring is part of the free Basic license, which can be used in production. If you want to monitor it some other way I am sure there are other tools available as well. See here for more details.

Which version of Elasticsearch are you using?

Oh! I did not know that. Let me install the monitoring piece then. Thanks for clarification, ill report back.

Ok I have monitoring installed via XPACK. What should I check for here?

It would be interesting to see if the indexing rate drops with the size of the shards, but that will require data to be gathered for some time.

Do you want me to send the data and watch the shards and performance in general?

Watch the indexing rate and data volume as data is being indexed, then share a screenshot if you see something interesting.

I am populating the index with a small set (48MM records) right now. One thing I did notice is that the memory is only 1GB! and I changed the heap to 16GB so I should hopefully see some performance increase the next time I run something, however I am currently indexing so cant restart the services for the new memory to take into effect just yet.

During the import Elastic plugin went red. Log file tells it went out of memory since there is a Java out of memory exception.

I tried increasing the memory and I am still seeing the same issue! What else can I try?

After I increased the memory the CPU usage goes to 100% and then elastic index goes red and the CPU stays 100%.

How much heap did you configure? Is there anything in the logs? What does the hot threads API show when the node is busy? What is the output of the cluster stats API? Which version of Elasticsearch are you using?

I gave 16GB memory to the heap. I did not check hot threads but I can as well as the cluster stats API. After increasing the heap to 16G I see that the data moves to elastic much faster. The whole process still takes 1 hour or so and then fails. When I to a top on the linux box, elastic is having a high cpu usage.

Actually after making the change from 1GB to 16GB I see it going down more than before. How do I get to the bottom of this? I am sure its the JVM process or something because when it goes down, I see the CPU usage to > 100% on elastic