Performance Help

jprante · March 12, 2016, 12:18am

@tgdesrochers But that does not align well with your previous observation:

If you have a large amount of logs, it could be handled by more nodes . Nodes do not time out just because of mapping or indexing, reporting a mapping timeout of 30s is just a coincidence. Check if your file system / disk subsystem is slowing down the indexing.

tgdesrochers · March 25, 2016, 12:42pm

 "status"=>500, "error"=>{"type"=>"timeout_exception", "reason"=>"Failed to acknowledge mapping update within [30s]"}}}, :level=>:warn

Can I revisit this error.

While searching my /var/log/logstash/logstash.log on all of my logstash nodes the only error I am seeing is the above. Is there a way to increase the mapping timeout. I realize this may just be a bandaid but it may help.

I don't know why I keep getting this error but it is causing loss of logs and loss of data which isn't acceptable in my environment. I am happy to check anything to assure that there isn't any other issue with the ES nodes.

I have 8 logstash nodes pulling from kafka and pushing to 12 ES data nodes. At peak I am indexing 14000 records per second. I don't see any problem with I/O speeds of my data nodes. They are all 2 TB SSD HDD with very very fast read/write. The logs are small in nature but they are constant coming from a BRO ids.

Thanks in advance

Christian_Dahlqvist · March 25, 2016, 1:51pm

Have you tried feeding Logstash data through a file input instead of reading from Kafka in order to see if that makes a difference?

tinle · March 25, 2016, 4:43pm

FYI. I have 33 logstash nodes ingesting from kafka, feeding to 31 ES nodes. All ES are using spinning 2x1TB drives and at peak, I get 120K records/s. Record size around 1K+ in size, they are coming from ATS. So it is certainly doable on spinning drives.

I do see occasional errors like yours, around the time when new indices are created.

tgdesrochers · March 25, 2016, 4:58pm

Mine also appears to be when new indices are created or when the first of a record type is seen in a new index. But it causes data loss, and I really need to assure I get all records.

Plus I'd like to know the root cause of the issue and fix it.

I did not try to transfer a file to a kafka node and transfer it directly into the ES cluster. I can't do it from the sensor because the sensor cannot talk to the ES cluster for a variety of reasons. I will try from a kafka node with the file input when I can.

Christian_Dahlqvist · March 25, 2016, 5:32pm

The errors seem related to the cluster state taking long to update and/or propagate. As you are on Elasticsearch 2.2, which supports delta cluster state updates, I would expect cluster state updates to be reasonably quick. What is the specification of your dedicated master nodes? Do you have any client nodes? Do you see evidence of long garbage collection in the logs?

tgdesrochers · May 24, 2016, 12:36pm

Sorry for the HUGE delay in responding but other duties pulled me away.

I am still seeing the issue.

My specs on the data nodes are:
12 data nodes
16 cores
32 GB RAM
6TB HDD on SSDs

I have extra I can throw at RAM and cores if needed.

I have 2 client nodes that I have kibana pointed to. Should I build more and have my logstash nodes point to the clients instead of directly into data nodes?

These are all VM's not metal servers. This is all being done in a corporate "cloud" environment and the metal isn't available to me.

Topic		Replies	Views
Performance degradation after upgrading to ES 2.2.0 Elasticsearch	1	505	July 5, 2017
Help with a logstash instance processing 4000 messages/s Logstash	13	1113	March 13, 2017
Suggestions - Hardware and software configurations of Logstash, Elasticsearch, Kibana Elasticsearch	2	1928	July 5, 2017
Kafka ingest performance issues Logstash	6	1924	September 3, 2019
ElasticSearch 5.2 indexing rate performance Elasticsearch	7	3019	March 18, 2017

Performance Help

Related topics