I'm trying to improve the indexing performance on my cluster and everything listed in the tune for indexing speed document was already checked.
Currently the indexing is done using Logstash with the
elasticsearch output and a couple of filebeats sending directly to elasticsearch, the indexing is done on 4 hot nodes, with 10 vCPU, 64 GB of RAM (30 GB of Java Heap) and SSD backed disks.
Normally I do not have any indexing problem and the data is almost near real time, but sometimes a couple of pipelines get behind and using the
hot_threads endpoint I can see that one or more nodes show that the
[write] action is one of the
hot_threads, when this happens, the load of the node is also high.
I was still not able to track which index or pipeline is causing this as the
hot_threads response does not have this information.
But since I'm in the process to optimize the index mappings, I was thinking if there is something that I can change at the mapping level to improve indexing speed.
Almost all my logs are security logs from network devices, applications, SaaS logs and things like that, I use the discover in Kibana and the SIEM/Detections interface to search and a couple of python scripts to trigger alerts and actions.
I do not need any kind of score in the searchs, looking at the mapping parameters, I found two things that I can change that I think would help my indexing speed.
One is set
docs as I just need to know if a string is in a text field, the position and frequency doesn't matter.
The other is set
false as score also doesn't matter.
Anyone has some experience if changing those settings can help indexing speed? Is there any other settings that I could change to improve indexing speed?