Tracking the cause of write queues

leandrojmp · December 15, 2021, 4:04am

Hello @DavidTurner and @Christian_Dahlqvist

I was finally able to track the cause using the following request:

GET _tasks?nodes=node-name&actions=*write*&detailed

Then I wrote a small python script to parse the json response and get only the task description with the index name and the task running_time_in_nanos, this allowed me to easily see which was the index that was taking more time in the indices:data/write/bulk action.

It was the only pipeline that sends data directly from filebeat to Elasticsearch, this pipeline uses a third party module with an ingest pipeline (wazuh) and we also created a final_pipeline to do some extra processing, recently we added another ingest pipeline to do some enriching, this pipeline is called from the final_pipeline using the pipeline processor.

The enrich pipeline is composed of some set processors, something around a hundred set processors, that have the following format:

{
    "set": {
        "field": "event.metadata",
        "value": "authentication;start;logged-in",
        "override": false,
        "if": "ctx.event?.code == '4624'",
        "ignore_failure": true
    }
},
{
    "set": {
        "field": "event.metadata",
        "value": "authentication;start;logon-failed",
        "override": false,
        "if": "ctx.event?.code == '4625'",
        "ignore_failure": true
    }
}

We created this enrich pipeline using set processors to replace an enrich processor we tried to use before and the performance was even worst, in that case the load on all hot nodes doubled as soon as the enrich processor was enabeld, I've made a topic about it if you want to read.

After I removed this pipeline with hundreds of set processors, the load of the node returned to a normal value, similar to the load of the other hot nodes.

What I do not understand now is why just one node had performance issues. Shouldn't the ingest be balanced between all the four nodes? Would ingest nodes only help solve this issue and it is possible to make just one pipeline use some specific ingest node?

Also, what are the improvements of ingests pipelines in newer versions? We have an update planned for next month, and I'm thinking if i should give another shot to do the enrich directly on Elasticsearch or move the ingesting to Logstash that can do what I need pretty quickly.

Thanks, you can close the topic.

Topic		Replies	Views
High CPU load Elasticsearch	10	794	May 10, 2022
[6.8.2] Unusual Server Load Elasticsearch	7	648	September 9, 2019
Write queue continue to rise Elasticsearch	23	3727	February 4, 2020
High CPU on some nodes in the cluster Elasticsearch	7	422	July 6, 2017
How to find problematic search which contributes to high load and CPU usages? Elasticsearch	11	633	December 18, 2020

Tracking the cause of write queues

Related topics