Enrich Processor missing documents

FKarraz · March 29, 2023, 2:14pm

Hi, i have several ingest pipelines that has quite large processor configured in it. Each pipeline for each Data Stream. For example, pipeline "2g_names" works with "raw_kpi_2g_" (raw_kpi_2g_1) Data Stream, "3g_names" for "raw_kpi_3g_" (raw_kpi_3g_1, raw_kpi_3g_2, raw_kpi_3g_3 & raw_kpi_3g_4) and "4g_names" works with "raw_kpi_4g_*" (raw_kpi_4g_1, raw_kpi_4g_2, raw_kpi_4g_3, raw_kpi_4g_4 & raw_kpi_4g_5).

The Elasticsearch version is 7.12.1

This is one of the pipelines ("3g_names"):

{
  "3g_names" : {
    "description" : "Enrich 3G data with names",
    "processors" : [
      {
        "enrich" : {
          "field" : "RNC.dn",
          "policy_name" : "mo_name",
          "target_field" : "RNC",
          "ignore_missing" : true,
          "ignore_failure" : true
        }
      },
      {
        "enrich" : {
          "field" : "WBTS.dn",
          "policy_name" : "mo_name",
          "target_field" : "WBTS",
          "ignore_missing" : true,
          "ignore_failure" : true
        }
      },
      {
        "enrich" : {
          "field" : "WCEL.dn",
          "policy_name" : "mo_name",
          "target_field" : "WCEL",
          "ignore_missing" : true,
          "ignore_failure" : true
        }
      },
      {
        "enrich" : {
          "field" : "WBTS.name",
          "policy_name" : "location_data",
          "target_field" : "location",
          "ignore_missing" : true,
          "ignore_failure" : true
        }
      },
      {
        "set" : {
          "field" : "location.name",
          "value" : "{{location.shortName}}",
          "ignore_failure" : true
        }
      },
      {
        "remove" : {
          "field" : "location.shortName",
          "ignore_missing" : true,
          "ignore_failure" : true
        }
      },
      {
        "enrich" : {
          "field" : "location.name",
          "policy_name" : "tech_correlation",
          "target_field" : "general",
          "ignore_missing" : true,
          "ignore_failure" : true
        }
      },
      {
        "enrich" : {
          "field" : "location.name",
          "policy_name" : "jefaturas_data_v2",
          "target_field" : "jefaturas",
          "ignore_missing" : true,
          "ignore_failure" : true
        }
      },
      {
        "enrich" : {
          "field" : "WBTS.name",
          "policy_name" : "site_name",
          "target_field" : "temp",
          "ignore_missing" : true,
          "ignore_failure" : true
        }
      },
      {
        "set" : {
          "field" : "location.siteName",
          "value" : "{{temp.siteName}}",
          "ignore_empty_value" : true,
          "ignore_failure" : true
        }
      },
      {
        "remove" : {
          "field" : "temp",
          "ignore_missing" : true,
          "ignore_failure" : true
        }
      }
    ]
  }
}

The pipeline actually works when i simulate, but when i send a large bulk request to index data, it fails with following error:

{
    "processor_results":[
        {
            "processor_type":"enrich",
            "status":"error_ignored",
            "ignored_error":{
                "error":{
                    "root_cause":[
                        {
                            "type":"es_rejected_execution_exception",
                            "reason":"Could not perform enrichment, enrich coordination queue at capacity [1024/1024]"
                        }
                    ],
                    "type":"es_rejected_execution_exception",
                    "reason":"Could not perform enrichment, enrich coordination queue at capacity [1024/1024]"
                }
            }
        }
    ]
}

Here are some statistics of the node (we have only 1) of the last 7 days:

Here are some statistics one of the behind index (writing index) of one DataStream:

I couldn't get the indexing rate of each data stream, if someone can tell me how to make the query, I will be happy to provide that information.

What approach could be taken to this situation in order to solve this problem?

Regards.

Keith_Massey · March 30, 2023, 7:48pm

You might benefit from the enrich cache added in 7.16.0 -- Add enrich node cache by martijnvg · Pull Request #76800 · elastic/elasticsearch · GitHub. Are you able to upgrade? I see some of the settings documented under Enrich settings at Edit Elasticsearch user settings | Elasticsearch Service Documentation | Elastic (I'm not sure why they are there and not in the Elasticsearch documentation, but they are Elasticsearch settings).

warkolm · March 30, 2023, 10:20pm

I want to second Keith's comments. That version is EOL and no longer supported, you should be looking to upgrade as a matter of urgency.

FKarraz · April 3, 2023, 12:24pm

Thanks @Keith_Massey & @warkolm for the reply. Do you have some guide or recommended approach to upgrade the elasticsearch (that is running on a ducker container) from 7.12.1 to 7.17.9? It's necessary to take aware for something or with just changing the tag version of docker image will work? Thanks

system · May 1, 2023, 12:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Enrich processor missing some documents Elasticsearch ingest-pipeline	5	1311	February 27, 2022
Enrich processor high cpu load Elasticsearch ingest-pipeline	8	1141	October 21, 2021
Enrich processor is sometimes not enriching Elasticsearch ingest-pipeline	2	393	January 13, 2023
Enrich processor error: mapper_parsing_exception Elasticsearch ingest-pipeline	1	275	September 1, 2022
Enrich processor: enrich multiple documents into one array Elasticsearch	2	1020	May 1, 2020

Enrich Processor missing documents

Related topics