Usually data are not enriched

Marcin_Frankiewicz · October 12, 2022, 12:21pm

Hi,

Does exists some situactions, when data are not enriched ? (i'm excluding scenario when there are no matching data)
Example : Can be skipped for perfomance reasons?

It is possible that enrich processor in pipeline can be skipped?
By adding tags can I exclude scenatio with skipping enrich processor ?

Flow :
Logs -> Logstash -> Elasticsearch pipeline (where enrich processor exists) -> Elasticsearch Index

Indexing speed(enriched index) : approx : 80k in 45minutes

Elasticsearch 7.17

leandrojmp · October 12, 2022, 1:58pm

You can configure your processor to use a conditional and it will only run if the conditional is true.

Check this part of the documentation.

Also, since you are using Logstash, and depending on what you are enrich, it would be much easier and fast to enrich the data in Logstash.

Marcin_Frankiewicz · October 13, 2022, 7:21am

Thank you for reply.

OK, i understood that using if statements can skip enrich processor in ingest pipeline

What I want to enrich?
Example :

Index : "products" contains : Name of product, type of product , ID, and color
Index : "pricing" contains : Price, tax, ID

I want to enrich "products" with Price, using enrich processor in elasticsearch pipeline
Additionally i understood that i can do it in logstash

When enrich processor in elastic pipeline can be skipped ? (excluding : if statements, there are no matching data, for enrichment i want to use elastic pipeline with enrich processor)

leandrojmp · October 13, 2022, 12:52pm

The ingest pipeline will run the processors in the order they are configured, if you have an enrich processor in the ingest pipeline for your products index, it will be executed for every event.

If you want to skip a processor you need a conditional based on some kind of data, there is no other way to skip the execution of a processor.

Also, the enrich processor is recommended for static data, if you will need to constantly update your source index, you will need to manually call the excute api to update the data of the enrich index every time the source index is updated.

Marcin_Frankiewicz · October 17, 2022, 5:32am

OK,

so.. What will happen when i'm not update data of the enrich index after adding some data(in source index)?
Enrich processor will run but without results?
By update data of the enrich index, you mean call /_enrich/policy/name_of_policy/_execute ?

so... if I exclude if statements, and enrich processor will be defined in pipeline, then it will ALWAYS execute? it cannot be skipped?

or..

When data can be not enriched? (excluding not updating enrich index after adding some data)

leandrojmp · October 17, 2022, 3:04pm

If you add new data to the source index of your enrich policy, this new data will only be available after you run a new _execute on the policy, this will create a new enrich index.

If you do not have an if conditional in your enrich processer it will always be executed.

As I said in the previous answer the processors are executed in the order they are configured in the ingest pipeline and they are always executed, if you want to skip some processors you need to use a conditional on that processor to check if it can be executed or not, in this case the conditional will be always executed.

If there is no match in the enrich processor, the data will not be enriched, but the processor will always be executed.

Marcin_Frankiewicz · November 2, 2022, 2:55pm

How many replicas should have .enrich-INDEXNAME indicle ?

Should be a sum of : Hot + Warm + Ingest ? or maybe : only ingest nodes?
What roles should have HOT/WARM/Ingest nodes? (3x HOT(drt), 3x WARM(drt), 6x Ingest(di) )

I'm trying to deal with situation that usually data are not enriched, but not because there is no matching data...

leandrojmp · November 2, 2022, 3:02pm

You should leave it using the default configuration, If i'm not wrong it will auto-expand to every data node, or at elast every node with a data_content role.

What situation? Provide more context.

Marcin_Frankiewicz · November 8, 2022, 12:13pm

There are two independent sources gathered by Logstash (two indexes: index1, index2)
Every source, has own Logstash pipeline.

Index1 looks simple.
Logstash crawls CSV files, there are some filter rules like, csv, translate,dissect and fingerprint
fingerprint is calculated from two fields
concatenate_sources => true

Logstash in output, has interesting options
doc_as_upsert => true
document_id => "%{fingerprint}"

Also here is defined pipeline where document will be sent

Ingest pipeline has few processor like date, grok, date
_enrich/policy is executed 3x per day

          "match_field" : "index1.number",
          "enrich_fields" : [
            "geo.region_iso_code_old",
            "index1.commune.id",
            "index1.company.id",
            "index1.company.name",
            "index1.service"
]

CSV file is created once per day (~00:00AM)

Above image shows that ~98% records are updated

Index2 are more complex.
Logstash crawls CSV files, there are some filter rules like, csv, dissect, mutate, and it is sent to ingest pipeline.

That pipeline has many sub-pipelines, but in one of them contains enrich processor, which depends on index1

        "enrich" : {
          "tag" : "index1 b",
          "ignore_missing" : true,
          "policy_name" : "index1",
          "field" : "tmp.enrich_pl.value",
          "target_field" : "tmp.enrich_pl.b",
          "max_matches" : "1"
        }

Final effect

leandrojmp · November 8, 2022, 12:52pm

I'm sorry, but i didn't get what is the issue with just that information.

You didn't share what your data looks like, you didn't share what the data of your enrich index looks like, for example, in your last screenshot you shared a field named b.company.name, where this comes from? It is not possible to know what may be the issue from what you shared.

You need to share some sample data of both your index and enrich data so it is possible to someone to try to replicate your issue.

Can you share an example of a document that should've be enriched, but wasn't, and also the data from the enrich index?

Marcin_Frankiewicz · December 2, 2022, 10:04am

I have found resolution for my problem.

That was an error in ingest pipeline.

In Kibana i saw phone number after their cleanup(done by other pipeline)(example : 00672794579), but while enrich processor was called, this phone number contained some unnecessary data(example : C43500672794579).

So, phone number could not be matched with enrich index

I have added gsub processor before enrich processor, and now it is working fine

Thanks for your help!

system · December 30, 2022, 10:05am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Enrich Processor - If the fields do not match Elasticsearch	2	493	October 15, 2020
Enrich processor is sometimes not enriching Elasticsearch ingest-pipeline	2	443	January 13, 2023
Logstash execute enrich processor Logstash ingest-pipeline	1	284	September 7, 2021
Dec 19th, 2019 [EN][Elasticsearch] Simplifying Ingest Pipelines with the new Enrich Processor Advent Calendar	1	1837	November 4, 2022
Ingest Node processor elasticsearch Elasticsearch	3	369	October 22, 2019

Usually data are not enriched

Related topics