Enrich document with data from same index

rickardo · December 20, 2023, 10:38am

Hi, we got one index that we insert documents where one can follow e.g. a session and what is done, i.e.:

document 1;
sessionId = 17
type=created
country=DE

document 2:
sessionId = 17
type=action

now at insertion of doc 2 we want to enrich it with data from doc 1 (same sessionId), in this case add field country=DE to doc 2.

A session is not long lived (secconds/minutes) and there will be a lot of newly created sessions.

How can this be accomplished? I read a bit about Ingest pipelines but wondering if that is the way to go or if there are other better ways?

RabBit_BR · December 20, 2023, 12:40pm

Hi @rickardo

I believe you've already read about the Enrich pipeline and I think that's one option. Another would be to have an isolated data enrichment service, in which your application requests enrichment before indexing the data.
The second option is how we do it here at our company.

rickardo · December 20, 2023, 2:05pm

Thanks @RabBit_BR ,
I read that using Enrich pipeline will create an enrich index which I felt was a bit unclear of how it is maintained. In this usecase the "enrich" documents only need to be there for a short time and was concerned that it might grow quite large since we will have a lot of those coming in (many millions per month). Can one set time to live setting on the "enrich index"?

leandrojmp · December 20, 2023, 2:44pm

If you need to update the data on the enrich index you need to use the execute API every time, enrich indices are best used for static data or data that is not update frequently.

In your case it seems that you would need to frequently update it.

The best way is to do that enrichment before indexing the data in Elasticsearch, but sometimes this is pretty hard to do or even not possible in a useful way.

Is the session id unique? Could the events like created and action happens too close for each other, like milliseconds?

How are you indexing your data?

rickardo · December 20, 2023, 3:07pm

Thanks @leandrojmp !

ok, thought it was best suited for static data..

The sesssionId is unique over that session which normally would be seconds/minutes (1 - 10 documents). Events would not be miliisecons but more like seconds, it is manual user input.

We havent done any index mapping for now if that is what you mean, just default.
This is what the mapping looks like:

"country": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},sessionId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}

leandrojmp · December 20, 2023, 3:10pm

Yeah, but how are you indexing the data? Are you using Logstash, Filebeat, a Custom script?

Do you have any control over the source of your data? For example, how is it created?

The enrich depends on how you are indexing your data, you would need to store the key value pair with the sessionId and country somewhere, and read it while indexing.

rickardo · December 20, 2023, 3:20pm

Yes, we have control, we are saving
through org.springframework.data.elasticsearch.repository.ElasticsearchRepository.

The problem for us is that we can scale our services and thus we would have to first do a lookup to ES before enriching. If we had just one entry point i guess it would be easy to cache the first entry. Unless we had a shared cache/store mechanism outside ES but the comlicates things.

leandrojmp · December 20, 2023, 3:35pm

Yeah, but this is the right approach in this case.

You would need to store it somewhere else, maybe a cache database like Redis/Memcached and query it every time you receive a new document, before indexing in Elasticsearch.

The enrich processor will not work in this case.

rickardo · December 20, 2023, 3:37pm

Ok, good to know! Thanks for your quick response!

system · January 17, 2024, 3:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Enrich processor Elasticsearch	4	375	November 23, 2020
Updating enrich index for pipeline Elasticsearch	8	748	September 9, 2023
Enrich documents by copying fields from another index Elasticsearch	11	4617	November 4, 2022
How to keep enrich index updated Elasticsearch	3	523	September 14, 2020
Create new document on index and update a second document Elasticsearch	3	760	September 4, 2019

Enrich document with data from same index

Related topics