Use Ingestion Pipeline to split between two indexes

I have documents containing the field "Status", this can have three values "Draft", "In Progress", or "Approved". I am trying to pass this document through a ingest pipeline, and if the status is equal to "Approved" then it should add it in the B index, whereas by default it should index in A index irrespective of status value.
for ex -
1.

{
"id":"123",
"status":"Draft"
}
{
"id":"1234",
"status":"InProgress"
}
{
"id":"12345",
"status":"Approved"
}

1,2,3 document should go to A Index and only document 3 should go to B Index
Is it possible to do it via Ingest Pipeline?

You should be able to do this with Pipeline processor | Elasticsearch Guide [8.2] | Elastic, where you push either one to a separate pipeline.

1 Like

There is actually and example of this here as well

2 Likes

Ahh that's what I was looking for as well!

You can override the index name through an index pipeline, but I do not believe you can make a document go to more than a single index.

1 Like

@stephenb @Christian_Dahlqvist @warkolm
Yes, I am able to divert the document to either of the indexes based on condition.
But as @Christian_Dahlqvist mentioned, one of the conditions requires pushing the document to both indexes. I tried but looks like it's not possible by this route.
Is there any other way to sync to two indexes same time? Does Alias index of these two, will push in both indexes ??

An alias will not write to more than one index at time, so it would not help you.

You will need to use logstash to do that, or maybe you can use a Transform to do that.

Why are you looking to duplicate some of the data?

So, our document has three stages, Draft, in progress, and then Approved (Ready to be viewed by end consumer). we are thinking that if we put all approved data in a different index, as it will have fewer documents, response time will be faster for our end consumer. As approved instances will be 10 percent of all instances (draft, in progress combined)).

Whereas documents having a draft, in progress state, and Approved will be searched only internally, so we can have some lag there in response time by putting them in one index.

One Suggestion is to query both indexes when a search is internal, it will do the job, but it will also consider the document as a duplicate between the indexes as only one field is different.

Couple Thoughts

How many Documents Total are we talking about .. Term Queries are very fast... If Approved is a value in a keyword field which then can be used in a filter contesxt in a query that should be very fast.

I think You could also do something like a Latest Transform with a filter on approved to create an index of just the last approved state of a document.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.