Enrich documents by copying fields from another index

Hi,

I got 2 indices.
say index1 and index2.

When I index a new document in index2, I want some fields to be copied from the document in index1 having the same _id.

I previously realized this using logstash + the elasticsearch filter.

But this time, i'm not using logstash. I index documents in index2 using the bulk api.

What is the best way to achieve that ? Ingestion plugin ?

Thanks.

What is the best way to achieve that ?

I think that what you did in the past is a good way to solve this problem. I mean:

I previously realized this using logstash + the elasticsearch filter.

That's probably what I'd do.

Thanks @dadoonet but I want to get rid of logstash exactly as you describe here :

I would like to include in the ingest node pipeline some plugin that is able to enrich datas from existing documents in the cluster.

Yeah. I'm not doing that talk anymore. That ended up being a bad idea after all to use ingest to do lookups in a 3rd party system. I was more using that for demo purpose, to introduce what ingest is and how you can easily build your own plugin but the use case was not the best one. :frowning:

You can see what is my recommendation today (using Logstash):

If you really want to go that way (which I do not recommend), you can look at this PR which was a WIP but was never merged because of the reasons I exposed before.

@dadoonet Thanks for the share.

I will also reconsider using logstash.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Guys,

Actually, the feature you're asking for is coming in 7.5 via the new enrich processor, which kind of provides index-time JOIN capability.

The main idea is to set up an enrich policy that will source data from your related indexes into a new "enrich index" and then you can leverage that "enrich index" in your ingest pipeline using an enrich processor in order to enrich your documents with related fields.

So, without going in too many details, here is how it works in practice:

  1. You have an index A with fields (a, b, c, d) that you'd like to use for enriching your incoming documents
  2. You define an enrich policy based on that index A and the "join" field a
  3. You define an ingest pipeline with an enrich processor that will try to match field z of the incoming document against field A.a of the enrich index
  4. If a match is found, your incoming document will get fields b, c and d from the index A. Note that it will also get the match field a that you can remove using a remove processor if needed.

That should pretty much work the way you expect. You can find a complete example here. At the beginning, it will work for exact matches (i.e. term query) and geo matches (i.e. geo_shape query), but they will probably add new kind of matches (like range matches) in the near future.

1 Like

Elastic Stack 7.5 has been released today and enrich processors are now available.

Will it be possible to use an analysed match rather then a term query to join the databases at all? I have a use case where I want to enrich a document with the best match (based on the '_score' value), as opposed to using a term query.

I don't think it's planned at the moment. The matching currently only occurs on yes/no filters and not based on scoring. But I see you've created a feature request so we'll see where that leads us :wink: