Yeah. I'm not doing that talk anymore. That ended up being a bad idea after all to use ingest to do lookups in a 3rd party system. I was more using that for demo purpose, to introduce what ingest is and how you can easily build your own plugin but the use case was not the best one.
You can see what is my recommendation today (using Logstash):
If you really want to go that way (which I do not recommend), you can look at this PR which was a WIP but was never merged because of the reasons I exposed before.
Actually, the feature you're asking for is coming in 7.5 via the new enrich processor, which kind of provides index-time JOIN capability.
The main idea is to set up an enrich policy that will source data from your related indexes into a new "enrich index" and then you can leverage that "enrich index" in your ingest pipeline using an enrich processor in order to enrich your documents with related fields.
So, without going in too many details, here is how it works in practice:
You have an index A with fields (a, b, c, d) that you'd like to use for enriching your incoming documents
You define an enrich policy based on that index A and the "join" field a
You define an ingest pipeline with an enrich processor that will try to match field z of the incoming document against field A.a of the enrich index
If a match is found, your incoming document will get fields b, c and d from the index A. Note that it will also get the match field a that you can remove using a remove processor if needed.
That should pretty much work the way you expect. You can find a complete example here. At the beginning, it will work for exact matches (i.e. term query) and geo matches (i.e. geo_shape query), but they will probably add new kind of matches (like range matches) in the near future.
Will it be possible to use an analysed match rather then a term query to join the databases at all? I have a use case where I want to enrich a document with the best match (based on the '_score' value), as opposed to using a term query.
I don't think it's planned at the moment. The matching currently only occurs on yes/no filters and not based on scoring. But I see you've created a feature request so we'll see where that leads us
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.