Transform with two input indices with different unique ids

Phil_McLachlan · October 8, 2024, 6:01pm

Hi, we have a transform with two input indices with different unique ids. One input index has a unique id of product_pk, and another has product_pk combined with catalog_type. There are two possible catalog_types: catalog and teyos.

If I create a tranform pivoting on just product_pk, then it seems to be missing documents from the index that have both types catalog and teyos. However, most of the documents just have a catalog type and most documents end up in the final index. I need all the documents to be in the final index that have any catalog type.

Consequently, I tried pivoting on the product_pk and the catalog_type. This only gave me information from the second input index. I'm trying to combine data from both indices.

Does anyone know what I have to pivot on and what can be done to get all documents with all info? We want a final index with product_pks that are in both input indices only. One solution I can think of is to generate the first input index with both product_pk and catalog_type, thereby duplicating the data. This input index is already 1 Gig, and I don't want to double it's size. Also, this approach is not scalable, if we decide to add future catalog_types. Any help would be appreciated.

Phil_McLachlan · October 11, 2024, 7:09pm

I tried a proof of concept for another idea. Unfortunately, it didn't work either. I am still missing the same documents. The idea is this. I moved the catalog_types into an array in the second input index, so it can be keyed in the product_pk only. Then in the transform, I wrote a combine_script that pulled out the catalog_types from the document and put each element into a separate document copying the rest of the document with it. This resulted in the same output as before, with missing documents. I am now puzzled.

Patrick_Whelan · October 15, 2024, 2:13pm

What do the data structures look like for the two different indices?

It might be possible to attach a custom analyzer for the second index to split up the unique id from the category? Create a custom analyzer | Elasticsearch Guide [8.15] | Elastic
Then both indices would have the same unique id field to pivot on. Another option would be a split processor if the fields are combined in some way: Split processor | Elasticsearch Guide [8.15] | Elastic

Alternatively, depending on the data structure, a query parameter attached in the Transform configuration may help: Create transform API | Elasticsearch Guide [8.15] | Elastic

Phil_McLachlan · October 16, 2024, 9:11pm

Thanks Patrick for the suggestions. I don't think the solutions you provided would work for our situation. We want all the data that is in both indices with either catalog type, which is a separate field to the product pk. Although it is unimplemented yet, we have decided to make separate output indices for both catalog types. This means two transforms. One would be for each catalog type, and we would prepare separate input input indices for the second input index. In our first pass, we will do only one type of catalog type, and leave the other for another time.

Topic		Replies	Views
Different behaviour from 7.6.2 and 7.8.1 Elasticsearch transforms	2	379	September 14, 2020
Elasticsearch Transforms Elasticsearch transforms	4	630	March 30, 2021
Is it possible to combine two docs in different indices? Elasticsearch transforms	2	942	July 13, 2021
Merging transform indices to create new summary index Elasticsearch	6	933	September 15, 2020
Combining documents in two indices correlated by their ID, but the fieldname is different Kibana transforms	8	2470	February 18, 2022

Transform with two input indices with different unique ids

Related topics