Elasticsearch Transforms

karanp · March 1, 2021, 9:28pm

Hello, So recently I started using es pivot transform. At every stage I am getting some incomplete data and using scripted metric I am completing it and sending it to the new index. Currently what this does is that every time a new document with same group by id comes in it updates the data in the new index. My question is that is there a way that instead of updating the document in the new index can I create a new document everytime?
For eg:
"""
doc1 :
"state" : "create",
"id" : "01",
"time" : "xyz" ,
"started" : True

  doc2 : 
           "state" : "update",
           "id" : "01",
            "time" : "abc",
             "data" : "added new data"

So, currently in the new index there will only be one document like this:
doc :
"state" : "update",
"id" : "01",
"time" : "abc" ,
"started" : True,
"data" : "added new data"

As you can see the data gets updated after the second docx comes in. what I would like to see in the new index is this:
doc1 :
"state" : "create",
"id" : "01",
"time" : "xyz" ,
"started" : True

     doc2 : 
             "state" : "update",
           "id" : "01",
            "time" : "abc" ,
            "started" : True,
             "data" : "added new data"

Would like to know if there is a way to do this with transforms.

przemekwitek · March 2, 2021, 6:29am

Hi,
It is by design that there is exactly one document corresponding to the given group-by id in the destination index. So if you use id field in group-by section on transform pivot config, the transform will do just this: take all the documents with the given id and summarize it in one document in the destination index.
If you'd like to have 1:1 relation between documents in source and destination, you should be using an id that is unique among documents, something like "event id" or maybe even "timestamp".

If you do not need continuous updates functionality, you could also try using reindex + scripted field. There is another ticket in which this is discussed:

przemekwitek · March 2, 2021, 8:44am

[Update]
I'm not sure about your particular use-case, but I could suggest one more option you might be interested in: ingest pipeline + script processor.
The script defined in the pipeline would be executed on each document without any grouping (so it won't override existing docs).

See documentation: Script processor | Elasticsearch Reference [7.11] | Elastic

If you are still unsure which of these options (reindex+script, ingest+script, transform) is suitable for you, please share more about your use-case.

karanp · March 2, 2021, 9:00pm

@przemekwitek Thank you for the reply. Let me elaborate a bit more on my use case. So, for my case a single event can have around 5 or more states(Unsure about the total number of events but it will be very high). Now, in the first state lets say it comes with 10 fields. After this state it will always come with fields which are either added or updated. So, the 2nd, 3rd and 4th states lets say comes with 4 fields and 5th with 6 field. My task is that every time a state comes in I have to create a docx with all the fields that have come in till that point and update the fields if any update has come in. So after every state there will be an extra docx in the new index.

system · March 30, 2021, 9:00pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Forming a new document with all the fields everytime a new event comes in Elasticsearch transforms	8	621	April 7, 2021
Is is possible to partially update a dest doc with transforms? Elasticsearch elastic-stack-machine-learning	5	1537	December 26, 2019
Can a transform recalculate if old documents update their values? Elasticsearch transforms	4	1334	June 11, 2021
Transforms weird situation Elasticsearch	6	371	July 14, 2020
Mapping - transform: only for creating new and not for updating? Elasticsearch	10	3245	July 5, 2017

Elasticsearch Transforms

Related topics