Is is possible to partially update a dest doc with transforms?

Hello all,

I'm trying to figure out if it's possible to update only some fields of a document with dataframe transforms similar to the way the index document update API works (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html).
I have used an existing transform to create the destination documents and I want to add some fields to the same documents. Is there an option for that? From my attempts it seems that new transforms overwrite the old document.

Thanks in advance

Hi,

yes, that's correct. There is no way to update a document, transform always overwrites.

Can you explain your usecase in more detail, best with an example: what's the input, what's the expected output? Maybe there is another way to achieve your end goal. It sounds to me like you try to join documents from different indexes?

Hi Hendrik,

thank you for your reply. Yes, you are correct the end result is kinda like trying to join documents from different indices.

So the use case is trying to calculate group aggregates of some customers' past purchases and future purchases based on predictions of a ML model. What I would like is to have the aggregates of their past month purchases, aggregated again in customer groups (based on the amount they spent and the days they have been shopping for) and also calculate the aggregate of this customers' group model predicted purchases (which are stored in a different index).

What would be the best way for this?

Thanks again

Hi,

a transform can have multiple sources and they do not need a common schema as long as the field name of the group_by is compatible. So if you have for example a field "customer_id" in both indices "past_purchases" and "predicted_purchases" you can create a transform that joins over both indices:

PUT _transform/prediction_accuracy
{
  "source": {
    "index": ["past_purchases", "predicted_purchases"]
  },
  "pivot": {
    "group_by": {
      "id": {"terms": {
        "field": "customer_id"
      }}
    },
  "aggregations": { 
...}
}

Note that aggregations are robust w.r.t. to missing fields, so if you have a field in your "past_purchases" index, but not in "predicted_purchases" you can still aggregate on it to e.g. calculate an average (with the correct count under the hood).

I hope this helps!

Thank you very much Hendrik, this might actually be what I was looking for!

I didn't know that transforms can have multiple sources with a common grouping field. It makes life so much easier now :slight_smile: