Transforms updates fields from data that exists from before even a filter condition was not met

I have an index (client_index) that has three fields: @timestamp, user, ip. I have a transform like this:

{
  "id": "my_transform",
  "source": {
    "index": [
      "client_index"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "client_index_transformed"
  },
  "sync": {
    "time": {
      "field": "@timestamp",
      "delay": "60s"
    }
  },
  "pivot": {
    "group_by": {
      "user": {
        "terms": {
          "field": "user"
        }
      }
    },
    "aggregations": {
      "@timestamp.max": {
        "max": {
          "field": "@timestamp"
        }
      },
      "srcip.filter": {
        "filter": {
          "range": {
            "srcip": {
              "gt": "192.168.140.1",
              "lt": "192.168.143.254"
            }
          }
        }
      }
    }
  },
  "description": "hi",
  "settings": {},
  "version": "7.9.2",
  "create_time": 1604933401081
}

And I detect a problem...
If a user has at some point (let's say one month ago) has IP 192.168.140.2, the transform will put it into the client_index_transformed index.. which is good...

The problem is that today the user has IP 192.168.240.3 shouldn't update the client_index_transformed index with the timestamp.. but it does..

So, If I query the client_index_transformed index for that user, the max @timestamp will be today and not one month ago... why is that and how to avoid it?

Thanks!

Hi,
In the above transform snippet, you specified 2 aggs ("@timestamp.max", "srcip.filter") but they are not related to each other, i.e. the latter ("srcip.filter") does not restrict the set of documents to perform the former ("@timestamp.max"). In fact, these 2 aggs are performed independently, that's why you see the updated row in destination index for the user in interest.

You can try nesting the "@timestamp.max" aggregation inside "srcip.filter" this way:

    "aggregations": {
      "srcip.filter": {
        "filter": {
          "range": {
            "srcip": {
              "gt": "192.168.140.1",
              "lt": "192.168.143.254"
            }
          }
        },
        "aggs": {
          "@timestamp.max": {
            "max": {
              "field": "@timestamp"
            }
          }
        }
      }
    }  

and see if this produces the result you expect.

2 Likes

It worked. Thanks!
So, basically, what I had was an "OR" instead of and "AND", right?
I think the only way to do it is editing the json config in kibana 7.9.2, right?

I'm glad it worked!

So, basically, what I had was an "OR" instead of and "AND", right?

You can think of aggs specified in a transform as separate (independent) fields you'll get in the destination index. In this sense it definitely is more like "OR" than "AND".

I think the only way to do it is editing the json config in kibana 7.9.2, right?

I think so. There is a "Edit JSON config" toggle button next to the Aggregations section

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.