Elasticsearch Saved Search with Group By

index_name: my_data-2020-12-01
ticket_number: T123 
ticket_status: OPEN 
ticket_updated_time: 2020-12-01 12:22:12   

index_name: my_data-2020-12-01 
ticket_number: T124 
ticket_status: OPEN 
ticket_updated_time: 2020-12-01 12:32:11   

index_name: my_data-2020-12-02 
ticket_number: T123 
ticket_status: INPROGRESS 
ticket_updated_time: 2020-12-02 12:33:12   

index_name: my_data-2020-12-02 
ticket_number: T125 
ticket_status: OPEN 
ticket_updated_time: 2020-12-02 14:11:45

I want to create a saved search with group by ticket_number field get unique doc with latest ticket status (ticket_status). Is it possible?

You can use top hits aggregation

GET my-logs/_search
{
  "size": 0, 
  "aggs": {
    "tiketId": {
      "terms": {
        "field": "tiketId.keyword",
        "size": 10
      },
      "aggs": {
        "NAME": {
          "top_hits": {
           "sort": [
              {
                "updatedAt": {
                  "order": "desc"
                }
              }
            ],
            "_source": {
              "includes": [ "updatedAt", "tiketId", "status" ]
            },
            "size": 1
          }
        }
      }
    }
  }
}

How to apply this in saved search?

This is an aggregation,
If you want to apply same logic in search, you can use collapse

GET my-logs/_search
{
  "query": {
    "match_all": {}
  },
  "collapse": {
    "field": "tiketId.keyword"
  },
  "sort": [
    {
      "updatedAt": {
        "order": "desc"
      }
    }
  ]
}

How can I create visualizations using this?

Check my first answer,
To create such visualisation, use the top hits as a metric

Yes, I tried. Tried to create a Pie chart but seems top hits allow only numeric fields.

Any workaround?

That will not be possible with Pie :slight_smile:
I suggest to have a separated index where you maintain the latest status & update date.
How your data is ingested into elasticsearch ?

Through a Kafka topic

Means you are using any logstash pipeline to sync data into elasticsearch ?
If yes then you can send data into 2 elasticsearch indexes

  • 1 as uou are doing to keep track of each change on you ticket
  • Seconday index with document_id => %{Ticket_id}, so you will override the exiting document with the latest update

Yes, that's what exactly what we are doing but the issue is, since we are creating indices daily, the document is not getting updated in the second index after the created date.

Either it should update the existing document or create new document in new date index but nothing is happening.

Oops i see the point :slight_smile:
I would suggest to use tranform, but it does not support yet the top hit aggs

How big is your dataset to use a daily index for tickets ?

Size of a daily index is between 1gb and 1.5 gb.

Unfortunately we don't have much control to change index creation policy (I meant daily index creation) as we are working on client environment.

Also, tried to create indices based on ticket created date but we don't get created time in every payload.

Thank you very much for your support, highly appreciated!

Hi @Kapila
Here is an example on how to use a scripted aggs to get the top hits inside a tranform, this will definelty help you with your use case

Here is an example

PUT _transform/tickets_tranform
{
  "source": {
    "index": [
      "tickets"
    ]
  },
  "pivot": {
    "group_by": {
      "ticket_id.keyword": {
        "terms": {
          "field": "ticket_id.keyword"
        }
      }
    },
    "aggregations": {
      "latest_doc": {
        "scripted_metric": {
          "init_script": "state.timestamp_latest = 0L; state.last_doc = ''",
          "map_script": """ 
        def current_date = doc['updated_at'].getValue().toInstant().toEpochMilli();
        if (current_date > state.timestamp_latest)
        {state.timestamp_latest = current_date;
        state.last_doc = new HashMap(params['_source']);}
      """,
          "combine_script": "return state",
          "reduce_script": """ 
        def last_doc = '';
        def timestamp_latest = 0L;
        for (s in states) {if (s.timestamp_latest > (timestamp_latest))
        {timestamp_latest = s.timestamp_latest; last_doc = s.last_doc;}}
        return last_doc
      """
        }
      }
    }
  },
  "description": "Get the latest update per ticket_id",
  "dest": {
    "index": "tickets_tranform"
  },
  "frequency": "5m",
  "sync": {
    "time": {
      "field": "updated_at",
      "delay": "60s"
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.