How can I use previously calculated Data Table columns to calculate a new column?

Hello!

I am trying to create a Data Table visualization that shows the total time between the minimum and maximum dates, grouping by another column. Here's an example of what I'm trying to do, where I have manually edited the values of each "Duration" cell.

The data shown here is generated with this test CSV. The Payload column isn't used, but is there to demonstrate that a request can appear many times. I am only interested in the minimum and maximum dates for each request.

Direction,ReferenceId,Time,Payload
Send,A,06Jul2018 12:41:40 PM,A Request
Send,A,06Jul2018 12:41:41 PM,A Intermediate
Send,B,06Jul2018 12:41:42 PM,B Request
Send,B,06Jul2018 12:41:44 PM,B Intermediate
Send,B,06Jul2018 12:41:45 PM,B Intermediate 2
Receive,B,06Jul2018 12:41:47 PM,B Response
Receive,A,06Jul2018 12:41:50 PM,A Response

Is there any way I can dynamically populate this column? Ideally I would like to sort by this value to see which requests have the longest total duration (in this case, request A).

So far I have tried writing a small script in JSON Input to be the value of this column (total milliseconds), which works in theory but not in practice. Here's the script, please pardon my poor programming skills:

long minDate = 0;
long maxDate = 0;

for (int i = 0; i < doc['time'].length; ++i) {
  if (minDate == 0 || doc['time'][i].getMillis() < minDate) minDate = doc['time'][i].getMillis();
  if (maxDate == 0 || doc['time'][i].getMillis() > maxDate) maxDate = doc['time'][i].getMillis();
}

return maxDate - minDate;

This search query returns the correct results, but I can't figure out how to get it into a visualization:

GET /duration_test/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "1": {
      "terms": {
        "field": "referenceId.keyword",
        "size": 10
      },
      "aggs": {
        "min_date": {
          "min": {
            "field": "time"
          }
        },
        "max_date": {
          "max": {
            "field": "time"
          }
        },
        "duration": {
          "bucket_script": {
            "buckets_path": {
              "minDate": "min_date",
              "maxDate": "max_date"
            },
            "script": "params.maxDate - params.minDate"
          }
        }
      }
    }
  }
}

Is what I'm trying to do not possible, or do I need to precalculate this field before I index it in Elasticsearch?

Hey @Kredit,

First off, thank you for your detailed write-up! Having the queries, scripts, and the expected outcome is very helpful.

That being said, unfortunately, what you are trying to do isn't really possible with your current dataset, but I have a couple of ideas:

  1. If you are using Logstash (or if you can use it), then you might give the Elapsed Filter Plugin a try when indexing your documents.
  2. If you don't need "near real time" results, you might be able to write a small program which aggregates these documents on a schedule and puts the results into a different index that you can use for your visualizations.
1 Like

Thanks! Currently I'm just querying gigabytes of flat files indexed in Elasticsearch, which is working extremely well. I will definitely see about getting Logstash up and running for the future, rather than periodically running imports.

I never thought about adding a new index with that precalculated data - that would be the simplest solution for now! Thank you for that solution :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.