Trying to run the 'Get duration by using bucket script' but not working

gentle_ghost · April 8, 2021, 11:13pm

Hello,

I'm attempting to calculate Session Duration by using the provided example here:

I even followed this users successful implementation: Computing session durations from timestamps scattered over two documents - #3 by seanowl

But I seem to be running into an issue. I can't get the field to populate for that actual session duration. Here's what I have in the edit json config:

  "group_by": {
    "IP": {
      "terms": {
        "field": "client_ip"
      }
    }
  },
  "aggregations": {
    "@timestamp_max": {
      "max": {
        "field": "@timestamp"
      }
    },
    "@timestamp_min": {
      "min": {
        "field": "@timestamp"
      }
    },
    "calculated_duration": {
      "bucket_script": {
        "buckets_path": {
          "min": "@timestamp_min",
          "max": "@timestamp_max"
        },
        "script": "(params.max - params.min)/1000"
      }
    }
  }
},
"description": "calculated duration",
  "dest": {
    "index": "test_duration_index"
  },
  "frequency": "1m",
  "sync": {
  "time": {
    "field": "@timestamp",
    "delay": "10s"
  }
}
}

The result is I get the max and min but I don't get a field created for the field calculated_duration:

missing_column - Copy (2)

Is there something I'm missing here?

Thanks

James_Gowdy · April 9, 2021, 7:51am

Hi,
It looks like this might be an issue with the UI rather than the actual transform preview.
if you run the preview manually, the _preview api returns the correct fields.
e.g. in kibana's Dev Tools:

POST _transform/_preview
{
  "source": {
    "index": [
      "farequote-*"
    ]
  },
  "pivot": {
    "group_by": {
      "airline": {
        "terms": {
          "field": "airline"
        }
      }
    },
    "aggregations": {
      "@timestamp_max": {
        "max": {
          "field": "@timestamp"
        }
      },
      "@timestamp_min": {
        "min": {
          "field": "@timestamp"
        }
      },
      "calculated_duration": {
        "bucket_script": {
          "buckets_path": {
            "min": "@timestamp_min",
            "max": "@timestamp_max"
          },
          "script": "(params.max - params.min)/1000"
        }
      }
    }
  }
}

Gives the response:

{
  "preview" : [
    {
      "@timestamp_min" : "2019-02-07T00:00:00.000Z",
      "@timestamp_max" : "2019-02-11T23:59:02.000Z",
      "calculated_duration" : 431942.0,
      "airline" : "AAL"
    },
    {
      "@timestamp_min" : "2019-02-07T00:00:00.000Z",
      "@timestamp_max" : "2019-02-11T23:59:45.000Z",
      "calculated_duration" : 431985.0,
      "airline" : "ACA"
    },
    ......

I've created a new issue to cover this.

This UI issue should not affect the actual creation of the transform.

Cheers,
James

gentle_ghost · April 9, 2021, 3:57pm

Hello,

This worked, thank you very much. I am testing this and I realize I am close to my goal but this is not exactly what I'm looking for.

I'm trying to calculate session duration from logs that do not have a listen duration. This is getting close but of course if I just do IP as a group by my "session duration" will be really high.

Do you have any way to try and calculate session duration from log events,. similar to google analytics. I know they use a timeout interval to track a session. This could be more in-depth as I'm thinking about it, would require a way to only track incoming event data to add to a previous session.

James_Gowdy · April 12, 2021, 7:55am

Hi,
I'm not aware of a capability in elasticsearch to automatically calculate session duration from a log file.
If the data also contains a session ID or something similar, I would group by that rather than IP.

Cheers,
James

gentle_ghost · April 16, 2021, 4:28pm

I've watched this video for entity centric indexing by Mark Harwood and see benefit in this. I'm trying to run the attached example code provided: Entity-Centric Indexing - Mark Harwood | Elastic Videos but running into some issues with the python script. I'm being thrown this error:

ElasticsearchWarning: [types removal] Specifying types in bulk requests is deprecated.
  warnings.warn(message, category=ElasticsearchWarning)
('Unexpected error:', <class 'elasticsearch.helpers.errors.BulkIndexError'>)
5001
('Unexpected error:', <class 'elasticsearch.helpers.errors.BulkIndexError'>)
10001
('Unexpected error:', <class 'elasticsearch.helpers.errors.BulkIndexError'>)

The script then runs and errors with the following:

raise BulkIndexError("%i document(s) failed to index." % len(errors), errors)
elasticsearch.helpers.errors.BulkIndexError: ('500 document(s) failed to index.', [{u'index': {u'status': 400, u'_type': u'review', u'_index': u'anonreviews', u'error': {u'reason': u'mapper [reviewerId] cannot be changed from type [keyword] to [text]'

This section appears to be section in question, are you familiar with this setup or could provide any insight to get this example ported to 7.11?

`if len(actions) >= actionsPerBulk:
try:
    helpers.bulk(es, actions)
except:
    print ("Unexpected error:", sys.exc_info()[0])
del actions[0:len(actions)]
print (numLines)

if len(actions) > 0:
helpers.bulk(es, actions)`

Thanks for your help if you can provide it.

gentle_ghost · April 16, 2021, 9:52pm

Hello,

I appear to be making some progress but the scripts in the demo seem to have some deprecated code as the last update was 2018. I'm running into the following errors in the script and appears changes need to be made to some of the queries but not sure where. Here are the error messages:

ElasticsearchWarning: [bool][1:94] Deprecated field [mustNot] used, expected [must_not] instead
ElasticsearchWarning: [types removal] Specifying types in search requests is deprecated.

Has anyone successfully updated ths script for 7.11?

system · May 14, 2021, 9:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kibana, Is there a way to create session duration Kibana	4	584	May 18, 2020
Computing session durations from timestamps scattered over two documents Kibana	3	573	June 24, 2020
How to calculate the difference between two web sessions in seconds Elasticsearch painless	3	556	October 19, 2021
Using Top Hits aggregation for calculating time duration in each grouped field Elasticsearch	5	1283	August 29, 2017
Interval between timestamps / duration of session Elasticsearch	1	755	July 6, 2017

Trying to run the 'Get duration by using bucket script' but not working

Related topics