Trying to run the 'Get duration by using bucket script' but not working

Hello,

I'm attempting to calculate Session Duration by using the provided example here:

I even followed this users successful implementation: Computing session durations from timestamps scattered over two documents - #3 by seanowl

But I seem to be running into an issue. I can't get the field to populate for that actual session duration. Here's what I have in the edit json config:

  "group_by": {
    "IP": {
      "terms": {
        "field": "client_ip"
      }
    }
  },
  "aggregations": {
    "@timestamp_max": {
      "max": {
        "field": "@timestamp"
      }
    },
    "@timestamp_min": {
      "min": {
        "field": "@timestamp"
      }
    },
    "calculated_duration": {
      "bucket_script": {
        "buckets_path": {
          "min": "@timestamp_min",
          "max": "@timestamp_max"
        },
        "script": "(params.max - params.min)/1000"
      }
    }
  }
},
"description": "calculated duration",
  "dest": {
    "index": "test_duration_index"
  },
  "frequency": "1m",
  "sync": {
  "time": {
    "field": "@timestamp",
    "delay": "10s"
  }
}
}

The result is I get the max and min but I don't get a field created for the field calculated_duration:

missing_column - Copy (2)

Is there something I'm missing here?

Thanks

Hi,
It looks like this might be an issue with the UI rather than the actual transform preview.
if you run the preview manually, the _preview api returns the correct fields.
e.g. in kibana's Dev Tools:

POST _transform/_preview
{
  "source": {
    "index": [
      "farequote-*"
    ]
  },
  "pivot": {
    "group_by": {
      "airline": {
        "terms": {
          "field": "airline"
        }
      }
    },
    "aggregations": {
      "@timestamp_max": {
        "max": {
          "field": "@timestamp"
        }
      },
      "@timestamp_min": {
        "min": {
          "field": "@timestamp"
        }
      },
      "calculated_duration": {
        "bucket_script": {
          "buckets_path": {
            "min": "@timestamp_min",
            "max": "@timestamp_max"
          },
          "script": "(params.max - params.min)/1000"
        }
      }
    }
  }
}

Gives the response:

{
  "preview" : [
    {
      "@timestamp_min" : "2019-02-07T00:00:00.000Z",
      "@timestamp_max" : "2019-02-11T23:59:02.000Z",
      "calculated_duration" : 431942.0,
      "airline" : "AAL"
    },
    {
      "@timestamp_min" : "2019-02-07T00:00:00.000Z",
      "@timestamp_max" : "2019-02-11T23:59:45.000Z",
      "calculated_duration" : 431985.0,
      "airline" : "ACA"
    },
    ......

I've created a new issue to cover this.

This UI issue should not affect the actual creation of the transform.

Cheers,
James

Hello,

This worked, thank you very much. I am testing this and I realize I am close to my goal but this is not exactly what I'm looking for.

I'm trying to calculate session duration from logs that do not have a listen duration. This is getting close but of course if I just do IP as a group by my "session duration" will be really high.

Do you have any way to try and calculate session duration from log events,. similar to google analytics. I know they use a timeout interval to track a session. This could be more in-depth as I'm thinking about it, would require a way to only track incoming event data to add to a previous session.

Hi,
I'm not aware of a capability in elasticsearch to automatically calculate session duration from a log file.
If the data also contains a session ID or something similar, I would group by that rather than IP.

Cheers,
James

I've watched this video for entity centric indexing by Mark Harwood and see benefit in this. I'm trying to run the attached example code provided: Entity-Centric Indexing - Mark Harwood | Elastic Videos but running into some issues with the python script. I'm being thrown this error:

ElasticsearchWarning: [types removal] Specifying types in bulk requests is deprecated.
  warnings.warn(message, category=ElasticsearchWarning)
('Unexpected error:', <class 'elasticsearch.helpers.errors.BulkIndexError'>)
5001
('Unexpected error:', <class 'elasticsearch.helpers.errors.BulkIndexError'>)
10001
('Unexpected error:', <class 'elasticsearch.helpers.errors.BulkIndexError'>)

The script then runs and errors with the following:

raise BulkIndexError("%i document(s) failed to index." % len(errors), errors)
elasticsearch.helpers.errors.BulkIndexError: ('500 document(s) failed to index.', [{u'index': {u'status': 400, u'_type': u'review', u'_index': u'anonreviews', u'error': {u'reason': u'mapper [reviewerId] cannot be changed from type [keyword] to [text]'

This section appears to be section in question, are you familiar with this setup or could provide any insight to get this example ported to 7.11?

`if len(actions) >= actionsPerBulk:
try:
    helpers.bulk(es, actions)
except:
    print ("Unexpected error:", sys.exc_info()[0])
del actions[0:len(actions)]
print (numLines)

if len(actions) > 0:
helpers.bulk(es, actions)`

Thanks for your help if you can provide it.

Hello,

I appear to be making some progress but the scripts in the demo seem to have some deprecated code as the last update was 2018. I'm running into the following errors in the script and appears changes need to be made to some of the queries but not sure where. Here are the error messages:

ElasticsearchWarning: [bool][1:94] Deprecated field [mustNot] used, expected [must_not] instead
ElasticsearchWarning: [types removal] Specifying types in search requests is deprecated.

Has anyone successfully updated ths script for 7.11?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.