Conditionally aggregating documents in an index?

propel · March 8, 2019, 9:43pm

I'm trying to conditionally aggregate documents in my index based on the existence of another document in the same set. Imagine the following series:

{"action": "pressed_submit", "user": 1, "timestamp": "2016-01-01 00:00:00", "meta": ...}
{"action": "answered_question", "user": 1, "timestamp": "2016-01-01 00:00:00", "meta": ...}
{"action": "answered_question", "user": 1, "timestamp": "2016-01-01 00:00:00", "meta": ...}
{"action": "filled_textbox", "user": 1, "timestamp": "2016-01-01 00:00:00", "meta": ...}
{"action": "pressed_submit", "user": 1, "timestamp": "2012-01-01 00:00:00", "meta": ...}

And then the following:

{"action": "answered_question", "user": 2, "timestamp": "2016-01-01 00:00:00", "meta": ...}
{"action": "answered_question", "user": 2, "timestamp": "2016-01-01 00:00:00", "meta": ...}
{"action": "filled_textbox", "user": 2, "timestamp": "2016-01-01 00:00:00", "meta": ...}
{"action": "pressed_submit", "user": 1, "timestamp": "2012-01-01 00:00:00", "meta": ...}

How can I find the last event done by each user that has not "pressed_submit"? Meaning, how can I find the last action by any user that hasn't pressed_submit? Note the timestamps.

I've been cracking my head around this for a while. While I've managed to solve the problem by querying ALL events and then filtering in Python code, it's very slow. Is there any way to use ElasticSearch's query engine to get results like that?

Mark_Harwood · March 8, 2019, 10:02pm

Behavioural analysis at scale typically requires an entity centric index - see https://twitter.com/elasticmark/status/1009380268409610240?s=21

propel · March 8, 2019, 10:26pm

I'm sorry, but I didn't understand your answer. I've watched the linked video, but I still don't understand what the centric index would be in this scenario.

Mark_Harwood · March 8, 2019, 10:32pm

Users and their last events?

propel · March 8, 2019, 11:11pm

That's already there. The JSON sample I sent is a single document type, in a single index.

Mark_Harwood · March 8, 2019, 11:17pm

Those documents are events.
An entity centric document would have the id field (think primary key) of the user id.

propel · March 8, 2019, 11:18pm

Is there any tutorial on how to create entity-centric indexes?

Mark_Harwood · March 8, 2019, 11:19pm

The video I shared along with the example code.

propel · March 8, 2019, 11:20pm

Yes, but the example implies having to re-index data. Unfortunately I do not control the application that writes to the ES index: it is proprietary third-party software.

Mark_Harwood · March 8, 2019, 11:25pm

The example advocates keeping your existing event-centric index and building a secondary entity-centric index from it.

propel · March 8, 2019, 11:27pm

But it revolves around the idea of having to recreate the "reviewers" index, or update it, with a cron job or everytime I need a report. So, whenever I need to run my query, I have to make sure that buildEntities.sh was ran recently. Correct?

That is, effectively, slower than processing the data in my programming language of choice.

Mark_Harwood · March 8, 2019, 11:32pm

At 31:30 in the video I talk about incremental updates

system · April 6, 2019, 5:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to aggregate based on latest value? Elasticsearch	17	5892	August 21, 2018
Aggregation on most recent document in a group Elasticsearch	5	900	May 22, 2018
Is it possible to filter documents based on a field in the most recent version, but still get all the documents? Elasticsearch	1	347	September 23, 2019
In time series data in ElasticSearch, need to find the count of documents till a point at which the value changes Elasticsearch	1	318	August 29, 2019
Need help with aggregation query in ES 6.8 Elasticsearch	1	434	April 15, 2021

Conditionally aggregating documents in an index?

Related topics