Scripting with multiple documents

oravecz · August 7, 2015, 12:42pm

All of the scripting examples (whether sorting, filtering or computing new values) take place on a single document. Is it possible to remove filtered results prior to aggregation using custom scripts or a plugin?

Use case: My documents are idempotent, and represent changes to a user's account. Each account update indexes a new document. I want to aggregate records on an account field, but I only want to include the user's latest update in the aggregation.

nik9000 · August 7, 2015, 12:55pm

I'd have an index that just contains the latest data per user in that case. If you still need the history then maybe two indexes? It'd be a pain to make sure they stay in sync but it'd be simpler than trying to squash the documents on the fly during an aggregation.

oravecz · August 7, 2015, 1:57pm

Are you suggesting that before I index the latest change into one index, I copy the existing record to an "archived" index and delete it from the "live" index? I don't get it.

More to the point, are you suggesting this approach because there is no way to filter an aggregated collection of records to include only the latest record?

What I'm trying to do is similar to this pseudo-SQL statement:

SELECT * from T where T.submitted = ( SELECT MAX(T.submitted) WHERE T.user = ? )

Christian_Dahlqvist · August 7, 2015, 2:17pm

I think he is suggesting keeping all the raw records in one index and have a separate index where you store the latest status, using the user ID as a key so that any update for a user coming in will overwrite the previous status. That way you have fast access to the last change made for every user as wells the entire history.

oravecz · August 7, 2015, 3:27pm

@Christian_Dahlqvist Thanks, that makes more sense.

However, that is not an option for me due to the size of these indices and our rollover strategy would get really complicated, but I do see how it is a workaround.

So, no ability to create an aggregation that limits the records to only include the latest record? Something like max(_timestamp)? No way to do it with a custom plugin?

Christian_Dahlqvist · August 7, 2015, 4:08pm

Although I am not sure how you would do it, it may be possible to do all that processing at query time. This could however be slow and may also be difficult to scale well. Using a separate index to hold the most recent state will make these queries much more efficient as well as scale well. If you are performing this type of query rarely, doing all this work for each query may be fine, but if you need to get this information on a regular basis, you will most likely benefit from doing the work up front by having a separate index.

oravecz · August 12, 2015, 12:45pm

Thanks. If I cannot filter a group of records in a custom plugin, then I will have to post process my results programmatically.

Topic		Replies	Views
Elasticsearch script field: add an aggregated field Elasticsearch	2	376	January 22, 2019
Multiple sum aggregations using the same script Elasticsearch	3	962	January 3, 2017
Hand-rolled document versioning and query? Elasticsearch	4	311	July 6, 2017
Create script-field and aggregate+filter+bucket on that Elasticsearch	3	844	July 5, 2017
Index document with script Elasticsearch	4	814	June 19, 2017

Scripting with multiple documents

Related topics