Write Preprocessor to Sum records


(Paul Ainslie) #1

Would it be possible to write a preprocessor in ES 5.0 which: a) counts our incoming nginx access records; and b) summarizes the byte count on each record and accumulates these two values for each unique session id over 5 minute intervals and then writes that out every 5 minutes for each session id? This could later be changed to longer intervals, but the principle is still the same.

I know everyone's going to say, 'just use aggregations in ES to report on the same thing'. Well, we currently are, but we have a requirement for more summarized data and faster query speeds when summarizing over a period of, say 2 months or 1 year.

Paul


(Mark Harwood) #2

See logstash https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html or entity centric indexing https://m.youtube.com/watch?v=yBf7oeJKH2Y


(Paul Ainslie) #3

Thanks Mark - so if I'm correct, that's given me two options.
a) Aggregate real time, at source using the aggregate Logstash plugin.
b) Create a script on Elasticsearch doing something similar to your entity centrix indexing example. In this case, would you suggest I use Groove or the new Painless language?

Paul


(Mark Harwood) #4

Yep.

Good question :slight_smile: I've not tried converting my example Groovy script into Painless yet.
At this point in the video I show the outline of an update script. The things it needs to do typically are:

  1. Load saved JSON arrays into a Map or a Set to represent info you want to maintain (e.g. a unique list of products a person has bought over time)
  2. A "for" loop to iterate over the new events and update the data in 1)
  3. Serialize the collections back to plain old JSON arrays ready for storage.

I'd be interested to hear how easy it is to do in Painless if you attempt this translation.
My example Groovy script is at http://bit.ly/entcent

Cheers
Mark


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.