Group by RegExp

emanuelef · October 26, 2016, 3:25pm

Hi,
I'd need to make query and group the results according to a reg Exp.
For example I have a filed with those possible values:
/sdc/user?id=4039&dc=4
/sdc/user?id=4039&dc=2
/sdc/user?id=2222&dc=7

should give me:
/sdc/user?id=4039 2
/sdc/user?id=2222 1

Is it possible ?

Thanks in advance

polyfractal · October 26, 2016, 3:42pm

You could do it with a script/value_script on the terms aggregation, and use the regex functionality that Groovy scripting provides. It won't be super efficient... scripting is a fair amount slower, but it'll work.

A better approach is to try and extract some of that structure ahead of time. Either use an analyzer that breaks those strings into smaller components (then run a terms aggregation to count up the number of ?id=<num> tokens), or extract id, dc, etc query params into their own fields, which would allow you to run a terms agg on that directly.

emanuelef · October 26, 2016, 4:01pm

Thanks for the suggestions,
I think the best way would be to extract the query params into fields, and then term agg.
I guess it should be the most efficient.

polyfractal · October 26, 2016, 4:26pm

++ if you can extract them ahead of time, it'd definitely be a lot more efficient. And once they're extracted, you can use it for all kinds of unrelated analysis (top IDs, IDs over time, etc)

Topic		Replies	Views
Filter by a regexp inside a terms aggregation Elasticsearch	3	575	May 9, 2020
Elasticsearch aggregation with regexp Elasticsearch	3	515	March 13, 2017
SQL like Group by in Elasticsearch Elasticsearch	6	1378	May 22, 2018
Regexp for integer aggregation Elasticsearch	1	502	March 15, 2017
Elasticsearch group data Elasticsearch	4	1077	April 25, 2019

Group by RegExp

Related topics