what puzzles me out is the idea of flipping an aggregation. it seems counter-intuitive to me and I would love to understand why it's the advisable way to go.
It’s about working more easily with the natural sort order of ‘ terms’ aggs. They pick the most popular. With users at the top level you’ll find the most prolific user first and for each the most common status code for that user. With the status codes up top up you’ll get the most common code first (likely 200) and for each status code the most common user. So for 404s you’ll naturally get the users with the most 404s
This might highlight a user who has a lot of 404s but they also have many more 200s to make up for it (ie 404s are only 1% of their total traffic). To find users who have an unusually high mix of a status code (eg 90% of their traffic is 404s) simply use a ‘significant_terms’ agg instead of a ‘terms’ agg for the users.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.