Displaying an "id" only if it appears N times in the period

Hello,

I have a log of events containing API access data like [ timestamp, id, other_informations ]. We would like to find some information about it.

We know that we can understand things by watching the amount of request we have from one id during a specified period of time. The problem is that Top X gives thousands of normal id access and inverted top X doesn't help (thousands of "1" connection attempt by an id). The id carnality is tens of millions, a full histogram can't be built.

On the other hand, we know that if we can specify N and M to something like "show me 1000 ids that appear between N and M times in the period of observation" we will have the info we need.

Any ideas?

If you're trying put these IDs on a visualization, you could get part of the way there by using a terms aggregation on ID, and then specifying the min_doc_count in the advanced JSON config. This will limit the terms on that axis to IDs that meet or exceed some threshold.

Once you do that, you can build up a list of IDs you are interested in and create a filter that limits all data to those IDs.

It is a somewhat manual process, but I think it is as far as Kibana will be able to take you until filters can be based on the results of queries or aggregations (maybe something like https://github.com/elastic/kibana/issues/16702).

Thanks for your reply. I just learned something :slight_smile:

Since there is no "max_doc_count", I have to combine with an ascending sort to have my information and Kibana say this is deprecated.

My next step is entity centric indexing, but it requires scripting skills which I don't have in the context of Elasticsearch. I'll open a separate topic when needed.

Thank you again.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.