I would solve this with aggregations rather than queries. You start with a composite aggregation, to get a list of all unique IPs across the two indexes.
Nested inside of that you use a terms aggregation on the _index meta field to get a bucket for each index in which each IP exists. IPs that occur in both indexes will get two buckets. IPs that occur in only one index get one bucket.
Finally, you can use a bucket_selector pipeline aggregation to filter out those IP addresses for which there are two buckets. Or, in other words, retrieve only the IPs that occur in only one index.
The first dstIP, 5.189.x.x mentioned appears on both indexes; second one is unique to index_1.
Is this how this was meant to work?
Suprised that index2 does not have more unique to it IP addresses since my Unique IP graphs per index shows different numbers (see below). Also, funny to notice that despite setting the time from 7pm EST time to 6:59pm EST the next day, index_2 which includes network data and has likely a delay, seems to be spilling over the index of the next day.
The composite aggregation does not return all results at once. It allows you to page through the buckets. The after_key that's returned is something you can use to get the next page of buckets, using the after parameter in a subsequent request:
As far as I know, you can't visualize an aggregation like this with regular Kibana visualizations. You can try building a Vega visualization, but those have a bit of a learning curve. This tutorial is a good way to get started.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.