Top 10 IPs visiting a website


#1

I have indexed some network logs. How do I search for the top 10 IP addresses that have visited a given website within a time range?

Do I use aggregations? Do I need to enable "doc_values", and also set "index_options" to at least "freqs"?

Thanks!


(Ranjith M) #2

If you can add more information about the challenges you are facing that will be easy to answer.
But per my understanding you can use aggs and limit the size per your requirement.

Adding reference link from documentation,
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html


#3

Hi Ranjith,

Thank you for replying.

For example, I have the following log lines:

[17/May/2015:09:40:34] 83.149.9.216 GET www.site1.com
[17/May/2015:09:45:16] 83.149.9.216 GET www.site1.com
[17/May/2015:10:05:04] 83.149.9.216 GET www.site1.com
[17/May/2015:10:06:02] 83.149.9.216 GET www.site222.com
[17/May/2015:10:07:35] 123.125.71.35 GET www.site1.com
[17/May/2015:10:08:08] 123.125.71.35 GET www.site1.com
[17/May/2015:10:09:14] 123.125.71.35 GET www.site1.com
[17/May/2015:10:09:24] 123.125.71.35 GET www.site333.com
[17/May/2015:10:10:01] 200.49.190.101 GET www.site1.com
[17/May/2015:10:10:26] 200.49.190.101 GET www.site1.com
[17/May/2015:11:10:43] 200.49.190.101 GET www.site1.com
[17/May/2015:11:12:23] 200.49.190.101 GET www.site1.com

Given www.site1.com and a time range [17/May/2015][10am-11am], 83.149.9.216 visited it once, 123.125.71.35 visited it 3 times, 200.49.190.101 visited it twice.

If I want to know what are the top 2 IP addresses that visited www.site1.com within time range [17/May/2015][10am-11am], it should return something like this:

123.125.71.35 --- 3
200.49.190.101 --- 2

How do I achieve this?

Thanks!


(Adrien Grand) #4

Yes, a terms aggregation.

This should be on by default, so you should rather make sure to not disable them.

This is not necessary.


#5

Hi Adrien,

Thank you for replying.

For my example above, do I enable doc_values for the field of 83.149.9.216, or the field of www.site1.com? Our disk space is very limited, so we want to keep the index data size to its minimum.

Given a website name and time range, we first extract all matched log line, then sum by IP address, then do a sort on the sums. Does something like this work: terms aggs + sort aggs. But I imagine that the search query should be nested...


#6

Will something like this work?

POST /_search
{
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "filter": [ 
                        "range": {
                            "log_time": {
                                "gte": 17/May/2015:10:00:00,
                                "lte": 17/May/2015:11:00:00
                            }
                        },
                        "terms": {
                            "site_name": "www.site1.com"
                        }
                    ]
                }
            },
        }
    },
    "aggs": {
        "top_ips_visiting_a_site": {
            "terms": {
                "field": "ip_address"
            },
            "aggs": {
                "top_ips_hits": {
                    "sum_hits" : { 
                        "sum" : { 
                            "field" : "ip_address" 
                        } 
                    },
                    "sort_hits": {
                        "sort": {
                            "date": {
                                "order": "desc"
                            }
                        }
                    }
                }
            }
        }
    }
}

(Adrien Grand) #7

I would recommend buying more disk space. Disabling doc values does not save that much disk space, and will prevent you from aggregating this field in the future.

Your aggregation looks wrong, I believe you want to do something like this:

POST /_search
{
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "filter": [ 
                        "range": {
                            "log_time": {
                                "gte": 17/May/2015:10:00:00,
                                "lte": 17/May/2015:11:00:00
                            }
                        },
                        "terms": {
                            "site_name": "www.site1.com"
                        }
                    ]
                }
            },
        }
    },
    "aggs": {
        "top_ips_visiting_a_site": {
            "terms": {
                "field": "ip_address"
            },
            "aggs": {
                "top_ips_hits": {
                    "top_hits": {
                        "sort": {
                            "log_time": {
                                "order": "desc"
                            }
                        }
                    }
                }
            }
        }
    }
}

(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.