I have indexed some network logs. How do I search for the top 10 IP addresses that have visited a given website within a time range?
Do I use aggregations? Do I need to enable "doc_values", and also set "index_options" to at least "freqs"?
Thanks!
I have indexed some network logs. How do I search for the top 10 IP addresses that have visited a given website within a time range?
Do I use aggregations? Do I need to enable "doc_values", and also set "index_options" to at least "freqs"?
Thanks!
If you can add more information about the challenges you are facing that will be easy to answer.
But per my understanding you can use aggs and limit the size per your requirement.
Adding reference link from documentation,
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html
Hi Ranjith,
Thank you for replying.
For example, I have the following log lines:
[17/May/2015:09:40:34] 83.149.9.216 GET www.site1.com
[17/May/2015:09:45:16] 83.149.9.216 GET www.site1.com
[17/May/2015:10:05:04] 83.149.9.216 GET www.site1.com
[17/May/2015:10:06:02] 83.149.9.216 GET www.site222.com
[17/May/2015:10:07:35] 123.125.71.35 GET www.site1.com
[17/May/2015:10:08:08] 123.125.71.35 GET www.site1.com
[17/May/2015:10:09:14] 123.125.71.35 GET www.site1.com
[17/May/2015:10:09:24] 123.125.71.35 GET www.site333.com
[17/May/2015:10:10:01] 200.49.190.101 GET www.site1.com
[17/May/2015:10:10:26] 200.49.190.101 GET www.site1.com
[17/May/2015:11:10:43] 200.49.190.101 GET www.site1.com
[17/May/2015:11:12:23] 200.49.190.101 GET www.site1.com
Given www.site1.com and a time range [17/May/2015][10am-11am], 83.149.9.216 visited it once, 123.125.71.35 visited it 3 times, 200.49.190.101 visited it twice.
If I want to know what are the top 2 IP addresses that visited www.site1.com within time range [17/May/2015][10am-11am], it should return something like this:
123.125.71.35 --- 3
200.49.190.101 --- 2
How do I achieve this?
Thanks!
Yes, a terms
aggregation.
This should be on by default, so you should rather make sure to not disable them.
This is not necessary.
Hi Adrien,
Thank you for replying.
For my example above, do I enable doc_values
for the field of 83.149.9.216, or the field of www.site1.com? Our disk space is very limited, so we want to keep the index data size to its minimum.
Given a website name and time range, we first extract all matched log line, then sum by IP address, then do a sort on the sums. Does something like this work: terms aggs
+ sort aggs
. But I imagine that the search query should be nested...
Will something like this work?
POST /_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"filter": [
"range": {
"log_time": {
"gte": 17/May/2015:10:00:00,
"lte": 17/May/2015:11:00:00
}
},
"terms": {
"site_name": "www.site1.com"
}
]
}
},
}
},
"aggs": {
"top_ips_visiting_a_site": {
"terms": {
"field": "ip_address"
},
"aggs": {
"top_ips_hits": {
"sum_hits" : {
"sum" : {
"field" : "ip_address"
}
},
"sort_hits": {
"sort": {
"date": {
"order": "desc"
}
}
}
}
}
}
}
}
I would recommend buying more disk space. Disabling doc values does not save that much disk space, and will prevent you from aggregating this field in the future.
Your aggregation looks wrong, I believe you want to do something like this:
POST /_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"filter": [
"range": {
"log_time": {
"gte": 17/May/2015:10:00:00,
"lte": 17/May/2015:11:00:00
}
},
"terms": {
"site_name": "www.site1.com"
}
]
}
},
}
},
"aggs": {
"top_ips_visiting_a_site": {
"terms": {
"field": "ip_address"
},
"aggs": {
"top_ips_hits": {
"top_hits": {
"sort": {
"log_time": {
"order": "desc"
}
}
}
}
}
}
}
}
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.