I do an aggregation search on my index(6 nodes). There are about 200
million lines of data(port scanning). Each line is same* like this :**{"ip":"85.18.68.5",
"banner":"cisco-IOS", "country":"IT", "_type":"port-80"}.*
So you can image I have these data sort into different type by port they
are scanning. Now, I want to know who open a lot of ports at the same time.
So, I choose to do aggregation on IP field, and I get an OOM error that may
be reasonable because of most of them open only one port so that there are
too many buckets? I guess.
And then, I use aggregation filter.
{
"aggs":{
"just_name1":{
"filter":{
"prefix":{
"ip":"100.1"
}
},
"aggs":{
"just_name2":{
"terms":{
"field":"ip",
"execution_hint":"map"
}
}
}
}
}
}(yes, my ip field is set as string)
I think this time, I could make ES narrow down the set for aggregation. But I still get an OOM error. While It works on a smaller index(another cluster, one node). Why would this happen? After filtering, 2 cluster should have an equal-volume set. Why the bigger one failed?
I do an aggregation search on my index(6 nodes). There are about 200
million lines of data(port scanning). Each line is same* like this :**{"ip":"85.18.68.5",
"banner":"cisco-IOS", "country":"IT", "_type":"port-80"}.*
So you can image I have these data sort into different type by port they
are scanning. Now, I want to know who open a lot of ports at the same time.
So, I choose to do aggregation on IP field, and I get an OOM error that may
be reasonable because of most of them open only one port so that there are
too many buckets? I guess.
And then, I use aggregation filter.
{
"aggs":{
"just_name1":{
"filter":{
"prefix":{
"ip":"100.1"
}
},
"aggs":{
"just_name2":{
"terms":{
"field":"ip",
"execution_hint":"map"
}
}
}
}
}
}(yes, my ip field is set as string)
I think this time, I could make ES narrow down the set for aggregation. But I still get an OOM error. While It works on a smaller index(another cluster, one node). Why would this happen? After filtering, 2 cluster should have an equal-volume set. Why the bigger one failed?
Given your description of the problem, I think the issue is that your
Elasticsearch cluster doesn't have enough memory to load field data for the
ip field (which needs to be done for all documents, not only those that
match your query). So you either need to give more nodes to your cluster,
more memory to your nodes, or use doc values for your ip field[1] (the
latter option requires reindexing).
I do an aggregation search on my index(6 nodes). There are about 200
million lines of data(port scanning). Each line is same* like this :**{"ip":"85.18.68.5",
"banner":"cisco-IOS", "country":"IT", "_type":"port-80"}.*
So you can image I have these data sort into different type by port they
are scanning. Now, I want to know who open a lot of ports at the same time.
So, I choose to do aggregation on IP field, and I get an OOM error that may
be reasonable because of most of them open only one port so that there are
too many buckets? I guess.
And then, I use aggregation filter.
{
"aggs":{
"just_name1":{
"filter":{
"prefix":{
"ip":"100.1"
}
},
"aggs":{
"just_name2":{
"terms":{
"field":"ip",
"execution_hint":"map"
}
}
}
}
}
}(yes, my ip field is set as string)
I think this time, I could make ES narrow down the set for aggregation. But I still get an OOM error. While It works on a smaller index(another cluster, one node). Why would this happen? After filtering, 2 cluster should have an equal-volume set. Why the bigger one failed?
Given your description of the problem, I think the issue is that your
Elasticsearch cluster doesn't have enough memory to load field data for the
ip field (which needs to be done for all documents, not only those that
match your query). So you either need to give more nodes to your cluster,
more memory to your nodes, or use doc values for your ip field[1] (the
latter option requires reindexing).
I do an aggregation search on my index(6 nodes). There are about 200
million lines of data(port scanning). Each line is same* like this :**{"ip":"85.18.68.5",
"banner":"cisco-IOS", "country":"IT", "_type":"port-80"}.*
So you can image I have these data sort into different type by port they
are scanning. Now, I want to know who open a lot of ports at the same time.
So, I choose to do aggregation on IP field, and I get an OOM error that may
be reasonable because of most of them open only one port so that there are
too many buckets? I guess.
And then, I use aggregation filter.
{
"aggs":{
"just_name1":{
"filter":{
"prefix":{
"ip":"100.1"
}
},
"aggs":{
"just_name2":{
"terms":{
"field":"ip",
"execution_hint":"map"
}
}
}
}
}
}(yes, my ip field is set as string)
I think this time, I could make ES narrow down the set for aggregation. But I still get an OOM error. While It works on a smaller index(another cluster, one node). Why would this happen? After filtering, 2 cluster should have an equal-volume set. Why the bigger one failed?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.