Aggregation error( Java heap space)

vir_candy · April 2, 2014, 8:04am

I do an aggregation search on my index(6 nodes). There are about 200
million lines of data(port scanning). Each line is same* like this :**{"ip":"85.18.68.5",
"banner":"cisco-IOS", "country":"IT", "_type":"port-80"}.*
So you can image I have these data sort into different type by port they
are scanning. Now, I want to know who open a lot of ports at the same time.
So, I choose to do aggregation on IP field, and I get an OOM error that may
be reasonable because of most of them open only one port so that there are
too many buckets? I guess.

And then, I use aggregation filter.

{
"aggs":{
"just_name1":{
"filter":{
"prefix":{
"ip":"100.1"
}
},
"aggs":{
"just_name2":{
"terms":{
"field":"ip",
"execution_hint":"map"
}
}
}
}
}
}(yes, my ip field is set as string)

I think this time, I could make ES narrow down the set for aggregation. But I still get an OOM error. While It works on a smaller index(another cluster, one node). Why would this happen? After filtering, 2 cluster should have an equal-volume set. Why the bigger one failed?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d66bef21-b1e9-4538-b621-e93949b389cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

vir_candy · April 2, 2014, 8:09am

The smaller index have 1 million lines of data. They are the lines filtered
by "prefix":{"ip":"100.1"} from the bigger one.

在 2014年4月2日星期三UTC+8下午4时04分27秒，vir....@gmail.com写道：

I do an aggregation search on my index(6 nodes). There are about 200
million lines of data(port scanning). Each line is same* like this :**{"ip":"85.18.68.5",
"banner":"cisco-IOS", "country":"IT", "_type":"port-80"}.*
So you can image I have these data sort into different type by port they
are scanning. Now, I want to know who open a lot of ports at the same time.
So, I choose to do aggregation on IP field, and I get an OOM error that may
be reasonable because of most of them open only one port so that there are
too many buckets? I guess.

And then, I use aggregation filter.

{
"aggs":{
"just_name1":{
"filter":{
"prefix":{
"ip":"100.1"
}
},
"aggs":{
"just_name2":{
"terms":{
"field":"ip",
"execution_hint":"map"
}
}
}
}
}
}(yes, my ip field is set as string)

I think this time, I could make ES narrow down the set for aggregation. But I still get an OOM error. While It works on a smaller index(another cluster, one node). Why would this happen? After filtering, 2 cluster should have an equal-volume set. Why the bigger one failed?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d384bea8-4a60-4521-aa0e-34bb2fd61ec5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · April 2, 2014, 8:27am

Given your description of the problem, I think the issue is that your
Elasticsearch cluster doesn't have enough memory to load field data for the
ip field (which needs to be done for all documents, not only those that
match your query). So you either need to give more nodes to your cluster,
more memory to your nodes, or use doc values for your ip field[1] (the
latter option requires reindexing).

[1]

On Wed, Apr 2, 2014 at 10:09 AM, vir.candy@gmail.com wrote:

The smaller index have 1 million lines of data. They are the lines
filtered by "prefix":{"ip":"100.1"} from the bigger one.

在 2014年4月2日星期三UTC+8下午4时04分27秒，vir....@gmail.com写道：

I do an aggregation search on my index(6 nodes). There are about 200
million lines of data(port scanning). Each line is same* like this :**{"ip":"85.18.68.5",
"banner":"cisco-IOS", "country":"IT", "_type":"port-80"}.*
So you can image I have these data sort into different type by port they
are scanning. Now, I want to know who open a lot of ports at the same time.
So, I choose to do aggregation on IP field, and I get an OOM error that may
be reasonable because of most of them open only one port so that there are
too many buckets? I guess.

And then, I use aggregation filter.

{
"aggs":{
"just_name1":{
"filter":{
"prefix":{
"ip":"100.1"
}
},
"aggs":{
"just_name2":{
"terms":{
"field":"ip",
"execution_hint":"map"
}
}
}
}
}
}(yes, my ip field is set as string)

I think this time, I could make ES narrow down the set for aggregation. But I still get an OOM error. While It works on a smaller index(another cluster, one node). Why would this happen? After filtering, 2 cluster should have an equal-volume set. Why the bigger one failed?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d384bea8-4a60-4521-aa0e-34bb2fd61ec5%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d384bea8-4a60-4521-aa0e-34bb2fd61ec5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6kOx7RXmBzU9wfhesUYiz-2Qx8mrZStb_rCGdQv%2BpqNQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

vir_candy · April 2, 2014, 8:52am

But I can do aggregation on 'banner' field on both cluster. Is that because
values of 'banner' are not so unique compared to 'ip' field

2014-04-02 16:27 GMT+08:00 Adrien Grand adrien.grand@elasticsearch.com:

Given your description of the problem, I think the issue is that your
Elasticsearch cluster doesn't have enough memory to load field data for the
ip field (which needs to be done for all documents, not only those that
match your query). So you either need to give more nodes to your cluster,
more memory to your nodes, or use doc values for your ip field[1] (the
latter option requires reindexing).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wed, Apr 2, 2014 at 10:09 AM, vir.candy@gmail.com wrote:

The smaller index have 1 million lines of data. They are the lines
filtered by "prefix":{"ip":"100.1"} from the bigger one.

在 2014年4月2日星期三UTC+8下午4时04分27秒，vir....@gmail.com写道：

I do an aggregation search on my index(6 nodes). There are about 200
million lines of data(port scanning). Each line is same* like this :**{"ip":"85.18.68.5",
"banner":"cisco-IOS", "country":"IT", "_type":"port-80"}.*
So you can image I have these data sort into different type by port they
are scanning. Now, I want to know who open a lot of ports at the same time.
So, I choose to do aggregation on IP field, and I get an OOM error that may
be reasonable because of most of them open only one port so that there are
too many buckets? I guess.

And then, I use aggregation filter.

{
"aggs":{
"just_name1":{
"filter":{
"prefix":{
"ip":"100.1"
}
},
"aggs":{
"just_name2":{
"terms":{
"field":"ip",
"execution_hint":"map"
}
}
}
}
}
}(yes, my ip field is set as string)

I think this time, I could make ES narrow down the set for aggregation. But I still get an OOM error. While It works on a smaller index(another cluster, one node). Why would this happen? After filtering, 2 cluster should have an equal-volume set. Why the bigger one failed?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d384bea8-4a60-4521-aa0e-34bb2fd61ec5%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d384bea8-4a60-4521-aa0e-34bb2fd61ec5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cf6dpcV7G3w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6kOx7RXmBzU9wfhesUYiz-2Qx8mrZStb_rCGdQv%2BpqNQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6kOx7RXmBzU9wfhesUYiz-2Qx8mrZStb_rCGdQv%2BpqNQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp1%3DtwM3KJ1QYvsKGcXi4bDfjwDF-bRviSsYX6jUBEg6w5qgQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · April 2, 2014, 9:10am

On Wed, Apr 2, 2014 at 10:52 AM, 张阳 vir.candy@gmail.com wrote:

But I can do aggregation on 'banner' field on both cluster. Is that
because values of 'banner' are not so unique compared to 'ip' field

Very likely, yes. Memory usage of field data is higher on high-cardinality
fields.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7Fzw6Aud-J2RFb7a2DvfzrDfjyNdMLP0DcjuWgd0Ax9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Aggregate query: Elasticsearch:java.lang.OutOfMemoryError: Java heap space Elasticsearch	8	1433	July 25, 2019
Aggregation on big data Elasticsearch	3	2315	July 6, 2017
Searching from the big index - Java heap space exception Elasticsearch	3	464	July 6, 2017
Searching the big index - java.lang.OutOfMemoryError: Java heap space Elasticsearch	8	495	July 6, 2017
Error while indexing -java heap space Elasticsearch	17	887	July 6, 2017

Aggregation error( Java heap space)

Related Topics