I have 200 million lines of data(about port scanning). I want ES to return
those "ip" who open not only one port at the same time(order by count).
But, considering the volume of data and very little docs have same value on
"ip" field, obviously I get an out of memory error. Is there any way to
finish my query mission.
This kind of use-case requires memory for two main reasons:
field data,
counting values (aggregations).
Field data memory usage can be reduced by using doc values[1] which will
effectively store data on disk instead of memory and rely on the filesystem
cache.
Aggregations memory usage is more complicated to improve. In case you are
storing your IPs as string fields, you might want to use the map
execution hint that requires less memory than the ordinals execution hint
(please however note that we are working on improving the efficiency of
ordinals on high-cardinality fields so it might improve in future versions).
I have 200 million lines of data(about port scanning). I want ES to return
those "ip" who open not only one port at the same time(order by count).
But, considering the volume of data and very little docs have same value on
"ip" field, obviously I get an out of memory error. Is there any way to
finish my query mission.
This kind of use-case requires memory for two main reasons:
field data,
counting values (aggregations).
Field data memory usage can be reduced by using doc values[1] which will
effectively store data on disk instead of memory and rely on the filesystem
cache.
Aggregations memory usage is more complicated to improve. In case you are
storing your IPs as string fields, you might want to use the map
execution hint that requires less memory than the ordinals execution hint
(please however note that we are working on improving the efficiency of
ordinals on high-cardinality fields so it might improve in future versions).
On Mon, Mar 31, 2014 at 11:01 AM, <vir....@gmail.com <javascript:>> wrote:
I have 200 million lines of data(about port scanning). I want ES to
return those "ip" who open not only one port at the same time(order by
count). But, considering the volume of data and very little docs have same
value on "ip" field, obviously I get an out of memory error. Is there any
way to finish my query mission.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.