Hi everyone,
I am having a performance issue trying to query an index of ~280 million
documents with a geo_distance or bounding box filter.
The data I'm trying to query is imported from Open Street Map. I used the
elasticsearch-osmosis-pluginhttps://github.com/ncolomer/elasticsearch-osmosis-plugin to
import the data.
Our configuration :
- Nodes : 2 nodes (Windows azure XL virtual machines on Ubuntu - 8core 14Go
RAM + 4 core 7Go) - Shards : I've played around with this, doesn't change much (usually set
to 1-5 shards and 0-1 replica) - JVM 7
- ES_HEAP_SIZE set to 7Go - 4Go
- Data is stored on windows azure drives (probably not on the same machine)
- The index in question is roughly 80Go for 280 million documents
- We have several other small indices (One with ~10M documents and 3 others
with ~50k documents)
My mapping : https://gist.github.com/sebhomengo/5136400
My query : https://gist.github.com/sebhomengo/5136451
The problem we've been facing is with performance and RAM. The query either
never ends with a *java.lang.OutOfMemoryError: Java heap space *or takes
between 20sec and several minutes. We are currently upgrading our second
server to add more ram and try avoiding OutOfMemory errors. With less
documents (up to 3 or 4 million) we don't really have performance issues.
From what i understand, geo_distance and geo_bounding_box filters have to
set everything in RAM before geolocalisation calculation, and with so
"many" documents in the index, our current nodes can't manage. I saw that
geo_shape doesn't work the same way but we can't easily change the
indexation since we import data from an external plugin.
So i guess my questions are :
- Is there a way to complete our query in less than 1second with our
current configuration ? - Do we have to add more nodes to balance the load and ram usage ?
- Can the use of a geo_shape type instead of geo_point solve this problem
(since i think it doesn't load points in RAM) ? In this case we will fork
with a new geo_shape feature in the plugin.
Several things I've already tried with no real success :
- Setting lon_lat indexation in geo_point and setting optimize_bbox to
indexed in the geo_distance filter or type to indexed in the bbox query - Setting distance_arc to plane
- _source compressed and store compress/tv to true
- Optimizing the index with max_number_segments to 4
- Reading everything I could find and understand in this group
Looking forward to your inputs !
Cheers,
Sébastien Zerah
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.