Geo_distance filter performance issues

Hi everyone,

I am having a performance issue trying to query an index of ~280 million
documents with a geo_distance or bounding box filter.
The data I'm trying to query is imported from Open Street Map. I used the
elasticsearch-osmosis-pluginhttps://github.com/ncolomer/elasticsearch-osmosis-plugin to
import the data.

Our configuration :

  • Nodes : 2 nodes (Windows azure XL virtual machines on Ubuntu - 8core 14Go
    RAM + 4 core 7Go)
  • Shards : I've played around with this, doesn't change much (usually set
    to 1-5 shards and 0-1 replica)
  • JVM 7
  • ES_HEAP_SIZE set to 7Go - 4Go
  • Data is stored on windows azure drives (probably not on the same machine)
  • The index in question is roughly 80Go for 280 million documents
  • We have several other small indices (One with ~10M documents and 3 others
    with ~50k documents)

My mapping : https://gist.github.com/sebhomengo/5136400
My query : https://gist.github.com/sebhomengo/5136451

The problem we've been facing is with performance and RAM. The query either
never ends with a *java.lang.OutOfMemoryError: Java heap space *or takes
between 20sec and several minutes. We are currently upgrading our second
server to add more ram and try avoiding OutOfMemory errors. With less
documents (up to 3 or 4 million) we don't really have performance issues.

From what i understand, geo_distance and geo_bounding_box filters have to
set everything in RAM before geolocalisation calculation, and with so
"many" documents in the index, our current nodes can't manage. I saw that
geo_shape doesn't work the same way but we can't easily change the
indexation since we import data from an external plugin.

So i guess my questions are :

  • Is there a way to complete our query in less than 1second with our
    current configuration ?
  • Do we have to add more nodes to balance the load and ram usage ?
  • Can the use of a geo_shape type instead of geo_point solve this problem
    (since i think it doesn't load points in RAM) ? In this case we will fork
    with a new geo_shape feature in the plugin.

Several things I've already tried with no real success :

  • Setting lon_lat indexation in geo_point and setting optimize_bbox to
    indexed in the geo_distance filter or type to indexed in the bbox query
  • Setting distance_arc to plane
  • _source compressed and store compress/tv to true
  • Optimizing the index with max_number_segments to 4
  • Reading everything I could find and understand in this group :stuck_out_tongue:

Looking forward to your inputs !

Cheers,
Sébastien Zerah

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.