Geo_distance filter performance issues

Hi everyone,

I am having a performance issue trying to query an index of ~280 million
documents with a geo_distance or bounding box filter.
The data I'm trying to query is imported from Open Street Map. I used the
elasticsearch-osmosis-plugin to
import the data.

Our configuration :

  • Nodes : 2 nodes (Windows azure XL virtual machines on Ubuntu - 8core 14Go
    RAM + 4 core 7Go)
  • Shards : I've played around with this, doesn't change much (usually set
    to 1-5 shards and 0-1 replica)
  • JVM 7
  • ES_HEAP_SIZE set to 7Go - 4Go
  • Data is stored on windows azure drives (probably not on the same machine)
  • The index in question is roughly 80Go for 280 million documents
  • We have several other small indices (One with ~10M documents and 3 others
    with ~50k documents)

My mapping :
My query :

The problem we've been facing is with performance and RAM. The query either
never ends with a *java.lang.OutOfMemoryError: Java heap space *or takes
between 20sec and several minutes. We are currently upgrading our second
server to add more ram and try avoiding OutOfMemory errors. With less
documents (up to 3 or 4 million) we don't really have performance issues.

From what i understand, geo_distance and geo_bounding_box filters have to
set everything in RAM before geolocalisation calculation, and with so
"many" documents in the index, our current nodes can't manage. I saw that
geo_shape doesn't work the same way but we can't easily change the
indexation since we import data from an external plugin.

So i guess my questions are :

  • Is there a way to complete our query in less than 1second with our
    current configuration ?
  • Do we have to add more nodes to balance the load and ram usage ?
  • Can the use of a geo_shape type instead of geo_point solve this problem
    (since i think it doesn't load points in RAM) ? In this case we will fork
    with a new geo_shape feature in the plugin.

Several things I've already tried with no real success :

  • Setting lon_lat indexation in geo_point and setting optimize_bbox to
    indexed in the geo_distance filter or type to indexed in the bbox query
  • Setting distance_arc to plane
  • _source compressed and store compress/tv to true
  • Optimizing the index with max_number_segments to 4
  • Reading everything I could find and understand in this group :stuck_out_tongue:

Looking forward to your inputs !

Sébastien Zerah

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
For more options, visit