Hi all -
I have a document set of ~100,000 items. Each document has a
geometry.location (a single mapped GeoPoint) and a geometry.coordinates (an
array of mapped GeoPoints). On average there may be 100 GeoPoints in the
geometry.coordinates array (so 10,000,000 GeoPoints in total). I
understand that executing a match_all query with a GeoDistanceFilter
against geometry.coordinates may require loading all the 10,000,000
geopoints into memory.
To avoid this, I'm attempting to execute a filtered query which I
understand should execute the query first and the filter second. Thus, if
the # of documents returned from the query are say 10, then this would mean
the GeoDistance filter executing across 1,000 GeoPoints (10 documents x 100
GeoPoints per document). However, every time I do this I get an OOM
exception, even when the query returns 0 documents.
Speculation: It seems that regardless of the query, all the GeoPoints in
geometry.coordinates are attempted to load into memory.
Example I'm using in ES head: (note the bool query for user:kimchy returns
0 documents).
{
"query": {
"filtered": {
"query": {
"bool": {
"must": {
"term": {
"user": "kimchy"
}
}
}
},
"filter": {
"geo_distance": {
"distance": "200km",
"geometry.coordinates": {
"lat": 43,
"lon": -79
}
}
}
}
}
}
Other tidbits:
- ES 19.9.
- Java 7 Update 9
- Windows 7, 64 bit
- 8GB memory allocated to ES
- 16GB memory total
Any help or suggestions are much appreciated!
--