We have the following scenario:
We have a 5 Node cluster with 4gb heap space on each node and what we want to do is performing geo distance queries with the percolator. Currently we have a throughput of 5.2 requests / sec which is not enough for our usecase.
What we did so far to improve the performance:
- Disabled swapping
- raised the shardsize of the index to 5 (one shard per node)
- intially we started with geo_shape queries and changed them to geo_distance filters now
- reduce the percolated queries by filtering the type
- use geo_distance type plane
We have indexed about 3million queries, what options do we have to increase the performance further?
The queries that we index are looking like this:
{
"query": {
"filtered": {
"query": {"match_all": {}},
"filter": {
"and": {
"filters": [
{
"range": {
"price": {
"from": 0,
"to": 400
}
}
},
{
"range": {
"rooms": {
"from": 4,
"to": 6
}
}
},
{
"range": {
"area": {
"from": 500,
"to": 720
}
}
},
{
"geo_distance": {
"location" : {
"lat" : 40,
"lon" : -70
},
"distance": 10,
"distance_unit": "km"
}
}
]
}
}
}
}
}
The documents that we are percolating look like this:
{
"doc": {
"price": 400,
"area": 30,
"rooms": 1,
"location": {
"lat": 52.517801,
"lon": 13.400000
}
},
"filter": {
"term" : {
"type": "xyz"
}
}
}
Mappings:
"mappings" : {
".percolator" : {
"_id" : {
"index" : "not_analyzed"
},
"properties" : {
"query" : {
"type" : "object",
"enabled" : false
},
"realEstateType" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"anonyme_gesuche" : {
"_source" : {
"enabled" : false
},
"properties" : {
"area" : {
"type" : "double"
},
"geoKey" : {
"type" : "string",
"index" : "not_analyzed"
},
"location" : {
"type" : "geo_point"
},
"price" : {
"type" : "double"
},
"realEstateType" : {
"type" : "string"
},
"rooms" : {
"type" : "double"
}
}
}
}