Taking hardware out of the equation and strictly looking at the tools available via Elasticsearch api, I'm looking for any other advice or suggestions on how I can improve performance. I have an index of roughly 220 million documents that responds fairly quickly, and trying to confirm there is nothing else I can do within the bounds of ES to improve on that speed.
I have an index mapped as follows:
... "_routing" : { "required" : true, "path" : "categoryId" }, "properties" : { "personId" : { "type" : "integer", "doc_values": true, "include_in_all" : false }, "statusId" : { "type" : "byte", "doc_values" : true, "include_in_all" : false }, "city" : { "type" : "string", "index" : "not_analyzed" }, "state" : { "type" : "string", "index" : "not_analyzed" }, "categoryId" : { "type" : "integer", "doc_values" : "true", "include_in_all": false }, ...
The goal here is to perform a search for "personId" and other associated information based on the combination of particular status codes, city, state, and categoryId.
(categoryId is distributed such that using custom routing did make a quantifiable positive impact to performance).
My query looks like this:
{ "query": { "filtered": { "query": {"match_all": {}}, "filter": { "bool": { "must": [ { "terms": { "statusId": [ 2, 3 ] } }, { "term": { "city": "Atlanta" } }, { "term": { "state": "GA", "_cache" : true } }, { "term": { "categoryId": 12345, "_cache" : true } } ] } } } }, "sort" : [ { "statusId" : { "order" : "asc" } }] }
Is there anything glaringly wrong with my approach?