Hi everyone,
I have a 10 million document index, and multiple high-memory (e.g. 250GB
ram, 32 cores) machines available. I'd like to do everything possible to
keep search latency as low as possible (< 50ms ideally), especially during
a high-throughput environment. I know it depends a lot on the query, but
to start with I'm asking about general index/cluster settings.
Here's a list of things I'm doing so far:
- ES_HEAP_SIZE=100g
- machine has no swap
- 20 shards
Are there any other settings or search parameters I should be aware of?
Also, I'm wondering how many shards is recommended in my case. Having more
shards helps reduce latency by parallelizing the work, but at some point
the overhead of fanning out the requests and collecting the partial results
will take over and latency would get worse. Is there a rule of thumb for a
sweet spot that others have found?
The volume of updates to the index is relatively small (500K/day), but
bursty. From initial testing, it seems like updates being issued can
increase the search latency happening on the same machine. Is there a good
way to "isolate" search and updates, either by some setting, or splitting
up the cluster somehow to have dedicated update nodes and dedicated search
nodes? (Not sure how you'd deploy a setup like this, or control where the
search/update calls went.)
The query I'm optimizing for will have a text search component and a
geo-restrict component, maybe something like this:
{
"query": {
// query may get more complex in the future
"match": { "_all": "my search terms" }
},
"filter": {
"geo_distance": {
"distance": "100km",
"location": {
"lat": 34.04,
"lon": -118.49
}
}
}
}
For the geo filter, I've tried the optimize_bbox option, and the default of
"memory" seemed to work the best, surprisingly. I haven't tried using
geohash yet, and I can't tell from the docs how one might use it, but maybe
that is inherently faster since it uses indexes?
Unfortunately, there are a lot of unique locations in my query stream, so I
don't know if caching this filter will work. (Each filter cache consumes
about 1 bit in memory per document, is that right? So about 1.25MB in my
case. Storing the most frequent 10,000 of these would take up about 12.5GB
of ram. So maybe that's doable...)
Sorry if that's a lot of questions, but I figured other people may benefit
from this thread too.
Thanks for any help.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e8e878ac-76d6-4233-a8e5-21908bd33e84%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.