Hi there. I'm sure you've heard this question a million times, so apologies
I've been reading day and night about indexes, shards, nodes, custom routing, ES metrics to monitor, etc. I'm still a bit stumped on the path forward. I've been testing the default configuration, but looking for some guidance before i go down paths that don't make sense.
Use case: listings search. Think Trulia, Zillow, etc. I'm using ES for the 'reads'. So searching for listings in various locations, with various filters, and doing faceted nav.
Numbers: approx 5 million listings. Only around 250k 'available' (e.g on market). They get get moved to 'sold' or 'leased'. etc. Index size is about 5gb.
Expectation: could have around 50 requests per second for searches. Obviously we can put caching in the application layer, but trying to avoid that unless needed.
Current infra: Elastic cloud default configuration for 'high i/o'. 4 GB, 2 nodes across 2 fault zones. 1 master node.
Current index setup: everything in one index ("Listings"). default sharding (5 primary, 1 replica).
Results: avg response time for 5 concurrent users = ~100ms. Was hoping for closer to ~50.
Cool, so now my questions.
Questions & Ideas i've got so far:
- Split into 'available' and 'completed' indexes. That way i can control the growth of 'available'.
- Use custom routing based on neighborhood. Might be overkill so far?
- Add more nodes. Not sure how many i need?
- Tweak primary/replica shards. I still don't understand the calcs. I don't think 5gb is considered 'big', so maybe i should go to 1 primary shard? But how many replicas?
- Don't store the full 'listing'. Just store what's being searched upon, then put the data required for the result in a diff index. But i've got these non-searchable fields as index=false, so i'm not sure if splitting will give more additional gains.
- In ElasticCloud, if i've got 2x nodes across two zones, i assume they are in different data centres, but will this impact latency, compared to having 2x nodes in the same data centre? Is it neglible?
That's a lot of ideas, and lots of things to test. So looking for some general guidance on my use case, before i waste lots of time. Thanks muchly!