(Asked on SO but thought that this place is probably better)
Unguided beginners in any field often find themselves barking up the wrong tree in trying to solve a problem — this question is asked in hoping that it'll vector my approach in a more direct path towards solving the problem.
On to the question:
I'm about a month into working with ES and so far it's been awesome. I've been incrementally indexing to ES from a set of data I've got in CSV, and I'm beginning to encounter slow response times. I want to bring the response time down, but don't know what's a good way / the best way to approach it.
My research thus far tells me that it really depends on a number of variables. So, listed below are details on the ES variables which might help you with writing an answer:
-
Shards & Stuff
- I say "& Stuff" because I don't know enough to know what's significant here.
- Running the default ES settings, 5 shards, 1 node.
- Running index-time-search-as-you-type, exactly as-is from the ES guide. There's a bit in there which
PUT
s settings for the indices:"number_of_shards": 1
. I'm not sure how that affects things.
-
Index
- 2 indices with similar mappings (mirror a DB, so don't want to combine them)
- Multi-language, but at the moment I only care about English.
- As mentioned above, configured for index-time-search-as-you-type (min: 3, max: 20).
-
Documents
- Have currently indexed ~1mil documents.
- Have total of ~4mil documents to index.
- Very short documents, like 5 fields of 10 english words per doc.
- Total CSV filesize of all ~4mil rows is only ~400MB.
-
Queries
- Main query is run as a bool (should) query.
- Heavy on score scripting.
- Heavy on script-sorted aggregations.
- Fuzzy search (fuzziness: 1).
-
Hardware
- Running Linode's $20/mo VPS.
-
Response Time
- Queries with very high frequencies (typically a single English word) in the index are taking forever (~7-9000ms) to return results.
- More specific queries (>=2 eng words) return more acceptable response times (~2-3000ms).
- Ideally, all response times should be <2s.
If there are other variables which are important, and I've missed out, let me know and I'll edit them in.
Thank you!