We are using elastic search almost as a cache, storing documents found in a
time window. We continuously insert a lot of documents of different sizes
and then we search in the ES using text queries combined with a date filter
so the current thread does not get documents it has already seen. Something
"((word1 AND word 2) OR (word3 AND word4)) AND insertedDate > 1389000"
We maintain the data in the elastic search for 30 minutes, using the TTL
feature. Today we have at least 3 machines inserting new documents in bulk
requests every minute for each machine and searching using queries like the
one above pratically continuously.
We are having a lot of trouble indexing and retrieving these documents, we
are not getting a good throughput volume of documents being indexed and
returned by ES. We can't get even 200 documents indexed per second.
We believe the problem lies in the simultaneous queries, inserts and TTL
deletes. We don't need to keep old data in elastic, we just need a small
time window of documents indexed in elastic at a given time.
What should we do to improve our performance?
Thanks in advance
- An Amazon EC2 medium instance (3.7 GB of RAM)
The code used to build the index is something like this:
Our elasticsearch.json configuration file:
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
For more options, visit https://groups.google.com/groups/opt_out.