We are looking out replacement for our existing SPHINX cluster for search, we are evaluating SOLR & Elasticsearch. We need real time index and fast response for selects. For our POC we are using 2 cent OS (64 bit) boxes with 24GB RAM, 16 & 8 core processors respectively, documents (216 fields) around 2.4 million with index size 30GB approx.
Elasticsearch version: 5.6.0 & 6.1.1
elasticsearch.yml file has the following configurations
Can you try to omit refresh=true on every request? This results in a performance penalty, because a new segment gets created for every document.
you can try to use refresh=wait_for on the index/update side of things, so that the 1 second default refresh interval still happens and write operation waits for this background operation to happen.
I am not yet sure where your issue regarding the updates not being reflected stems from, can you do a GET operation on the document and see that it is updated when running those tests?
@spinscale thanks for the response, I already tried with refresh=wait_for but no luck, also I am making get requests to measure the update's reflection time.
As I mentioned we are using close to 216 fields, out of which 7 fields are text fields which will have some large content say 300 to 500 kb per field, my suspect is the way Elasticsearch handles the partial document updates, ie., reading the full document and apply the changes and then re-indexing it!
with select requests are in parallel these update processes are taking more time to complete. Please correct me if i am wrong.
@spinscale for the update requests I am getting the response from the server immediately, but the changes I did to a document is not visible till the load became normal or when there are no select requests. FYI I am using a separate script to verify the changes became visible to search.
does the same happen when you refresh manually after a couple of updates? Also what is your refresh interval set to?
You can also check the nodes stats to find out how much time is spent with refresh. Given the size of your document this could also be sometime.
Still, when you configure a refresh as part of your index operation is should be visible immediately. Does that happen when you only index a single document?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.