Select Operation Holds Insert /Updates reflection

We are looking out replacement for our existing SPHINX cluster for search, we are evaluating SOLR & Elasticsearch. We need real time index and fast response for selects. For our POC we are using 2 cent OS (64 bit) boxes with 24GB RAM, 16 & 8 core processors respectively, documents (216 fields) around 2.4 million with index size 30GB approx.

Elasticsearch version: 5.6.0 & 6.1.1

elasticsearch.yml file has the following configurations

cluster.name: clsindex
node.name: proidx1
path.data: /home/poc/elasticsearch560/datadir
path.logs: /home/poc/elasticsearch560/logs
network.host: 192.168.1.20
network.bind_host: 192.168.1.20
network.publish_host: 192.168.1.20
http.port: 9200
discovery.zen.ping.unicast.hosts: ["192.168.1.20","192.168.1.21"]
discovery.zen.minimum_master_nodes: 1

Refresh interval: Default, Insert / Update - with ?refresh=true, replication factor 1

While giving load test (Jmeter + PHP) I am giving 1 Insert request, 2 update requests and a select request

Problem I am facing is the insert/updates are not getting reflected in search even after the 1 sec refresh interval.

The moment I stopped giving the load, all the Insert / updates are getting reflected within 1 or 2 seconds

To narrow down the problem, from the jemeter test case I disabled select component, the updates are getting reflected immediately

I googled for my issue, but nobody talked about this, looks like I am missing some configuration parameter.

Can somebody please point out the gaps?

Can you try to omit refresh=true on every request? This results in a performance penalty, because a new segment gets created for every document.

you can try to use refresh=wait_for on the index/update side of things, so that the 1 second default refresh interval still happens and write operation waits for this background operation to happen.

I am not yet sure where your issue regarding the updates not being reflected stems from, can you do a GET operation on the document and see that it is updated when running those tests?

@spinscale thanks for the response, I already tried with refresh=wait_for but no luck, also I am making get requests to measure the update's reflection time.

As I mentioned we are using close to 216 fields, out of which 7 fields are text fields which will have some large content say 300 to 500 kb per field, my suspect is the way Elasticsearch handles the partial document updates, ie., reading the full document and apply the changes and then re-indexing it!

with select requests are in parallel these update processes are taking more time to complete. Please correct me if i am wrong.

yes they take longer to execute, but when the client response from the update request is sent from the server to the client, all of this is done.

Are you waiting until you receive the response?

@spinscale for the update requests I am getting the response from the server immediately, but the changes I did to a document is not visible till the load became normal or when there are no select requests. FYI I am using a separate script to verify the changes became visible to search.

does the same happen when you refresh manually after a couple of updates? Also what is your refresh interval set to?

You can also check the nodes stats to find out how much time is spent with refresh. Given the size of your document this could also be sometime.

Still, when you configure a refresh as part of your index operation is should be visible immediately. Does that happen when you only index a single document?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.