Javascript client returning inconsistent results

(Nwood888) #1

I like the new forum!

I have a 2 node cluster that I'm using as a primary data store for a web application. Think of a blog with comments.

When a comment is added in the web application, it gets sent to Elastic for indexing. Then to ensure it's available for query, the "save" method runs a search query every 100ms that only matches the updated record (it uses the 'date updated' field). As soon as the search query gets a hit, I assume the record is indexed and I return from the 'save' method.

But I believe I've made an incorrect assumption. What I'm seeing is that as soon as the 'save' method returns, I issue a new search request to "refresh" the comment listing on the blog post, which should include the comment that was just added. 9 times out of 10, it includes the new comment, but occasionally it doesn't. If I wait an additional second and then refresh, the comment shows up.

Is this happening because the 'save' method is hitting one node and the subsequent search that follows after the 'save' method is hitting a different node?

I realize this isn't necessarily what Elastic was "built" for, but is there a recommended way to accomplish what I'm trying to do?


(Emptyemail) #2

There is a delay before data sent to elasticsearch is available for searching. It is controlled by the index.refresh_interval.

However for your use-case I wouldn't recommend that approach. No matter how low you set the interval too, it will never be real-time. And forcing an index refresh out of your application is a bad idea.

We have a similar use-case where we cache the data in redis for 5 minutes giving elasticsearch enough time to refresh the index. When the client requests the data, we take the response from elasticsearch and augment it with the data from redis to provide a real-time result.

(Nwood888) #3

Thanks for the reply. Yes, that's why I run a 'search' query in a loop that only responds with a hit once the new record has actually been indexed (in theory). But then sometimes a subsequent 'search' doesn't include the new record. So it looks like this:

Index record
Search record - 0 hits
Search record - 0 hits
Search record - 0 hits
Search record - 1 hit
// At this point I assume the record is indexed and my 'save' method returns.

Then I run another search request and the record isn't included in the results
Then I wait 1 second, search again and the record is included

So in theory, I've waited for the 'refresh' interval to take place. It seems though that each search request might be going to a different node and one node is getting refreshed sooner than the other. Does that sound right?


(Nwood888) #4

For others with a similar use case, I've confirmed that what I suspected in my last post is what's happening. When I check to see if the document has been indexed, I get a hit, but when the app does a subsequent search, it goes to a different node where the indexing hasn't completed yet.

To solve this, I have my "hasBeenIndexed" method use the "nodes info" api to get a list of all nodes, and I check each of them using the "preference" search parameter before moving on.

For write-heavy use cases this may not be a good solution, but we have few writes compared to reads so this seems like it will work great.

(system) #5