Realtime GETs not working

We are testing our ES cluster prior to going live and we are having some
issues where the realtime GETs are not in real-time. Using our testing
logic below, we are noticing that after the update is done that the
"version" of the document is coming back always incremented by +1 but the
"old" is being returned instead of the "new" value in ~1% of our tests.
However, if you wait for 1 to 2 more seconds more after step 4 below, the
"new" value will then be returned correctly. We have verified that the
"realtime" setting is set to "true" but we are still having this issue.

Our test involves the following:

  1. Use the Get API and retrieve a document by an id
  2. Change a field in the returned document from an "old" to a "new value.
  3. Use the Index API and update the document in the index.
  4. Wait 500 milliseconds after step 3 completes.
  5. Use the Get API and retrieve the same document again

Cluster Configuration:

  • 4 nodes
  • 4 shards with 1 replica
  • ~40K documents in the index
  • Default index refresh of 1 second.

Does anyone have an idea why this is happening?

--

Note that Lucene does not promise real-time updates, but near-real-time.

The default refresh_interval is set to 1 second:

You can attempt to provide a lower value, but embracing near-real-time
would be an easier approach.

Cheers,

Ivan

On Tue, Aug 14, 2012 at 12:28 PM, amos.wood amos.wood@lifeway.com wrote:

We are testing our ES cluster prior to going live and we are having some
issues where the realtime GETs are not in real-time. Using our testing
logic below, we are noticing that after the update is done that the
"version" of the document is coming back always incremented by +1 but the
"old" is being returned instead of the "new" value in ~1% of our tests.
However, if you wait for 1 to 2 more seconds more after step 4 below, the
"new" value will then be returned correctly. We have verified that the
"realtime" setting is set to "true" but we are still having this issue.

Our test involves the following:

Use the Get API and retrieve a document by an id
Change a field in the returned document from an "old" to a "new value.
Use the Index API and update the document in the index.
Wait 500 milliseconds after step 3 completes.
Use the Get API and retrieve the same document again

Cluster Configuration:

4 nodes
4 shards with 1 replica
~40K documents in the index
Default index refresh of 1 second.

Does anyone have an idea why this is happening?

--

--

On Tue, 2012-08-14 at 14:26 -0700, Ivan Brusic wrote:

Note that Lucene does not promise real-time updates, but near-real-time.

But elastisearch does offer real time GET (not search)

The default refresh_interval is set to 1 second:
Elasticsearch Platform — Find real-time answers at scale | Elastic

..for search

You can attempt to provide a lower value, but embracing near-real-time
would be an easier approach.

If realtime GET isn't working, then its a bug.

The question is: are you using document GET? or are you searching?

Difficult to say without a gist recreating what you are doing

See Elasticsearch Platform — Find real-time answers at scale | Elastic

clint

--

Thanks Ivan & Clinton,

I'm a colleague of Amos and as Amos pointed out he is using the Get API to
retrieve the record by id. I'm not sure I know what you would need gisted,
but we would be glad to provide something if it would make things clearer.
Just let us know what you would need.

I will also add that this process outlined is done under some lite stress
in order to determine what our baseline throughput would be. And I'm pretty
sure we didn't see it until we added some stress. Amos, can you confirm
this?

Thanks
Wes

On Tuesday, August 14, 2012 2:28:34 PM UTC-5, amos.wood wrote:

We are testing our ES cluster prior to going live and we are having some
issues where the realtime GETs are not in real-time. Using our testing
logic below, we are noticing that after the update is done that the
"version" of the document is coming back always incremented by +1 but the
"old" is being returned instead of the "new" value in ~1% of our tests.
However, if you wait for 1 to 2 more seconds more after step 4 below, the
"new" value will then be returned correctly. We have verified that the
"realtime" setting is set to "true" but we are still having this issue.

Our test involves the following:

  1. Use the Get API and retrieve a document by an id
  2. Change a field in the returned document from an "old" to a "new
    value.
  3. Use the Index API and update the document in the index.
  4. Wait 500 milliseconds after step 3 completes.
  5. Use the Get API and retrieve the same document again

Cluster Configuration:

  • 4 nodes
  • 4 shards with 1 replica
  • ~40K documents in the index
  • Default index refresh of 1 second.

Does anyone have an idea why this is happening?

--

On Tue, Aug 14, 2012 at 3:36 PM, Clinton Gormley clint@traveljury.com wrote:

On Tue, 2012-08-14 at 14:26 -0700, Ivan Brusic wrote:

Note that Lucene does not promise real-time updates, but near-real-time.

But elastisearch does offer real time GET (not search)

Mea culpa. I forgot about GET. I tend to forget about GET because it
doesn't support wildcard fields, which are nice if source is not
enabled (the reason is understandable).

Elasticsearch only has quorums on writes, correct?

--
Ivan

--

the documentation states they have "one", "quorum" and "all", but our
attempt at "all" never succeeds. Not exactly sure why since all our shards
are up and available but the error indicates otherwise.

On Tuesday, August 14, 2012 6:24:52 PM UTC-5, Ivan Brusic wrote:

On Tue, Aug 14, 2012 at 3:36 PM, Clinton Gormley <cl...@traveljury.com<javascript:>>
wrote:

On Tue, 2012-08-14 at 14:26 -0700, Ivan Brusic wrote:

Note that Lucene does not promise real-time updates, but
near-real-time.

But elastisearch does offer real time GET (not search)

Mea culpa. I forgot about GET. I tend to forget about GET because it
doesn't support wildcard fields, which are nice if source is not
enabled (the reason is understandable).

Elasticsearch only has quorums on writes, correct?

--
Ivan

--

Hi Wes

I'm a colleague of Amos and as Amos pointed out he is using the Get
API to retrieve the record by id. I'm not sure I know what you would
need gisted, but we would be glad to provide something if it would
make things clearer. Just let us know what you would need.

well, clearer than a description is code. That way we're not making
assumptions about what you're doing, which may prove to be incorrect.
Also, if it's something reproducible then it becomes an easy test to
know if there is a bug and when it is fixed.

I will also add that this process outlined is done under some lite
stress in order to determine what our baseline throughput would be.
And I'm pretty sure we didn't see it until we added some stress. Amos,
can you confirm this?

The more moving parts you can reduce from the test, the better. So if
you're using 4 nodes, check if it still happens on 2. What does the
cluster health look like? etc etc

ta

clint

Thanks
Wes

On Tuesday, August 14, 2012 2:28:34 PM UTC-5, amos.wood wrote:
We are testing our ES cluster prior to going live and we are
having some issues where the realtime GETs are not in
real-time. Using our testing logic below, we are noticing
that after the update is done that the "version" of the
document is coming back always incremented by +1 but the "old"
is being returned instead of the "new" value in ~1% of our
tests. However, if you wait for 1 to 2 more seconds more
after step 4 below, the "new" value will then be returned
correctly. We have verified that the "realtime" setting is
set to "true" but we are still having this issue.

    Our test involves the following:
         1. Use the Get API and retrieve a document by an id
         2. Change a field in the returned document from an "old"
            to a "new value.
         3. Use the Index API and update the document in the
            index.
         4. Wait 500 milliseconds after step 3 completes.
         5. Use the Get API and retrieve the same document again
    
    Cluster Configuration:
    
          * 4 nodes
          * 4 shards with 1 replica
          * ~40K documents in the index
          * Default index refresh of 1 second.
    
    Does anyone have an idea why this is happening?

--

--

ok, we'll work on getting that done, thanks

On Wednesday, August 15, 2012 2:58:19 AM UTC-5, Clinton Gormley wrote:

Hi Wes

I'm a colleague of Amos and as Amos pointed out he is using the Get
API to retrieve the record by id. I'm not sure I know what you would
need gisted, but we would be glad to provide something if it would
make things clearer. Just let us know what you would need.

well, clearer than a description is code. That way we're not making
assumptions about what you're doing, which may prove to be incorrect.
Also, if it's something reproducible then it becomes an easy test to
know if there is a bug and when it is fixed.

I will also add that this process outlined is done under some lite
stress in order to determine what our baseline throughput would be.
And I'm pretty sure we didn't see it until we added some stress. Amos,
can you confirm this?

The more moving parts you can reduce from the test, the better. So if
you're using 4 nodes, check if it still happens on 2. What does the
cluster health look like? etc etc

ta

clint

Thanks
Wes

On Tuesday, August 14, 2012 2:28:34 PM UTC-5, amos.wood wrote:
We are testing our ES cluster prior to going live and we are
having some issues where the realtime GETs are not in
real-time. Using our testing logic below, we are noticing
that after the update is done that the "version" of the
document is coming back always incremented by +1 but the "old"
is being returned instead of the "new" value in ~1% of our
tests. However, if you wait for 1 to 2 more seconds more
after step 4 below, the "new" value will then be returned
correctly. We have verified that the "realtime" setting is
set to "true" but we are still having this issue.

    Our test involves the following: 
         1. Use the Get API and retrieve a document by an id 
         2. Change a field in the returned document from an "old" 
            to a "new value. 
         3. Use the Index API and update the document in the 
            index. 
         4. Wait 500 milliseconds after step 3 completes. 
         5. Use the Get API and retrieve the same document again 
    
    Cluster Configuration: 
    
          * 4 nodes 
          * 4 shards with 1 replica 
          * ~40K documents in the index 
          * Default index refresh of 1 second. 
    
    Does anyone have an idea why this is happening? 

--

--

We were able to get the write consistency of "all" to work with our testing
which greatly reduced the errors down to either no errors or very little
errors on a heavily loaded system. This write_consistency argument and
outcome of our testing doesn't make sense as to our error and (mostly)
resolution of our error. Does any one have a comment as to why the
write_consistency option of "all" is helping?

--

Can you share the code that does the index and get? Get API is full realtime to the point in time you issued it. What I have seen for example is users indexing a doc as an async event (on their code), and then doing realtime get and obviously they have a chance of not "seeing" the latest index, since its not in ES yet.

Regarding quorum, when you index a doc, its sync replication to all replicas. Write consistency only makes sure there are enough replicas to index to before the index operation is done.

On Aug 14, 2012, at 9:28 PM, amos.wood amos.wood@lifeway.com wrote:

We are testing our ES cluster prior to going live and we are having some issues where the realtime GETs are not in real-time. Using our testing logic below, we are noticing that after the update is done that the "version" of the document is coming back always incremented by +1 but the "old" is being returned instead of the "new" value in ~1% of our tests. However, if you wait for 1 to 2 more seconds more after step 4 below, the "new" value will then be returned correctly. We have verified that the "realtime" setting is set to "true" but we are still having this issue.

Our test involves the following:
Use the Get API and retrieve a document by an id
Change a field in the returned document from an "old" to a "new value.
Use the Index API and update the document in the index.
Wait 500 milliseconds after step 3 completes.
Use the Get API and retrieve the same document again
Cluster Configuration:

4 nodes
4 shards with 1 replica
~40K documents in the index
Default index refresh of 1 second.
Does anyone have an idea why this is happening?

--

--

Can you share the code that does the index and get? Get API is full
realtime to the point in time you issued it. What I have seen for example
is users indexing a doc as an async event (on their code), and then doing
realtime get and obviously they have a chance of not "seeing" the latest
index, since its not in ES yet.

We wrap all activity to ES in another service class, but I can share what
our service class calls end up becoming in ES calls.

GET call

Client client = getClient(); // Using TransportClient
GetRequestBuilder requestBuilder = client.prepareGet();

requestBuilder.setIndex(index);
requestBuilder.setType(Elasticsearch.ALL_DOC_TYPES);
requestBuilder.setId(StringUtils.upperCase(id));

ListenableActionFuture resp = requestBuilder.execute();
response = resp.actionGet();

INDEX Call

Client client = getClient(); // Using TransportClient
IndexRequestBuilder doc = client.prepareIndex();
doc.setIndex(index);
doc.setType(Elasticsearch.DEFAULT_DOC_TYPE);
doc.setId(StringUtils.upperCase(id));
doc.setConsistencyLevel(WriteConsistencyLevel.ALL);
doc.setReplicationType(ReplicationType.DEFAULT); // SYNC Default
doc.setSource(fields);

ListenableActionFuture lir = doc.execute();
IndexResponse ir = lir.actionGet();

P.S. Just as a reminder, we end up making a GET, INDEX, and GET again to
validate the real-time values that we changed in the INDEX operation. If
we immediately poll ES for that document, we see the updated version of the
document but not the updated values.

On Wednesday, August 22, 2012 7:45:44 AM UTC-5, kimchy wrote:

Can you share the code that does the index and get? Get API is full

realtime to the point in time you issued it. What I have seen for example
is users indexing a doc as an async event (on their code), and then doing
realtime get and obviously they have a chance of not "seeing" the latest
index, since its not in ES yet.

Regarding quorum, when you index a doc, its sync replication to all
replicas. Write consistency only makes sure there are enough replicas to
index to before the index operation is done.

On Aug 14, 2012, at 9:28 PM, amos.wood <amos...@lifeway.com <javascript:>>
wrote:

We are testing our ES cluster prior to going live and we are having some
issues where the realtime GETs are not in real-time. Using our testing
logic below, we are noticing that after the update is done that the
"version" of the document is coming back always incremented by +1 but the
"old" is being returned instead of the "new" value in ~1% of our tests.
However, if you wait for 1 to 2 more seconds more after step 4 below, the
"new" value will then be returned correctly. We have verified that the
"realtime" setting is set to "true" but we are still having this issue.

Our test involves the following:

  1. Use the Get API and retrieve a document by an id
  2. Change a field in the returned document from an "old" to a "new
    value.
  3. Use the Index API and update the document in the index.
  4. Wait 500 milliseconds after step 3 completes.
  5. Use the Get API and retrieve the same document again

Cluster Configuration:

  • 4 nodes
  • 4 shards with 1 replica
  • ~40K documents in the index
  • Default index refresh of 1 second.

Does anyone have an idea why this is happening?

--

--