Entries in ES getting dropped


(kgodbole) #1

Hi,

In the following piece of code, createNodeFromEvent() is a custom function to create a XContentBuilder node object and may be ignored.

doAccept()
{
...
XContentBuilder node = createNodeFromEvent(context, event);
String id = ""+System.currentTimeMillis();
client.prepareIndex(index, type, id)
.setSource(node)
.execute()
.actionGet();
...
}

The function doAccept() is invoked repeatedly from my application and in each invocation the function inserts a single entry into ES. However, this results in certain entries getting dropped (more specifically, deleted from ES). For example, if I call this function 1000 times (i.e. in total for 1000 entries), only 800 or so (some random number) get inserted finally into ES.

However, adding a Thread.sleep(100) (basically introducing a delay before adding an entry) at the beginning of the function solves this issue.

doAccept()
{
...
Thread.sleep(100);
...
XContentBuilder node = createNodeFromEvent(context, event);
String id = ""+System.currentTimeMillis();
client.prepareIndex(index, type, id)
.setSource(node)
.execute()
.actionGet();
...
}

Is there any timing related issue ? I tried with ES 0.16.3 and ES 0.16.5.

Thanks


(Shay Banon) #2

This should not happen assuming you have unique ids for the different
documents you index. Can you gist a test case that recreates it?

On Fri, Aug 19, 2011 at 8:58 AM, kgodbole kedarsgodbole@gmail.com wrote:

Hi,

In the following piece of code, createNodeFromEvent() is a custom function
to create a XContentBuilder node object and may be ignored.

doAccept()
{
...
XContentBuilder node = createNodeFromEvent(context, event);
String id = ""+System.currentTimeMillis();
client.prepareIndex(index, type, id)
.setSource(node)
.execute()
.actionGet();
...
}

The function doAccept() is invoked repeatedly from my application and in
each invocation the function inserts a single entry into ES. However, this
results in certain entries getting dropped (more specifically, deleted from
ES). For example, if I call this function 1000 times (i.e. in total for
1000 entries), only 800 or so (some random number) get inserted finally
into
ES.

However, adding a Thread.sleep(100) (basically introducing a delay before
adding an entry) at the beginning of the function solves this issue.

doAccept()
{
...
Thread.sleep(100);
...
XContentBuilder node = createNodeFromEvent(context, event);
String id = ""+System.currentTimeMillis();
client.prepareIndex(index, type, id)
.setSource(node)
.execute()
.actionGet();
...
}

Is there any timing related issue ? I tried with ES 0.16.3 and ES 0.16.5.

Thanks

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Entries-in-ES-getting-dropped-tp3267220p3267220.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(James Cook) #3

I think the problem is right in the code.

You are using System.currentTimeMillis() to generate the id value.

ES is so fast at executing your index request that you are getting duplicate
ids. :^)

I say this slightly tongue in cheek because I would expect at least a milli
to tick off the clock before the id is generated again, especially since you
are using the actionGet() method which basically turns this into a
synchronous call.

-- jim


(Ivan Brusic) #4

If the request is getting a duplicate id, then the version field
should have been incremented. One question I have always been meaning
to ask is if it is possible to query based on the version field?
Simple queries such as give me all documents with a version greater or
equal to 2.

--
Ivan

On Thu, Aug 25, 2011 at 9:32 AM, James Cook jcook@tracermedia.com wrote:

I think the problem is right in the code.
You are using System.currentTimeMillis() to generate the id value.
ES is so fast at executing your index request that you are getting duplicate
ids. :^)
I say this slightly tongue in cheek because I would expect at least a milli
to tick off the clock before the id is generated again, especially since you
are using the actionGet() method which basically turns this into a
synchronous call.
-- jim


(Shay Banon) #5

Ha!, I missed that part :), good catch James. Regarding millisecond
resolution, ES can definitely index data faster than 1 millisecond
(depending on the complexity of the indexed doc), but also, note that
System.currentTimeInMillis does not guarantee 1 millisecond resolution.

On Thu, Aug 25, 2011 at 4:32 PM, James Cook jcook@tracermedia.com wrote:

I think the problem is right in the code.

You are using System.currentTimeMillis() to generate the id value.

ES is so fast at executing your index request that you are getting
duplicate ids. :^)

I say this slightly tongue in cheek because I would expect at least a milli
to tick off the clock before the id is generated again, especially since you
are using the actionGet() method which basically turns this into a
synchronous call.

-- jim


(Shay Banon) #6

If once does not provide a version when indexing data, then the doc is
simply updated. A version check is performed on index operation only when
the version is provided. If, on the other hand, a create was used (and not
index), then there would have been failures because some documents would
have already existed.

There isn't an option to ask for all documents with a version greater than
something. I guess that its meaningful when providing an external version,
in which case, that value can be simply also used as an additional field in
the doc. Though, it can possibly be automatically done in es.

On Thu, Aug 25, 2011 at 4:38 PM, Ivan Brusic ivan@brusic.com wrote:

If the request is getting a duplicate id, then the version field
should have been incremented. One question I have always been meaning
to ask is if it is possible to query based on the version field?
Simple queries such as give me all documents with a version greater or
equal to 2.

--
Ivan

On Thu, Aug 25, 2011 at 9:32 AM, James Cook jcook@tracermedia.com wrote:

I think the problem is right in the code.
You are using System.currentTimeMillis() to generate the id value.
ES is so fast at executing your index request that you are getting
duplicate
ids. :^)
I say this slightly tongue in cheek because I would expect at least a
milli
to tick off the clock before the id is generated again, especially since
you
are using the actionGet() method which basically turns this into a
synchronous call.
-- jim


(system) #7