Bulk indexing and count mismatch

phoenix · June 14, 2012, 4:40pm

Hi all,

I'm facing a strange situation.
I'm indexing some documents using the bulk java api.
My problem is in my tests. First my test uses the bulk api to index 100
randomly created documents (with random generated strings). Then it
executes a count query to check i now have a 100 documents in my index.
But actually i have 0, or maybe one. And sometime while debugging a
actually have 100 docs.
Is this a sycnhronization problem? I thought that the actionGet() method
was waiting for the job to be done before returning.

My code is as follows :

BulkResponse response = bulkRequest.execute().actionGet();

Any idea ?

Frederic

Ivan · June 14, 2012, 5:15pm

Are you indexing to a new id for each document? That might account for
seeing only one document, but zero documents is a different issue.

--
Ivan

On Thu, Jun 14, 2012 at 9:40 AM, Frederic Esnault
esnault.frederic@gmail.com wrote:

Hi all,

I'm facing a strange situation.
I'm indexing some documents using the bulk java api.
My problem is in my tests. First my test uses the bulk api to index 100
randomly created documents (with random generated strings). Then it executes
a count query to check i now have a 100 documents in my index.
But actually i have 0, or maybe one. And sometime while debugging a actually
have 100 docs.
Is this a sycnhronization problem? I thought that the actionGet() method was
waiting for the job to be done before returning.

My code is as follows :

BulkResponse response = bulkRequest.execute().actionGet();

Any idea ?

Frederic

phoenix · June 15, 2012, 7:16am

We are not giving any id to our documents. According to the API,
Elasticsearch is supposed to generate one by itself.
And actually if we add a Thread.sleep(...) before asking for the document
count, we get the expected result. So it seems it just takes time.
But i was thinking (and reading the javadoc seems to confirm it) that the
execute().actionGet() call was waiting for the completion of the task to
return.
Is it different for bulk requests?
Is there any way to actually wait for the indexing to be done? (Not really
important at runtime, but for testing purposes, it is).

Frederic

On Thu, Jun 14, 2012 at 7:15 PM, Ivan Brusic ivan@brusic.com wrote:

Are you indexing to a new id for each document? That might account for
seeing only one document, but zero documents is a different issue.

--
Ivan

On Thu, Jun 14, 2012 at 9:40 AM, Frederic Esnault
esnault.frederic@gmail.com wrote:

Hi all,

I'm facing a strange situation.
I'm indexing some documents using the bulk java api.
My problem is in my tests. First my test uses the bulk api to index 100
randomly created documents (with random generated strings). Then it
executes
a count query to check i now have a 100 documents in my index.
But actually i have 0, or maybe one. And sometime while debugging a
actually
have 100 docs.
Is this a sycnhronization problem? I thought that the actionGet() method
was
waiting for the job to be done before returning.

My code is as follows :

BulkResponse response = bulkRequest.execute().actionGet();

Any idea ?

Frederic

Ivan · June 15, 2012, 4:25pm

The synchronous calls ensures that the operation is committed at the
server level, but there can still be delays at the Lucene level. The
default index refresh interval is 1 second. How many BulkItemResponses
do you have in your BulkResponse?

--
Ivan

On Fri, Jun 15, 2012 at 12:16 AM, Frederic Esnault
esnault.frederic@gmail.com wrote:

We are not giving any id to our documents. According to the API,
Elasticsearch is supposed to generate one by itself.
And actually if we add a Thread.sleep(...) before asking for the document
count, we get the expected result. So it seems it just takes time.
But i was thinking (and reading the javadoc seems to confirm it) that the
execute().actionGet() call was waiting for the completion of the task to
return.
Is it different for bulk requests?
Is there any way to actually wait for the indexing to be done? (Not really
important at runtime, but for testing purposes, it is).

Frederic

On Thu, Jun 14, 2012 at 7:15 PM, Ivan Brusic ivan@brusic.com wrote:

Are you indexing to a new id for each document? That might account for
seeing only one document, but zero documents is a different issue.

--
Ivan

On Thu, Jun 14, 2012 at 9:40 AM, Frederic Esnault
esnault.frederic@gmail.com wrote:

Hi all,

I'm facing a strange situation.
I'm indexing some documents using the bulk java api.
My problem is in my tests. First my test uses the bulk api to index 100
randomly created documents (with random generated strings). Then it
executes
a count query to check i now have a 100 documents in my index.
But actually i have 0, or maybe one. And sometime while debugging a
actually
have 100 docs.
Is this a sycnhronization problem? I thought that the actionGet() method
was
waiting for the job to be done before returning.

My code is as follows :

BulkResponse response = bulkRequest.execute().actionGet();

Any idea ?

Frederic

Igor_Motov · June 15, 2012, 6:07pm

Frederic,

Just add explicit refresh before checking document count:

client.admin().indices().prepareRefresh().execute().actionGet();

This command will ensure that all indexed records are committed and and
available in your searches.

On Friday, June 15, 2012 12:25:39 PM UTC-4, Ivan Brusic wrote:

The synchronous calls ensures that the operation is committed at the
server level, but there can still be delays at the Lucene level. The
default index refresh interval is 1 second. How many BulkItemResponses
do you have in your BulkResponse?

--
Ivan

On Fri, Jun 15, 2012 at 12:16 AM, Frederic Esnault
esnault.frederic@gmail.com wrote:

We are not giving any id to our documents. According to the API,
Elasticsearch is supposed to generate one by itself.
And actually if we add a Thread.sleep(...) before asking for the
document
count, we get the expected result. So it seems it just takes time.
But i was thinking (and reading the javadoc seems to confirm it) that
the
execute().actionGet() call was waiting for the completion of the task to
return.
Is it different for bulk requests?
Is there any way to actually wait for the indexing to be done? (Not
really
important at runtime, but for testing purposes, it is).

Frederic

On Thu, Jun 14, 2012 at 7:15 PM, Ivan Brusic ivan@brusic.com wrote:

Are you indexing to a new id for each document? That might account for
seeing only one document, but zero documents is a different issue.

--
Ivan

On Thu, Jun 14, 2012 at 9:40 AM, Frederic Esnault
esnault.frederic@gmail.com wrote:

Hi all,

I'm facing a strange situation.
I'm indexing some documents using the bulk java api.
My problem is in my tests. First my test uses the bulk api to index
100
randomly created documents (with random generated strings). Then it
executes
a count query to check i now have a 100 documents in my index.
But actually i have 0, or maybe one. And sometime while debugging a
actually
have 100 docs.
Is this a sycnhronization problem? I thought that the actionGet()
method
was
waiting for the job to be done before returning.

My code is as follows :

BulkResponse response = bulkRequest.execute().actionGet();

Any idea ?

Frederic

phoenix · June 16, 2012, 5:20am

@Ivan :
We have 1 or 100, depending on if we asked for sleep on the thread.

@Igor :
Thx for the tip, we'll try to refresh first, we'll let you know the result

Frederic

On Friday, June 15, 2012, Igor Motov wrote:

Frederic,

Just add explicit refresh before checking document count:

client.admin().indices().prepareRefresh().execute().actionGet();

This command will ensure that all indexed records are committed and and
available in your searches.

On Friday, June 15, 2012 12:25:39 PM UTC-4, Ivan Brusic wrote:

The synchronous calls ensures that the operation is committed at the
server level, but there can still be delays at the Lucene level. The
default index refresh interval is 1 second. How many BulkItemResponses
do you have in your BulkResponse?

--
Ivan

On Fri, Jun 15, 2012 at 12:16 AM, Frederic Esnault
<esnault.frederic@gmail.com <javascript:_e({}, 'cvml',
'esnault.frederic@gmail.com');>> wrote:

We are not giving any id to our documents. According to the API,
Elasticsearch is supposed to generate one by itself.
And actually if we add a Thread.sleep(...) before asking for the
document
count, we get the expected result. So it seems it just takes time.
But i was thinking (and reading the javadoc seems to confirm it) that
the
execute().actionGet() call was waiting for the completion of the task
to
return.
Is it different for bulk requests?
Is there any way to actually wait for the indexing to be done? (Not
really
important at runtime, but for testing purposes, it is).

Frederic

On Thu, Jun 14, 2012 at 7:15 PM, Ivan Brusic <ivan@brusic.com<javascript:_e({}, 'cvml', 'ivan@brusic.com');>>
wrote:

Are you indexing to a new id for each document? That might account for
seeing only one document, but zero documents is a different issue.

--
Ivan

On Thu, Jun 14, 2012 at 9:40 AM, Frederic Esnault
<esnault.frederic@gmail.com <javascript:_e({}, 'cvml',
'esnault.frederic@gmail.com');>> wrote:

Hi all,

I'm facing a strange situation.
I'm indexing some documents using the bulk java api.
My problem is in my tests. First my test uses the bulk api to index
100
randomly created documents (with random generated strings). Then it
executes
a count query to check i now have a 100 documents in my index.
But actually i have 0, or maybe one. And sometime while debugging a
actually
have 100 docs.
Is this a sycnhronization problem? I thought that the actionGet()
method
was
waiting for the job to be done before returning.

My code is as follows :

BulkResponse response = bulkRequest.execute().**actionGet();

Any idea ?

Frederic

phoenix · June 19, 2012, 7:46pm

Thanks Igor, it works perfectly !

Topic		Replies	Views
Java Bulk indexing API performances Elasticsearch	3	318	July 6, 2017
Race condition when removing, indexing and counting data by using the JAVA API? Elasticsearch	3	1745	July 6, 2017
Bulk update with java API Elasticsearch	3	2464	July 6, 2017
Missing in bulk response Elasticsearch	5	792	July 6, 2017
Bulk index api with upsert Elasticsearch	4	2827	July 5, 2017

Bulk indexing and count mismatch

Related topics