How to properly bulk index while defining a custom id?


(Shane Witbeck) #1

I've noticed different behavior when bulk indexing using a custom
document id versus not defining a document id. By not defining an id,
I get the desire behavior which is all documents are indexed. If I
attempt to define an id, only one document gets indexed as opposed to
all the documents defined in a bulk iteration.

How do you properly index all documents in a bulk request while
defining a custom document id?

Thanks,
Shane


(Shay Banon) #2

What is hte failure that you get? You should also see it in the longs.

On Mon, Oct 24, 2011 at 7:57 PM, Shane Witbeck shane@digitalsanctum.comwrote:

I've noticed different behavior when bulk indexing using a custom
document id versus not defining a document id. By not defining an id,
I get the desire behavior which is all documents are indexed. If I
attempt to define an id, only one document gets indexed as opposed to
all the documents defined in a bulk iteration.

https://gist.github.com/1309649

How do you properly index all documents in a bulk request while
defining a custom document id?

Thanks,
Shane


(Shane Witbeck) #3

Thanks for the reply. I see no exceptions or errors in the logs. One
other thing I noticed is that the counts (in elasticsearch-head) are
something like:

docs: {
num_docs: 3
max_doc: 2196
deleted_docs: 2193
}

which seems to indicate that all but one of the docs are getting
deleted for each bulk iteration.

Any additional guidance is appreciated.

Shane

On Oct 24, 6:34 pm, Shay Banon kim...@gmail.com wrote:

What is hte failure that you get? You should also see it in the longs.

On Mon, Oct 24, 2011 at 7:57 PM, Shane Witbeck sh...@digitalsanctum.comwrote:

I've noticed different behavior when bulk indexing using a custom
document id versus not defining a document id. By not defining an id,
I get the desire behavior which is all documents are indexed. If I
attempt to define an id, only one document gets indexed as opposed to
all the documents defined in a bulk iteration.

https://gist.github.com/1309649

How do you properly index all documents in a bulk request while
defining a custom document id?

Thanks,
Shane


(Clinton Gormley) #4

On Mon, 2011-10-24 at 16:01 -0700, Shane Witbeck wrote:

Thanks for the reply. I see no exceptions or errors in the logs. One
other thing I noticed is that the counts (in elasticsearch-head) are
something like:

docs: {
num_docs: 3
max_doc: 2196
deleted_docs: 2193
}

which seems to indicate that all but one of the docs are getting
deleted for each bulk iteration.

You don't provide the code for getPosts or getPostID but I would suspect
that you are reusing the same ID over and over again.

Check the _version of the single doc that you manage to index - I bet it
is high (when it should be 1)

clint


(system) #5