Bulk indexing size?

Hi, everyone.

Yet another day playing with that awesome product.
I have a question regarding bulk indexing.
Right now, we have three nodes, running with 22Gb ram devoted to ES
Our docs are big, with between 300-500 fields (let's say an average of
400), several nested structures and many analyzed strings.
We are storing the source, but are not indexing the _all field.
We have a batch of indexation in JAVA, in wich we use, of course, the bulk
api to increase our performances.
Right now, ours bulks, due to factors outside of our control, may vary in
size, between 2000 and 5000 of these docs per bulk.
Of course, the refresh_interval is disabled (-1)
Our performances lies somewhere between 2 and 4 minutes per bulk.
I read a lot about indexing speed of several thousand docs per sec, and we
are pretty far form there,

So, is there something we are doing wrong? Are those times dues to the
complexity of our doc?
Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Are you fetching your docs from an external database?
Are you sure that in your process, Bulk is taking all the duration?

Can you share a little your code?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 mai 2013 à 11:47, DH ciddp195@gmail.com a écrit :

Hi, everyone.

Yet another day playing with that awesome product.
I have a question regarding bulk indexing.
Right now, we have three nodes, running with 22Gb ram devoted to ES
Our docs are big, with between 300-500 fields (let's say an average of 400), several nested structures and many analyzed strings.
We are storing the source, but are not indexing the _all field.
We have a batch of indexation in JAVA, in wich we use, of course, the bulk api to increase our performances.
Right now, ours bulks, due to factors outside of our control, may vary in size, between 2000 and 5000 of these docs per bulk.
Of course, the refresh_interval is disabled (-1)
Our performances lies somewhere between 2 and 4 minutes per bulk.
I read a lot about indexing speed of several thousand docs per sec, and we are pretty far form there,

So, is there something we are doing wrong? Are those times dues to the complexity of our doc?
Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi, David, and thanks for your answer.

Well, the code is pretty straightforward.
It's a loop that read a JSON in a noSQL database, fiddle with it a bit, and
then put it in a bulk.
After the loop, the bulk is executed - rinse and repeat.

The number of docs in the bulk vary because each doc have to be indexed
between 2-5 time, each time with altered fields, and within 2 different
indices.

Those 2000-5000 docs always come from 1000 original docs.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

So? Where is the cost?
When building the bulk?
When executing the bulk?

Can you share your numbers (time spent on building vs on executing)?

May be you are running out of memory space on client side?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 mai 2013 à 13:36, DH ciddp195@gmail.com a écrit :

Hi, David, and thanks for your answer.

Well, the code is pretty straightforward.
It's a loop that read a JSON in a noSQL database, fiddle with it a bit, and then put it in a bulk.
After the loop, the bulk is executed - rinse and repeat.

The number of docs in the bulk vary because each doc have to be indexed between 2-5 time, each time with altered fields, and within 2 different indices.

Those 2000-5000 docs always come from 1000 original docs.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Are you bulk indexing to a new index? If so, did you remove all replicas?

--
Ivan

On Tue, May 7, 2013 at 6:09 AM, David Pilato david@pilato.fr wrote:

So? Where is the cost?
When building the bulk?
When executing the bulk?

Can you share your numbers (time spent on building vs on executing)?

May be you are running out of memory space on client side?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 mai 2013 à 13:36, DH ciddp195@gmail.com a écrit :

Hi, David, and thanks for your answer.

Well, the code is pretty straightforward.
It's a loop that read a JSON in a noSQL database, fiddle with it a bit,
and then put it in a bulk.
After the loop, the bulk is executed - rinse and repeat.

The number of docs in the bulk vary because each doc have to be indexed
between 2-5 time, each time with altered fields, and within 2 different
indices.

Those 2000-5000 docs always come from 1000 original docs.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.