Bluk data more than 10 will make data deleted with python libray

Hello

In my environment, I found if in one time, I bulk data more than 10, it
will make some doc deleted for some reason. When I change it to 5 per time,
no doc deleted happen again. Is this the limitation of bluk API ?

Details are written in the following gist.
https://gist.githubusercontent.com/yuecong/67237a5aae2dc2ea6f2d/raw/gistfile1.txt

Thanks,
Cong

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/adff9def-4ec9-478c-bb69-67c13c6d425b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

This is definitely not the limitation of the bulk api, nor the python
library, have you seen the python bulk helper? it might solve some of
the issues for you.

http://elasticsearch-py.readthedocs.org/en/latest/helpers.html#elasticsearch.helpers.streaming_bulk

On Mon, Feb 23, 2015 at 7:21 PM, cong yue yuecong1104@gmail.com wrote:

Hello

In my environment, I found if in one time, I bulk data more than 10, it will
make some doc deleted for some reason. When I change it to 5 per time, no
doc deleted happen again. Is this the limitation of bluk API ?

Details are written in the following gist.
https://gist.githubusercontent.com/yuecong/67237a5aae2dc2ea6f2d/raw/gistfile1.txt

Thanks,
Cong

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/adff9def-4ec9-478c-bb69-67c13c6d425b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABfdDiqSEhmh%2BJrUAnEKFxFkiCEVujiOQufge74F6yYjP27q1A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks for the information. I changed it with helper, but more documents are deleted now.

The modified source is as

From the results, when I set the data per bulk be 5, docs.deleted is 1655.

cyue@cyue-OptiPlex-790:~/traffic_test/tools$ curl '10.0.0.158:9200/_cat/indices?v';
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open .kibana 1 1 2 2 10.5kb 10.5kb
green open ats 1 0 23351 1655 1.8mb 1.8mb

When set the bulk doc per time be 500, docs.deleted is 7820.

cyue@cyue-OptiPlex-790:~/traffic_test/tools$ curl '10.0.0.158:9200/_cat/indices?v';
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open .kibana 1 1 2 2 10.5kb 10.5kb
green open ats 1 0 23180 7820 2.1mb 2.1mb

When set it be 5000, docs.deleted is 7571.

Did we do something wrong?

thanks,
Cong

On Monday, February 23, 2015 at 11:00:12 AM UTC-8, Honza Král wrote:

This is definitely not the limitation of the bulk api, nor the python
library, have you seen the python bulk helper? it might solve some of
the issues for you.

http://elasticsearch-py.readthedocs.org/en/latest/helpers.html#elasticsearch.helpers.streaming_bulk

On Mon, Feb 23, 2015 at 7:21 PM, cong yue <yueco...@gmail.com
<javascript:>> wrote:

Hello

In my environment, I found if in one time, I bulk data more than 10, it
will
make some doc deleted for some reason. When I change it to 5 per time,
no
doc deleted happen again. Is this the limitation of bluk API ?

Details are written in the following gist.

https://gist.githubusercontent.com/yuecong/67237a5aae2dc2ea6f2d/raw/gistfile1.txt

Thanks,
Cong

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/adff9def-4ec9-478c-bb69-67c13c6d425b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ee9b288e-2e78-41ab-b881-50b115d486e6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

at least in your code, you are not calling refresh so the search,
immediately following the bulk, will not have all the documents
available.

call es.indices.refresh() after the bulk call and all should be fine.

On Mon, Feb 23, 2015 at 8:34 PM, cong yue yuecong1104@gmail.com wrote:

Thanks for the information. I changed it with helper, but more documents are
deleted now.

The modified source is as

https://gist.github.com/yuecong/7a678517064667aa294f#file-gistfile1-py

From the results, when I set the data per bulk be 5, docs.deleted is 1655.

cyue@cyue-OptiPlex-790:~/traffic_test/tools$ curl
'10.0.0.158:9200/_cat/indices?v';
health status index pri rep docs.count docs.deleted store.size
pri.store.size
yellow open .kibana 1 1 2 2 10.5kb
10.5kb
green open ats 1 0 23351 1655 1.8mb
1.8mb

When set the bulk doc per time be 500, docs.deleted is 7820.

cyue@cyue-OptiPlex-790:~/traffic_test/tools$ curl
'10.0.0.158:9200/_cat/indices?v';
health status index pri rep docs.count docs.deleted store.size
pri.store.size
yellow open .kibana 1 1 2 2 10.5kb
10.5kb
green open ats 1 0 23180 7820 2.1mb
2.1mb

When set it be 5000, docs.deleted is 7571.

Did we do something wrong?

thanks,
Cong

On Monday, February 23, 2015 at 11:00:12 AM UTC-8, Honza Král wrote:

This is definitely not the limitation of the bulk api, nor the python
library, have you seen the python bulk helper? it might solve some of
the issues for you.

http://elasticsearch-py.readthedocs.org/en/latest/helpers.html#elasticsearch.helpers.streaming_bulk

On Mon, Feb 23, 2015 at 7:21 PM, cong yue yueco...@gmail.com wrote:

Hello

In my environment, I found if in one time, I bulk data more than 10, it
will
make some doc deleted for some reason. When I change it to 5 per time,
no
doc deleted happen again. Is this the limitation of bluk API ?

Details are written in the following gist.

https://gist.githubusercontent.com/yuecong/67237a5aae2dc2ea6f2d/raw/gistfile1.txt

Thanks,
Cong

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/adff9def-4ec9-478c-bb69-67c13c6d425b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ee9b288e-2e78-41ab-b881-50b115d486e6%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABfdDiqaDV9%2B4netj1PQPsXKYec-Wfy8so_%2B4bwwjtF%2B-m4iUA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks.
I added es.indices.refresh() just after helpers.bulk(). And sometimes
docs.delete can be o in the case of one time bulk dos is with 5000 size.
But sometimes, there are still some docs to be shown as docs.delete.
Is there some way to make sure all the doc to be indexed successfully. Also
I tested although I am trying to wait for some time, there are still some
docs to be shown as docs.deleted.

Please kindly advise.

Cong

On Monday, February 23, 2015 at 12:55:54 PM UTC-8, Honza Král wrote:

at least in your code, you are not calling refresh so the search,
immediately following the bulk, will not have all the documents
available.

call es.indices.refresh() after the bulk call and all should be fine.

On Mon, Feb 23, 2015 at 8:34 PM, cong yue <yueco...@gmail.com
<javascript:>> wrote:

Thanks for the information. I changed it with helper, but more documents
are
deleted now.

The modified source is as

https://gist.github.com/yuecong/7a678517064667aa294f#file-gistfile1-py

From the results, when I set the data per bulk be 5, docs.deleted is

cyue@cyue-OptiPlex-790:~/traffic_test/tools$ curl
'10.0.0.158:9200/_cat/indices?v';
health status index pri rep docs.count docs.deleted store.size
pri.store.size
yellow open .kibana 1 1 2 2 10.5kb
10.5kb
green open ats 1 0 23351 1655 1.8mb
1.8mb

When set the bulk doc per time be 500, docs.deleted is 7820.

cyue@cyue-OptiPlex-790:~/traffic_test/tools$ curl
'10.0.0.158:9200/_cat/indices?v';
health status index pri rep docs.count docs.deleted store.size
pri.store.size
yellow open .kibana 1 1 2 2 10.5kb
10.5kb
green open ats 1 0 23180 7820 2.1mb
2.1mb

When set it be 5000, docs.deleted is 7571.

Did we do something wrong?

thanks,
Cong

On Monday, February 23, 2015 at 11:00:12 AM UTC-8, Honza Král wrote:

This is definitely not the limitation of the bulk api, nor the python
library, have you seen the python bulk helper? it might solve some of
the issues for you.

http://elasticsearch-py.readthedocs.org/en/latest/helpers.html#elasticsearch.helpers.streaming_bulk

On Mon, Feb 23, 2015 at 7:21 PM, cong yue yueco...@gmail.com wrote:

Hello

In my environment, I found if in one time, I bulk data more than 10,
it

will
make some doc deleted for some reason. When I change it to 5 per
time,

no
doc deleted happen again. Is this the limitation of bluk API ?

Details are written in the following gist.

https://gist.githubusercontent.com/yuecong/67237a5aae2dc2ea6f2d/raw/gistfile1.txt

Thanks,
Cong

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send

an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/adff9def-4ec9-478c-bb69-67c13c6d425b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/ee9b288e-2e78-41ab-b881-50b115d486e6%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2eb084f2-622e-4537-8187-559e4e867d82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.