How to index bulk of documents all at once?


(Daniel Guo) #1

I have some documents, they are all of the same index and type, and have
the same fields, for example:
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012", "2013"]}
{"nation" : "USA", "city" : "Califorlia", "year" : ["2012", "2014", "2015"]}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"]}

If I want to index them to my ES server all at once, I use the bulk
interface like this:
# curl -s -XPOST localhost:9200/_bulk --data-binary @data_file

the data_file looks like:
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012", "2013"]}
{"nation" : "USA", "city" : "California", "year" : ["2012", "2014", "2015"]}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"]}

But it only index the first document, If I change the data_file as the
following:
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012", "2013"]}
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "USA", "city" : "Califorlia", "year" : ["2012", "2014", "2015"]}
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"]}

it works, but the data_file becomes much bigger.

Is there a better way to import Json documents to ES ? Thanks very much.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/adf0a1f7-bfab-4065-9ce6-c49da1286500%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

I guess that the smallest data_file you can have is:
{"index" : { }}
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012", "2013"]}
{"index" : { }}
{"nation" : "USA", "city" : "Califorlia", "year" : ["2012", "2014", "2015"]}
{"index" : { }}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"]}

curl -s -XPOST localhost:9200/country/city/_bulk --data-binary @data_file

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 27 novembre 2013 at 14:10:41, Daniel Guo (daniel5hbs@gmail.com) a écrit:

I have some documents, they are all of the same index and type, and have the same fields, for example:
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012", "2013"]}
{"nation" : "USA", "city" : "Califorlia", "year" : ["2012", "2014", "2015"]}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"]}

If I want to index them to my ES server all at once, I use the bulk interface like this:

curl -s -XPOST localhost:9200/_bulk --data-binary @data_file

the data_file looks like:
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012", "2013"]}
{"nation" : "USA", "city" : "California", "year" : ["2012", "2014", "2015"]}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"]}

But it only index the first document, If I change the data_file as the following:
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012", "2013"]}
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "USA", "city" : "Califorlia", "year" : ["2012", "2014", "2015"]}
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"]}

it works, but the data_file becomes much bigger.

Is there a better way to import Json documents to ES ? Thanks very much.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/adf0a1f7-bfab-4065-9ce6-c49da1286500%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.529601cf.140e0f76.3e14%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


(Daniel Guo) #3

David, you provide a great improvement. Thank you!
Is there other ways to load data from to ES server?

On Wednesday, November 27, 2013 10:29:35 PM UTC+8, David Pilato wrote:

I guess that the smallest data_file you can have is:

{"index" : { }}
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012", "2013"
]}
{"index" : { }}
{"nation" : "USA", "city" : "Califorlia", "year" : ["2012", "2014", "2015"
]}
{"index" : { }}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"
]}

curl -s -XPOST localhost:9200/country/city/_bulk --data-binary
@data_file

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 27 novembre 2013 at 14:10:41, Daniel Guo (danie...@gmail.com<javascript:>)
a écrit:

I have some documents, they are all of the same index and type, and have
the same fields, for example:
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012",
"2013"]}
{"nation" : "USA", "city" : "Califorlia", "year" : ["2012", "2014",
"2015"]}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"
]}

If I want to index them to my ES server all at once, I use the bulk
interface like this:
# curl -s -XPOST localhost:9200/_bulk --data-binary @data_file

the data_file looks like:
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012", "2013"
]}
{"nation" : "USA", "city" : "California", "year" : ["2012", "2014",
"2015"]}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"
]}

But it only index the first document, If I change the data_file as the
following:
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012", "2013"
]}
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "USA", "city" : "Califorlia", "year" : ["2012", "2014",
"2015"]}
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"
]}

it works, but the data_file becomes much bigger.

Is there a better way to import Json documents to ES ? Thanks very much.

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/adf0a1f7-bfab-4065-9ce6-c49da1286500%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/60b58840-45de-497a-aab1-870429b70c62%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #4

I think Bulk is the best practice.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 nov. 2013 à 02:46, Daniel Guo daniel5hbs@gmail.com a écrit :

David, you provide a great improvement. Thank you!
Is there other ways to load data from to ES server?

On Wednesday, November 27, 2013 10:29:35 PM UTC+8, David Pilato wrote:
I guess that the smallest data_file you can have is:

{"index" : { }}
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012", "2013"]}
{"index" : { }}
{"nation" : "USA", "city" : "Califorlia", "year" : ["2012", "2014", "2015"]}
{"index" : { }}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"]}

curl -s -XPOST localhost:9200/country/city/_bulk --data-binary @data_file

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 27 novembre 2013 at 14:10:41, Daniel Guo (danie...@gmail.com) a écrit:

I have some documents, they are all of the same index and type, and have the same fields, for example:
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012", "2013"]}
{"nation" : "USA", "city" : "Califorlia", "year" : ["2012", "2014", "2015"]}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"]}

If I want to index them to my ES server all at once, I use the bulk interface like this:

curl -s -XPOST localhost:9200/_bulk --data-binary @data_file

the data_file looks like:
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012", "2013"]}
{"nation" : "USA", "city" : "California", "year" : ["2012", "2014", "2015"]}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"]}

But it only index the first document, If I change the data_file as the following:
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "China", "city" : "Tianjin", "year" : ["2011", "2012", "2013"]}
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "USA", "city" : "Califorlia", "year" : ["2012", "2014", "2015"]}
{"index" : {"_index" : "country", "_type" : "city"}}
{"nation" : "China", "city" : "Beijing", "year" : ["2012", "2014", "2015"]}

it works, but the data_file becomes much bigger.

Is there a better way to import Json documents to ES ? Thanks very much.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/adf0a1f7-bfab-4065-9ce6-c49da1286500%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/60b58840-45de-497a-aab1-870429b70c62%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/96920FBA-2738-4319-A4A8-2B6FDB609899%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5