I am trying to bulk load a large number of documents (15Billion+) into ES
cluster via the http bulk api. Per the docs I generate batches of documents
to comply with the format
{action.....}
{document......}
{action.....}
{document......}
{action.....}
{document......}
{action.....}
{document......}
{action.....}
{document......}
{action.....}
{document......}
{action.....}
{document......}
Is it possible to mark the {action....} clause for an entire group of
{document......} i.e. generate the payload to look like
{action.....}
{document......}
{document......}
{document......}
{document......}
{document......}
{document......}
{document......}
{document......}
If this is not supported :
1 - are there alternatives which would allow me to index my data w/o having
to repeat the same action, which in my case is "index" ?
2 - I am currently using urllib2 via python to insert the documents via
http api, would pyes be better/more efficient?
3 - in addition to disabling or increasing index.refresh_interval, dropping
the replica and evenly distributing the posts to all nodes of the cluster
are there any other optimizations I should consider to improve perf?
-Adi
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.