Just to be sure I understand -- are you suggesting the following syntax:
curl -XPOST
'http://localhost:9200/test/type1/_bulk?replication=async&timeout=5m' -d '
{ "index" : { "_id" : "i1", "version": 3, "version_type": "external" } }
{ "fields": "values etc." }
'
For my use case, version_type is always "external" for all documents in the
request. But I get the motivation for specifying it per-doc.
You said that timeout per-doc is ignored in bulk mode. So the
Elasticsearch default timeout, i.e. [1m] should have been applied to my
original requests.
Do you know why a [0s] timeout was applied instead? This was from a
response:
{"index":"test","_type":"type1","_id":"123","status":503,"error":"UnavailableShardsException[[test][98]
[3] shardIt, [3] active : Timeout waiting for [0s], request:
org.elasticsearch.action.bulk.BulkShardRequest@36d185a1]"}
On Tuesday, July 29, 2014 12:21:21 AM UTC-7, Jörg Prante wrote:
You can use "version" and "version_type" per doc, of course.
The parameters "replication" and "timeout" per doc are ignored when using
bulk mode. They must be set at bulk request level.
Each bulk request is split and forwarded to relevant shards. This
splitting is very fast by searching delimiters in the request chunk,
sorting the actions that belong to one shard, and forward them as new
packets. For these packets, the bulk request level parameters "replication"
and "timeout" should work.
Although the request format looks heavy, it is most appropriate for
distributed processing.
Jörg
On Tue, Jul 29, 2014 at 1:02 AM, Ashish Mishra <laughin...@gmail.com
<javascript:>> wrote:
I'm uploading documents using syntax like the following.
curl -XPOST 'http://localhost:9200/test/type1/_bulk' -d '
{ "index" : { "_id" : "i1", "version": 3, "version_type": "external",
"replication": "async", "timeout": "5m" } }
{ "fields": "values etc." }
{ "index" : { "_id" : "i2", "version": 1, "version_type": "external",
"replication": "async", "timeout": "5m" } }
{ "fields": "values etc." }
'
A couple of questions: First, there's a fair bit of redundancy in the
action line. It feels wasteful when sending 10s of Mb / thousands of
requests per API call.
Can I roll default version_type / replication / timeout parameters into
the top-level _bulk url? I've seen a few resolved issues suggesting this.
But it's not mentioned in the documentation at
Elasticsearch Platform — Find real-time answers at scale | Elastic
Second, in the response I occasionally see errors like
{"index":"test","_type":"type1","_id":"123","status":503,"error":"UnavailableShardsException[[test][98]
[3] shardIt, [3] active : Timeout waiting for [0s], request:
org.elasticsearch.action.bulk.BulkShardRequest@36d185a1]"}
The "[0s]" part is surprising. The available-shard-timeout is 1m by
default, and I explicitly requested 5m. Does this get overridden somewhere?
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ba4bde17-2668-42c4-9d14-0923571044d5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ba4bde17-2668-42c4-9d14-0923571044d5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc6294c4-cca4-4eec-961a-491e6c6c007b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.