I can definitely see the point of using the bulk API when indexing via
HTTP.
But is there an advantage of using bulk instead of individual index request
when using the client node? Since the node parses the bulk and routes each
request to its proper destination - and it's basically doing the same when
you submit individual requests - what is the benefit of doing a bulk
request in this case?
I can definitely see the point of using the bulk API when indexing via HTTP.
But is there an advantage of using bulk instead of individual index request when using the client node? Since the node parses the bulk and routes each request to its proper destination - and it's basically doing the same when you submit individual requests - what is the benefit of doing a bulk request in this case?
Ah, thanks, that's helpful to know!
But if that's the case - why does the node need to parse the
document when doing an individual request and not bulk? It can also
stream the doc to the right shard based on the meta data (id and
routing) without parsing the doc, same as it does with the bulk API,
no?
On 11/10/2014 8:53, David Pilato wrote:
Definitely !
Try with and without and you will see the difference.
The node does not parse the full doc but only headers and
streams your docs to the right shards.
I noticed myself a huge difference between both.
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 10 nov. 2014 à 07:34, Rotem <<a moz-do-not-send="true" href="mailto:rotem.hermon@gmail.com">rotem.hermon@gmail.com</a>>
a écrit :
I can definitely see the point of using the
bulk API when indexing via HTTP.
But is there an advantage of using bulk instead of
individual index request when using the client node? Since
the node parses the bulk and routes each request to its
proper destination - and it's basically doing the same when
you submit individual requests - what is the benefit of
doing a bulk request in this case?
--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/28d54f90-f6b8-449a-806f-e873600dfdd5%40googlegroups.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/28d54f90-f6b8-449a-806f-e873600dfdd5%40googlegroups.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.
--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/rnusTTvNTfg/unsubscribe">https://groups.google.com/d/topic/elasticsearch/rnusTTvNTfg/unsubscribe</a>.
To unsubscribe from this group and all its topics, send an email
to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/1C44D026-16C2-42BD-A531-5B79DF49D91B%40pilato.fr?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/1C44D026-16C2-42BD-A531-5B79DF49D91B%40pilato.fr</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.
The node does not parse the bulk, only part of it (the metadata lines for
hashing and routing).
The benefit of bulk requests are simple to see on the network layer.
Assume 1000 docs:
without bulk, send a request per doc, and wait for response each doc:
client must submit 1000 packets on the wire, and server must submit 1000
responses on the wire back, and for each doc on inner shard level
send/receive cycle, there are also another 1000 send/receive. Makes around
4000 packets on the wire (worst case is the connected server node does not
hold a shard of the index), with all the delays.
with bulk, client submits 1 request, server submits subpackets to each
node that holds a shard of the index and submits 1 response back. Makes
around 1 + (2 * n) + 1 packets where n is the number of nodes. With 3
nodes, you have 8 packets instead of 4000.
Same holds for both HTTP and transport protocol, HTTP is only used for
accepting client requests.
I can definitely see the point of using the bulk API when indexing via
HTTP.
But is there an advantage of using bulk instead of individual index
request when using the client node? Since the node parses the bulk and
routes each request to its proper destination - and it's basically doing
the same when you submit individual requests - what is the benefit of doing
a bulk request in this case?
The node does not parse the bulk, only part of it
(the metadata lines for hashing and routing).
The benefit of bulk requests are simple to see on the
network layer.
Assume 1000 docs:
without bulk, send a request per doc, and wait for
response each doc: client must submit 1000 packets on the
wire, and server must submit 1000 responses on the wire
back, and for each doc on inner shard level send/receive
cycle, there are also another 1000 send/receive. Makes
around 4000 packets on the wire (worst case is the connected
server node does not hold a shard of the index), with all
the delays.
with bulk, client submits 1 request, server submits
subpackets to each node that holds a shard of the index and
submits 1 response back. Makes around 1 + (2 * n) + 1
packets where n is the number of nodes. With 3 nodes, you
have 8 packets instead of 4000.
Same holds for both HTTP and transport protocol, HTTP is
only used for accepting client requests.
I can definitely see the point of using the
bulk API when indexing via HTTP.
But is there an advantage of using bulk instead of
individual index request when using the client node? Since
the node parses the bulk and routes each request to its
proper destination - and it's basically doing the same
when you submit individual requests - what is the benefit
of doing a bulk request in this case?
--
You received this message because you are subscribed to
the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com" target="_blank">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/28d54f90-f6b8-449a-806f-e873600dfdd5%40googlegroups.com?utm_medium=email&utm_source=footer" target="_blank">https://groups.google.com/d/msgid/elasticsearch/28d54f90-f6b8-449a-806f-e873600dfdd5%40googlegroups.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout" target="_blank">https://groups.google.com/d/optout</a>.
--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit <a moz-do-not-send="true" href="https://groups.google.com/d/topic/elasticsearch/rnusTTvNTfg/unsubscribe">https://groups.google.com/d/topic/elasticsearch/rnusTTvNTfg/unsubscribe</a>.
To unsubscribe from this group and all its topics, send an email
to <a moz-do-not-send="true" href="mailto:elasticsearch+unsubscribe@googlegroups.com">elasticsearch+unsubscribe@googlegroups.com</a>.
To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHwtoUKSN%2B-D0jiPrVHaMoUygWTtgR-%3Di84Mn0jPCZSYw%40mail.gmail.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHwtoUKSN%2B-D0jiPrVHaMoUygWTtgR-%3Di84Mn0jPCZSYw%40mail.gmail.com</a>.
For more options, visit <a moz-do-not-send="true" href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.