Hi,
The master has been enhanced so that all writes are concurrent based on the number of shards (!) (not nodes) available
for the target index. You don't need to specify multiple hosts - the discovery happens automatically at runtime and it
works on all frameworks - M&R, Cascading, Pig and Hive.
The concurrent/parallel writes occur for both map-only and map-reduce jobs as the map or reduce tasks are
'spread/partitioned' on top of the shards of the target.
A note on performance (until the docs will be updated) - in most cases, the jobs only have mapper (no reducers) - Pig
and Hive are smart enough to figure this out and eliminate the reduce phase, M&R isn't, so consider disabling the reduce
phase for improved performance.
This means that the write parallelism is driven by the number of input splits for your job - that is if your job is not
splittable (the source file is compressed), you will still end up with only one task writing to ES. As you have (a lot)
of small files, this shouldn't be the case.
I've pushed the nightly build (elasticsearch-hadoop-1.3.0.BUILD-20131102.205934-207) to Maven central so the snapshot
should be downloaded automatically by any Maven tool.
Let me know how it goes.
Cheers!
On 31/10/2013 2:41 PM, Niels Basjes wrote:
Hi,
Op donderdag 31 oktober 2013 10:29:29 UTC+1 schreef Costin Leau:
That's because the node receives all the write traffic at the moment.
I thought as much
There are plans in the current release cycle to:
a. add multiple nodes (mainly for discovery purposes)
b. do concurrent writes on the target nodes
Any indication when this is planned to become available?
Do you use certain shard/partitioning settings or just the defaults?
Nothing special, just getting started.
Also, how big is your data set in Pig?
129M very small documents (approx 20GB in size). Main goal is a simple graph in kibana.
Niels
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
Costin
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.