Hi,
apologies if this is a double post, I also added this question in response
to another question but I'm posting it here as I think it's actually a new
topic:
https://groups.google.com/forum/?fromgroups#!starred/elasticsearch/l-YJGJAGJLQ
I'm just wondering if the parallel scan feature mentioned in that document
is under any development?
We have an index with 20 shards and approx 125 million docs, using default
routing.
I have a situation where we run quite a lot of large scans to process
customer data, and we're looking for a way to speed up the sequential
scans. The parallel scan would be great in this situation.
Another idea was to implement this by running multiple workers (one per
shard) and enforcing that each worker only hits one shard. Each worker
would run the same scan against each shard, so we should get the same
results but faster as they run in parallel (and hopefully as each query is
only hitting one shard, this may be faster too?)
If there was a way I could determine a routing value for each worker that
would force each worker to the correct shard, I think it would be possible,
just by including routing=[some_key] with the query.
But it would mean working out some inverse of the default routing algorithm
to find a routing value that goes to each shard, and I'm not sure how
simple this is.
Does this sound like a feasible solution? Or does anyone have any
suggestions as to other ways to achieve this?
If I could get some pointers on how to implement I will have a look at
implementing something.
thanks in advance for any assistance.
Bob.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.