"users" data flow question

james_lewis · January 24, 2013, 1:12pm

Hi guys,

I'm currently working on a "users" data flow configuration of elasticsearch
and I've got a quick question (users data flow is referenced in this talk:
herehttp://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-searchengine-berlinbuzzwords.htmland
herehttps://speakerdeck.com/kimchy/elasticsearch-big-data-search-analytics
).

Lets say I have 1 index split over 5 shards with 1 replica of that index
all distributed across 2 nodes.

So in this example *node 1 *looks like:

index shard 1
index shard 2
index shard 3
replica index shard 4
replica index shard 5

And node 2 looks like this:

index shard 4
index shard 5
replica index shard 1
replica index shard 2
replica index shard 3

As per the "users" data flow I will be routing all index requests for a
particular user to 1 shard. In this example, lets say that "user 1" is
routed to "index shard 1" so all user 1's documents are stored on shard 1.

My two questions are:

1. what happens if node 1 goes down?

I understand that user 1 will still be able to read the contents of his
data store - I'm assuming all read requests for user 1 will be directed to
"replica index shard 1". But what will happen to any new documents that
user 1 wants to add? The replica is read only (as I understand it) so will
all new index requests fail or will they be added to another shard? If
they're added to another shard then will the routing for user 1 now point
to 2 shards?

2. what happens if there's not enough space on 1 particular shard for a
new user but there is enough across 2 shards?

This is purely hypothetical because I'm interested. I understand I'd have
bigger problems if I came across this situation in production and that I
should be over-allocating shards in this scenario anyway. As I said, just
curious

If I haven't been clear on anything then let me know and update.

Thanks!

James

--

drewr · January 24, 2013, 4:14pm

james.lewis@7digital.com wrote:

Lets say I have 1 index split over 5 shards with 1 replica of that index
all distributed across 2 nodes.

[...]

1. what happens if node 1 goes down?

I understand that user 1 will still be able to read the contents of
his data store - I'm assuming all read requests for user 1 will be
directed to "replica index shard 1". But what will happen to any
new documents that user 1 wants to add? The replica is read only
(as I understand it) so will all new index requests fail or will
they be added to another shard? If they're added to another shard
then will the routing for user 1 now point to 2 shards?

When a primary goes away, one of its replicas is promoted to
primary. So you're right that a replica is read-only, only that it
doesn't have to remain a replica forever.

2. what happens if there's not enough space on 1 particular shard
for a new user but there is enough across 2 shards?

Once ES targets data for a shard, whether by hashing your routing
value or the doc id, that's where it's going. If there isn't disk
space available, the Lucene index (just for that shard) will likely
be corrupted.

In the relatively near future, we will support disk calculations for
more intelligent shard allocation, but that won't help the indexing
issue. The latter is quite a tricky one to solve, mostly because ES
and Lucene both try to touch the disk as seldom as possible.

-Drew

--

james_lewis · January 25, 2013, 12:01pm

Cool - thanks for clearing that up Drew!

On Thu, Jan 24, 2013 at 4:14 PM, Drew Raines aaraines@gmail.com wrote:

james.lewis@7digital.com wrote:

Lets say I have 1 index split over 5 shards with 1 replica of that index
all distributed across 2 nodes.

[...]

1. what happens if node 1 goes down?

I understand that user 1 will still be able to read the contents of
his data store - I'm assuming all read requests for user 1 will be
directed to "replica index shard 1". But what will happen to any
new documents that user 1 wants to add? The replica is read only
(as I understand it) so will all new index requests fail or will
they be added to another shard? If they're added to another shard
then will the routing for user 1 now point to 2 shards?

When a primary goes away, one of its replicas is promoted to
primary. So you're right that a replica is read-only, only that it
doesn't have to remain a replica forever.

2. what happens if there's not enough space on 1 particular shard
for a new user but there is enough across 2 shards?

Once ES targets data for a shard, whether by hashing your routing
value or the doc id, that's where it's going. If there isn't disk
space available, the Lucene index (just for that shard) will likely
be corrupted.

In the relatively near future, we will support disk calculations for
more intelligent shard allocation, but that won't help the indexing
issue. The latter is quite a tricky one to solve, mostly because ES
and Lucene both try to touch the disk as seldom as possible.

-Drew

--

--

Topic		Replies	Views
Users data flow Elasticsearch	8	1281	July 6, 2017
Shard memory allocation and replicas Elasticsearch	1	317	July 6, 2017
How do nodes and shards work in this scenario? Elasticsearch	5	480	October 8, 2019
Over-allocation of shards Elasticsearch	9	1564	July 6, 2017
How to distribute documents across shards equally, using _routing in Elasticsearch Elasticsearch	10	630	April 25, 2022

"users" data flow question

Related topics