Overloaded shard


(Atharva Patel) #1

Through the API we have an option of setting custom routing value to index
the document in a particular shard. I am curious to know, what happens when
we start routing large number of documents in the same route. Does it cause
the problem of overload in a single shard at some point of time? Will it
give an error of hard disk full in that particular Node which saves that
shard or will it automatically split the shard in multiple computer, even
if it the index API was called with a custom route say"xyz"? How does ES
handle excessive routing to the same route?

--


(Rafał Kuć) #2

Hello!

ElasticSearch won't divide a shard, you can't do that right now in ES. So, if you end up running low on disk space or having a single shard being overloaded, you can't do anything with it. You may try a different approach - routing not by a single parameter, but more. So for example if you route by customer name and few of them have large number of document, you can also add routing based on document id and this way try to distribute the documents in more even way.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

Through the API we have an option of setting custom routing value to index the document in a particular shard. I am curious to know, what happens when we start routing large number of documents in the same route. Does it cause the problem of overload in a single shard at some point of time? Will it give an error of hard disk full in that particular Node which saves that shard or will it automatically split the shard in multiple computer, even if it the index API was called with a custom route say"xyz"? How does ES handle excessive routing to the same route?

--


(Atharva Patel) #3

Hey, thanks for the quick reply!

I was wondering how to give more them one values so that as you said we can
do routing based on multiple parameters. So as I looked into the mapping
section, inside that the _routing field. Where they have given the
following example:

path

The routing value can be provided as an external value when indexing (and
still stored as part of the document, in much the same way _source is
stored). But, it can also be automatically extracted from the index doc
based on a path. For example, having the following mapping:{
"comment" : {
"_routing" : {
"required" : true,
"path" : "blog.post_id"
}
}
}

Will cause the following doc to be routed based on the 111222 value:{
"text" : "the comment text"
"blog" : {
"post_id" : "111222"
}
}

Note, using path without explicit routing value provided required an
additional (though quite fast) parsing phase.

So can you explain me how to achieve multiple parameter routing by
extending this example?

On Monday, 3 September 2012 16:41:26 UTC+5:30, Rafał Kuć wrote:

Hello!

ElasticSearch won't divide a shard, you can't do that right now in ES. So,
if you end up running low on disk space or having a single shard being
overloaded, you can't do anything with it. You may try a different approach

  • routing not by a single parameter, but more. So for example if you route
    by customer name and few of them have large number of document, you can
    also add routing based on document id and this way try to distribute the
    documents in more even way.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

Through the API we have an option of setting custom routing value to
index the document in a particular shard. I am curious to know, what
happens when we start routing large number of documents in the same route.
Does it cause the problem of overload in a single shard at some point of
time? Will it give an error of hard disk full in that particular Node which
saves that shard or will it automatically split the shard in multiple
computer, even if it the index API was called with a custom route say"xyz"?
How does ES handle excessive routing to the same route?

--

--


(Rafał Kuć) #4

Hello!

I don't know if you can specify multiple fields for a routing field (haven't tried that), but you can explicitly specify _routing parameter with fields separated with comma character.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

Hey, thanks for the quick reply!

I was wondering how to give more them one values so that as you said we can do routing based on multiple parameters. So as I looked into the mapping section, inside that the _routing field. Where they have given the following example:

path

The routing value can be provided as an external value when indexing (and still stored as part of the document, in much the same way _source is stored). But, it can also be automatically extracted from the index doc based on a path. For example, having the following mapping:{

"comment" : {

"_routing" : {

"required" : true,

"path" : "blog.post_id"

}

}

}

Will cause the following doc to be routed based on the 111222 value:{

"text" : "the comment text"

"blog" : {

"post_id" : "111222"

}

}

Note, using path without explicit routing value provided required an additional (though quite fast) parsing phase.

So can you explain me how to achieve multiple parameter routing by extending this example?

On Monday, 3 September 2012 16:41:26 UTC+5:30, Rafał Kuć wrote:

Hello!

ElasticSearch won't divide a shard, you can't do that right now in ES. So, if you end up running low on disk space or having a single shard being overloaded, you can't do anything with it. You may try a different approach - routing not by a single parameter, but more. So for example if you route by customer name and few of them have large number of document, you can also add routing based on document id and this way try to distribute the documents in more even way.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

Through the API we have an option of setting custom routing value to index the document in a particular shard. I am curious to know, what happens when we start routing large number of documents in the same route. Does it cause the problem of overload in a single shard at some point of time? Will it give an error of hard disk full in that particular Node which saves that shard or will it automatically split the shard in multiple computer, even if it the index API was called with a custom route say"xyz"? How does ES handle excessive routing to the same route?

--

--


(system) #5