Tiering storage / Curator


(Patrick Proniewski) #1

Hello,

Curator makes is possible to migrate an index to another storage programmatically, and that's very nice to keep old indices on cheap storage. But if I understand correctly, a unique ES cluster cannot handle two different storages. Hence, having small but fast storage for recent files and cheap but slow storage for old files requires building two clusters.
Am I right?

thanks,
Patrick

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/487D9125-FC9B-43F9-B714-9C4EA2556A47%40patpro.net.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

Nope, you can use allocation awareness to have indexes on different
machines -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-allocation.html

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 15 July 2014 15:20, Patrick Proniewski elasticsearch@patpro.net wrote:

Hello,

Curator makes is possible to migrate an index to another storage
programmatically, and that's very nice to keep old indices on cheap
storage. But if I understand correctly, a unique ES cluster cannot handle
two different storages. Hence, having small but fast storage for recent
files and cheap but slow storage for old files requires building two
clusters.
Am I right?

thanks,
Patrick

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/487D9125-FC9B-43F9-B714-9C4EA2556A47%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aBWn1RruyyhchVeE5kOF_vFFusv2fejvjjqM3Y8PSpRw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Patrick Proniewski) #3

Ok, so if I understand correctly I can have a single cluster with:

  • machine A: fast storage (recent data)
  • machine B & C: slow storage (old data)

In that case, I cannot have a homogeneous cluster with both fast and slow storage on each node and I'm losing the benefit of having multiple machines when I index new data and when I search recent data. Is that correct?

Regards,
Patrick

On 15 juil. 2014, at 07:25, Mark Walkom markw@campaignmonitor.com wrote:

Nope, you can use allocation awareness to have indexes on different
machines -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-allocation.html

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 15 July 2014 15:20, Patrick Proniewski elasticsearch@patpro.net wrote:

Hello,

Curator makes is possible to migrate an index to another storage
programmatically, and that's very nice to keep old indices on cheap
storage. But if I understand correctly, a unique ES cluster cannot handle
two different storages. Hence, having small but fast storage for recent
files and cheap but slow storage for old files requires building two
clusters.
Am I right?

thanks,
Patrick

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/487D9125-FC9B-43F9-B714-9C4EA2556A47%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aBWn1RruyyhchVeE5kOF_vFFusv2fejvjjqM3Y8PSpRw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/185792EE-3E9D-4EB6-A0B1-1E4B4FBC6F81%40patpro.net.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #4

You cannot have multiple data.paths on a single node/instances. You could
try running multiple instances of ES on a single physical, each pointing to
either one of your tiered pools.
But you aren't losing the benefit of multiple nodes, just the optimal use
of your storage on those physical nodes.

You could look at something like L2ARC or similar though.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 15 July 2014 17:05, Patrick Proniewski elasticsearch@patpro.net wrote:

Ok, so if I understand correctly I can have a single cluster with:

  • machine A: fast storage (recent data)
  • machine B & C: slow storage (old data)

In that case, I cannot have a homogeneous cluster with both fast and slow
storage on each node and I'm losing the benefit of having multiple machines
when I index new data and when I search recent data. Is that correct?

Regards,
Patrick

On 15 juil. 2014, at 07:25, Mark Walkom markw@campaignmonitor.com wrote:

Nope, you can use allocation awareness to have indexes on different
machines -

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-allocation.html

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 15 July 2014 15:20, Patrick Proniewski elasticsearch@patpro.net
wrote:

Hello,

Curator makes is possible to migrate an index to another storage
programmatically, and that's very nice to keep old indices on cheap
storage. But if I understand correctly, a unique ES cluster cannot
handle

two different storages. Hence, having small but fast storage for recent
files and cheap but slow storage for old files requires building two
clusters.
Am I right?

thanks,
Patrick

--
You received this message because you are subscribed to the Google
Groups

"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an

email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/487D9125-FC9B-43F9-B714-9C4EA2556A47%40patpro.net

.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624aBWn1RruyyhchVeE5kOF_vFFusv2fejvjjqM3Y8PSpRw%40mail.gmail.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/185792EE-3E9D-4EB6-A0B1-1E4B4FBC6F81%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Z6rMHS%2Br-2eeGWTpWq_bZ4iraEp0sE_91w8vDcJVe02w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Patrick Proniewski) #5

It seems I can have multiple path.data on a single node, but it does not allow for storage tiering:

Can optionally include more than one location, causing data to be striped across

the locations (a la RAID 0) on a file level, favouring locations with most free

space on creation. For example:

path.data: /path/to/data1,/path/to/data2

ES seems to be quite close to beeing able to provide storage tiering... Maybe in 1.4? :wink:

On 15 juil. 2014, at 09:33, Mark Walkom markw@campaignmonitor.com wrote:

You cannot have multiple data.paths on a single node/instances. You could try running multiple instances of ES on a single physical, each pointing to either one of your tiered pools.
But you aren't losing the benefit of multiple nodes, just the optimal use of your storage on those physical nodes.

You could look at something like L2ARC or similar though.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 15 July 2014 17:05, Patrick Proniewski elasticsearch@patpro.net wrote:
Ok, so if I understand correctly I can have a single cluster with:

  • machine A: fast storage (recent data)
  • machine B & C: slow storage (old data)

In that case, I cannot have a homogeneous cluster with both fast and slow storage on each node and I'm losing the benefit of having multiple machines when I index new data and when I search recent data. Is that correct?

Regards,
Patrick

On 15 juil. 2014, at 07:25, Mark Walkom markw@campaignmonitor.com wrote:

Nope, you can use allocation awareness to have indexes on different
machines -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-allocation.html

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 15 July 2014 15:20, Patrick Proniewski elasticsearch@patpro.net wrote:

Hello,

Curator makes is possible to migrate an index to another storage
programmatically, and that's very nice to keep old indices on cheap
storage. But if I understand correctly, a unique ES cluster cannot handle
two different storages. Hence, having small but fast storage for recent
files and cheap but slow storage for old files requires building two
clusters.
Am I right?

thanks,
Patrick

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/487D9125-FC9B-43F9-B714-9C4EA2556A47%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aBWn1RruyyhchVeE5kOF_vFFusv2fejvjjqM3Y8PSpRw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/185792EE-3E9D-4EB6-A0B1-1E4B4FBC6F81%40patpro.net.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Z6rMHS%2Br-2eeGWTpWq_bZ4iraEp0sE_91w8vDcJVe02w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/68787163-5E12-448B-9C0B-553DECF8D613%40patpro.net.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #6

There you go, I didn't know it did that!

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 15 July 2014 18:35, Patrick Proniewski elasticsearch@patpro.net wrote:

It seems I can have multiple path.data on a single node, but it does not
allow for storage tiering:

Can optionally include more than one location, causing data to be

striped across

the locations (a la RAID 0) on a file level, favouring locations with

most free

space on creation. For example:

path.data: /path/to/data1,/path/to/data2

ES seems to be quite close to beeing able to provide storage tiering...
Maybe in 1.4? :wink:

On 15 juil. 2014, at 09:33, Mark Walkom markw@campaignmonitor.com wrote:

You cannot have multiple data.paths on a single node/instances. You
could try running multiple instances of ES on a single physical, each
pointing to either one of your tiered pools.
But you aren't losing the benefit of multiple nodes, just the optimal
use of your storage on those physical nodes.

You could look at something like L2ARC or similar though.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 15 July 2014 17:05, Patrick Proniewski elasticsearch@patpro.net
wrote:
Ok, so if I understand correctly I can have a single cluster with:

  • machine A: fast storage (recent data)
  • machine B & C: slow storage (old data)

In that case, I cannot have a homogeneous cluster with both fast and
slow storage on each node and I'm losing the benefit of having multiple
machines when I index new data and when I search recent data. Is that
correct?

Regards,
Patrick

On 15 juil. 2014, at 07:25, Mark Walkom markw@campaignmonitor.com
wrote:

Nope, you can use allocation awareness to have indexes on different
machines -

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-allocation.html

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 15 July 2014 15:20, Patrick Proniewski elasticsearch@patpro.net
wrote:

Hello,

Curator makes is possible to migrate an index to another storage
programmatically, and that's very nice to keep old indices on cheap
storage. But if I understand correctly, a unique ES cluster cannot
handle

two different storages. Hence, having small but fast storage for
recent

files and cheap but slow storage for old files requires building two
clusters.
Am I right?

thanks,
Patrick

--
You received this message because you are subscribed to the Google
Groups

"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an

email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/487D9125-FC9B-43F9-B714-9C4EA2556A47%40patpro.net

.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624aBWn1RruyyhchVeE5kOF_vFFusv2fejvjjqM3Y8PSpRw%40mail.gmail.com
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/185792EE-3E9D-4EB6-A0B1-1E4B4FBC6F81%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Z6rMHS%2Br-2eeGWTpWq_bZ4iraEp0sE_91w8vDcJVe02w%40mail.gmail.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/68787163-5E12-448B-9C0B-553DECF8D613%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Zxw2HNNvDFYdqsCurVFrFs_9%3DLOk1JM%2BkYkEzp23hTjw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Otis Gospodnetić) #7

Hi,

On Tuesday, July 15, 2014 1:20:39 AM UTC-4, Patrick Proniewski wrote:

Hello,

Curator makes is possible to migrate an index to another storage
programmatically, and that's very nice to keep old indices on cheap
storage. But if I understand correctly, a unique ES cluster cannot handle
two different storages. Hence, having small but fast storage for recent
files and cheap but slow storage for old files requires building two
clusters.
Am I right?

Not necessarily. We used the tiered storage approach in Logsene
http://sematext.com/logsene/, for example, but we explicitly move older
indexes to from more expensive nodes that deal with fresh data to cheaper
nodes that host all data. It's automated, but it's not 100% done within
ES. But it's done with a single ES cluster.

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0bbe88ad-4c60-4383-8ae9-13ab15d02676%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #8