1 PiB with one year data retention policy

Hello, I'm a newcomer on Elasticsearch and I have a few conceptual
question, I hope someone can help me.

  1. I know Elasticsearch is basically a search engine, and is thought as
    time series data store with some policy of limited data retention. I've
    read in some thread of this list:

"Set expiration dates on the documents you are indexing in ElasticSearch:
This is generally around 90 to 120 days in certain production environments
and 15 to 30 days in development and lower environments."

My question is: Can I use ES for time series data store with a one year
data retention policy? Probably the data stored size is about 1 PiB of logs.

  1. Is there any site similar to https://wiki.apache.org/hadoop/PoweredBy
    with real use cases of ES (with usage numbers, number of servers, and so
    forth).

Many thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4a555464-492f-4faa-88b8-a12dff6bac19%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Firstly, ES is not a time series data store, it can definitely be leveraged
as one but it does a lot, lot more!

1 - If that is referring to using document level TTLs then you may want to
avoid that for 1PB of data, TTLs can be resource intensive and at that
level you might find them very difficult. Instead it is better to use time
based indexes, eg daily as per the ELK stack, then you can use
Elasticsearch Curator to handle retention.
2 - No not at this stage unfortunately.

On 22 December 2014 at 07:43, Javier Roman jroman.espinar@gmail.com wrote:

Hello, I'm a newcomer on Elasticsearch and I have a few conceptual
question, I hope someone can help me.

  1. I know Elasticsearch is basically a search engine, and is thought as
    time series data store with some policy of limited data retention. I've
    read in some thread of this list:

"Set expiration dates on the documents you are indexing in Elasticsearch:
This is generally around 90 to 120 days in certain production environments
and 15 to 30 days in development and lower environments."

My question is: Can I use ES for time series data store with a one year
data retention policy? Probably the data stored size is about 1 PiB of logs.

  1. Is there any site similar to https://wiki.apache.org/hadoop/PoweredBy
    with real use cases of ES (with usage numbers, number of servers, and so
    forth).

Many thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4a555464-492f-4faa-88b8-a12dff6bac19%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4a555464-492f-4faa-88b8-a12dff6bac19%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_NFxN2jK61iHNznOuimEyH%2B31J_X0eLUPv8-aZ5xvN6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.