How about elasticsearch for a heavy size load?

Hi group,

I will be starting some testing using elastic search for a really heavy
amount of data (at least heavy for me by now)
I will need to handle a cluster handling about 5000GB of text for searching
Should it be better to have different smaller clusters, organized depending
on my search needs?
Should I handle scheduled indexing in some way?
I would need searches to finish in some acceptable time for an end user
performing queries, but I don't expect to have a heavy load on queries,
But well, I'm worried about the size of that data, it will millions of
small entires, I guess each entry would be less than 1KB, with an average
around 200 bytes.
I would thank so much some comments about some experiences like this.

Thanks in advance,

Hernán

--

Hello Hernán,

plain indexing is not a problem as such. The challenge is making
queries performant over your index. And advice here will differ
depending on your queries: will you be using facets? Will your queries
only hit a subset of all data (such as period-based queries)? How
often will you be indexing new documents? In batches?

Depending on that, you can optimize the structure of your indexes, the
number of shards, refresh interval etc. See kimchy's recent talk about
different scenarios: Elasticsearch Platform — Find real-time answers at scale | Elastic

Also be aware of compression options:

Best,
Radim

On Oct 4, 6:05 am, Hernán Leoni leoni.her...@gmail.com wrote:

Hi group,

I will be starting some testing using Elasticsearch for a really heavy
amount of data (at least heavy for me by now)
I will need to handle a cluster handling about 5000GB of text for searching
Should it be better to have different smaller clusters, organized depending
on my search needs?
Should I handle scheduled indexing in some way?
I would need searches to finish in some acceptable time for an end user
performing queries, but I don't expect to have a heavy load on queries,
But well, I'm worried about the size of that data, it will millions of
small entires, I guess each entry would be less than 1KB, with an average
around 200 bytes.
I would thank so much some comments about some experiences like this.

Thanks in advance,

Hernán

--

Hi Hernán,

It reeeeally depends. :slight_smile:

5 TB is not small, but doable, depending on a number of factors: hardware,
ES configuration, query complexity, query concurrency, query latency
requirements, etc.
Unfortunately, nobody can give you precise advice without knowing a lot
more details about the above.... you'll want to look at sharding,
oversharding, replication, at cache sizes, at compression, at routing and
filtering, etc. etc. Again, can't give you exact guidance or answers
without knowing a lot more.

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Thursday, October 4, 2012 12:05:40 AM UTC-4, Hernán Leoni wrote:

Hi group,

I will be starting some testing using Elasticsearch for a really heavy
amount of data (at least heavy for me by now)
I will need to handle a cluster handling about 5000GB of text for searching
Should it be better to have different smaller clusters, organized
depending on my search needs?
Should I handle scheduled indexing in some way?
I would need searches to finish in some acceptable time for an end user
performing queries, but I don't expect to have a heavy load on queries,
But well, I'm worried about the size of that data, it will millions of
small entires, I guess each entry would be less than 1KB, with an average
around 200 bytes.
I would thank so much some comments about some experiences like this.

Thanks in advance,

Hernán

--