Splitting index into smaller ones

Hello,
Currently i have one index (by day) which contains logs from several applications. The size is ~50-80Gb/day.
We often searches/aggregate documents by applications.
So would it be better to split this index into smaller indexs (from 1 to 10-15 indexs about 2-10Gb)?
Would the response time improve a lot (considering that i can cache 64gb in memory)?
And what would be the penalty for cross application searches ?

Another concern : it's seems that els don't like many indexes.
With splitting, i can end up with 400 indexes by month.
Is it too much ?

Jean

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d7c5971a-b32b-48e9-b8aa-b6f5e0afd7b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

The most important figure is the total number of shards per node.
Having too many shards on a node could lead to some issues (file descriptors, memory...). So, at some point, if you need to manage more shards, you should think of adding more nodes.

1 index with 5 shards is exactly the same as 5 index with 1 shard.

Hope this helps.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 27 déc. 2014 à 09:53, lagarutte via elasticsearch elasticsearch@googlegroups.com a écrit :

Hello,
Currently i have one index (by day) which contains logs from several applications. The size is ~50-80Gb/day.
We often searches/aggregate documents by applications.
So would it be better to split this index into smaller indexs (from 1 to 10-15 indexs about 2-10Gb)?
Would the response time improve a lot (considering that i can cache 64gb in memory)?
And what would be the penalty for cross application searches ?

Another concern : it's seems that els don't like many indexes.
With splitting, i can end up with 400 indexes by month.
Is it too much ?

Jean

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d7c5971a-b32b-48e9-b8aa-b6f5e0afd7b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CBD1E85D-09D5-44DB-B1AB-17F4163BE5A6%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

If you have 50-80 G/d you need quite a number of machines. Smaller indexes
and higher shard count on the same number of machines do not help, the
search performance will be worse.

ES is fine with many indexes. Take big indexes that span over many
machines, and once the index is complete, execute an optimize for faster
search.

Jörg

On Sat, Dec 27, 2014 at 9:53 AM, lagarutte via elasticsearch <
elasticsearch@googlegroups.com> wrote:

Hello,
Currently i have one index (by day) which contains logs from several
applications. The size is ~50-80Gb/day.
We often searches/aggregate documents by applications.
So would it be better to split this index into smaller indexs (from 1 to
10-15 indexs about 2-10Gb)?
Would the response time improve a lot (considering that i can cache 64gb
in memory)?
And what would be the penalty for cross application searches ?

Another concern : it's seems that els don't like many indexes.
With splitting, i can end up with 400 indexes by month.
Is it too much ?

Jean

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d7c5971a-b32b-48e9-b8aa-b6f5e0afd7b9%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEMa4jpFaFvFAUXiM-NBB5RuySHSCKq_sJ99PBoSY%3D-kw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Well thanks you.
Based on the answers, i understand this : put everything in one big index with one shard per server.
When the shards are too big then add another server.

Coming for dbms world, it's "strange" for me.
For example, in mysql, we create 1 table for each application and so the tables are faster to scan/query, even for the indexes (full text or not).
With this, we don't cache in memory the other tables if they are never accessed.
If i put everything in one big table, then indexes are bigger and more expensive to query.
Admin operations are longer and impact all logging applications.
So i thought by isolating independant data into multiples indexes, the local queries would be faster.

If i have 3 servers/64g memory each , 1 shard/day/server with 1 month history, majority of queries for the last 7 days :
Does the last 7 shards by server must be fully cached in memory for good response time?
Even if i query in real only 10% of the data of the last 7 days and rarely the remaining ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c9430718-d608-4958-b87a-f32b8f9e17cf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Some answers inlined.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 27 déc. 2014 à 15:17, lagarutte via elasticsearch elasticsearch@googlegroups.com a écrit :

Well thanks you.
Based on the answers, i understand this : put everything in one big index with one shard per server.
When the shards are too big then add another server.
If you have one shard per node, adding a new node will have no effect...

If you add new shards so if you have more than one shard per node, adding new nodes will help.

Coming for dbms world, it's "strange" for me.
For example, in mysql, we create 1 table for each application and so the tables are faster to scan/query, even for the indexes (full text or not).
With this, we don't cache in memory the other tables if they are never accessed.
If i put everything in one big table, then indexes are bigger and more expensive to query.
Admin operations are longer and impact all logging applications.
So i thought by isolating independant data into multiples indexes, the local queries would be faster.

If i have 3 servers/64g memory each , 1 shard/day/server with 1 month history, majority of queries for the last 7 days :
Does the last 7 shards by server must be fully cached in memory for good response time?
Actually elasticsearch doesn't cache the full data but filter bitsets, fielddata... OS cache will cache efficiently Lucene files.

Even if i query in real only 10% of the data of the last 7 days and rarely the remaining ?

If you query not all shards (last day index/shards), you might end up using less memory.

HTH.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c9430718-d608-4958-b87a-f32b8f9e17cf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/01EE839E-D6FC-439F-B1CA-7676DB35031E%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.