Its really up to the number of machines you plan to have and how the load
(in this case, shards) will be distributed among them. Each shard is a
Lucene index, which does come with its overhead (though I reduced it in
0.9). For time based indexing, it make sense to use different indices (which
give you another dimension to scaling out, since you can't change the number
of shards per index).
One option that you can implement is to do (for example) week based index
files for the past 4 weeks, and then, do 1 month based indexing for older
data. This will require reindexing a week that has passed the one month
marker into a new index that represents a month.
I would say hundreds of indices per log is a bit much on a small cluster of
nodes (< 5), so either increase the time span, use filters, or do something
similar to what I suggested above. In any case, the amount of data per index
you can store in ES is quite large (but requires testing on your end to see
it fits what you try to do).
On Mon, Jul 12, 2010 at 10:29 PM, Berkay Mollamustafaoglu <email@example.com
Can you shed some more light into the inner workings, the potential
overhead due to the high number of indices?
There seems to be some pros to using many indices but it sounds like the
benefits need to be balanced with the associated overhead.
For example, we're planning to shard (log data) by date. This means when
index is updated, it's not updating a very large index, and indices for
older days can be better cached. However this would mean hundreds of indices
per log (as time goes), hence I'm trying to understand whether we need to
consider sharding by week/month instead, etc.
mberkay on yahoo, google and skype
On Mon, Jul 12, 2010 at 2:08 PM, Shay Banon firstname.lastname@example.org:
Actually, the 100 shards per node is just arbitrary limit, and to be
honest, not properly designed in terms of configuration (let me see what I
can do for 0.9 regarding that). Second, creating so many indices / shards is
going to be pretty heavy on a single node. In 0.9, that overhead is going to
be lower, but still... . And yes, you can certainly start more than one node
on a single machine.
On Mon, Jul 12, 2010 at 8:40 PM, Kenneth Loafman <
Thanks! One more learning hurdle down.
Is there a list somewhere of all the configuration items that I could
adjust? I need to increase the number of indices that I can build on a
node. I'm getting 500 errors every time I add a new index or mapping.
As I understand it, given the limit of 100 shards per node, and the
default of 5 shards per index, each node can support only 20 indices.
Am I missing something?
Or, is it possible to run more than one node per machine for testing?
I need to be able to run 100+ indices with 10+ mappings per index, all
fairly low use. One machine should be able to handle the load for
Shay Banon wrote:
Have you configured a gateway for elasticsearch? The configuration is
On Thu, Jul 8, 2010 at 2:16 AM, Kenneth Loafman
<email@example.com mailto:firstname.lastname@example.org> wrote:
I installed elasticsearch-0.8.0 on 64-bit Ubuntu 9.10 system. It
up, runs fine, creates the indices I want, etc., but has a very
problem I can't trace down. If I do a 'bin/service/elasticsearch
restart' command (change config/restart), the entire database
disappears. I'm fairly certain this is not the desired action.
a console log showing what happens:
drwxr-xr-x 22 ken ken 4096 2010-07-07 17:48
<go off and run ES for a while, then restart>
Starting ElasticSearch...Waiting for ElasticSearch......
ls: cannot access
such file or directory
I've read through the docs many times (very terse), changed config,
blown it away and unzipped again, nothing helps. Perhaps you can
me what I can do to correct the problem. I'm pretty sure it should
persist the data.