How many indexes is too many indexes?

I understand this may depend on a lot of factors, but I am curious on what
is an efficient number of indexes for a large data set.

I would like to break up indexes by user and by date (I think) mostly
because it will make data management easier on my end.

I am wondering when Elasticsearch will have issues with the number of
indexes. For example is 10 a good number? 100? 1000? 10000? etc.

I would like to break up the indexes as much as possible and make use of
aliases for searching the data of interest, but I don't want to create so
many indexes that it will have an adverse affect on performance.

I would appreciate any insight into what is recommended and what others
have experienced.

Thanks in advance.

-Kevin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/71c384b4-0dc8-4c98-8ef2-5b00872754e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

This entirely depends on your data structure, volume and cluster sizing.
Hundreds works, thousands should be ok if you have a lot of nodes, tens of
thousands is even more nodes.

Aliases will also affect your requirements.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 9 October 2014 00:19, kmoore.cce kmoore.cce@gmail.com wrote:

I understand this may depend on a lot of factors, but I am curious on what
is an efficient number of indexes for a large data set.

I would like to break up indexes by user and by date (I think) mostly
because it will make data management easier on my end.

I am wondering when Elasticsearch will have issues with the number of
indexes. For example is 10 a good number? 100? 1000? 10000? etc.

I would like to break up the indexes as much as possible and make use of
aliases for searching the data of interest, but I don't want to create so
many indexes that it will have an adverse affect on performance.

I would appreciate any insight into what is recommended and what others
have experienced.

Thanks in advance.

-Kevin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/71c384b4-0dc8-4c98-8ef2-5b00872754e7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/71c384b4-0dc8-4c98-8ef2-5b00872754e7%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YFYFcFYUpMzVJd_PrHTeHpL2VE3VbNjnaLcQKzcVhyrA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I am wondering when Elasticsearch will have issues with the number of
indexes. For example is 10 a good number? 100? 1000? 10000? etc.

I would like to break up the indexes as much as possible and make use of
aliases for searching the data of interest, but I don't want to create so
many indexes that it will have an adverse affect on performance.

FYR, one issue that I ran into with daily indices is the limit of file
descriptiors per node.

I wanted to maximize write performance so the number of shards was set
to be 24, matching the number of CPU cores. It does not seem to be that
much, but after a few weeks the number of FD reach 65k limit before the
disk space ran out, and this setting need to be changed.

At this point the cluster have 6 nodes, 4 new indices per day, the total
days kept is about 80. But we do not do cross-day query.

--
Cheers,
Kang-min Liu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/m2fvex68vh.fsf%40gugod.org.
For more options, visit https://groups.google.com/d/optout.

Did you get better writes?
What sort of storage are you on, did you measure before and after, are you
reaching I/O limits?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 9 October 2014 17:33, Kang-min Liu gugod@gugod.org wrote:

I am wondering when Elasticsearch will have issues with the number of
indexes. For example is 10 a good number? 100? 1000? 10000? etc.

I would like to break up the indexes as much as possible and make use of
aliases for searching the data of interest, but I don't want to create so
many indexes that it will have an adverse affect on performance.

FYR, one issue that I ran into with daily indices is the limit of file
descriptiors per node.

I wanted to maximize write performance so the number of shards was set
to be 24, matching the number of CPU cores. It does not seem to be that
much, but after a few weeks the number of FD reach 65k limit before the
disk space ran out, and this setting need to be changed.

At this point the cluster have 6 nodes, 4 new indices per day, the total
days kept is about 80. But we do not do cross-day query.

--
Cheers,
Kang-min Liu

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/m2fvex68vh.fsf%40gugod.org
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624b7dPM-vgf0B%2B0%3D28oLZaGc5keFi%2BRseMmZGqyhf91DyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Mark Walkom writes:

Did you get better writes?
What sort of storage are you on, did you measure before and after, are you
reaching I/O limits?

We pump realtime log data and only measure the the overall processing
throughput instead of low level IO throughput (we had the data, but we
did not correlate those data with setting change). The disk is just some
hard drive but not SSD or some hybrid disk. We did not reach disk or
network IO limit before and afterwards. FD limits was the only limit we
ran into.

--
Cheers,
Kang-min Liu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/m2eguh5fj2.fsf%40gugod.org.
For more options, visit https://groups.google.com/d/optout.

Thank you for the feedback guys, it is greatly appreciated.
I had not thought about file descriptors so that gives me another thing to
think about.

Our daily volume will be pretty high across all of our users, I don't think
we have a great estimate, but right now we are at about 50 million
documents a day and ~30 users.
Our cluster is in EC2 so we can adjust size and nodes basically whenever we
need to, so I don't think that is a huge issue assuming we get the index
layout correct.
At present we are using one large index which is causing some performance
issues as you would expect. We also did not get our sharding correct
originally so now we have really large shards.

We have a requirement to keep 90 days of data per user. With an upward
bound of users at (indeterminate though) 1000 users. So that would be
90,000 indexes if we did it by day.
I guess I am wondering if that is a crazy thing to attempt to do, or if it
makes more sense to break it up weekly or monthly instead in order to keep
the index count down.

Our documents are usually pretty small (or what I would consider small) at
<= 1K, but we will receive them basically constantly.
So I guess I am looking for tips on how we can layout and breakup indexes
to get the best performance benefit as we grow.

Again thank you for the feedback. And appreciate anymore in advance!

Thanks,

On Thursday, October 9, 2014 1:18:07 PM UTC-4, gugod wrote:

Mark Walkom writes:

Did you get better writes?
What sort of storage are you on, did you measure before and after, are
you
reaching I/O limits?

We pump realtime log data and only measure the the overall processing
throughput instead of low level IO throughput (we had the data, but we
did not correlate those data with setting change). The disk is just some
hard drive but not SSD or some hybrid disk. We did not reach disk or
network IO limit before and afterwards. FD limits was the only limit we
ran into.

--
Cheers,
Kang-min Liu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6a0f4b11-2061-4972-b475-1f04aaf66bbc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

1 Like