Greetings,
We're currently testing out Elasticsearch using the opening/closing of
indexes as a way to control what data is actually in memory at any given
time. Our hope is that we can avoid getting boxes with tons of memory
because we have several hundred gigs of data that needs to be searched but
only a gigs worth needs to be available at any given time. We are
currently creating an index for every user of our app which is in the tens
of thousands.
Initially, we're just trying to get the indexes created and populated with
data. We're using workers that will take care of creating an index and
populating it. So if we have 50 workers going, we only have 50 indexes
open. In theory, even though we plan on having tens of thousands of
indexes, we can scale down our hardware because we'll only be dealing with
a limited set of open indexes at any given time.
What we're finding is when we let the workers loose on the cluster, they
zip through creating indexes and populating them for the first 30-45
minutes, and then performance just degrades. Creating indexes goes from
taking seconds to minutes to even over 10 minutes. If we restart the
cluster, we'll find that performance zips along again for 30-45 minutes and
starts degrading again. During the first 30-45 minutes, we go through
creating a few thousand indexes but then the throughput just drops hour
after hour.
Looking at system metrics on our nodes, it looks like index creation is
mostly IO intensive due to the shard creation, but it seems like there is
some other overhead that just gets larger and larger as we create indexes,
even though we close them after we're done.
Does anyone have insight they can share on this? Again, our main reason
for the opening and closing is so we can have more control about what goes
into memory. We're open to cons on this approach as well.
Thanks!
-Chris
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Check out this talk by Shay Banon: http://vimeo.com/44716955
An index per user sounds like it may not be the right approach.
On Friday, September 27, 2013 2:54:39 AM UTC-4, Chris Wornell wrote:
Greetings,
We're currently testing out Elasticsearch using the opening/closing of
indexes as a way to control what data is actually in memory at any given
time. Our hope is that we can avoid getting boxes with tons of memory
because we have several hundred gigs of data that needs to be searched but
only a gigs worth needs to be available at any given time. We are
currently creating an index for every user of our app which is in the tens
of thousands.
Initially, we're just trying to get the indexes created and populated with
data. We're using workers that will take care of creating an index and
populating it. So if we have 50 workers going, we only have 50 indexes
open. In theory, even though we plan on having tens of thousands of
indexes, we can scale down our hardware because we'll only be dealing with
a limited set of open indexes at any given time.
What we're finding is when we let the workers loose on the cluster, they
zip through creating indexes and populating them for the first 30-45
minutes, and then performance just degrades. Creating indexes goes from
taking seconds to minutes to even over 10 minutes. If we restart the
cluster, we'll find that performance zips along again for 30-45 minutes and
starts degrading again. During the first 30-45 minutes, we go through
creating a few thousand indexes but then the throughput just drops hour
after hour.
Looking at system metrics on our nodes, it looks like index creation is
mostly IO intensive due to the shard creation, but it seems like there is
some other overhead that just gets larger and larger as we create indexes,
even though we close them after we're done.
Does anyone have insight they can share on this? Again, our main reason
for the opening and closing is so we can have more control about what goes
into memory. We're open to cons on this approach as well.
Thanks!
-Chris
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.