Scalability and creating 1 index per user


(Neera Vats) #1

Hi All,
I am exploring elastic search to create one index per user instead of one
big index for all the users. Each index would be about 6G.
I am wondering if anyone has tried it and how would it scale?

I couldn't find that elastic search has limit on maximum number of indices.
Is it safe/recommended to have say 20K indices for 20K users? Would it
architecture scale well?

Also, if start with say a 5 nodes cluster now, and add more nodes as I need
them, does ES redistributes its shards every time I add new nodes? How
newly added nodes are utilized in a cluster?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/846e3f85-97b6-494b-9ffb-67c9ec3f66ab%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nik Everett) #2

On Tue, Feb 25, 2014 at 4:46 PM, ESUser neeravats@gmail.com wrote:

Hi All,
I am exploring elastic search to create one index per user instead of one
big index for all the users. Each index would be about 6G.
I am wondering if anyone has tried it and how would it scale?

I couldn't find that elastic search has limit on maximum number of
indices. Is it safe/recommended to have say 20K indices for 20K users?
Would it architecture scale well?

I'm running 1107 indexes right now. Some of the cluster actions are a bit
slower then I'd like but I think that is better in 1.0. I don't think it'd
work well an order of magnitude larger but I could be wrong.

Also, if start with say a 5 nodes cluster now, and add more nodes as I
need them, does ES redistributes its shards every time I add new nodes? How
newly added nodes are utilized in a cluster?

It'll smooth the shards out across the new nodes. There is configuration
for how many concurrent moves can take place and how much bandwidth is ok
per move. The defaults are a bit slow especially if you have fast network
and disks.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Atp5piP5bOYqaxnMPw6iW7yKTY8%3DxQhmO56GCcUsa_A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #3

20K is a lot of indexes, probably too many as ES will need to maintain
state about each of those in memory which could mean you have nothing left
for caching indexed data!
You might want to look at
http://www.elasticsearch.org/blog/customizing-your-document-routing/instead,
that way you can reduce your index count but still gain the same
usability outcome.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 26 February 2014 08:52, Nikolas Everett nik9000@gmail.com wrote:

On Tue, Feb 25, 2014 at 4:46 PM, ESUser neeravats@gmail.com wrote:

Hi All,
I am exploring elastic search to create one index per user instead of one
big index for all the users. Each index would be about 6G.
I am wondering if anyone has tried it and how would it scale?

I couldn't find that elastic search has limit on maximum number of
indices. Is it safe/recommended to have say 20K indices for 20K users?
Would it architecture scale well?

I'm running 1107 indexes right now. Some of the cluster actions are a bit
slower then I'd like but I think that is better in 1.0. I don't think it'd
work well an order of magnitude larger but I could be wrong.

Also, if start with say a 5 nodes cluster now, and add more nodes as I
need them, does ES redistributes its shards every time I add new nodes? How
newly added nodes are utilized in a cluster?

It'll smooth the shards out across the new nodes. There is configuration
for how many concurrent moves can take place and how much bandwidth is ok
per move. The defaults are a bit slow especially if you have fast network
and disks.

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Atp5piP5bOYqaxnMPw6iW7yKTY8%3DxQhmO56GCcUsa_A%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZbwOJarNCT8wrEzi047V8GCP3mYfh_2X7MQOFrb4QbCg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Otis Gospodnetić) #4

Hi,

I know some users of SPM for Elasticsearch have clusters with many
thousands indexes (and growing), each with 5+ shards. They've been using
SPM for man months now, plus they are our clients, so I've had the chance
to see their servers and metrics and can tell you that I don't see any
other metric growing to dangerous level just because the number of their
indexes is growing - CPU is low, JVM GC is low, etc. But, of course, there
are a ton of factors involved, such as whether and how much these indexes
are actually being searches, and I don't recall that info off the top of my
head. ~10 years ago I had a single server with >300K non-trivial Lucene
indexes that bots were crawling like crazy.... but I implemented something
that closed unused indices and opened them on demand and that's what made
it possible for this system to survive. Maybe this could work for you, too.

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Tuesday, February 25, 2014 4:54:31 PM UTC-5, Mark Walkom wrote:

20K is a lot of indexes, probably too many as ES will need to maintain
state about each of those in memory which could mean you have nothing left
for caching indexed data!
You might want to look at
http://www.elasticsearch.org/blog/customizing-your-document-routing/instead, that way you can reduce your index count but still gain the same
usability outcome.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 26 February 2014 08:52, Nikolas Everett <nik...@gmail.com <javascript:>

wrote:

On Tue, Feb 25, 2014 at 4:46 PM, ESUser <neer...@gmail.com <javascript:>>wrote:

Hi All,
I am exploring elastic search to create one index per user instead of
one big index for all the users. Each index would be about 6G.
I am wondering if anyone has tried it and how would it scale?

I couldn't find that elastic search has limit on maximum number of
indices. Is it safe/recommended to have say 20K indices for 20K users?
Would it architecture scale well?

I'm running 1107 indexes right now. Some of the cluster actions are a
bit slower then I'd like but I think that is better in 1.0. I don't think
it'd work well an order of magnitude larger but I could be wrong.

Also, if start with say a 5 nodes cluster now, and add more nodes as I
need them, does ES redistributes its shards every time I add new nodes? How
newly added nodes are utilized in a cluster?

It'll smooth the shards out across the new nodes. There is configuration
for how many concurrent moves can take place and how much bandwidth is ok
per move. The defaults are a bit slow especially if you have fast network
and disks.

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Atp5piP5bOYqaxnMPw6iW7yKTY8%3DxQhmO56GCcUsa_A%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f8907862-ac1d-447b-8cdd-2df948ed6579%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5