Index per company - any alternatives?


(MoD) #1

Hi,

We are running a saas crm service. We set up elasticsearch to create an
index per company (for example abc-company has its own index, xyz-company
has its own index.).

But after 1000+ company we are suspecting this may not be a correct setup.

Especially when elasticsearch restarts (due to a failure) it starts with
recovery and 1000+ index with 5 shards recovery takes forever (and %100
cpu).

Any ideas for a correct setup?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

You can over-allocate shards.

https://groups.google.com/forum/#!msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ

Here are the docs for indices aliases with routing:
http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

Jörg

Am 05.03.13 13:39, schrieb MoD:

Hi,

We are running a saas crm service. We set up elasticsearch to create
an index per company (for example abc-company has its own index,
xyz-company has its own index.).

But after 1000+ company we are suspecting this may not be a correct
setup.

Especially when elasticsearch restarts (due to a failure) it starts
with recovery and 1000+ index with 5 shards recovery takes forever
(and %100 cpu).

Any ideas for a correct setup?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Michael Sick) #3

Can you explain more about the nature of the data? If it's not time based,
as Joerg suggests using a single index with sharding and routing could be
the answer.

On Tue, Mar 5, 2013 at 7:49 AM, Jörg Prante joergprante@gmail.com wrote:

You can over-allocate shards.

https://groups.google.com/forum/#!msg/elasticsearch/49q-
_AgQCp8/MRol0t9asEcJhttps://groups.google.com/forum/#!msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ

Here are the docs for indices aliases with routing:
http://www.elasticsearch.org/guide/reference/api/admin-
indices-aliases.htmlhttp://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

Jörg

Am 05.03.13 13:39, schrieb MoD:

Hi,

We are running a saas crm service. We set up elasticsearch to create an
index per company (for example abc-company has its own index, xyz-company
has its own index.).

But after 1000+ company we are suspecting this may not be a correct setup.

Especially when elasticsearch restarts (due to a failure) it starts with
recovery and 1000+ index with 5 shards recovery takes forever (and %100
cpu).

Any ideas for a correct setup?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(MoD) #4

The data is the contacts/companies/notes of each company (domain, user). We
are using elasticsearch to index the data and full text search the data on
the site.

It is not timebased and will remain searchable as long as the company
wishes to use the product.

At first we thought that in order to limit the search within the company we
should use index per company.

But after the company/user base grew, we found out that recovery of indexes
takes too long. By the way, that is the sole reason (the recovery phase) we
wish to change the setup.

On Tuesday, 5 March 2013 15:18:00 UTC+2, Michael Sick wrote:

Can you explain more about the nature of the data? If it's not time based,
as Joerg suggests using a single index with sharding and routing could be
the answer.

On Tue, Mar 5, 2013 at 7:49 AM, Jörg Prante <joerg...@gmail.com<javascript:>

wrote:

You can over-allocate shards.

https://groups.google.com/forum/#!msg/elasticsearch/49q-
_AgQCp8/MRol0t9asEcJhttps://groups.google.com/forum/#!msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ

Here are the docs for indices aliases with routing:
http://www.elasticsearch.org/guide/reference/api/admin-
indices-aliases.htmlhttp://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

Jörg

Am 05.03.13 13:39, schrieb MoD:

Hi,

We are running a saas crm service. We set up elasticsearch to create an
index per company (for example abc-company has its own index, xyz-company
has its own index.).

But after 1000+ company we are suspecting this may not be a correct
setup.

Especially when elasticsearch restarts (due to a failure) it starts with
recovery and 1000+ index with 5 shards recovery takes forever (and %100
cpu).

Any ideas for a correct setup?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@**googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(ppearcy) #5

Are you able to figure out what is eating up the most time during recovery?
If you set index.gateway to DEBUG log level you should be able to get those
details.

One alternative solution is to tweak the index.translog.flush_threshold in
the config file. I deal with a decent number of indexes (less than you do,
though) and moving this from the default of 5000 down to 1000 helped our
recovery times. It's a tradeoff, since you will have more merges, but if
your indexing volume is small won't make a difference.

This should only require a cluster restart instead of a full data rebuild.

That being said, one index with routing key on company will definitely
help.

Best Regards,
Paul

On Tuesday, March 5, 2013 11:20:02 AM UTC-7, MoD wrote:

The data is the contacts/companies/notes of each company (domain, user).
We are using elasticsearch to index the data and full text search the data
on the site.

It is not timebased and will remain searchable as long as the company
wishes to use the product.

At first we thought that in order to limit the search within the company
we should use index per company.

But after the company/user base grew, we found out that recovery of
indexes takes too long. By the way, that is the sole reason (the recovery
phase) we wish to change the setup.

On Tuesday, 5 March 2013 15:18:00 UTC+2, Michael Sick wrote:

Can you explain more about the nature of the data? If it's not time
based, as Joerg suggests using a single index with sharding and routing
could be the answer.

On Tue, Mar 5, 2013 at 7:49 AM, Jörg Prante joerg...@gmail.com wrote:

You can over-allocate shards.

https://groups.google.com/forum/#!msg/elasticsearch/49q-
_AgQCp8/MRol0t9asEcJhttps://groups.google.com/forum/#!msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ

Here are the docs for indices aliases with routing:
http://www.elasticsearch.org/guide/reference/api/admin-
indices-aliases.htmlhttp://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

Jörg

Am 05.03.13 13:39, schrieb MoD:

Hi,

We are running a saas crm service. We set up elasticsearch to create an
index per company (for example abc-company has its own index, xyz-company
has its own index.).

But after 1000+ company we are suspecting this may not be a correct
setup.

Especially when elasticsearch restarts (due to a failure) it starts
with recovery and 1000+ index with 5 shards recovery takes forever (and
%100 cpu).

Any ideas for a correct setup?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(MoD) #6

We changed the setup to a single index (ie. master) and multiple aliases as in
https://www.elastic.co/guide/en/elasticsearch/guide/current/faking-it.html

Index creation was taking to long and restarting too. This way all our problems are solved.

Thanks for the help.


(system) #7