Can't get Documents Deleted below 40% - performance issues - help needed

Hi,

We've been using ElasticSearch (through qBox) for almost 3 months now, but
something's very wrong and we need help on this one. I'll try to give as
much context as possible.

Last month we had a major downtime (almost 24hr). We were using a single
node with 2GB and after this downtime we upgraded to a 3 node 4GB setup. We
are online again, but our Documents Deleted values are still way too high
(40-50% and more) and they never go down, which leaves us uncomfortable.
We've contacted qBox, which are very heplful, but they always say we need a
bigger node because of that. It's really not an option, as hosting costs of
our product skyrocketed with this.

Our product is an invoicing platform. Our text search, filters and all
invoice listings now get all their data from elastic instead of MySQL, so
we can't rely (we think) on bulk indexing, as the invoices, items, clients,
etc.. need to be indexed asap to be shown in listings. Maybe we have a
design issue and shouldn't be using elastic for so many things, it's
something we need to understand better.

At the moment, our main concern in making sure we're ok in the long run and
avoid another huge downtime at all costs.

Can someone help in some way? We'd be very grateful.

Some more info:

  • We're still not using round robbin. Could this help?
  • Version 1.2.4. It's on our roadmap to upgrade but we've heard stories
    of worst performance in some cases (groovy-related).
  • I've added some screens with node info from qbox.

Thanks,
Ricardo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eca7aa05-fda6-4c21-9fff-dd418d6ae2cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

If you're updating single documents often then you should expect high
delete rates, your heap and CPU use seems to be ok though so it's not
anything to be super concerned with (for now). You do have the option of
forcing an optimise (which does a merge and removes deleted docs), but this
is resource intensive so be careful of when you run it, ie it'd be best to
schedule it to run when your users are asleep.

Your main indices seem to only have 2 shards, with no replicas. You should
really have one shard per node, with one replica to give you some
redundancy for a total of 6 shards, 3 primary, 3 replica. Currently you
will have an imbalance of data, which means some nodes are more overloaded
than others, plus without replicas you cannot survive the loss of a node.

Without knowing your use, it also looks like something is creating indices
that shouldn't be there - eg phppath, cgi-bin, cfg, phpmyadmin - which
doesn't look like a problem at the moment, but is something you should
check, out if they shouldn't be there that is).

Finally, if you want better performance, move to Oracle JDK and then
upgrade to the latest release (1.4.3). There is a lot of improvements in
the latest versions, though I cannot comment on the Groovy side of your
question.

On 2 February 2015 at 06:31, Ricardo Fiel ricardo.fiel@rupeal.com wrote:

Hi,

We've been using Elasticsearch (through qBox) for almost 3 months now, but
something's very wrong and we need help on this one. I'll try to give as
much context as possible.

Last month we had a major downtime (almost 24hr). We were using a single
node with 2GB and after this downtime we upgraded to a 3 node 4GB setup. We
are online again, but our Documents Deleted values are still way too high
(40-50% and more) and they never go down, which leaves us uncomfortable.
We've contacted qBox, which are very heplful, but they always say we need a
bigger node because of that. It's really not an option, as hosting costs of
our product skyrocketed with this.

Our product is an invoicing platform. Our text search, filters and all
invoice listings now get all their data from elastic instead of MySQL, so
we can't rely (we think) on bulk indexing, as the invoices, items, clients,
etc.. need to be indexed asap to be shown in listings. Maybe we have a
design issue and shouldn't be using elastic for so many things, it's
something we need to understand better.

At the moment, our main concern in making sure we're ok in the long run
and avoid another huge downtime at all costs.

Can someone help in some way? We'd be very grateful.

Some more info:

  • We're still not using round robbin. Could this help?
  • Version 1.2.4. It's on our roadmap to upgrade but we've heard
    stories of worst performance in some cases (groovy-related).
  • I've added some screens with node info from qbox.

Thanks,
Ricardo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/eca7aa05-fda6-4c21-9fff-dd418d6ae2cb%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/eca7aa05-fda6-4c21-9fff-dd418d6ae2cb%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-L-C%3DSfNjRN6559%3DGsQ90qY_HsugOmV-i2Di3vOf8r1w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

It's normal to see 40-60% deleted docs if you frequently update existing
documents. See this recent blog post I wrote for some details:

Mike McCandless

http://blog.mikemccandless.com

On Sun, Feb 1, 2015 at 3:50 PM, Mark Walkom markwalkom@gmail.com wrote:

If you're updating single documents often then you should expect high
delete rates, your heap and CPU use seems to be ok though so it's not
anything to be super concerned with (for now). You do have the option of
forcing an optimise (which does a merge and removes deleted docs), but this
is resource intensive so be careful of when you run it, ie it'd be best to
schedule it to run when your users are asleep.

Your main indices seem to only have 2 shards, with no replicas. You should
really have one shard per node, with one replica to give you some
redundancy for a total of 6 shards, 3 primary, 3 replica. Currently you
will have an imbalance of data, which means some nodes are more overloaded
than others, plus without replicas you cannot survive the loss of a node.

Without knowing your use, it also looks like something is creating indices
that shouldn't be there - eg phppath, cgi-bin, cfg, phpmyadmin - which
doesn't look like a problem at the moment, but is something you should
check, out if they shouldn't be there that is).

Finally, if you want better performance, move to Oracle JDK and then
upgrade to the latest release (1.4.3). There is a lot of improvements in
the latest versions, though I cannot comment on the Groovy side of your
question.

On 2 February 2015 at 06:31, Ricardo Fiel ricardo.fiel@rupeal.com wrote:

Hi,

We've been using Elasticsearch (through qBox) for almost 3 months now,
but something's very wrong and we need help on this one. I'll try to give
as much context as possible.

Last month we had a major downtime (almost 24hr). We were using a single
node with 2GB and after this downtime we upgraded to a 3 node 4GB setup. We
are online again, but our Documents Deleted values are still way too high
(40-50% and more) and they never go down, which leaves us uncomfortable.
We've contacted qBox, which are very heplful, but they always say we need a
bigger node because of that. It's really not an option, as hosting costs of
our product skyrocketed with this.

Our product is an invoicing platform. Our text search, filters and all
invoice listings now get all their data from elastic instead of MySQL, so
we can't rely (we think) on bulk indexing, as the invoices, items, clients,
etc.. need to be indexed asap to be shown in listings. Maybe we have a
design issue and shouldn't be using elastic for so many things, it's
something we need to understand better.

At the moment, our main concern in making sure we're ok in the long run
and avoid another huge downtime at all costs.

Can someone help in some way? We'd be very grateful.

Some more info:

  • We're still not using round robbin. Could this help?
  • Version 1.2.4. It's on our roadmap to upgrade but we've heard
    stories of worst performance in some cases (groovy-related).
  • I've added some screens with node info from qbox.

Thanks,
Ricardo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/eca7aa05-fda6-4c21-9fff-dd418d6ae2cb%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/eca7aa05-fda6-4c21-9fff-dd418d6ae2cb%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-L-C%3DSfNjRN6559%3DGsQ90qY_HsugOmV-i2Di3vOf8r1w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-L-C%3DSfNjRN6559%3DGsQ90qY_HsugOmV-i2Di3vOf8r1w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRcJKMYKg-OX2CkpGezF9VDNYpXOE0mvewFU112j1yXSNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mark, hi Mike,

Thanks both for helping on this. The article's great, I didn't know about
the 40-60 sawtooth pattern.

So, I guess this means we should be ok for now as long as we monitor CPU
usage, correct?

Don't have access to the server OS (that's on qBox side), so moving to
Oracle JDK is not an option.

I thought the 3 node set-up is the recommended, so when a node goes down,
the cluster could recover automatically, but I was scratching my head on
why there were no replicas. I guess your answer made it clear. I need to
ask qBox this.

Next step is changing the app to use round robbin, that may distribute the
load a bit better.

Thanks,
Ricardo

On Sunday, February 1, 2015 at 9:06:04 PM UTC, Michael McCandless wrote:

It's normal to see 40-60% deleted docs if you frequently update existing
documents. See this recent blog post I wrote for some details:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Mike McCandless

http://blog.mikemccandless.com

On Sun, Feb 1, 2015 at 3:50 PM, Mark Walkom <markw...@gmail.com
<javascript:>> wrote:

If you're updating single documents often then you should expect high
delete rates, your heap and CPU use seems to be ok though so it's not
anything to be super concerned with (for now). You do have the option of
forcing an optimise (which does a merge and removes deleted docs), but this
is resource intensive so be careful of when you run it, ie it'd be best to
schedule it to run when your users are asleep.

Your main indices seem to only have 2 shards, with no replicas. You
should really have one shard per node, with one replica to give you some
redundancy for a total of 6 shards, 3 primary, 3 replica. Currently you
will have an imbalance of data, which means some nodes are more overloaded
than others, plus without replicas you cannot survive the loss of a node.

Without knowing your use, it also looks like something is creating
indices that shouldn't be there - eg phppath, cgi-bin, cfg, phpmyadmin -
which doesn't look like a problem at the moment, but is something you
should check, out if they shouldn't be there that is).

Finally, if you want better performance, move to Oracle JDK and then
upgrade to the latest release (1.4.3). There is a lot of improvements in
the latest versions, though I cannot comment on the Groovy side of your
question.

On 2 February 2015 at 06:31, Ricardo Fiel <ricard...@rupeal.com
<javascript:>> wrote:

Hi,

We've been using Elasticsearch (through qBox) for almost 3 months now,
but something's very wrong and we need help on this one. I'll try to give
as much context as possible.

Last month we had a major downtime (almost 24hr). We were using a single
node with 2GB and after this downtime we upgraded to a 3 node 4GB setup. We
are online again, but our Documents Deleted values are still way too high
(40-50% and more) and they never go down, which leaves us uncomfortable.
We've contacted qBox, which are very heplful, but they always say we need a
bigger node because of that. It's really not an option, as hosting costs of
our product skyrocketed with this.

Our product is an invoicing platform. Our text search, filters and all
invoice listings now get all their data from elastic instead of MySQL, so
we can't rely (we think) on bulk indexing, as the invoices, items, clients,
etc.. need to be indexed asap to be shown in listings. Maybe we have a
design issue and shouldn't be using elastic for so many things, it's
something we need to understand better.

At the moment, our main concern in making sure we're ok in the long run
and avoid another huge downtime at all costs.

Can someone help in some way? We'd be very grateful.

Some more info:

  • We're still not using round robbin. Could this help?
  • Version 1.2.4. It's on our roadmap to upgrade but we've heard
    stories of worst performance in some cases (groovy-related).
  • I've added some screens with node info from qbox.

Thanks,
Ricardo

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/eca7aa05-fda6-4c21-9fff-dd418d6ae2cb%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/eca7aa05-fda6-4c21-9fff-dd418d6ae2cb%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-L-C%3DSfNjRN6559%3DGsQ90qY_HsugOmV-i2Di3vOf8r1w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-L-C%3DSfNjRN6559%3DGsQ90qY_HsugOmV-i2Di3vOf8r1w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c524e7fc-fea4-46cb-bf8c-ab1bfe2f11c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.