Stress testing queries

In our case, we're just interested in query stress testing. We've got a web
app that queries our indexes that are organized based on weeks of the year,
with a bunch of aliases making it so specific portions of the data can be
reached easily. Questions about scaling the app have come up. In our case,
that means testing through the app itself, which so far only makes queries.
I figure we should load test our cluster directly too, so we can see if
there is a bottleneck somewhere in the app, if any eventual bottlenecks are
on the cluster itself.

So far I haven't been able to really max out the indexing rate on a system
that is adequately equipped with resources, that I can tell. I've had 32
sub-process Python workers happily sending, I think, ~5+ million records an
hour to our cluster with no problem in indexing speed or other response
time when backloading some data.
My current strategy is to get the ugliest heavy queries the application
runs and simply use ABS or something similar to run queries over http with
variables that are in a reasonable range. If I can make my cluster crash by
doing that, I know that'll be my upper limit!

On Thursday, January 30, 2014 3:59:19 PM UTC-8, Jörg Prante wrote:

Just a few questions, because I'm also interested in load testing.

What kind of stress do you think of? Random data? Wikipedia? Logfiles?
Just query? What about indexing? And what client? Java? Other script
languages? How should the cluster be configured, one node? two or more
nodes? Index shards? Replica? etc. etc.

There are so many variants and options out there, I believe this is one of
the reason why a compelling load testing tool is still missing.

It would be nice to have a tool to upload ES performance profiles to a
public web site, for checking how well an ES cluster is tuned in comparison
to others. A measure unit for comparing performance is needed to be
defined, e.g. "this cluster performs with a power factor of 1.0, this
cluster has power factor 1.5, 2.0, ..."

That's only possible when all software and hardware characteristics are
properly taken into account, plus "application profiles" for a typical
workloads, so it can be decided which configuration is best for what
purpose.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8b33e17a-0b27-4926-a8bd-f467357198c5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.