ElasticSearch expectable performances

I am working on porting a search engine from an sql database to
elasticsearch. The main reason for doing it is to be able to compute
facets easily.

Currently we have facets on the sql by generating precalc tables. It
works well but it's a pain to maintain and facets are supported only
on a subset of the datas.

Now the ES prototype is working, i am benchmarking the two solutions,
and it apears that the ES version is a little under the sql version in
terms of performance (in terms of maintainability it's far better).

I've used the exact same machine configutations, a 64 bits platform,
32 gigs of ram, an ssd disk, and a quad core Intel Xeon at 3ghz to
compare the sql and ES.

The document is not small, there's around 200 fields, depending on the
request, script based sorting is used, and facets are always computed
on 8 fields of the doc.

The index contains 3 millions of docs, if i'm not mistaken it's
relatively small to what ES can handle.

In terms of query, i use a filtered query, and for some requests, a
custom_filters_score query to compute the score and use it for
sorting.

Some of the filters are global because of the facets but there's
always some filters in the filtered query, so the number of docs
scanned should be reduced (not all the index is scanned).

I use two measures in my tests : the time spent on the server to
execute the search, and the number of queries by second executed by
the client running 100 threads in parallel.

For elasticsearch, the average time spent on the server is around 500
ms for each query (for 100 queries in parrallel), and the average
queries by second on the client is around 160 (some ms are lost in
building the query, sending it, receiving results, and parsing them).
And this is with an index having 1 shard and 0 replicas, when i
increase the number of shards/replicas, performances drop
significantly.

For sql, the average time spent to execute a query is around 360 ms
(idem, with 100 queries running in parrallel), and the average queris
by second on the client is around 200.

I know it's hard to compare, but as i don't have any idea of the
results i can expect, i wonder if someone can comment on these
measures.

Maybe i missed something and it should be an order of magnitude
faster, or maybe these are the typical results for similar
environnements to mine, i don't know.

What can i expect in my case ? What did you observe under similar
circumstances with ES ? Does it support concurrent requests well ?
Should the time to execute a query be in the range of 500 ms when
making 100 queries at the same time ? Are there some ways to improve
search performances ?

Any information or comments are welcome, this is an important part for
the decision to industrialize the prototype or not.

Thank you.

You don't provide enough data to even start and try and help. Lets start with you gisting a sample document, and same search requests you execute.

On Thursday, March 1, 2012 at 1:17 PM, SquareDot wrote:

I am working on porting a search engine from an sql database to
elasticsearch. The main reason for doing it is to be able to compute
facets easily.

Currently we have facets on the sql by generating precalc tables. It
works well but it's a pain to maintain and facets are supported only
on a subset of the datas.

Now the ES prototype is working, i am benchmarking the two solutions,
and it apears that the ES version is a little under the sql version in
terms of performance (in terms of maintainability it's far better).

I've used the exact same machine configutations, a 64 bits platform,
32 gigs of ram, an ssd disk, and a quad core Intel Xeon at 3ghz to
compare the sql and ES.

The document is not small, there's around 200 fields, depending on the
request, script based sorting is used, and facets are always computed
on 8 fields of the doc.

The index contains 3 millions of docs, if i'm not mistaken it's
relatively small to what ES can handle.

In terms of query, i use a filtered query, and for some requests, a
custom_filters_score query to compute the score and use it for
sorting.

Some of the filters are global because of the facets but there's
always some filters in the filtered query, so the number of docs
scanned should be reduced (not all the index is scanned).

I use two measures in my tests : the time spent on the server to
execute the search, and the number of queries by second executed by
the client running 100 threads in parallel.

For elasticsearch, the average time spent on the server is around 500
ms for each query (for 100 queries in parrallel), and the average
queries by second on the client is around 160 (some ms are lost in
building the query, sending it, receiving results, and parsing them).
And this is with an index having 1 shard and 0 replicas, when i
increase the number of shards/replicas, performances drop
significantly.

For sql, the average time spent to execute a query is around 360 ms
(idem, with 100 queries running in parrallel), and the average queris
by second on the client is around 200.

I know it's hard to compare, but as i don't have any idea of the
results i can expect, i wonder if someone can comment on these
measures.

Maybe i missed something and it should be an order of magnitude
faster, or maybe these are the typical results for similar
environnements to mine, i don't know.

What can i expect in my case ? What did you observe under similar
circumstances with ES ? Does it support concurrent requests well ?
Should the time to execute a query be in the range of 500 ms when
making 100 queries at the same time ? Are there some ways to improve
search performances ?

Any information or comments are welcome, this is an important part for
the decision to industrialize the prototype or not.

Thank you.

Ok, i gisted a document and a query here : doc+query sample · GitHub

The query i show in this example is for the most complex case (custom
filters score, script-based sorting, and facets (sometimes we have
more facets though)).
The search requests of the load test are not always as complex as this
one, we generate the requests from a database recording the search
criterias of our clients.

Thank you, if you need other informations do not hesitate.

On 1 mar, 13:44, Shay Banon kim...@gmail.com wrote:

You don't provide enough data to even start and try and help. Lets start with you gisting a sample document, and same search requests you execute.

On Thursday, March 1, 2012 at 1:17 PM, SquareDot wrote:

I am working on porting a search engine from an sql database to
elasticsearch. The main reason for doing it is to be able to compute
facets easily.

Currently we have facets on the sql by generating precalc tables. It
works well but it's a pain to maintain and facets are supported only
on a subset of the datas.

Now the ES prototype is working, i am benchmarking the two solutions,
and it apears that the ES version is a little under the sql version in
terms of performance (in terms of maintainability it's far better).

I've used the exact same machine configutations, a 64 bits platform,
32 gigs of ram, an ssd disk, and a quad core Intel Xeon at 3ghz to
compare the sql and ES.

The document is not small, there's around 200 fields, depending on the
request, script based sorting is used, and facets are always computed
on 8 fields of the doc.

The index contains 3 millions of docs, if i'm not mistaken it's
relatively small to what ES can handle.

In terms of query, i use a filtered query, and for some requests, a
custom_filters_score query to compute the score and use it for
sorting.

Some of the filters are global because of the facets but there's
always some filters in the filtered query, so the number of docs
scanned should be reduced (not all the index is scanned).

I use two measures in my tests : the time spent on the server to
execute the search, and the number of queries by second executed by
the client running 100 threads in parallel.

For elasticsearch, the average time spent on the server is around 500
ms for each query (for 100 queries in parrallel), and the average
queries by second on the client is around 160 (some ms are lost in
building the query, sending it, receiving results, and parsing them).
And this is with an index having 1 shard and 0 replicas, when i
increase the number of shards/replicas, performances drop
significantly.

For sql, the average time spent to execute a query is around 360 ms
(idem, with 100 queries running in parrallel), and the average queris
by second on the client is around 200.

I know it's hard to compare, but as i don't have any idea of the
results i can expect, i wonder if someone can comment on these
measures.

Maybe i missed something and it should be an order of magnitude
faster, or maybe these are the typical results for similar
environnements to mine, i don't know.

What can i expect in my case ? What did you observe under similar
circumstances with ES ? Does it support concurrent requests well ?
Should the time to execute a query be in the range of 500 ms when
making 100 queries at the same time ? Are there some ways to improve
search performances ?

Any information or comments are welcome, this is an important part for
the decision to industrialize the prototype or not.

Thank you.

The search request is quite complex… . Some points to try and optimize it is to try and combine some of the and/or filters combinations into a single bool filter if you can (which can have multiple should/must/much_not clauses), and if you can, try and create a dummy field in the doc that will not require using script sorting (that already holds that logic there). In any case, its strange that you can map this to a SQL test case :slight_smile:

On Thursday, March 1, 2012 at 4:20 PM, SquareDot wrote:

Ok, i gisted a document and a query here : doc+query sample · GitHub

The query i show in this example is for the most complex case (custom
filters score, script-based sorting, and facets (sometimes we have
more facets though)).
The search requests of the load test are not always as complex as this
one, we generate the requests from a database recording the search
criterias of our clients.

Thank you, if you need other informations do not hesitate.

On 1 mar, 13:44, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

You don't provide enough data to even start and try and help. Lets start with you gisting a sample document, and same search requests you execute.

On Thursday, March 1, 2012 at 1:17 PM, SquareDot wrote:

I am working on porting a search engine from an sql database to
elasticsearch. The main reason for doing it is to be able to compute
facets easily.

Currently we have facets on the sql by generating precalc tables. It
works well but it's a pain to maintain and facets are supported only
on a subset of the datas.

Now the ES prototype is working, i am benchmarking the two solutions,
and it apears that the ES version is a little under the sql version in
terms of performance (in terms of maintainability it's far better).

I've used the exact same machine configutations, a 64 bits platform,
32 gigs of ram, an ssd disk, and a quad core Intel Xeon at 3ghz to
compare the sql and ES.

The document is not small, there's around 200 fields, depending on the
request, script based sorting is used, and facets are always computed
on 8 fields of the doc.

The index contains 3 millions of docs, if i'm not mistaken it's
relatively small to what ES can handle.

In terms of query, i use a filtered query, and for some requests, a
custom_filters_score query to compute the score and use it for
sorting.

Some of the filters are global because of the facets but there's
always some filters in the filtered query, so the number of docs
scanned should be reduced (not all the index is scanned).

I use two measures in my tests : the time spent on the server to
execute the search, and the number of queries by second executed by
the client running 100 threads in parallel.

For elasticsearch, the average time spent on the server is around 500
ms for each query (for 100 queries in parrallel), and the average
queries by second on the client is around 160 (some ms are lost in
building the query, sending it, receiving results, and parsing them).
And this is with an index having 1 shard and 0 replicas, when i
increase the number of shards/replicas, performances drop
significantly.

For sql, the average time spent to execute a query is around 360 ms
(idem, with 100 queries running in parrallel), and the average queris
by second on the client is around 200.

I know it's hard to compare, but as i don't have any idea of the
results i can expect, i wonder if someone can comment on these
measures.

Maybe i missed something and it should be an order of magnitude
faster, or maybe these are the typical results for similar
environnements to mine, i don't know.

What can i expect in my case ? What did you observe under similar
circumstances with ES ? Does it support concurrent requests well ?
Should the time to execute a query be in the range of 500 ms when
making 100 queries at the same time ? Are there some ways to improve
search performances ?

Any information or comments are welcome, this is an important part for
the decision to industrialize the prototype or not.

Thank you.

Thanks.
I tried to switch to a boolean filter (i placed the filters previously
combined by a and filter in the must clause of the bool filter) but in
fact, the performance was worse.
Do you have any idea why ?

One simple thing that improved performances was to replace the JVM coming
with debian by the one from oracle, my guess is that it handles the 100
threads better.

PS: What do you mean by 'it's strange that you can map this to a SQL test
case" ? Should ES performance be a lot better than SQL ? Do you think i'm
doing something wrong ?

Le 3 mars 2012 22:32, Shay Banon kimchy@gmail.com a écrit :

The search request is quite complex… . Some points to try and optimize it
is to try and combine some of the and/or filters combinations into a single
bool filter if you can (which can have multiple should/must/much_not
clauses), and if you can, try and create a dummy field in the doc that will
not require using script sorting (that already holds that logic there). In
any case, its strange that you can map this to a SQL test case :slight_smile:

On Thursday, March 1, 2012 at 4:20 PM, SquareDot wrote:

Ok, i gisted a document and a query here : doc+query sample · GitHub

The query i show in this example is for the most complex case (custom
filters score, script-based sorting, and facets (sometimes we have
more facets though)).
The search requests of the load test are not always as complex as this
one, we generate the requests from a database recording the search
criterias of our clients.

Thank you, if you need other informations do not hesitate.

On 1 mar, 13:44, Shay Banon kim...@gmail.com wrote:

You don't provide enough data to even start and try and help. Lets start
with you gisting a sample document, and same search requests you execute.

On Thursday, March 1, 2012 at 1:17 PM, SquareDot wrote:

I am working on porting a search engine from an sql database to
elasticsearch. The main reason for doing it is to be able to compute
facets easily.

Currently we have facets on the sql by generating precalc tables. It
works well but it's a pain to maintain and facets are supported only
on a subset of the datas.

Now the ES prototype is working, i am benchmarking the two solutions,
and it apears that the ES version is a little under the sql version in
terms of performance (in terms of maintainability it's far better).

I've used the exact same machine configutations, a 64 bits platform,
32 gigs of ram, an ssd disk, and a quad core Intel Xeon at 3ghz to
compare the sql and ES.

The document is not small, there's around 200 fields, depending on the
request, script based sorting is used, and facets are always computed
on 8 fields of the doc.

The index contains 3 millions of docs, if i'm not mistaken it's
relatively small to what ES can handle.

In terms of query, i use a filtered query, and for some requests, a
custom_filters_score query to compute the score and use it for
sorting.

Some of the filters are global because of the facets but there's
always some filters in the filtered query, so the number of docs
scanned should be reduced (not all the index is scanned).

I use two measures in my tests : the time spent on the server to
execute the search, and the number of queries by second executed by
the client running 100 threads in parallel.

For elasticsearch, the average time spent on the server is around 500
ms for each query (for 100 queries in parrallel), and the average
queries by second on the client is around 160 (some ms are lost in
building the query, sending it, receiving results, and parsing them).
And this is with an index having 1 shard and 0 replicas, when i
increase the number of shards/replicas, performances drop
significantly.

For sql, the average time spent to execute a query is around 360 ms
(idem, with 100 queries running in parrallel), and the average queris
by second on the client is around 200.

I know it's hard to compare, but as i don't have any idea of the
results i can expect, i wonder if someone can comment on these
measures.

Maybe i missed something and it should be an order of magnitude
faster, or maybe these are the typical results for similar
environnements to mine, i don't know.

What can i expect in my case ? What did you observe under similar
circumstances with ES ? Does it support concurrent requests well ?
Should the time to execute a query be in the range of 500 ms when
making 100 queries at the same time ? Are there some ways to improve
search performances ?

Any information or comments are welcome, this is an important part for
the decision to industrialize the prototype or not.

Thank you.