Php client benchmarks, which library is faster?


(Zelfapp) #1

The only php client benchmark I can find for elasticsearch seems big time
outdated: http://jolicode.com/blog/elasticsearch-php-clients-test-drive

We are currently developing using the elasticsearch-php official library

The benchmark on the jolicode.com sight makes it appear that elastica has
the official php library beat, but this is outdated benchmark. Nervetattoo
is no longer being developed for example. We don't have the time to create
our own benchmark.

Hoping someone out there can tell me the benchmark status of the official
library compared to elastica, which is being actively developed.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5858b1c7-fb5b-45fe-8feb-86e857c66cf3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(zachary.tong) #2

Heya, author of the official PHP client here. The only benchmark that I
know of is the one you posted, and it is indeed a bit out of date. Afraid
I don't have any hard numbers to give you either, I've never written a
benchmark to compare the two clients myself.

There are several considerations when it comes to performance and
benchmarking. What is your end-goal? Do you want high throughput? Low
latency? Both? For throughput, both Elastica and Elasticsearch-PHP are
largely equivalent, once the clients have both been "warmed". Both clients
use PHP's curl_multi interface for talking to the server, so the
difference in throughput boils down to relatively minor differences in code
execution paths. The majority of the time is dominated by network
transfer, so they are roughly equivalent.

Under default settings, Elastica tends to beat Elasticsearch-PHP for
latency on one-off queries. ES-PHP 1.0 uses Guzzle3, which is a bit
"bulky". The first time ES-PHP fires off a query, it needs to autoload a
lot of Guzzle classes from disk, which can slow it down considerably
compared to Elastica. If you have a bytecode cache installed, this
difference largely disappears. Or if you use HHVM.

If you switch ES-PHP over to the CurlMultiConnection class, the
difference largely disappears too, since this effectively replaces Guzzle
with a single class that performs the same workload.

Finally, ES-PHP 2.0.beta1 was tagged a month or so ago, and offers a new
"async" mode that better leverages curl_multi. It allows you to send
requests in parallel to nodes in batches. This crudely approximates
threading in other languages, and can give a huge boost in throughput.
From my (informal, unscientific) testing, it's about 90-170% faster than
ES-PHP 1.0 (note this was on a single node, theoretically greater boost for
multi-node cluster). Even then synchronous, non-batch mode was quite a bit
faster:

https://lh3.googleusercontent.com/-eeGDxH49zBY/VRxumlUgqeI/AAAAAAAAAAM/j1wfunxM2t8/s1600/B9g-yvqIgAA0bat.png
I don't have a direct comparison against Elastica, but I imagine it will be
faster for the same reason it is faster than ES-PHP 1.0 ... it is simply
executing in parallel.

Other benefits of the official library include someone (me) who is paid to
maintain it, and develop new features like the async functionality in 2.0.
ES-PHP also undergoes continuous integration testing against multiple ES
versions, using our extensive test suite (which all the clients run). That
means we support the entire set Elasticsearch APIs, and support new APIs
shortly after they are added. It supports connection pooling, configurable
selection strategies, etc. It's highly pluggable, so you can write your own
behavior if you want. And finally, it has similar syntax to other language
clients, in case you work in a multi-language environment.

Now, with all that said, Elastica is a great project. Ruflin has done a
great job maintaining it over the years, and a lot of people prefer the API
to the more generic / verbose ES-PHP API. I've told people before that if
they are happy with Elastica, there isn't a suuuper compelling reason to
switch. The new 2.0 async behavior may be a reason, but the jury is still
out on that.

Cheers,
-Zach

On Wednesday, April 1, 2015 at 5:55:58 PM UTC-4, Zelfapp wrote:

The only php client benchmark I can find for elasticsearch seems big time
outdated: http://jolicode.com/blog/elasticsearch-php-clients-test-drive

We are currently developing using the elasticsearch-php official library
https://github.com/elastic/elasticsearch-php

The benchmark on the jolicode.com sight makes it appear that elastica has
the official php library beat, but this is outdated benchmark. Nervetattoo
is no longer being developed for example. We don't have the time to create
our own benchmark.

Hoping someone out there can tell me the benchmark status of the official
library compared to elastica, which is being actively developed.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/862de673-9ea2-4216-9284-5084e8bf716a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Zelfapp) #3

Really appreciate you reaching out. We are planning on continuing to use
es-php for all of the reasons you laid out. Elastica is a great project and
we started with that, but ultimately decided to work with the officially
supported project. Our main use for elasticsearch will be for searching an
index of 25,000 products on an ecommerce site. We'll have to checkout 2.0.
Thank you.

On Wednesday, April 1, 2015 at 3:27:46 PM UTC-7, zachary.tong wrote:

Heya, author of the official PHP client here. The only benchmark that I
know of is the one you posted, and it is indeed a bit out of date. Afraid
I don't have any hard numbers to give you either, I've never written a
benchmark to compare the two clients myself.

There are several considerations when it comes to performance and
benchmarking. What is your end-goal? Do you want high throughput? Low
latency? Both? For throughput, both Elastica and Elasticsearch-PHP are
largely equivalent, once the clients have both been "warmed". Both clients
use PHP's curl_multi interface for talking to the server, so the
difference in throughput boils down to relatively minor differences in code
execution paths. The majority of the time is dominated by network
transfer, so they are roughly equivalent.

Under default settings, Elastica tends to beat Elasticsearch-PHP for
latency on one-off queries. ES-PHP 1.0 uses Guzzle3, which is a bit
"bulky". The first time ES-PHP fires off a query, it needs to autoload a
lot of Guzzle classes from disk, which can slow it down considerably
compared to Elastica. If you have a bytecode cache installed, this
difference largely disappears. Or if you use HHVM.

If you switch ES-PHP over to the CurlMultiConnection class, the
difference largely disappears too, since this effectively replaces Guzzle
with a single class that performs the same workload.

Finally, ES-PHP 2.0.beta1 was tagged a month or so ago, and offers a new
"async" mode that better leverages curl_multi. It allows you to send
requests in parallel to nodes in batches. This crudely approximates
threading in other languages, and can give a huge boost in throughput.
From my (informal, unscientific) testing, it's about 90-170% faster than
ES-PHP 1.0 (note this was on a single node, theoretically greater boost for
multi-node cluster). Even then synchronous, non-batch mode was quite a bit
faster:

https://lh3.googleusercontent.com/-eeGDxH49zBY/VRxumlUgqeI/AAAAAAAAAAM/j1wfunxM2t8/s1600/B9g-yvqIgAA0bat.png
I don't have a direct comparison against Elastica, but I imagine it will
be faster for the same reason it is faster than ES-PHP 1.0 ... it is simply
executing in parallel.

Other benefits of the official library include someone (me) who is paid to
maintain it, and develop new features like the async functionality in 2.0.
ES-PHP also undergoes continuous integration testing against multiple ES
versions, using our extensive test suite (which all the clients run). That
means we support the entire set Elasticsearch APIs, and support new
APIs shortly after they are added. It supports connection pooling,
configurable selection strategies, etc. It's highly pluggable, so you can
write your own behavior if you want. And finally, it has similar syntax to
other language clients, in case you work in a multi-language environment.

Now, with all that said, Elastica is a great project. Ruflin has done a
great job maintaining it over the years, and a lot of people prefer the API
to the more generic / verbose ES-PHP API. I've told people before that if
they are happy with Elastica, there isn't a suuuper compelling reason to
switch. The new 2.0 async behavior may be a reason, but the jury is still
out on that.

Cheers,
-Zach

On Wednesday, April 1, 2015 at 5:55:58 PM UTC-4, Zelfapp wrote:

The only php client benchmark I can find for elasticsearch seems big time
outdated: http://jolicode.com/blog/elasticsearch-php-clients-test-drive

We are currently developing using the elasticsearch-php official library
https://github.com/elastic/elasticsearch-php

The benchmark on the jolicode.com sight makes it appear that elastica
has the official php library beat, but this is outdated benchmark.
Nervetattoo is no longer being developed for example. We don't have the
time to create our own benchmark.

Hoping someone out there can tell me the benchmark status of the official
library compared to elastica, which is being actively developed.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/edafb50d-986d-4133-9039-24eb4ee6e89a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(ruflin) #4

Unsurprisingly this is a also a question I get asked very frequently. The answer from @zachary_tong describes the basic differences very well. Based on this I also made the decision to publish a page which I started some time ago but never finished. It's a comparison between the two clients and some explanations: http://elastica.io/elastica-vs-elasticsearch-php/ Through publishing it, I hope some others will contribute to it.

One idea I have in my had since quite some time but never found the time to try it out is that Elastica could use elasticsearch-php as transport layer below. Like this, Elastica would be an extension of the client and would profit from all speed improvements made to the client. This could be possible as Elastica is for most parts only a mapper of objects and functions to arrays. Currently Elastica supports several different Transport layer, but most of them (Thrift, Memcache) will be disabled with elasticsearch 2.0 anyways, which will reduce the complexity here and make this idea more feasible.

If someone is interested to spin this idea further, let me know.


(system) #5