Some elasticsearch thrift test performances - python


(Alberto Paro-2) #1

I'm writing an elasticsearch "driver" in python.
I'm started the project looking pyelasticsearch and I created a new project
because I don't like the previous api and I want to experiment new way.

The project is in very alpha state, but I have implemented first a connection using standard library via HTTP,
the using the thrift interface.

This is a small dump of results.

(pyes)MBPAlbertoParo:tests alberto$ python generate_dataset.py 10000
samples.shelve generated with 10000 samples

Urllib

(pyes)MBPAlbertoParo:tests alberto$ python performance.py
time: 0:00:08.652321
(pyes)MBPAlbertoParo:tests alberto$ python performance.py
time: 0:00:08.282428
(pyes)MBPAlbertoParo:tests alberto$ python performance.py
time: 0:00:08.889818
(pyes)MBPAlbertoParo:tests alberto$

Thrift

(pyes)MBPAlbertoParo:tests alberto$ python performance.py
time: 0:00:04.448639
(pyes)MBPAlbertoParo:tests alberto$ python performance.py
time: 0:00:04.812295
(pyes)MBPAlbertoParo:tests alberto$ python performance.py
time: 0:00:04.301892
(pyes)MBPAlbertoParo:tests alberto$

This is a first step. I'm investigating in using multiprocess and a producer/consumer to increase the parallel/throughtput.
I'm open to suggestions and hits.

Hi,
Alberto


(Shay Banon) #2

Look good Alberto!, I see you already have it on github (
http://github.com/aparo/pyes), I added it to the projects page on the site.

p.s. Love the name (pyes)!

On Sat, Sep 11, 2010 at 12:14 AM, Alberto Paro alberto.paro@gmail.comwrote:

I'm writing an elasticsearch "driver" in python.
I'm started the project looking pyelasticsearch and I created a new project
because I don't like the previous api and I want to experiment new way.

The project is in very alpha state, but I have implemented first a
connection using standard library via HTTP,
the using the thrift interface.

This is a small dump of results.

(pyes)MBPAlbertoParo:tests alberto$ python generate_dataset.py 10000
samples.shelve generated with 10000 samples

Urllib

(pyes)MBPAlbertoParo:tests alberto$ python performance.py
time: 0:00:08.652321
(pyes)MBPAlbertoParo:tests alberto$ python performance.py
time: 0:00:08.282428
(pyes)MBPAlbertoParo:tests alberto$ python performance.py
time: 0:00:08.889818
(pyes)MBPAlbertoParo:tests alberto$

Thrift

(pyes)MBPAlbertoParo:tests alberto$ python performance.py
time: 0:00:04.448639
(pyes)MBPAlbertoParo:tests alberto$ python performance.py
time: 0:00:04.812295
(pyes)MBPAlbertoParo:tests alberto$ python performance.py
time: 0:00:04.301892
(pyes)MBPAlbertoParo:tests alberto$

This is a first step. I'm investigating in using multiprocess and a
producer/consumer to increase the parallel/throughtput.
I'm open to suggestions and hits.

Hi,
Alberto


(system) #3