I'm writing an elasticsearch "driver" in python.
I'm started the project looking pyelasticsearch and I created a new project
because I don't like the previous api and I want to experiment new way.
The project is in very alpha state, but I have implemented first a connection using standard library via HTTP,
the using the thrift interface.
This is a small dump of results.
(pyes)MBPAlbertoParo:tests alberto$ python generate_dataset.py 10000
samples.shelve generated with 10000 samples
This is a first step. I'm investigating in using multiprocess and a producer/consumer to increase the parallel/throughtput.
I'm open to suggestions and hits.
I'm writing an elasticsearch "driver" in python.
I'm started the project looking pyelasticsearch and I created a new project
because I don't like the previous api and I want to experiment new way.
The project is in very alpha state, but I have implemented first a
connection using standard library via HTTP,
the using the thrift interface.
This is a small dump of results.
(pyes)MBPAlbertoParo:tests alberto$ python generate_dataset.py 10000
samples.shelve generated with 10000 samples
This is a first step. I'm investigating in using multiprocess and a
producer/consumer to increase the parallel/throughtput.
I'm open to suggestions and hits.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.