Another strange behavior in terms of performance. If I am time the
following code I get a good performance..about about 100k search results in
2 seconds.
However if I try to retrieve the top search result using result object, the
performance goes down by about 100 times.
conn = ES(['128.55.54.149:9200','128.55.54.149:9201'],timeout=20)
q1 = TermQuery("tax_name",query.lower().strip())
results = conn.search(query=q1) #getting the tophit
if results:
return results[0]
I would think once the search is done and the results are returned in a
result object one would expect the post processing to take a negligible
overhead..I am not seeing it. Anything I am messing up ?
However if I try to retrieve the top search result using result object, the performance goes down by about 100 times.
conn = ES(['128.55.54.149:9200','128.55.54.149:9201'],timeout=20)
q1 = TermQuery("tax_name",query.lower().strip())
results = conn.search(query=q1) #getting the tophit
if results:
return results[0]
I would think once the search is done and the results are returned in a result object one would expect the post processing to take a negligible overhead..I am not seeing it. Anything I am messing up ?
iirc, pyes fetches results lazily. That is, it won't actually execute the search until you start doing anything with 'results'. If you dig a bit deeper, you'll probably find that your search isn't actually being executed at all in your first example.
Thanks Dan. In that case I would say that the performance I am getting is
pretty low compared to what I would expect. Currently I am getting 1000
search hits in 2-3 second interval. Can I do anything to improve this..Also
what would be a optimum number once could get with a ES implementation.
With the tweaks suggested by Radu in an earlier thread I was able to index
12-15K records per second and expected to get 100k results per second
during searching.
-Abhi
On Wednesday, August 15, 2012 1:45:22 PM UTC-7, Dan Fairs wrote:
However if I try to retrieve the top search result using result object,
the performance goes down by about 100 times.
conn = ES(['128.55.54.149:9200','128.55.54.149:9201'],timeout=20)
q1 = TermQuery("tax_name",query.lower().strip())
results = conn.search(query=q1) #getting the tophit
if results:
return results[0]
I would think once the search is done and the results are returned in a
result object one would expect the post processing to take a negligible
overhead..I am not seeing it. Anything I am messing up ?
iirc, pyes fetches results lazily. That is, it won't actually execute the
search until you start doing anything with 'results'. If you dig a bit
deeper, you'll probably find that your search isn't actually being executed
at all in your first example.
Apologies for pushing it once more. Can any one help me figure out why my
search queries on ES are slow. I am able to get 1000 results in 2-3
seconds(details in the first post of this thread). I would expect the
performance to be atleast 50-100 times more faster than this. I hope thats
a realistic expectation.
best,
-Abhi
On Wednesday, August 15, 2012 2:20:04 PM UTC-7, Abhishek Pratap wrote:
Thanks Dan. In that case I would say that the performance I am getting is
pretty low compared to what I would expect. Currently I am getting 1000
search hits in 2-3 second interval. Can I do anything to improve this..Also
what would be a optimum number once could get with a ES implementation.
With the tweaks suggested by Radu in an earlier thread I was able to index
12-15K records per second and expected to get 100k results per second
during searching.
-Abhi
On Wednesday, August 15, 2012 1:45:22 PM UTC-7, Dan Fairs wrote:
However if I try to retrieve the top search result using result object,
the performance goes down by about 100 times.
conn = ES(['128.55.54.149:9200','128.55.54.149:9201'],timeout=20)
q1 = TermQuery("tax_name",query.lower().strip())
results = conn.search(query=q1) #getting the tophit
if results:
return results[0]
I would think once the search is done and the results are returned in a
result object one would expect the post processing to take a negligible
overhead..I am not seeing it. Anything I am messing up ?
iirc, pyes fetches results lazily. That is, it won't actually execute the
search until you start doing anything with 'results'. If you dig a bit
deeper, you'll probably find that your search isn't actually being executed
at all in your first example.
Be sure to use the latest version of pyes and requests, or the development version of pyes from github (which goes back to use urllib3).
We had troubles with certain combination of pyes and requests a few month ago. Requests was fetching data from the network one byte at a time, with abysmal performances…
Apologies for pushing it once more. Can any one help me figure out why my search queries on ES are slow. I am able to get 1000 results in 2-3 seconds(details in the first post of this thread). I would expect the performance to be atleast 50-100 times more faster than this. I hope thats a realistic expectation.
I have upgraded the requests module. Request version 0.13.8 and pyes
version 0.19..
Still able to make only about 1000 searches per 2-3 second period
-Abhi
On Thursday, August 16, 2012 2:43:28 PM UTC-7, Anton2 wrote:
Be sure to use the latest version of pyes and requests, or the development
version of pyes from github (which goes back to use urllib3).
We had troubles with certain combination of pyes and requests a few month
ago. Requests was fetching data from the network one byte at a time, with
abysmal performances…
Apologies for pushing it once more. Can any one help me figure out why my
search queries on ES are slow. I am able to get 1000 results in 2-3
seconds(details in the first post of this thread). I would expect the
performance to be atleast 50-100 times more faster than this. I hope thats
a realistic expectation.
Sorry I will have to push this again. I am still not able to get an
optimum performance from ES for searches.
my index contains 1.5 million records and I am able to make 800-1000
searches in 2 seconds using pyes. It has been a while since we are trying
to optimize ES through ES for our production work. Any help now will be
appreciated.
Best,
-Abhi
On Monday, August 20, 2012 9:48:31 AM UTC-7, Abhishek Pratap wrote:
Hi Anton
I have upgraded the requests module. Request version 0.13.8 and pyes
version 0.19..
Still able to make only about 1000 searches per 2-3 second period
-Abhi
On Thursday, August 16, 2012 2:43:28 PM UTC-7, Anton2 wrote:
Be sure to use the latest version of pyes and requests, or the
development version of pyes from github (which goes back to use urllib3).
We had troubles with certain combination of pyes and requests a few month
ago. Requests was fetching data from the network one byte at a time, with
abysmal performances…
Apologies for pushing it once more. Can any one help me figure out why my
search queries on ES are slow. I am able to get 1000 results in 2-3
seconds(details in the first post of this thread). I would expect the
performance to be atleast 50-100 times more faster than this. I hope thats
a realistic expectation.
And just in case anyone is interested this is how I am testing the search
performance
loop_start = time.clock()
q1 = TermQuery("tax_name","cellvibrio")
for x in xrange(1000000):
if x % 1000 == 0 and x > 0:
loop_check_point = time.clock()
print 'took %s secs to search %d records' %
(loop_check_point-loop_start,x)
results = conn.search(query=q1)
if results:
for r in results:
pass
print len(results)
else:
pass
-Abhi
On Wednesday, August 22, 2012 10:27:53 AM UTC-7, Abhishek Pratap wrote:
Hi guys
Sorry I will have to push this again. I am still not able to get an
optimum performance from ES for searches.
my index contains 1.5 million records and I am able to make 800-1000
searches in 2 seconds using pyes. It has been a while since we are trying
to optimize ES through ES for our production work. Any help now will be
appreciated.
Best,
-Abhi
On Monday, August 20, 2012 9:48:31 AM UTC-7, Abhishek Pratap wrote:
Hi Anton
I have upgraded the requests module. Request version 0.13.8 and pyes
version 0.19..
Still able to make only about 1000 searches per 2-3 second period
-Abhi
On Thursday, August 16, 2012 2:43:28 PM UTC-7, Anton2 wrote:
Be sure to use the latest version of pyes and requests, or the
development version of pyes from github (which goes back to use urllib3).
We had troubles with certain combination of pyes and requests a few
month ago. Requests was fetching data from the network one byte at a time,
with abysmal performances…
Apologies for pushing it once more. Can any one help me figure out why
my search queries on ES are slow. I am able to get 1000 results in 2-3
seconds(details in the first post of this thread). I would expect the
performance to be atleast 50-100 times more faster than this. I hope thats
a realistic expectation.
Guys I am stuck and need some guidance in order to move fwd with ES and use
it.
Mainly the bottle neck is search queries I am able to make ( 1000 queries
in 2-3 seconds). Can this be scaled up ?
I have also asked this on stackoverflow but dint get any response.
Thanks!
-Abhi
On Wednesday, August 22, 2012 11:13:58 AM UTC-7, Abhishek Pratap wrote:
And just in case anyone is interested this is how I am testing the search
performance
loop_start = time.clock()
q1 = TermQuery("tax_name","cellvibrio")
for x in xrange(1000000):
if x % 1000 == 0 and x > 0:
loop_check_point = time.clock()
print 'took %s secs to search %d records' %
(loop_check_point-loop_start,x)
results = conn.search(query=q1)
if results:
for r in results:
pass
print len(results)
else:
pass
-Abhi
On Wednesday, August 22, 2012 10:27:53 AM UTC-7, Abhishek Pratap wrote:
Hi guys
Sorry I will have to push this again. I am still not able to get an
optimum performance from ES for searches.
my index contains 1.5 million records and I am able to make 800-1000
searches in 2 seconds using pyes. It has been a while since we are trying
to optimize ES through ES for our production work. Any help now will be
appreciated.
Best,
-Abhi
On Monday, August 20, 2012 9:48:31 AM UTC-7, Abhishek Pratap wrote:
Hi Anton
I have upgraded the requests module. Request version 0.13.8 and pyes
version 0.19..
Still able to make only about 1000 searches per 2-3 second period
-Abhi
On Thursday, August 16, 2012 2:43:28 PM UTC-7, Anton2 wrote:
Be sure to use the latest version of pyes and requests, or the
development version of pyes from github (which goes back to use urllib3).
We had troubles with certain combination of pyes and requests a few
month ago. Requests was fetching data from the network one byte at a time,
with abysmal performances…
Apologies for pushing it once more. Can any one help me figure out why
my search queries on ES are slow. I am able to get 1000 results in 2-3
seconds(details in the first post of this thread). I would expect the
performance to be atleast 50-100 times more faster than this. I hope thats
a realistic expectation.
Just wondering if your test is correct. I mean: what do you want to test? If ES can deal with 1000 search requests?
If it's your question, you should parallelize your tests. In java, you should create more test Threads. I don't know with python and pyes.
It seems that with your test, you create 1000 calls, one by one. Just like if you were creating 1000 curl http://www.google.com and see how long it takes...
My 2 cents.
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Guys I am stuck and need some guidance in order to move fwd with ES and use it.
Mainly the bottle neck is search queries I am able to make ( 1000 queries in 2-3 seconds). Can this be scaled up ?
I have also asked this on stackoverflow but dint get any response.
Thanks!
-Abhi
On Wednesday, August 22, 2012 11:13:58 AM UTC-7, Abhishek Pratap wrote:
And just in case anyone is interested this is how I am testing the search performance
loop_start = time.clock()
q1 = TermQuery("tax_name","cellvibrio")
for x in xrange(1000000):
if x % 1000 == 0 and x > 0:
loop_check_point = time.clock()
print 'took %s secs to search %d records' % (loop_check_point-loop_start,x)
results = conn.search(query=q1)
if results:
for r in results:
pass
print len(results)
else:
pass
-Abhi
On Wednesday, August 22, 2012 10:27:53 AM UTC-7, Abhishek Pratap wrote:
Hi guys
Sorry I will have to push this again. I am still not able to get an optimum performance from ES for searches.
my index contains 1.5 million records and I am able to make 800-1000 searches in 2 seconds using pyes. It has been a while since we are trying to optimize ES through ES for our production work. Any help now will be appreciated.
Best,
-Abhi
On Monday, August 20, 2012 9:48:31 AM UTC-7, Abhishek Pratap wrote:
Hi Anton
I have upgraded the requests module. Request version 0.13.8 and pyes version 0.19..
Still able to make only about 1000 searches per 2-3 second period
-Abhi
On Thursday, August 16, 2012 2:43:28 PM UTC-7, Anton2 wrote:
Be sure to use the latest version of pyes and requests, or the development version of pyes from github (which goes back to use urllib3).
We had troubles with certain combination of pyes and requests a few month ago. Requests was fetching data from the network one byte at a time, with abysmal performances…
Apologies for pushing it once more. Can any one help me figure out why my search queries on ES are slow. I am able to get 1000 results in 2-3 seconds(details in the first post of this thread). I would expect the performance to be atleast 50-100 times more faster than this. I hope thats a realistic expectation.
On Thu, 2012-08-23 at 14:25 -0700, Abhishek Pratap wrote:
Guys I am stuck and need some guidance in order to move fwd with ES
and use it.
Mainly the bottle neck is search queries I am able to make ( 1000
queries in 2-3 seconds). Can this be scaled up ?
You haven't provided any info about what hardware you are running this
on. You're currently getting a search result every 2ms. 500 queries per
second may be a good result.
We don't know. We don't know what queries you're running, how you're
running your queries, what your data looks like, how much RAM you have,
what CPU you have, what disks, etc
As such, it's very difficult to tell you whether you can get better
results with your current setup.
Thanks for your reply. I dint think about parallelizing the code as I was
interested in seeing the best I could do with one thread and then may be
use multiple threads.
Here are the answers to the questions Clint asked.
ES is running on a debian5-2 OS, x86_64 16 cores with 132 Gb of ram. The
filesystem being used is IBM GPFS.
The data I indexed had the following structure
{
"tax_id" : 45
"taxa_name" : Mycocosm
}
About 1.5 million such records were indexed into ES
The query I am making is basically
{"taxa_name":"mycocosm"}
Once I am able to get good performance(I am not sure how good is good in
terms of ES) the plan is to insert 500 million such records and query them.
Thanks!
-Abhi
On Thursday, August 23, 2012 2:37:47 PM UTC-7, Clinton Gormley wrote:
Hi Abhi
On Thu, 2012-08-23 at 14:25 -0700, Abhishek Pratap wrote:
Guys I am stuck and need some guidance in order to move fwd with ES
and use it.
Mainly the bottle neck is search queries I am able to make ( 1000
queries in 2-3 seconds). Can this be scaled up ?
You haven't provided any info about what hardware you are running this
on. You're currently getting a search result every 2ms. 500 queries per
second may be a good result.
We don't know. We don't know what queries you're running, how you're
running your queries, what your data looks like, how much RAM you have,
what CPU you have, what disks, etc
As such, it's very difficult to tell you whether you can get better
results with your current setup.
Thanks for your reply. I dint think about parallelizing the code as I
was interested in seeing the best I could do with one thread and then
may be use multiple threads.
Right - running in parallel will very likely get you better throughput.
ES is running on a debian5-2 OS, x86_64 16 cores with 132 Gb of ram.
The filesystem being used is IBM GPFS.
You don't mention how much of that RAM is dedicated to the ES heap, or
whether anything else is running on that box.
You really want elasticsearch to be the only thing consuming resources,
so don't share it with other code. Re heap settings, you should make
the heap about 60-70% of total RAM, so that there is plenty of space
left for kernel file system caches.
Also, (you may already be doing this) you want to make sure that none of
that memory is ever being swapped out (see bootstrap.mlockall and ulimit
-l)
The data I indexed had the following structure
{
"tax_id" : 45
"taxa_name" : Mycocosm
}
The query I am making is basically
{"taxa_name":"mycocosm"}
Do you need full text search on 'mycocosm'? eg do you need to find the
most relevant match, or find that text in text like "Mycocosms are FUN!"
Or do you just need to use the taxa_name as a filter (the equivalent of
WHERE taxa_name = 'mycocosm')
If the latter, then consider making taxa_name 'not_analyzed' and using a
term FILTER to search for it. Filters don't have the scoring phase of
queries, and their results can be cached, which means they perform
better.
Once I am able to get good performance(I am not sure how good is good
in terms of ES) the plan is to insert 500 million such records and
query them.
Definitely try this in parallel, and from a different box - not the same
node where es is running
clint
Thanks!
-Abhi
On Thursday, August 23, 2012 2:37:47 PM UTC-7, Clinton Gormley wrote:
Hi Abhi
On Thu, 2012-08-23 at 14:25 -0700, Abhishek Pratap wrote:
> Guys I am stuck and need some guidance in order to move fwd
with ES
> and use it.
> Mainly the bottle neck is #search queries I am able to make
( 1000
> queries in 2-3 seconds). Can this be scaled up ?
You haven't provided any info about what hardware you are
running this
on. You're currently getting a search result every 2ms. 500
queries per
second may be a good result.
We don't know. We don't know what queries you're running, how
you're
running your queries, what your data looks like, how much RAM
you have,
what CPU you have, what disks, etc
As such, it's very difficult to tell you whether you can get
better
results with your current setup.
clint
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.