Query result differ / Scan and scroll result in very low performance using Python API


#1

Hi there! I am new to elasticsearch, so these might be stupid questions.

The version in use is 2.3.5.


The first problem I encountered is that I get different result by using Python API and by directly sending http request.

The query is as follows:
# Just a representation of my search query
POST http://server_ip/index_name/_search
{
"query":{
"bool":{
"must":[{
"term":{"lang":"en"}
}]
}
},"size":50
}

The result is correct which are all records with value of "en" in the "lang" field. However, the search result I obtain using Python API are incorrect. Records in other language are filtered out.


Another problem is that I'm using the elasticsearch.helpers.scan function to fetch 26,640,158 records from server. However, the performance starts to slow down significantly after the first 50000 records. Now it takes approximately 40 sec for 1000 records, which is really slow to me.

My code is as follows:

# Set Scan Parameters
res = helpers.scan(
                client = es,
                index = "index_name",
                doc_type = 'doc_type',
                size = 200)

for aResult in res:
    # Manipulate Data

The scroll only return one record in each iteration, and I wonder if I set something wrong.


Sorry that I couldn't provide additional server detail since I am not the server administrator.
Can anyone help me with my problems or give me some advice? Thanks a lot!


(Honza Král) #2

Hi, could you please specify how you are calling the python API to use the query in the first example?

For the scan issue we sometimes see this with smalles size values [0] - try raising that. Also don't be afraid, the helper will retrieve the documents in batches and only return them to you one by one for convenience.

0 - https://github.com/elastic/elasticsearch-py/issues/397


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.