I wrote a python script to take a line from a file and query the line_item. If there are results It should place the line_item into a new file, otherwise it should continue to the next item. I want to search across all indexes and tags.
However, the query comes back empty when I know there are results in my Elastic. I have pasted the script below.
import json
import datetime
from elasticsearch_dsl import MultiSearch, Search
(This part is not too important)
with open('file_temp') as data_file:
data = json.load(data_file)
with open(block_file, "w") as f:
for result in data["results"]:
f.write("\n".join(str(x) for x in result["scan"].get("domain", [])))
f.write("\n".join(str(x) for x in result["scan"].get("ipv4", [])))
f.write("\n".join(str(x) for x in result["scan"].get("url", [])))
ms = MultiSearch()
with open(block_file, "r") as terms:
for term in terms:
ms = ms.add(Search().query("match", query=term))
responses = ms.execute()
print(responses)
with open(hits_file, "w") as hits:
for response in responses:
if response.hits.total:
hits.write(response.search.query)
Any chance you could post the documents that end up being indexed and the queries that you are sending? In general, being able to recreate the problem with a few curl commands would be useful, unless it is a python specific problem.
I am indexing thousands of logs per minute, however this script is not to index, rather query existing data. When I curl via this command curl -XGET 'http://localhost:9200/_search?q= I get results back, although it limits the results which is another issue. If there are 30k results it will not bring them all to the CL.
I did a tcpdump on the script request and it shows the queries going (as a batch), and it comes back with json. In the hits{ total is 0. There should be results but it comes back with none.
below is the original doc I am querying from. The script pulls the ipv4 and send it off in a query. I have verified this via print statements. It is sending the correct query out. When I run the script this is what it prints as the result:
(I took the < out)
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Here is my doc that its pulling data from and querying:
Thanks for the info, I think it would be useful to also have the json query that doesn't yield any result.
Note that you can just do _search with no query, that is going to be a match_all query, but is always going to return 10 results. It seems that what you want to do is going through a lot of documents one by one. In that case I think using a scroll request would be beneficial. What es version are you on?
The query is being done in the python script. All i get is the empty responses. and if I look at the json response within the packets it shows 0 for hits on each request. The scrip takes the ip addresses and makes a query with all the addresses.
I am not familiar with the scroll request. I am currently using elasticsearch-2.1.1
Like you said, I need to be bale to query everything and every hit come back as a result, not just 10.
The point is that having all the results back in a single round is probably too much, and paginate through a lot of results with the search api is going to cause deep pagination problems. Scroll should be used instead as it is optimized to scrolling over all the documents that match a query.
That being said, it is hard for me to answer why you don't get back any results. I think it would be good to try and figure out what your query should be in json format. Pretty sure you could also enable trace logging if you are using the official python client so that the query would be printed out.
If you want to send the exact same query that kibana sends, then you can see in kibana itself what query gets executed. It is for sure not the match query that you have.
Your current match queries are querying a field called query, also not sure whether the line feed should be part of the query. Whether the query yields results also depends on how the data was indexed, and which analyzer was used. Are you sure you want to search against all fields? or maybe just the array containing ips? What does your mapping look like?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.