Multisearch coming back empty

Shimon_k · July 12, 2016, 6:20pm

I wrote a python script to take a line from a file and query the line_item. If there are results It should place the line_item into a new file, otherwise it should continue to the next item. I want to search across all indexes and tags.

However, the query comes back empty when I know there are results in my Elastic. I have pasted the script below.

import json
import datetime
from elasticsearch_dsl import MultiSearch, Search

timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

connections.create_connection(hosts=['x.x.x.x'])

block_file = "myfile_{0}".format(timestamp)
hits_file = "hits_{0}".format(timestamp)

(This part is not too important)
with open('file_temp') as data_file:
data = json.load(data_file)
with open(block_file, "w") as f:
for result in data["results"]:
f.write("\n".join(str(x) for x in result["scan"].get("domain", [])))
f.write("\n".join(str(x) for x in result["scan"].get("ipv4", [])))
f.write("\n".join(str(x) for x in result["scan"].get("url", [])))

ms = MultiSearch()
with open(block_file, "r") as terms:
for term in terms:
ms = ms.add(Search().query("match", query=term))

responses = ms.execute()
print(responses)
with open(hits_file, "w") as hits:
for response in responses:
if response.hits.total:
hits.write(response.search.query)

Any help would really be appreciated!

javanna · July 14, 2016, 8:04am

Any chance you could post the documents that end up being indexed and the queries that you are sending? In general, being able to recreate the problem with a few curl commands would be useful, unless it is a python specific problem.

Thanks

Shimon_k · July 14, 2016, 12:44pm

I am indexing thousands of logs per minute, however this script is not to index, rather query existing data. When I curl via this command curl -XGET 'http://localhost:9200/_search?q= I get results back, although it limits the results which is another issue. If there are 30k results it will not bring them all to the CL.

I did a tcpdump on the script request and it shows the queries going (as a batch), and it comes back with json. In the hits{ total is 0. There should be results but it comes back with none.

below is the original doc I am querying from. The script pulls the ipv4 and send it off in a query. I have verified this via print statements. It is sending the correct query out. When I run the script this is what it prints as the result:
(I took the < out)
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>
Response: []>

Here is my doc that its pulling data from and querying:

{
"date": "2016-07-06T15:28:37.785274",
"plugins": {
"0": "iocextract"
},
"payloads": 1,
"results": [
{
"path": "ingest/arbor.txt",
"source_meta": {},
"archive": "file",
"scan": {
"ipv4": [
"2.58.8.170",
"19.102.2.77"
]
},
"plugin": "iocextract",
"payload_id": 0,
"ouuid": "dca38a95-3b84-443f-81e6-bcdfe821fb31",
"uuid": "dca38a95-3b84-443f-81e6-bcdfe821fb31",
"size": 3848,
"filename": "ar.txt"
}
]
}

I have verified that there are results in my index for these queries.

Thanks for the help!

javanna · July 14, 2016, 12:48pm

Thanks for the info, I think it would be useful to also have the json query that doesn't yield any result.

Note that you can just do _search with no query, that is going to be a match_all query, but is always going to return 10 results. It seems that what you want to do is going through a lot of documents one by one. In that case I think using a scroll request would be beneficial. What es version are you on?

Shimon_k · July 14, 2016, 12:56pm

The query is being done in the python script. All i get is the empty responses. and if I look at the json response within the packets it shows 0 for hits on each request. The scrip takes the ip addresses and makes a query with all the addresses.

I am not familiar with the scroll request. I am currently using elasticsearch-2.1.1

Like you said, I need to be bale to query everything and every hit come back as a result, not just 10.

javanna · July 14, 2016, 1:11pm

Have a look here.

The point is that having all the results back in a single round is probably too much, and paginate through a lot of results with the search api is going to cause deep pagination problems. Scroll should be used instead as it is optimized to scrolling over all the documents that match a query.

That being said, it is hard for me to answer why you don't get back any results. I think it would be good to try and figure out what your query should be in json format. Pretty sure you could also enable trace logging if you are using the official python client so that the query would be printed out.

Cheers
Luca

Shimon_k · July 14, 2016, 2:52pm

This is what my query looked like in the tcpdump:

{

}.{
"query": {
"m atch": {
"query": "14.12.0.22\n"
}
}
}.{

}.{
"query": {
"match": {
"query": "5.25.43.94\n"
}
}
}.{

}.{
"qu ery": {
"match": {
"query": "18.16.4.94\n"
}
}
}.{

}.{
"query": {
"mat ch": {
"query": "67.79.16.0\n"
}
}
}.{

}.{
"query": {
"match": {
"query ": "26.58.19.170\n"
}
}
}.{

}.{
"qu ery": {
"match": {
"query": "5.20.12.178\n"
}
}
}.{

}.{
"query": {
"ma tch": {
"query": "4.28.3.94\n"
}
}
}.{

}.{
"query": {
"match": {
"query ": "148.16.4.6\n"
}
}
}.{

}.{
"query ": {
"match": {
"q uery": "211.10.25.30\n"
}
}
}.{

}.{
"query": {
"matc h": {
"query": "211.76.23.199\n"
}
}
}.{

}.{
"query": {
"match": {
"qu ery": "38.1.225.40\n"
}
}
}.{

}.{
" query": {
"match": {
"query": "115.39.11.92\n"
}
}
}.{

}.{
"query": {
"match": {
"query ": "19.102.37.77"
}
}
}.

javanna · July 14, 2016, 3:14pm

Have you tried running those queries against your datasets and see whether they return results?

Which field would you like to query exactly? Also maybe the analyzer for the field might play a role in why you get or not get back results.

Shimon_k · July 14, 2016, 3:17pm

If I enter those Ip addresses in Kibana I get results. I want to query against all fields across all index's.

javanna · July 14, 2016, 3:47pm

If you want to send the exact same query that kibana sends, then you can see in kibana itself what query gets executed. It is for sure not the match query that you have.

Your current match queries are querying a field called query, also not sure whether the line feed should be part of the query. Whether the query yields results also depends on how the data was indexed, and which analyzer was used. Are you sure you want to search against all fields? or maybe just the array containing ips? What does your mapping look like?

Cheers
Luca

Topic		Replies	Views
Elasticsearch multisearch API Elasticsearch	2	547	October 19, 2017
Get all documents matching a list of values for a field Elasticsearch	1	1209	December 29, 2020
Elasticsearch CURL Query Elasticsearch	7	437	October 6, 2020
Py-script querys againt elastic problem Elasticsearch elastic-stack-graph	1	499	April 8, 2019
Elasticsearch MultiGet working through curl, but no results are returned through Java API Elasticsearch	2	1009	June 6, 2017

Multisearch coming back empty

Related topics