Searching for a particular phrase in _all fields generates less number of records comparing to do the same thing on small number of fields


#1

Hi,
I wanted to search for a particular phrase using elasticsearch on both _all fields and only 2 fields. The phrase is taken from a file listing more than 10000 keywords.: Here is the code:
from elasticsearch import Elasticsearch
import os, sys
import requests
import json
es = Elasticsearch(['localhost:9200/'])
with open('localDrive\extract_keywords\t2.txt') as my_keywordfile:
for keyword in my_keywordfile.readlines():
keyword_array.append(keyword.strip().strip("'"))
with open('LocalFile\_description_Results2.txt','w',encoding="utf-8") as f:
for x in keyword_array:
doc = {
"query": {
"multi_match": {
"query": x,
"type": "phrase",
"fields":["title", "description"],
}
res = es.search(index='xxx_062617', body=doc)
json.dump(res, f, ensure_ascii=False)
f.write(("\n"))

f.close()

Also, the query that matches _all fields is:

"multi_match": {
                                    "query": x,
                                     "type": "phrase",
                                     "fields":"_all",   
                                 }

Also, here is what I have for mapping .....
"dataset": { "properties": { "creators": { "type": "string" }, "title": { "type": "string" }, "description": { "type": "string" }, "types": { "type": "string", "fields":{ "raw":{"type":"string","index":"not_analyzed"} } },
Now what happens is that, I get 101 returned record if I use query only on title, and description. But, I only get 100 returned records if I use _all fields. And if I want to get unique IDs by combining the ids of all records and remove duplicate ones, I see that there are only 86 duplicate records!

My questions are: 1- Does using type:phrase works differently if I use _all fields? 2- Should not I get more number of records if I use _all fields? 3- If _all includes all fields including title, and description, then why using _all does not cover all the records that have been returned by quering title, and description?

Thanks,


#2

Hi everyone!
I found out the issue! The issue was that i did not use size as a parameter, so, basically it was showing only 10 hits and when I removed duplicated _ids from the retrieved records, the number of unique records from _all was less than the other one. When I used size and assigned it to 10000, the problem solved!


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.