Hi,
I wanted to search for a particular phrase using elasticsearch on both _all fields and only 2 fields. The phrase is taken from a file listing more than 10000 keywords.: Here is the code:
from elasticsearch import Elasticsearch
import os, sys
import requests
import json
es = Elasticsearch(['localhost:9200/'])
with open('localDrive\extract_keywords\t2.txt') as my_keywordfile:
for keyword in my_keywordfile.readlines():
keyword_array.append(keyword.strip().strip("'"))
with open('LocalFile\_description_Results2.txt','w',encoding="utf-8") as f:
for x in keyword_array:
doc = {
"query": {
"multi_match": {
"query": x,
"type": "phrase",
"fields":["title", "description"],
}
res = es.search(index='xxx_062617', body=doc)
json.dump(res, f, ensure_ascii=False)
f.write(("\n"))
f.close()
Also, the query that matches _all fields is:
"multi_match": {
"query": x,
"type": "phrase",
"fields":"_all",
}
Also, here is what I have for mapping .....
"dataset": { "properties": { "creators": { "type": "string" }, "title": { "type": "string" }, "description": { "type": "string" }, "types": { "type": "string", "fields":{ "raw":{"type":"string","index":"not_analyzed"} } },
Now what happens is that, I get 101 returned record if I use query only on title, and description. But, I only get 100 returned records if I use _all fields. And if I want to get unique IDs by combining the ids of all records and remove duplicate ones, I see that there are only 86 duplicate records!
My questions are: 1- Does using type:phrase works differently if I use _all fields? 2- Should not I get more number of records if I use _all fields? 3- If _all includes all fields including title, and description, then why using _all does not cover all the records that have been returned by quering title, and description?
Thanks,