I have this data:
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T13:16", "price": 59900, "sellerName": "Lelles MC AB", "description": "KTM 690 Duke (Abs) M\u00e4tarst\u00e4llning: 450 mil F\u00e4rg: Vit Typ: Touring/Landsv\u00e4g Info: Mycket fin Duke 690 med rensad bakdel och handskydd.", "location": "Uppsala", "id": 345, "title": "KTM 690 Duke (Abs)", "modelYear": 2016, "url": "https://www.blocket.se/uppsala/KTM_690_Duke__Abs__79079911.htm?ca=11&w=3", "vehicleType": "Touring"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T14:00", "price": 12900, "sellerName": "Hondo", "description": "Hej! D\u00e5 va det dags att s\u00e4lja p\u00e4rlan. Det som \u00e4r gjort med crossen \u00e4r Nytt bakd\u00e4ck. Nya bromsbel\u00e4gg bak. Nytt sadel\u00f6verdrag. Kolvbytet gjord f\u00f6r 25 timmar sen. Inga l\u00e4ckage. Extra k\u00e5pset ing\u00e5r. Vid en smidig aff\u00e4r s\u00e5 ing\u00e5r en haspl\u00e5t. Crossen startar alltid p\u00e5 f\u00f6rsta eller andra kicken. Vid mer info f\u00e5r ni g\u00e4rna ringa p\u00e5 telefon mvh", "location": "Uddevalla", "id": 319, "title": "Honda Cr 125", "modelYear": 2001, "url": "https://www.blocket.se/goteborg/Honda_Cr_125_79080992.htm?ca=11&w=3", "vehicleType": "Cross/enduro"}}
{"_index": "simple", "_type": "motorcycle", "_source": {"date": "2018-04-28T14:15", "price": 22000, "sellerName": "Martin", "description": "G\u00e5tt - 2284mil.Haft sedan 2008 \u00e4r servad regelbunden p\u00e5 mc-firma. V\u00e4lsk\u00f6tt. Startar och g\u00e5r fint. Allt original. Vinterf\u00f6rvaring i garage. Besiktad senast maj -17. Ring eller maila", "location": "Norrk\u00f6ping", "id": 314, "title": "Honda VT 600C", "modelYear": 1999, "url": "https://www.blocket.se/ostergotland/Honda_VT_600C_79081306.htm?ca=11&w=3", "vehicleType": "Custom"}}
and I'm running this python code to index it:
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
def create_index(self, file_path):
"""
Takes path to file containing JSON-formatted data
and indexes into Elasticsearch index.
"""
self.es = Elasticsearch()
print('Creating index "{}"'.format(INDEX_NAME))
request_body = {
"settings":{
"index":{
"number_of_shards":1,
"number_of_replicas":0
}
},
"mappings":{
"motorcycle":{
"properties":{
"location": {
"type":"text",
"analyzer":"swedish"
},
"description":{
"type":"text",
"analyzer":"swedish"
}
}
}
}
}
self.es.indices.create(index = INDEX_NAME, body = request_body)
f_in = open(PATH_TO_DATASET, "r")
actions = (json.loads(line) for line in f_in)
print("Performed bulk index: {}".format(bulk(self.es, actions)))
self.es.indices.refresh(index = "simple")
Now, I'm trying to query the index using postman for all documents with location:Uppsala
(the location of the first object (I did the same query with python with the same result):
POST to localhost:9200/simple/_search:
{
"query": {
"bool": {
"filter": [
{
"term": {
"location": "uppsala"
}
}
]
}
}
}
It returns nothing. The same thing happens if I change the location to uddevalla
, which is also in the original data (second document).
However, if I change location
to norrköping
, it returns the third document, which it should do.
What is the reason behind this erratic behaviour?
UPDATE:
The documents that don't show up when they should with the location
filter seem to not show up for any query at all. For example, this query:
{
"query": {
"bool": {
"filter": [],
"must": {
"multi_match": {
"fields": [
"title^1.0",
"description"
],
"operator": "or",
"query": "honda",
"type": "cross_fields"
}
}
}
}
}
only returns one result, (the one with location:Norrköping
), while it should in fact return two (the one with location:Uddevalla
should also be returned).