Elasticsearch not returning hits for multi-valued field


(Pat Ferrel) #1

I am using Elasticsearch with no modifications whatsoever. This means the mappings, norms, and analyzed/not_analyzed are all default config. I have a very small data set of two items for experimentation purposes (trying to migrate from Solr). The items have several fields but I query only on one, which is a multi-valued/array of strings field. The doc looks like this:

{
   "_index": "index_profile",
   "_type": "items",
   "_id": "ega",
   "_version": 1,
   "found": true,
   "_source": {
      "clicked": [
         "ega"
      ],
      "profile_topics": [
         "Twitter",
         "Entertainment",
         "ESPN",
         "Comedy",
         "University of Rhode Island",
         "Humor",
         "Basketball",
         "Sports",
         "Movies",
         "SnapChat",
         "Celebrities",
         "Rite Aid",
         "Education",
         "Television",
         "Country Music",
         "Seattle",
         "Beer",
         "Hip Hop",
         "Actors",
         "David Cameron",
         ... // other topics
      ],
      "id": "ega"
   }
}

A sample query is:

GET /index_profile/items/_search
{
    "size": 10,
    "query": {
        "bool": {
            "should": [{
                "terms": {
                    "profile_topics": [
                        "Basketball"
                    ]
                }
            }]
        }
    }
}

Again there are only two items and the one listed should match the query because the profile_topics field matches with the "Basketball" term. The other item does not match. I only get a result if I ask for clicked = ega in the should.

With Solr I would probably specify that the fields are multi-valued string arrays and are to have no norms and no analyzer so profile_topics are not stemmed or tokenized since all values should be treated as tokens (even the spaces). Not sure this would solve the problem but it is how I treat similar data on Solr.

I assume I have run afoul of some norm/analyzer/TF-IDF issue, if so how do I solve this so that even with two items the query will return ega. If possible I'd like to solve this index or type wide rather than field specific.


(Dan Tuffery) #2

You're not getting a match with that query because you are using the terms query, the terms query doesn't analyze the search term. So during indexing the term was analyzed using the standard analyzer, the resulting term in the index would be basketball (lowercased), therefore your unanalyzed search term Basketball will not match. Change the term query to a match query.


(Pat Ferrel) #3

Awesome. That's it. But I need to keep the array of terms and match doesn't seem to allow that. Actually the method is probably to disable all analysis everywhere, no lowercasing, no stemming, no tokenizing, no breaking by spaces in query or index. The query data will be an exact match to some of the tokens in the doc named field. I think Solr lets you set an analyzer to none and the same analyzer is used for query data as is used in the index.

Any advice on how to turn all that off, preferably for an index and all queries?


(Dan Tuffery) #4

To do exact matches use the not_analyzed index attribute in your custom mapping .

https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping-intro.html#custom-field-mappings


(Pat Ferrel) #5

Thanks, I've read this but it seems to apply to only a single field and I don't know what fields will be called in the docs when the index is created. So I think I need to have the entire index not_analyzed and so may need some form of put mapping (though that also seems tied to fields)? I'm obviously missing something, sorry.


(Dan Tuffery) #6

You can set a dynamic template on the default mapping when you create the index, this mapping will then apply to all fields in every mapping in your index:

POST /index_profile
{
    "mappings": {
        "_default_": {
            "dynamic_templates": [
                {
                    "all_fields": {
                        "match": "*",
                        "match_mapping_type": "string",
                        "mapping": {
                            "type": "string",
                            "index": "not_analyzed"
                        }
                    }
                }
            ]
        }
    }
}

(system) #7