Elasticsearch missing results and incorrect sorting

I have Elasticsearch setup for searching book content. It works for the most part, however it frequently misses single search terms, especially in the 'entryTitle'. An example is in one book there is a Section title called "Parks and Recreation", however with the search term 'parks' it does not find it.

It also for my use case ideally should be prioritizing the results from the 'entryTitle' over the 'entryContent' results.

The current query is:

 $params = [
                'index' => 'toc_entries',
                'type' => '_doc',
                'body' => [
                    'min_score' => '0.1',
                    'sort' => [
                        '_score'
                    ],
                    '_source' => [
                        'entryTitle',
                        'entryId'
                    ],
                    'query' => [
                        'bool' => [
                            'filter' => [
                                'term' => [
                                    'bookSku.raw' => $sku
                                ]
                            ],
                            'should' => [
                                [
                                    'match' => [
                                        'entryTitle' => [
                                            'query' => $search_term,
                                            '_name' => 'title'
                                        ]
                                    ]
                                ],
                                [
                                    'match' => [
                                        'entryContent' => [
                                            'query' => $search_term,
                                            '_name' => 'body'
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ],
                    'highlight' => [
                        'pre_tags' => [''],
                        'post_tags' => [''],
                        'fields' => [
                            'entryContent' => new \stdClass(),
                            'entryTitle' => new \stdClass()
                        ]
                    ]
                ]
            ];

Can any Elasticsearch experts see where I have gone wrong here?

Thank you in advance

Edit:

Here is the structure of the setup:

{
   "analysis":{
      "analyzer":{
         "default":{
            "type":"english"
         }
      }
   },
   "settings":{
      "number_of_shards":1,
      "number_of_replicas":1
   },
   "mappings":{
      "_default_":{
         "dynamic":"strict"
      },
      "_doc":{
         "properties":{
            "bookSku":{
               "type":"text",
               "fields":{
                  "raw":{
                     "type":"keyword"
                  }
               }
            },
            "entryId":{
               "type":"text",
               "fields":{
                  "raw":{
                     "type":"keyword"
                  }
               }
            },
            "entryTitle":{
               "type":"text"
            },
            "entryContent":{
               "type":"text"
            }
         }
      }
   }
}

For prioritizing one field over another, you'll want to use score boosting. You can't apply boosting to match queries directly according to the documentation, but you can apply them to booleans, so you could wrap your match queries for content and title in their own boolean queries with each in the must, giving the boolean wrapping the title a boost that is higher than the content. You'll need to experiment with the boost value to get it to function exactly how you would like though, as you can easily not boost it enough or too much.

Without seeing how you've indexed the title and content fields (i.e. what field types they are, which analyzers are applied at index time and query time), I don't think I can tell you much about why you're not getting an individual word back. If you can add that info, I might be able to parse it and help?

1 Like

I have edited the original post with detail on the structure.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.