Search results returned are incomplete

I use ES7 to search PDF files.

When I search something with match_phrase, I get 2 word occurrences, even though the document has 59 occurrences.

What can be the reason for this? How can I optimize the query?

{
        "from": 0,
        "size": 2500,
        "_source": [
            "filename",
            "folder"
        ],
        "highlight": {
            "fields": {
                "attachment.content": {
                    "number_of_fragments": 1000,
                    "force_source": true,
                    "type": "fvh"
                }
            }
        },
        "query": {
            "bool": {
                "must": [
                    { "match_phrase": { "attachment.content": "lorem ipsum dolor" } }
                ]
            }
        }
      }

Hi Philipp.

"hits" is a count of documents not word occurrences.

You're right, I misunderstood. I am, of course, referring to word occurrences in a document. I updated the question.

Not sure where that figure comes from because we don't report word occurrences.

Hello, Mark,

when I search for a phrase with ES7, the search engine should
Return word occurrences in the specified index. Since I use the highlighter, I wanted to highlight exactly these text passages.

I wouldn't rely on highlighting for accurate counting of in-doc matches.
That said, there are some settings for highlighters where you can increase the number of fragments that are returned.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.