I use ES7 to search PDF files.

When I search something with match_phrase, I get 2 word occurrences, even though the document has 59 occurrences.

What can be the reason for this? How can I optimize the query?

        "from": 0,
        "size": 2500,
        "_source": [
        "highlight": {
            "fields": {
                "attachment.content": {
                    "number_of_fragments": 1000,
                    "force_source": true,
                    "type": "fvh"
        "query": {
            "bool": {
                "must": [
                    { "match_phrase": { "attachment.content": "lorem ipsum dolor" } }

Hi Philipp.

"hits" is a count of documents not word occurrences.

You're right, I misunderstood. I am, of course, referring to word occurrences in a document. I updated the question.

Not sure where that figure comes from because we don't report word occurrences.

Hello, Mark,

when I search for a phrase with ES7, the search engine should
Return word occurrences in the specified index. Since I use the highlighter, I wanted to highlight exactly these text passages.

I wouldn't rely on highlighting for accurate counting of in-doc matches.
That said, there are some settings for highlighters where you can increase the number of fragments that are returned.

