How can I retrieve the information about where the highlight was found?

Hi everyone,

I have a field named 'name_of_company' and anexos (type List), which contains three fields: name, url, and pages. Within the pages (type List) field, I have two additional fields: num_page and text.

I am searching for information in anexos > pages > text, which works fine. However, in the highlight section, I would like to know the num_page, name, and url (of the anexo) associated with each highlight that is found.

In the highlight I can see 'name_of_company' but I can't see the name and
url of the "anexos" .

this is my map

"pagina_h": {
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "anexos": {
          "properties": {
            "link": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "paginas": {
              "properties": {
                "num_pagina": {
                  "type": "long"
                },
                "texto": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                }
              }
            },
            "titulo": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "url": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        "ano": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },

here is my console

GET page_h/_search
{
  "query": {
    "match": {
      "anexos.paginas.texto": "key_word"
    }
  },
  "highlight": {
    "fields": {
      "anexos.paginas.texto": {
        "fragment_size": 500
      }
    }
  }
}

and here is my result

   {
    "took": 260,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 12,
            "relation": "eq"
        },
        "max_score": 4.536918,
        "hits": [
            {
                "_index": "pagina_h",
                "_id": "1",
                "_score": 4.536918,
                "_ignored": [
                    "anexos.paginas.texto.keyword"
                ],



                "highlight": {
                    "anexos.paginas.texto": [
                        """ example """
                    ]
                }
            },
            {
                "_index": "pagina_h",
                "_id": "760",
                "_score": 1.6609094,
                "_ignored": [
                    "anexos.paginas.texto.keyword"
                ],
                "highlight": {
                    "anexos.paginas.texto": [
                        """ example """
                    ]
                }
            },
            {
                "_index": "pagina_h",
                "_id": "761",
                "_score": 1.6334696,
                "_ignored": [
                    "anexos.paginas.texto.keyword"
                ],
                "highlight": {
                    "anexos.paginas.texto": [
                        """ example """
                    ]
                }
            },
            {
                "_index": "pagina_h",
                "_id": "764",
                "_score": 1.3660059,
                "_ignored": [
                    "anexos.paginas.texto.keyword"
                ],     

Another example that I put in json, note that I got the highlight from '

anexos> paginas > text'

and note that I can even identify that the highlight came from the source with

"type" : "unsercover",
"num": "756",
etc.
etc.

However In highight I can't identify the ' 'title' and 'url' (in anexos ) that it came from.

What I would like to see

The 'title', 'url' (from anexos) and 'num_page' where the highlight was done.

Welcome @Talles .

If I understand correctly, you want to have the highlight in addition to the "attachments.pages.text" field, also in the "attachments.pages.title" and "attachments.pages.url" fields.

If that's the case, the query could be like this:

{
  "query": {
    "bool": {
      "minimum_should_match": 1,
      "should": [
        {
          "match": {
            "anexos.paginas.texto": "Title page document1"
          }
        },
        {
          "match": {
            "anexos.titulo": "Title page document1"
          }
        },
        {
          "match": {
            "anexos.url": "Title page document1"
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "anexos.paginas.texto": {
      },
      "anexos.titulo": {
      },
      "anexos.url": {
      }
    }
  }
}

Output:

"hits": [
      {
        "_index": "livros",
        "_id": "rQA36pIB3NhMlEBWmyNR",
        "_score": 0.970927,
        "_source": {
          .....
        "highlight": {
          "anexos.url": [
            "https://example.com/<em>document1</em>"
          ],
          "anexos.titulo": [
            "Document 1 <em>Title</em>"
          ],
          "anexos.paginas.texto": [
            "This is the first <em>page</em> of the document.",
            "This is the second <em>page</em> of the document."
          ]
        }
      }
    ]