Returning field names in highlight

neesha · March 21, 2019, 3:55pm

I have a query that runs against a field called content. Content contains many subfields, for example message, name, description, story, caption. When it ES finds matches, the highlight result looks like this:

highlight: {content: ["[bold]Test[/bold] [bold]Test[/bold]"]}

That result found two matches, one in message and one in name. However, it's difficult to connect that to the result returned.

Is there a way to get the highlighter to return a result that ideally would return the result with the field names, like:
highlight: {message: ["[bold]Test[/bold]"], name:["[bold]Test[/bold]"]}

And if not that, at least split each match and return a list, like below?
content: ["[bold]Test[/bold]", "[bold]Test[/bold]"]

dadoonet · March 21, 2019, 4:29pm

I don't understand.

For example:

DELETE test 
PUT test/_doc/1
{
  "content": {
    "foo": "bar",
    "message": "bar baz"
  }
}
GET test/_search
{
  "query": {
    "multi_match": {
      "query": "bar",
      "fields": ["content.foo","content.message"]
    }
  },
  "highlight": {
    "fields": {
      "content.foo": {},
      "content.message": {}
    }
  }
}

is giving:

{
  "took" : 123,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "content" : {
            "foo" : "bar",
            "message" : "bar baz"
          }
        },
        "highlight" : {
          "content.foo" : [
            "<em>bar</em>"
          ],
          "content.message" : [
            "<em>bar</em> baz"
          ]
        }
      }
    ]
  }
}

So you know exactly from which field you are getting the response, no?

neesha · March 21, 2019, 5:27pm

What I'm hoping is to not have to list all the possibilities (content.foo, content.message, etc).

What I'd like to be able to do is run it as follows:

{
  "query": {
    "multi_match": {
      "query": "bar",
      "fields": ["content"]
    }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  }
}

However, I'm trying to do so, and actually finding the above doesn't return a hit. So, now I'm wondering about how the data I'm working with is set up (I wasn't involved in the original setup that I'm working with), such that we're able to get hits across all the fields.

The results look like separate fields, but that's being pulled from _source. I think maybe it's a case of it all being grouped into a single textfield for indexing/search, which would prevent the return that I want.

Going to dig deeper.

Thanks. This was helpful. I'm still very new to ES and trying to wrap my brain around how it works and how it's used at this company.

dadoonet · March 25, 2019, 2:21pm

But do we agree that your query does not work?

{
  "query": {
    "multi_match": {
      "query": "bar",
      "fields": ["content"]
    }
  }
}

Because content is not a field.
You can use copy_to feature if you wish to flatten all the content you want within one single field at index time.

neesha · March 25, 2019, 3:48pm

Yes, agreed it doesn't work as written. What I need to do is add the .*. The following works almost as I want it too.

curl -X GET "localhost:9200/test/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "multi_match": {
      "query": "bar",
      "fields": ["content.*"]
    }
  },
  "highlight": {
    "fields": {
      "content.*": {}
    }
  }
}'

Except, of course, that it includes all subfields. So would need to limit it.

But I think with a few other things we've encountered in our indexing setup, there's some bigger changes we need to make, and all of this is just helping to support the urgency of the changes.

I looked up copy_to and have a question. Let's say in the first example way above, you use copy_to to combine message and foo into m_foo. If you try to do highlighting, can you specify you want it on message and foo? Or do you have to highlight on m_foo and end up back with the same problem I started with, where you don't know the specific fields the highlight matched on?

dadoonet · March 25, 2019, 4:24pm

Yes. This is correct.

Except, of course, that it includes all subfields. So would need to limit it.

Yeah. Then you would need to be explicit for each single field I think.

neesha · March 25, 2019, 7:27pm

Okay. That's what I thought. Hmm... Going to have to think through this system carefully.

It feels like the fastest use of ES is when everything can be simply split into tokens and then only direct matches are searched for and only in one field at a time. Each addition (another field, partial match, AND's or OR's) just increases complexity and slows it down. Need to figure out how we can do the least of the additions, but still get what we want out of it. It's a fine balance.

system · April 22, 2019, 7:27pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Returning the child fields next to matched field Elasticsearch	2	294	March 15, 2021
Highlight and complex search Elasticsearch	1	348	January 1, 2021
Get only matched fields in highlight Elasticsearch	1	279	July 6, 2017
How to know which field (fields) is matched and do the custom highlighting? Elasticsearch	1	418	July 6, 2017
Returning context from other fields in the highlighter Elasticsearch	1	384	February 15, 2017

Returning field names in highlight

Related topics