Returning field names in highlight

I have a query that runs against a field called content. Content contains many subfields, for example message, name, description, story, caption. When it ES finds matches, the highlight result looks like this:

highlight: {content: ["[bold]Test[/bold] [bold]Test[/bold]"]}

That result found two matches, one in message and one in name. However, it's difficult to connect that to the result returned.

Is there a way to get the highlighter to return a result that ideally would return the result with the field names, like:
highlight: {message: ["[bold]Test[/bold]"], name:["[bold]Test[/bold]"]}

And if not that, at least split each match and return a list, like below?
content: ["[bold]Test[/bold]", "[bold]Test[/bold]"]

I don't understand.

For example:

DELETE test 
PUT test/_doc/1
{
  "content": {
    "foo": "bar",
    "message": "bar baz"
  }
}
GET test/_search
{
  "query": {
    "multi_match": {
      "query": "bar",
      "fields": ["content.foo","content.message"]
    }
  },
  "highlight": {
    "fields": {
      "content.foo": {},
      "content.message": {}
    }
  }
}

is giving:

{
  "took" : 123,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "content" : {
            "foo" : "bar",
            "message" : "bar baz"
          }
        },
        "highlight" : {
          "content.foo" : [
            "<em>bar</em>"
          ],
          "content.message" : [
            "<em>bar</em> baz"
          ]
        }
      }
    ]
  }
}

So you know exactly from which field you are getting the response, no?

What I'm hoping is to not have to list all the possibilities (content.foo, content.message, etc).

What I'd like to be able to do is run it as follows:

{
  "query": {
    "multi_match": {
      "query": "bar",
      "fields": ["content"]
    }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  }
}

However, I'm trying to do so, and actually finding the above doesn't return a hit. So, now I'm wondering about how the data I'm working with is set up (I wasn't involved in the original setup that I'm working with), such that we're able to get hits across all the fields.

The results look like separate fields, but that's being pulled from _source. I think maybe it's a case of it all being grouped into a single textfield for indexing/search, which would prevent the return that I want.

Going to dig deeper.

Thanks. This was helpful. I'm still very new to ES and trying to wrap my brain around how it works and how it's used at this company.

But do we agree that your query does not work?

{
  "query": {
    "multi_match": {
      "query": "bar",
      "fields": ["content"]
    }
  }
}

Because content is not a field.
You can use copy_to feature if you wish to flatten all the content you want within one single field at index time.

Yes, agreed it doesn't work as written. What I need to do is add the .*. The following works almost as I want it too.

curl -X GET "localhost:9200/test/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "multi_match": {
      "query": "bar",
      "fields": ["content.*"]
    }
  },
  "highlight": {
    "fields": {
      "content.*": {}
    }
  }
}'

Except, of course, that it includes all subfields. So would need to limit it.

But I think with a few other things we've encountered in our indexing setup, there's some bigger changes we need to make, and all of this is just helping to support the urgency of the changes.

I looked up copy_to and have a question. Let's say in the first example way above, you use copy_to to combine message and foo into m_foo. If you try to do highlighting, can you specify you want it on message and foo? Or do you have to highlight on m_foo and end up back with the same problem I started with, where you don't know the specific fields the highlight matched on?

Yes. This is correct.

Except, of course, that it includes all subfields. So would need to limit it.

Yeah. Then you would need to be explicit for each single field I think.

Okay. That's what I thought. Hmm... Going to have to think through this system carefully.

It feels like the fastest use of ES is when everything can be simply split into tokens and then only direct matches are searched for and only in one field at a time. Each addition (another field, partial match, AND's or OR's) just increases complexity and slows it down. Need to figure out how we can do the least of the additions, but still get what we want out of it. It's a fine balance. :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.