I have a query that runs against a field called content. Content contains many subfields, for example message, name, description, story, caption. When it ES finds matches, the highlight result looks like this:
That result found two matches, one in message and one in name. However, it's difficult to connect that to the result returned.
Is there a way to get the highlighter to return a result that ideally would return the result with the field names, like: highlight: {message: ["[bold]Test[/bold]"], name:["[bold]Test[/bold]"]}
And if not that, at least split each match and return a list, like below? content: ["[bold]Test[/bold]", "[bold]Test[/bold]"]
However, I'm trying to do so, and actually finding the above doesn't return a hit. So, now I'm wondering about how the data I'm working with is set up (I wasn't involved in the original setup that I'm working with), such that we're able to get hits across all the fields.
The results look like separate fields, but that's being pulled from _source. I think maybe it's a case of it all being grouped into a single textfield for indexing/search, which would prevent the return that I want.
Going to dig deeper.
Thanks. This was helpful. I'm still very new to ES and trying to wrap my brain around how it works and how it's used at this company.
Except, of course, that it includes all subfields. So would need to limit it.
But I think with a few other things we've encountered in our indexing setup, there's some bigger changes we need to make, and all of this is just helping to support the urgency of the changes.
I looked up copy_to and have a question. Let's say in the first example way above, you use copy_to to combine message and foo into m_foo. If you try to do highlighting, can you specify you want it on message and foo? Or do you have to highlight on m_foo and end up back with the same problem I started with, where you don't know the specific fields the highlight matched on?
Okay. That's what I thought. Hmm... Going to have to think through this system carefully.
It feels like the fastest use of ES is when everything can be simply split into tokens and then only direct matches are searched for and only in one field at a time. Each addition (another field, partial match, AND's or OR's) just increases complexity and slows it down. Need to figure out how we can do the least of the additions, but still get what we want out of it. It's a fine balance.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.