Best way to return which field matched

Hi,

I have two current ways that I've been returning which field matched
from a query:

  • Going through the results and doing an in memory search through each
    result and doing a string contains,
  • Creating a named filter for each field.

This seems like a FAQ, so I apologise if this isn't just sitting on a
page one click away from the homepage.

The first solution just seems inefficient as it's plausible (but it
may not be true) that Lucene knows what the field is when it does
match.

The second has the drawback, in that the queries get large very
quickly. It starts off okay if you have an index with two fields and
you want to know which field has matched you construct two named
filters. But the queries get quite large as you go across lots of
documents that may have have different fields to search.

Some ideas I've come across:

  • Explain (as hinted at here [1])?
  • SpanQuery [2] give you back a field - could you use that?
  • Highlighting - but it doesn't seem the right answer (am I wrong - if
    so is there a simple example)?

I've been looking through the mailing list for an answer without a lot
of joy. I'm hopeful that there's a better way to return the field
that matched. Below is example of the second solution.

curl -XGET 'http://localhost:9200/index/type/_search?pretty=true' -d '
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"or" : [
{
"fquery" : {
"query" : {
"query_string" : {
"query" : "field1:100"
}
},
"_name" : "field1"
}
},
{
"fquery" : {
"query" : {
"query_string" : {
"query" : "field2:100"
}
},
"_name" : "field2"
}
}
]
}
}
}
}'

[1] http://www.gossamer-threads.com/lists/lucene/java-user/66959?do=post_view_threaded#66959
[2] http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/spans/SpanQuery.html

There is no way in Lucene to tell which field the query matched on. Well, the query itself is constructed against a single field, but, doing a search on _all will not tell you which field it ended up matching on (and supporting that will make _all useless perf wise).

One could try and hack and track all the query AST and see if it matched something, and then check that it actually ended up as one of the document hits, but again, that would make the whole execution stack much slower.

The SpanQuery javadoc is a bit misleading. It returns the field the query was executed on, which in case of span, you ned up executing the span on a single field.

-shay.banon
On Tuesday, March 22, 2011 at 3:18 AM, Andrew wrote:

Hi,

I have two current ways that I've been returning which field matched
from a query:

  • Going through the results and doing an in memory search through each
    result and doing a string contains,
  • Creating a named filter for each field.

This seems like a FAQ, so I apologise if this isn't just sitting on a
page one click away from the homepage.

The first solution just seems inefficient as it's plausible (but it
may not be true) that Lucene knows what the field is when it does
match.

The second has the drawback, in that the queries get large very
quickly. It starts off okay if you have an index with two fields and
you want to know which field has matched you construct two named
filters. But the queries get quite large as you go across lots of
documents that may have have different fields to search.

Some ideas I've come across:

  • Explain (as hinted at here [1])?
  • SpanQuery [2] give you back a field - could you use that?
  • Highlighting - but it doesn't seem the right answer (am I wrong - if
    so is there a simple example)?

I've been looking through the mailing list for an answer without a lot
of joy. I'm hopeful that there's a better way to return the field
that matched. Below is example of the second solution.

curl -XGET 'http://localhost:9200/index/type/_search?pretty=true' -d '
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"or" : [
{
"fquery" : {
"query" : {
"query_string" : {
"query" : "field1:100"
}
},
"_name" : "field1"
}
},
{
"fquery" : {
"query" : {
"query_string" : {
"query" : "field2:100"
}
},
"_name" : "field2"
}
}
]
}
}
}
}'

[1] Carbon60: Managed Cloud Services
[2] SpanQuery (Lucene 3.0.3 API)

I wonder if you can use "explain" somehow. Explain returns details of
how the score was computed for each hit, including which fields
matched. The details are cryptic and can be interpreted only by a
very knowledgable human. I think it would be possible (albeit not
very easy) to parse the data and boil it down to something more human
readable or something that your code could make decisions upon. That
said, I'm not certain exactly how explain deals with match_all.

On Mar 22, 3:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There is no way in Lucene to tell which field the query matched on. Well, the query itself is constructed against a single field, but, doing a search on _all will not tell you which field it ended up matching on (and supporting that will make _all useless perf wise).

One could try and hack and track all the query AST and see if it matched something, and then check that it actually ended up as one of the document hits, but again, that would make the whole execution stack much slower.

The SpanQuery javadoc is a bit misleading. It returns the field the query was executed on, which in case of span, you ned up executing the span on a single field.

-shay.banon

On Tuesday, March 22, 2011 at 3:18 AM, Andrew wrote:

Hi,

I have two current ways that I've been returning which field matched
from a query:

  • Going through the results and doing an in memory search through each
    result and doing a string contains,
  • Creating a named filter for each field.

This seems like a FAQ, so I apologise if this isn't just sitting on a
page one click away from the homepage.

The first solution just seems inefficient as it's plausible (but it
may not be true) that Lucene knows what the field is when it does
match.

The second has the drawback, in that the queries get large very
quickly. It starts off okay if you have an index with two fields and
you want to know which field has matched you construct two named
filters. But the queries get quite large as you go across lots of
documents that may have have different fields to search.

Some ideas I've come across:

  • Explain (as hinted at here [1])?
  • SpanQuery [2] give you back a field - could you use that?
  • Highlighting - but it doesn't seem the right answer (am I wrong - if
    so is there a simple example)?

I've been looking through the mailing list for an answer without a lot
of joy. I'm hopeful that there's a better way to return the field
that matched. Below is example of the second solution.

curl -XGET 'http://localhost:9200/index/type/_search?pretty=true'-d '
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"or" : [
{
"fquery" : {
"query" : {
"query_string" : {
"query" : "field1:100"
}
},
"_name" : "field1"
}
},
{
"fquery" : {
"query" : {
"query_string" : {
"query" : "field2:100"
}
},
"_name" : "field2"
}
}
]
}
}
}
}'

[1]Carbon60: Managed Cloud Services...
[2]Index of /__root/docs.lucene.apache.org/core/3_0_3/api/core/org/apache/lucene/search...

Right, explain does not have a specific element for field, its just text. Though, using explain will cause for slower searches (how slow?, really depends on the type of query).
On Tuesday, March 22, 2011 at 5:53 PM, Tim Scott wrote:

I wonder if you can use "explain" somehow. Explain returns details of
how the score was computed for each hit, including which fields
matched. The details are cryptic and can be interpreted only by a
very knowledgable human. I think it would be possible (albeit not
very easy) to parse the data and boil it down to something more human
readable or something that your code could make decisions upon. That
said, I'm not certain exactly how explain deals with match_all.

On Mar 22, 3:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There is no way in Lucene to tell which field the query matched on. Well, the query itself is constructed against a single field, but, doing a search on _all will not tell you which field it ended up matching on (and supporting that will make _all useless perf wise).

One could try and hack and track all the query AST and see if it matched something, and then check that it actually ended up as one of the document hits, but again, that would make the whole execution stack much slower.

The SpanQuery javadoc is a bit misleading. It returns the field the query was executed on, which in case of span, you ned up executing the span on a single field.

-shay.banon

On Tuesday, March 22, 2011 at 3:18 AM, Andrew wrote:

Hi,

I have two current ways that I've been returning which field matched
from a query:

  • Going through the results and doing an in memory search through each
    result and doing a string contains,
  • Creating a named filter for each field.

This seems like a FAQ, so I apologise if this isn't just sitting on a
page one click away from the homepage.

The first solution just seems inefficient as it's plausible (but it
may not be true) that Lucene knows what the field is when it does
match.

The second has the drawback, in that the queries get large very
quickly. It starts off okay if you have an index with two fields and
you want to know which field has matched you construct two named
filters. But the queries get quite large as you go across lots of
documents that may have have different fields to search.

Some ideas I've come across:

  • Explain (as hinted at here [1])?
  • SpanQuery [2] give you back a field - could you use that?
  • Highlighting - but it doesn't seem the right answer (am I wrong - if
    so is there a simple example)?

I've been looking through the mailing list for an answer without a lot
of joy. I'm hopeful that there's a better way to return the field
that matched. Below is example of the second solution.

curl -XGET 'http://localhost:9200/index/type/_search?pretty=true'-d '
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"or" : [
{
"fquery" : {
"query" : {
"query_string" : {
"query" : "field1:100"
}
},
"_name" : "field1"
}
},
{
"fquery" : {
"query" : {
"query_string" : {
"query" : "field2:100"
}
},
"_name" : "field2"
}
}
]
}
}
}
}'

[1]Carbon60: Managed Cloud Services...
[2]Index of /__root/docs.lucene.apache.org/core/3_0_3/api/core/org/apache/lucene/search...