I'd like to support queries such as "one two" near "three four" which
isn't supported by the query string syntax by the looks of it. The query
would be run against a number of fields with differing analyzers. Currently
we query using querystring and specify the fields we want to match against
and this works well (the correct analysis is used for each field).
However, trying to recreate this behaviour using spannears had some issues:
span terms not being analyzed correctly
the query dict had to be bool ored together at the very top to
accomodate querying against the various fields (there's no way to specify
multiple fields for spans)
For the analysis we could do that client side, but that sounds
unsustainable. How would others tackle this? Would it be a simple patch
to elasticsearch to support the above (perhaps extending the query syntax
would be easy?). Where would I look to do this?
There are various ways to achieve what you want, but none which are easy.
Extending the query syntax is difficult since Elasticsearch is built on
Lucene and the query syntax is the same. Creating a new query syntax means
breaking the dependency on Lucene. Solr does have a SurroundQueryParser
which supports span queries, but the terms are still not analyzed. Have you
tried using phrase support in the match queries?
Span terms are not analyzed at all, so they are just not analyzed
incorrectly. I bite the bullet an analyze the terms on the client side. And
I also combine various span queries with a boolean query to support
multiple fields (and DisMax queries as well). Not an ideal solution, but
better than hacking a new query syntax IMHO.
I'd like to support queries such as "one two" near "three four" which
isn't supported by the query string syntax by the looks of it. The query
would be run against a number of fields with differing analyzers. Currently
we query using querystring and specify the fields we want to match against
and this works well (the correct analysis is used for each field).
However, trying to recreate this behaviour using spannears had some issues:
span terms not being analyzed correctly
the query dict had to be bool ored together at the very top to
accomodate querying against the various fields (there's no way to specify
multiple fields for spans)
For the analysis we could do that client side, but that sounds
unsustainable. How would others tackle this? Would it be a simple patch
to elasticsearch to support the above (perhaps extending the query syntax
would be easy?). Where would I look to do this?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.