Hi all,
I am using Span queries to enable match phrase with fuzziness. I am able to do this on single field but since i am using fuzzy query with span_multi query, I am failing it to use this query for multiple fields based search. Is there a way that i can overcome this issue?
Hi, thanks for your time and response
I have about 150 fields and thousands of documents, so i don't think Index time operation would not be feasible due to the large index size.
can you please provide a link or an example where bool and should array is used with span queries or either the format of its usage?
The bool query provides the building blocks for assembling combinations of all queries (span included) using Boolean logic. You want an OR (span match on field X OR field Y) so you need boolean logic to assemble that expression.
By what I understood, I have to use span query under bool query. Where do i mention all fields in this query? since span query doesn't support fields operator
You have to repeat yourself. Multiple span query objects, each with a different field name (to provide the relevant context) but containing the same search terms ie
I am looking for a query similar to this using span queries. Since i have about 150 fields, I don't feel it right to have 150 span query objects. Is there a way to mention "fields" operator like in the below example?
No, sorry. Span is used on text fields and generally an index has many structured fields but only one or two unstructured text fields to capture what can't be expressed in structured data.
You perhaps don't need a phrase/span query that supports fuzzy.
A common strategy is to have a big bool query with a should array filled with different forms of running the same user input, ranging from the sloppy "any word plus fuzzy" to the very strict e.g. ANDed terms or exact phrase matches. Docs which satisfy more of the given clauses will naturally rank higher.
Try taking away request elements until you find the culprit. Wildcards and highlighting are both potential performance hogs. You could also try the unified highlighter type. "Fast Vector Highlighter" isn't always as fast as the name suggests.
My project demands fuzziness, wildcard, highlighting. Indeed unified is faster than fvh. Will unified highlighting be deprecated in near future? Unified takes around 4571ms for a single document. Is it because of the edge gram analyzer that i have used?
This is straying from the original topic. In the interests of keeping things focused I suggest opening another issue to concentrate on questions around highlighter performance. It would help to do some investigation with your data and settings first eg the effects of ngram sizes/numbers of fields otherwise you'll just be waiting for someone to ask your for that additional information before being able to offer a diagnosis.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.