As I understand it is important to use the same analyzer type when indexing
and searching,
but I have a hard time figuring out which index analyzer that matches the
query "analyzer".
If I e.g. want to search on prefix of words I suspect that I should use the
QueryBuilders.prefixQuery(...) but which index analyzer fits this (an NGram
analyzer)?
Is there a general rule of thumb, documentation or how can I figure this out
myself?
Furthermore I have a vague idea that searching can be done using queries and
filters and that filters are faster (see http://www.elasticsearch.org/guide/reference/query-dsl/),
but I cannot figure out how to make a filter search in Java.
You can do prefix query on terms broken down by any analyzer. The standard
analyzer can break: "brown fox" into two terms, "brown" and "fox", and you
can do a prefix query on "bro" and find that document.
You can use ngrams in order to analyze the text differently, in which case,
usually, you won't need to do prefix query because of the way ngrams work.
Regarding using filters with the Java API, for example:
On Mon, Sep 26, 2011 at 5:49 PM, Trym trym@sigmat.dk wrote:
Hi
As I understand it is important to use the same analyzer type when indexing
and searching,
but I have a hard time figuring out which index analyzer that matches the
query "analyzer".
If I e.g. want to search on prefix of words I suspect that I should use the
QueryBuilders.prefixQuery(...) but which index analyzer fits this (an NGram
analyzer)?
Is there a general rule of thumb, documentation or how can I figure this
out myself?
I hoped there would be a performance gain when searching for prefix queries
if I had used the ngram analyzer when indexing, but this seems not to be the
case?
Can you describe a use case that could not be solved without the nGram index
analyzer?
The Java documentation on QueryBuilders.filteredQuery says that it applies a
filter on the result of another query. Does that mean that the matchAllQuery
returns all result to the callee and then the callee filters these (and can
I think of the callee as a node, a shard or a Lucene instance)? And does
Lucene have some similar concepts and can you point me to a description of
these?
On Tue, Sep 27, 2011 at 9:08 AM, Trym trym@sigmat.dk wrote:
Hi
Thanks for your kind reply.
I hoped there would be a performance gain when searching for prefix queries
if I had used the ngram analyzer when indexing, but this seems not to be the
case?
Can you describe a use case that could not be solved without the nGram
index analyzer?
With ngrams, you usually don't need to use prefix queries, so thats your
perf gain.
The Java documentation on QueryBuilders.**filteredQuery says that it
applies a filter on the result of another query. Does that mean that the
matchAllQuery returns all result to the callee and then the callee filters
these (and can I think of the callee as a node, a shard or a Lucene
instance)? And does Lucene have some similar concepts and can you point me
to a description of these?
Yes, Lucene has those concepts. FilteredQuery, Filters, and Queries.
What performance penalty do I get when searching using a PrefixQuery if I
have used the standard index analyzer?
The Java documentation on QueryBuilders.**filteredQuery says that it
applies a filter on the result of another query. Does that mean that the
matchAllQuery returns all result to the callee and then the callee filters
these (and can I think of the callee as a node, a shard or a Lucene
instance)?
On Wed, Sep 28, 2011 at 9:23 AM, Trym trym@sigmat.dk wrote:
Hi Shay
Thanks again for your reply.
What performance penalty do I get when searching using a PrefixQuery if
I have used the standard index analyzer?
The analyzer is not really relevant here. Prefix query will cause all terms
that start with the prefix to be enumerated.
The Java documentation on QueryBuilders.filteredQuery says that it
applies a filter on the result of another query. Does that mean that the
matchAllQuery returns all result to the callee and then the callee filters
these (and can I think of the callee as a node, a shard or a Lucene
instance)?
No, the query execution is done per shard. A query can be a filtered query,
and then it will apply a filter to a query, but it will still execute on the
"shard" level.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.