Mapping between index and search analyzer


(Trym) #1

Hi

As I understand it is important to use the same analyzer type when indexing
and searching,
but I have a hard time figuring out which index analyzer that matches the
query "analyzer".
If I e.g. want to search on prefix of words I suspect that I should use the
QueryBuilders.prefixQuery(...) but which index analyzer fits this (an NGram
analyzer)?
Is there a general rule of thumb, documentation or how can I figure this out
myself?

Furthermore I have a vague idea that searching can be done using queries and
filters and that filters are faster (see
http://www.elasticsearch.org/guide/reference/query-dsl/),
but I cannot figure out how to make a filter search in Java.

Any help is great.

Best regards Trym


(Shay Banon) #2

You can do prefix query on terms broken down by any analyzer. The standard
analyzer can break: "brown fox" into two terms, "brown" and "fox", and you
can do a prefix query on "bro" and find that document.

You can use ngrams in order to analyze the text differently, in which case,
usually, you won't need to do prefix query because of the way ngrams work.

Regarding using filters with the Java API, for example:

client.prepareSearch("my_index")
.setQuery(QueryBuilders.filteredQuery(
QueryBuilders.matchAllQuery(),
FilterBuilders.termFilter("field", "prefix")
))
.execute().actionGet();

On Mon, Sep 26, 2011 at 5:49 PM, Trym trym@sigmat.dk wrote:

Hi

As I understand it is important to use the same analyzer type when indexing
and searching,
but I have a hard time figuring out which index analyzer that matches the
query "analyzer".
If I e.g. want to search on prefix of words I suspect that I should use the
QueryBuilders.prefixQuery(...) but which index analyzer fits this (an NGram
analyzer)?
Is there a general rule of thumb, documentation or how can I figure this
out myself?

Furthermore I have a vague idea that searching can be done using queries
and filters and that filters are faster (see
http://www.elasticsearch.org/guide/reference/query-dsl/),
but I cannot figure out how to make a filter search in Java.

Any help is great.

Best regards Trym


(Trym) #3

Hi

Thanks for your kind reply.

I hoped there would be a performance gain when searching for prefix queries
if I had used the ngram analyzer when indexing, but this seems not to be the
case?
Can you describe a use case that could not be solved without the nGram index
analyzer?

The Java documentation on QueryBuilders.filteredQuery says that it applies a
filter on the result of another query. Does that mean that the matchAllQuery
returns all result to the callee and then the callee filters these (and can
I think of the callee as a node, a shard or a Lucene instance)? And does
Lucene have some similar concepts and can you point me to a description of
these?

Thanks in advance.

Best regards Trym


(Shay Banon) #4

On Tue, Sep 27, 2011 at 9:08 AM, Trym trym@sigmat.dk wrote:

Hi

Thanks for your kind reply.

I hoped there would be a performance gain when searching for prefix queries
if I had used the ngram analyzer when indexing, but this seems not to be the
case?
Can you describe a use case that could not be solved without the nGram
index analyzer?

With ngrams, you usually don't need to use prefix queries, so thats your
perf gain.

The Java documentation on QueryBuilders.**filteredQuery says that it
applies a filter on the result of another query. Does that mean that the
matchAllQuery returns all result to the callee and then the callee filters
these (and can I think of the callee as a node, a shard or a Lucene
instance)? And does Lucene have some similar concepts and can you point me
to a description of these?

Yes, Lucene has those concepts. FilteredQuery, Filters, and Queries.

Thanks in advance.

Best regards Trym


(Trym) #5

Hi Shay

Thanks again for your reply.

  1. What performance penalty do I get when searching using a PrefixQuery if I
    have used the standard index analyzer?
  2. The Java documentation on QueryBuilders.**filteredQuery says that it
    applies a filter on the result of another query. Does that mean that the
    matchAllQuery returns all result to the callee and then the callee filters
    these (and can I think of the callee as a node, a shard or a Lucene
    instance)?
  3. I will read further about Lucene

Thanks for any further comments.

Best regards Trym


(Shay Banon) #6

On Wed, Sep 28, 2011 at 9:23 AM, Trym trym@sigmat.dk wrote:

Hi Shay

Thanks again for your reply.

  1. What performance penalty do I get when searching using a PrefixQuery if
    I have used the standard index analyzer?

The analyzer is not really relevant here. Prefix query will cause all terms
that start with the prefix to be enumerated.

  1. The Java documentation on QueryBuilders.filteredQuery says that it
    applies a filter on the result of another query. Does that mean that the
    matchAllQuery returns all result to the callee and then the callee filters
    these (and can I think of the callee as a node, a shard or a Lucene
    instance)?

No, the query execution is done per shard. A query can be a filtered query,
and then it will apply a filter to a query, but it will still execute on the
"shard" level.

  1. I will read further about Lucene

Thanks for any further comments.

Best regards Trym


(system) #7