I have an index with about a million entries with a rather dynamic
structure. The queries I want to perform are of the kind: "give me exactly
the entries that contain 'abc' and 'def' in any two fields". The
results can be returned in an arbitrary order.
Since I still need the analyzer for accent and case folding and the like,
the query I end up with is:
{
"query": {
"match": {
"_all": {
"query": "abc def",
"operator": "and"
}
}
}
}
I have the exact same dataset in Sphinx with min_infix_length = 3 (the
Sphinx equivalent of "min_gram: 3") and on my development machine no query
takes longer than 100 ms (in Sphinx).
I haven't even activated the ngrams in Elasticsearch yet and it already
takes more than 500 ms to search for a new set of terms.
Now I wonder if this rather bad performance is related to the scoring that
is performed and what I can do to make it (much) better. Can I maybe turn
off the scoring (since I don't care about the order at all) and get better
performance by that?
If you use filters rather than query , the scoring should be out of
picture.
Thanks
Vineeth
On Wed, Sep 3, 2014 at 9:34 PM, André Hänsel andre@webkr.de wrote:
I have an index with about a million entries with a rather dynamic
structure. The queries I want to perform are of the kind: "give me exactly
the entries that contain 'abc' and 'def' in any two fields". The
results can be returned in an arbitrary order.
Since I still need the analyzer for accent and case folding and the like,
the query I end up with is:
{
"query": {
"match": {
"_all": {
"query": "abc def",
"operator": "and"
}
}
}
}
I have the exact same dataset in Sphinx with min_infix_length = 3 (the
Sphinx equivalent of "min_gram: 3") and on my development machine no query
takes longer than 100 ms (in Sphinx).
I haven't even activated the ngrams in Elasticsearch yet and it already
takes more than 500 ms to search for a new set of terms.
Now I wonder if this rather bad performance is related to the scoring that
is performed and what I can do to make it (much) better. Can I maybe turn
off the scoring (since I don't care about the order at all) and get better
performance by that?
I thought I need a query (instead of a filter) to use the analyzer. How can
I do this? Sorry, I'm rather new to Elasticsearch.
Also, is that the best way to go for this kind of query?
On Wednesday, September 3, 2014 6:13:15 PM UTC+2, vineeth mohan wrote:
Hello ,
If you use filters rather than query , the scoring should be out of
picture.
Thanks
Vineeth
On Wed, Sep 3, 2014 at 9:34 PM, André Hänsel <an...@webkr.de <javascript:>
wrote:
I have an index with about a million entries with a rather dynamic
structure. The queries I want to perform are of the kind: "give me exactly
the entries that contain 'abc' and 'def' in any two fields". The
results can be returned in an arbitrary order.
Since I still need the analyzer for accent and case folding and the like,
the query I end up with is:
{
"query": {
"match": {
"_all": {
"query": "abc def",
"operator": "and"
}
}
}
}
I have the exact same dataset in Sphinx with min_infix_length = 3 (the
Sphinx equivalent of "min_gram: 3") and on my development machine no query
takes longer than 100 ms (in Sphinx).
I haven't even activated the ngrams in Elasticsearch yet and it already
takes more than 500 ms to search for a new set of terms.
Now I wonder if this rather bad performance is related to the scoring
that is performed and what I can do to make it (much) better. Can I maybe
turn off the scoring (since I don't care about the order at all) and get
better performance by that?
Analyzer are used either way. Just that score is not computed for filters.
Also there are some permanence optimizations using caching in filters -
Try that too.
Thanks
Vineeth
On Wed, Sep 3, 2014 at 9:48 PM, André Hänsel andre@webkr.de wrote:
I thought I need a query (instead of a filter) to use the analyzer. How
can I do this? Sorry, I'm rather new to Elasticsearch.
Also, is that the best way to go for this kind of query?
On Wednesday, September 3, 2014 6:13:15 PM UTC+2, vineeth mohan wrote:
Hello ,
If you use filters rather than query , the scoring should be out of
picture.
Thanks
Vineeth
On Wed, Sep 3, 2014 at 9:34 PM, André Hänsel an...@webkr.de wrote:
I have an index with about a million entries with a rather dynamic
structure. The queries I want to perform are of the kind: "give me exactly
the entries that contain 'abc' and 'def' in any two fields". The
results can be returned in an arbitrary order.
Since I still need the analyzer for accent and case folding and the
like, the query I end up with is:
{
"query": {
"match": {
"_all": {
"query": "abc def",
"operator": "and"
}
}
}
}
I have the exact same dataset in Sphinx with min_infix_length = 3 (the
Sphinx equivalent of "min_gram: 3") and on my development machine no query
takes longer than 100 ms (in Sphinx).
I haven't even activated the ngrams in Elasticsearch yet and it already
takes more than 500 ms to search for a new set of terms.
Now I wonder if this rather bad performance is related to the scoring
that is performed and what I can do to make it (much) better. Can I maybe
turn off the scoring (since I don't care about the order at all) and get
better performance by that?
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
On Wed, Sep 3, 2014 at 9:48 PM, André Hänsel <an...@webkr.de <javascript:>
wrote:
I thought I need a query (instead of a filter) to use the analyzer. How
can I do this? Sorry, I'm rather new to Elasticsearch.
Also, is that the best way to go for this kind of query?
On Wednesday, September 3, 2014 6:13:15 PM UTC+2, vineeth mohan wrote:
Hello ,
If you use filters rather than query , the scoring should be out of
picture.
Thanks
Vineeth
On Wed, Sep 3, 2014 at 9:34 PM, André Hänsel an...@webkr.de wrote:
I have an index with about a million entries with a rather dynamic
structure. The queries I want to perform are of the kind: "give me exactly
the entries that contain 'abc' and 'def' in any two fields". The
results can be returned in an arbitrary order.
Since I still need the analyzer for accent and case folding and the
like, the query I end up with is:
{
"query": {
"match": {
"_all": {
"query": "abc def",
"operator": "and"
}
}
}
}
I have the exact same dataset in Sphinx with min_infix_length = 3 (the
Sphinx equivalent of "min_gram: 3") and on my development machine no query
takes longer than 100 ms (in Sphinx).
I haven't even activated the ngrams in Elasticsearch yet and it already
takes more than 500 ms to search for a new set of terms.
Now I wonder if this rather bad performance is related to the scoring
that is performed and what I can do to make it (much) better. Can I maybe
turn off the scoring (since I don't care about the order at all) and get
better performance by that?
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.