Should I disable scoring somehow for performance reasons?


(André Hänsel) #1

I have an index with about a million entries with a rather dynamic
structure. The queries I want to perform are of the kind: "give me exactly
the entries that contain 'abc' and 'def' in any two fields". The
results can be returned in an arbitrary order.

Since I still need the analyzer for accent and case folding and the like,
the query I end up with is:
{
"query": {
"match": {
"_all": {
"query": "abc def",
"operator": "and"
}
}
}
}

I have the exact same dataset in Sphinx with min_infix_length = 3 (the
Sphinx equivalent of "min_gram: 3") and on my development machine no query
takes longer than 100 ms (in Sphinx).

I haven't even activated the ngrams in Elasticsearch yet and it already
takes more than 500 ms to search for a new set of terms.

Now I wonder if this rather bad performance is related to the scoring that
is performed and what I can do to make it (much) better. Can I maybe turn
off the scoring (since I don't care about the order at all) and get better
performance by that?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed6690b3-0116-46ef-bdb2-649c2d3e50e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #2

Hello ,

If you use filters rather than query , the scoring should be out of
picture.

Thanks
Vineeth

On Wed, Sep 3, 2014 at 9:34 PM, André Hänsel andre@webkr.de wrote:

I have an index with about a million entries with a rather dynamic
structure. The queries I want to perform are of the kind: "give me exactly
the entries that contain 'abc' and 'def' in any two fields". The
results can be returned in an arbitrary order.

Since I still need the analyzer for accent and case folding and the like,
the query I end up with is:
{
"query": {
"match": {
"_all": {
"query": "abc def",
"operator": "and"
}
}
}
}

I have the exact same dataset in Sphinx with min_infix_length = 3 (the
Sphinx equivalent of "min_gram: 3") and on my development machine no query
takes longer than 100 ms (in Sphinx).

I haven't even activated the ngrams in Elasticsearch yet and it already
takes more than 500 ms to search for a new set of terms.

Now I wonder if this rather bad performance is related to the scoring that
is performed and what I can do to make it (much) better. Can I maybe turn
off the scoring (since I don't care about the order at all) and get better
performance by that?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ed6690b3-0116-46ef-bdb2-649c2d3e50e2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ed6690b3-0116-46ef-bdb2-649c2d3e50e2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kMwT-%3Ddk7p%2Bhn39SewPkuMEc9sHmVNcvEfY-n0ePZuKg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(André Hänsel) #3

I thought I need a query (instead of a filter) to use the analyzer. How can
I do this? Sorry, I'm rather new to Elasticsearch.

Also, is that the best way to go for this kind of query?

On Wednesday, September 3, 2014 6:13:15 PM UTC+2, vineeth mohan wrote:

Hello ,

If you use filters rather than query , the scoring should be out of
picture.

Thanks
Vineeth

On Wed, Sep 3, 2014 at 9:34 PM, André Hänsel <an...@webkr.de <javascript:>

wrote:

I have an index with about a million entries with a rather dynamic
structure. The queries I want to perform are of the kind: "give me exactly
the entries that contain 'abc' and 'def' in any two fields". The
results can be returned in an arbitrary order.

Since I still need the analyzer for accent and case folding and the like,
the query I end up with is:
{
"query": {
"match": {
"_all": {
"query": "abc def",
"operator": "and"
}
}
}
}

I have the exact same dataset in Sphinx with min_infix_length = 3 (the
Sphinx equivalent of "min_gram: 3") and on my development machine no query
takes longer than 100 ms (in Sphinx).

I haven't even activated the ngrams in Elasticsearch yet and it already
takes more than 500 ms to search for a new set of terms.

Now I wonder if this rather bad performance is related to the scoring
that is performed and what I can do to make it (much) better. Can I maybe
turn off the scoring (since I don't care about the order at all) and get
better performance by that?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ed6690b3-0116-46ef-bdb2-649c2d3e50e2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ed6690b3-0116-46ef-bdb2-649c2d3e50e2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/908c48c1-13a4-4a61-9759-d5228b06c086%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #4

Hi ,

Analyzer are used either way. Just that score is not computed for filters.
Also there are some permanence optimizations using caching in filters -
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/filter-caching.html
Try that too.

Thanks
Vineeth

On Wed, Sep 3, 2014 at 9:48 PM, André Hänsel andre@webkr.de wrote:

I thought I need a query (instead of a filter) to use the analyzer. How
can I do this? Sorry, I'm rather new to Elasticsearch.

Also, is that the best way to go for this kind of query?

On Wednesday, September 3, 2014 6:13:15 PM UTC+2, vineeth mohan wrote:

Hello ,

If you use filters rather than query , the scoring should be out of
picture.

Thanks
Vineeth

On Wed, Sep 3, 2014 at 9:34 PM, André Hänsel an...@webkr.de wrote:

I have an index with about a million entries with a rather dynamic
structure. The queries I want to perform are of the kind: "give me exactly
the entries that contain 'abc' and 'def' in any two fields". The
results can be returned in an arbitrary order.

Since I still need the analyzer for accent and case folding and the
like, the query I end up with is:
{
"query": {
"match": {
"_all": {
"query": "abc def",
"operator": "and"
}
}
}
}

I have the exact same dataset in Sphinx with min_infix_length = 3 (the
Sphinx equivalent of "min_gram: 3") and on my development machine no query
takes longer than 100 ms (in Sphinx).

I haven't even activated the ngrams in Elasticsearch yet and it already
takes more than 500 ms to search for a new set of terms.

Now I wonder if this rather bad performance is related to the scoring
that is performed and what I can do to make it (much) better. Can I maybe
turn off the scoring (since I don't care about the order at all) and get
better performance by that?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ed6690b3-0116-46ef-bdb2-649c2d3e50e2%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ed6690b3-0116-46ef-bdb2-649c2d3e50e2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/908c48c1-13a4-4a61-9759-d5228b06c086%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/908c48c1-13a4-4a61-9759-d5228b06c086%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kYMRGK8-U8MS2C5mQVJtZ5Pd-GE2pZK_RdCa3-6pGxxA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(André Hänsel) #5

Indeed I forgot about the query filter.

Here's my new query:

{
"query": {
"filtered": {
"filter": {
"query": {
"match": {
"_all": {
"query": "abc def",
"operator": "and"
}
}
}
}
}
}
}

Now all the documents have a score of 1, but the performance is still
really bad. Any further suggestions?

On Wednesday, September 3, 2014 6:26:51 PM UTC+2, vineeth mohan wrote:

Hi ,

Analyzer are used either way. Just that score is not computed for filters.
Also there are some permanence optimizations using caching in filters -
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/filter-caching.html
Try that too.

Thanks
Vineeth

On Wed, Sep 3, 2014 at 9:48 PM, André Hänsel <an...@webkr.de <javascript:>

wrote:

I thought I need a query (instead of a filter) to use the analyzer. How
can I do this? Sorry, I'm rather new to Elasticsearch.

Also, is that the best way to go for this kind of query?

On Wednesday, September 3, 2014 6:13:15 PM UTC+2, vineeth mohan wrote:

Hello ,

If you use filters rather than query , the scoring should be out of
picture.

Thanks
Vineeth

On Wed, Sep 3, 2014 at 9:34 PM, André Hänsel an...@webkr.de wrote:

I have an index with about a million entries with a rather dynamic
structure. The queries I want to perform are of the kind: "give me exactly
the entries that contain 'abc' and 'def' in any two fields". The
results can be returned in an arbitrary order.

Since I still need the analyzer for accent and case folding and the
like, the query I end up with is:
{
"query": {
"match": {
"_all": {
"query": "abc def",
"operator": "and"
}
}
}
}

I have the exact same dataset in Sphinx with min_infix_length = 3 (the
Sphinx equivalent of "min_gram: 3") and on my development machine no query
takes longer than 100 ms (in Sphinx).

I haven't even activated the ngrams in Elasticsearch yet and it already
takes more than 500 ms to search for a new set of terms.

Now I wonder if this rather bad performance is related to the scoring
that is performed and what I can do to make it (much) better. Can I maybe
turn off the scoring (since I don't care about the order at all) and get
better performance by that?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ed6690b3-0116-46ef-bdb2-649c2d3e50e2%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ed6690b3-0116-46ef-bdb2-649c2d3e50e2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/908c48c1-13a4-4a61-9759-d5228b06c086%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/908c48c1-13a4-4a61-9759-d5228b06c086%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3f5739ba-50e1-4f99-9cd3-7a44c57c0eb9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6