Query response times

shift · January 23, 2013, 7:21pm

Searches can be very slow, especially when I need to search a large field
like @message. It can take up to 45 seconds. The time improves if I do
not need to use asterisks, it'll reduce from 45 seconds to 9 seconds. If I
select which index to search, it'll reduce to 0.51 seconds (no asterisks),
or 12.9 seconds (with asterisks), times vary. Unfortunately, some users
will search for generic strings that require us to append asterisks to find
results.

I am using hourly indexes, keeping 24 hours total (but hope to increase
this to 7 days eventually), at peak load an index can contain 69,308,904
documents, with a size of 33GB (or 66GB replicated).

What can I do to improve these queries? I need to address the need for
using asterisks and route the user to the appropriate index if possible.
Should I try index routing? Are there any good example templates?

Here is an example @message:
A|aBCdef|Jan 22 08:32:26 2013|log.sample.app.call.SampleSvr|12345|node|
123456|bar |CodeName.cpp|123|***** START OF A LONG MESSAGE *****|12345.0123

Here is an example query:

{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "@message",
"query": "SampleSvr"
}
}
],
"must_not": [ ],
"should": [ ]
}
},
"from": 0,
"size": 50,
"sort": [ ],
"facets": { }
}

--

egaumer · January 23, 2013, 7:47pm

Leading wildcards are expensive.

In your example query above, you shouldn't need to use wildcards if you've
properly tokenized the input.

log.sample.app.call.SampleSvr|12345|node|

Would be tokenized as...

[log, sample, app, call, samplesvr, 12345, node]

With proper tokenization, punctuation would cause the token to be split
allowing you to simply search for "simplesvr" without the wildcards. I
would focus on this before you descend into routing semantics.

On Wednesday, January 23, 2013 2:21:49 PM UTC-5, shift wrote:

Searches can be very slow, especially when I need to search a large field
like @message. It can take up to 45 seconds. The time improves if I do
not need to use asterisks, it'll reduce from 45 seconds to 9 seconds. If I
select which index to search, it'll reduce to 0.51 seconds (no asterisks),
or 12.9 seconds (with asterisks), times vary. Unfortunately, some users
will search for generic strings that require us to append asterisks to find
results.

I am using hourly indexes, keeping 24 hours total (but hope to increase
this to 7 days eventually), at peak load an index can contain 69,308,904
documents, with a size of 33GB (or 66GB replicated).

What can I do to improve these queries? I need to address the need for
using asterisks and route the user to the appropriate index if possible.
Should I try index routing? Are there any good example templates?

Here is an example @message:
A|aBCdef|Jan 22 08:32:26 2013|log.sample.app.call.SampleSvr|12345|node|
123456|bar |CodeName.cpp|123|***** START OF A LONG MESSAGE *****|12345.0123

Here is an example query:

{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "@message",
"query": "SampleSvr"
}
}
],
"must_not": ,
"should":
}
},
"from": 0,
"size": 50,
"sort": ,
"facets": { }
}

--

shift · January 23, 2013, 10:04pm

Thanks, I tokenized it with a nonword pattern and it's working great
without wildcards. I will continue to research tokenizing, but this is a
vast improvement already.

i.e. -

"settings" : {
"index.analysis.analyzer.nonword.type" : "pattern",
"index.analysis.analyzer.nonword.pattern" : "[^\w]+"
},
"mappings" : {
"tcp" : {
"properties" : {
"@message" : {
"type" : "string",
"analyzer" : "nonword"
}
}
}
}

On Wednesday, January 23, 2013 2:21:49 PM UTC-5, shift wrote:

Searches can be very slow, especially when I need to search a large field
like @message. It can take up to 45 seconds. The time improves if I do
not need to use asterisks, it'll reduce from 45 seconds to 9 seconds. If I
select which index to search, it'll reduce to 0.51 seconds (no asterisks),
or 12.9 seconds (with asterisks), times vary. Unfortunately, some users
will search for generic strings that require us to append asterisks to find
results.

I am using hourly indexes, keeping 24 hours total (but hope to increase
this to 7 days eventually), at peak load an index can contain 69,308,904
documents, with a size of 33GB (or 66GB replicated).

What can I do to improve these queries? I need to address the need for
using asterisks and route the user to the appropriate index if possible.
Should I try index routing? Are there any good example templates?

Here is an example @message:
A|aBCdef|Jan 22 08:32:26 2013|log.sample.app.call.SampleSvr|12345|node|
123456|bar |CodeName.cpp|123|***** START OF A LONG MESSAGE *****|12345.0123

Here is an example query:

{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "@message",
"query": "SampleSvr"
}
}
],
"must_not": ,
"should":
}
},
"from": 0,
"size": 50,
"sort": ,
"facets": { }
}

--

Justin_Treher · January 24, 2013, 1:37am

Shift - Leading wildcards are index killers period, not just in ES. Try
putting a leading wildcard against a regular database table with a few
million rows.

On Wednesday, January 23, 2013 5:04:08 PM UTC-5, shift wrote:

Thanks, I tokenized it with a nonword pattern and it's working great
without wildcards. I will continue to research tokenizing, but this is a
vast improvement already.

i.e. -

"settings" : {
"index.analysis.analyzer.nonword.type" : "pattern",
"index.analysis.analyzer.nonword.pattern" : "[^\w]+"
},
"mappings" : {
"tcp" : {
"properties" : {
"@message" : {
"type" : "string",
"analyzer" : "nonword"
}
}
}
}

On Wednesday, January 23, 2013 2:21:49 PM UTC-5, shift wrote:

Searches can be very slow, especially when I need to search a large field
like @message. It can take up to 45 seconds. The time improves if I do
not need to use asterisks, it'll reduce from 45 seconds to 9 seconds. If I
select which index to search, it'll reduce to 0.51 seconds (no asterisks),
or 12.9 seconds (with asterisks), times vary. Unfortunately, some users
will search for generic strings that require us to append asterisks to find
results.

I am using hourly indexes, keeping 24 hours total (but hope to increase
this to 7 days eventually), at peak load an index can contain 69,308,904
documents, with a size of 33GB (or 66GB replicated).

What can I do to improve these queries? I need to address the need for
using asterisks and route the user to the appropriate index if possible.
Should I try index routing? Are there any good example templates?

Here is an example @message:
A|aBCdef|Jan 22 08:32:26 2013|log.sample.app.call.SampleSvr|12345|node|
123456|bar |CodeName.cpp|123|***** START OF A LONG MESSAGE *****|12345.0123

Here is an example query:

{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "@message",
"query": "SampleSvr"
}
}
],
"must_not": ,
"should":
}
},
"from": 0,
"size": 50,
"sort": ,
"facets": { }
}

--

egaumer · January 24, 2013, 2:34am

If you really need leading wildcards, the trick is to index each token in
reverse order (i.e., backwards). The result is that leading wildcard
searches become trailing wildcard searches which are more efficient.

On Wednesday, January 23, 2013 8:37:11 PM UTC-5, jtr...@gmail.com wrote:

Shift - Leading wildcards are index killers period, not just in ES. Try
putting a leading wildcard against a regular database table with a few
million rows.

On Wednesday, January 23, 2013 5:04:08 PM UTC-5, shift wrote:

Thanks, I tokenized it with a nonword pattern and it's working great
without wildcards. I will continue to research tokenizing, but this is a
vast improvement already.

i.e. -

"settings" : {
"index.analysis.analyzer.nonword.type" : "pattern",
"index.analysis.analyzer.nonword.pattern" : "[^\w]+"
},
"mappings" : {
"tcp" : {
"properties" : {
"@message" : {
"type" : "string",
"analyzer" : "nonword"
}
}
}
}

On Wednesday, January 23, 2013 2:21:49 PM UTC-5, shift wrote:

Searches can be very slow, especially when I need to search a large
field like @message. It can take up to 45 seconds. The time improves if I
do not need to use asterisks, it'll reduce from 45 seconds to 9 seconds.
If I select which index to search, it'll reduce to 0.51 seconds (no
asterisks), or 12.9 seconds (with asterisks), times vary. Unfortunately,
some users will search for generic strings that require us to append
asterisks to find results.

I am using hourly indexes, keeping 24 hours total (but hope to increase
this to 7 days eventually), at peak load an index can contain 69,308,904
documents, with a size of 33GB (or 66GB replicated).

What can I do to improve these queries? I need to address the need for
using asterisks and route the user to the appropriate index if possible.
Should I try index routing? Are there any good example templates?

Here is an example @message:
A|aBCdef|Jan 22 08:32:26 2013|log.sample.app.call.SampleSvr|12345|node|
123456|bar |CodeName.cpp|123|***** START OF A LONG MESSAGE *****|12345.0123

Here is an example query:

{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "@message",
"query": "SampleSvr"
}
}
],
"must_not": ,
"should":
}
},
"from": 0,
"size": 50,
"sort": ,
"facets": { }
}

--

Topic		Replies	Views
Query response times Elasticsearch	1	321	July 6, 2017
Is my response time is ok? Elasticsearch	18	7998	July 6, 2018
Slow query Elasticsearch	3	285	July 6, 2017
Why is my query slow? Elasticsearch	9	7294	July 5, 2017
Elasticsearch async search very slow Elasticsearch async-search	7	749	October 7, 2022

Query response times

Related topics