Query response times

Searches can be very slow, especially when I need to search a large field
like @message. It can take up to 45 seconds. The time improves if I do
not need to use asterisks, it'll reduce from 45 seconds to 9 seconds. If I
select which index to search, it'll reduce to 0.51 seconds (no asterisks),
or 12.9 seconds (with asterisks), times vary. Unfortunately, some users
will search for generic strings that require us to append asterisks to find
results.

I am using hourly indexes, keeping 24 hours total (but hope to increase
this to 7 days eventually), at peak load an index can contain 69,308,904
documents, with a size of 33GB (or 66GB replicated).

What can I do to improve these queries? I need to address the need for
using asterisks and route the user to the appropriate index if possible.
Should I try index routing? Are there any good example templates?

Here is an example @message:
A|aBCdef|Jan 22 08:32:26 2013|log.sample.app.call.SampleSvr|12345|node|
123456|bar |CodeName.cpp|123|***** START OF A LONG MESSAGE *****|12345.0123

Here is an example query:

{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "@message",
"query": "SampleSvr"
}
}
],
"must_not": [ ],
"should": [ ]
}
},
"from": 0,
"size": 50,
"sort": [ ],
"facets": { }
}

--

Leading wildcards are expensive.

In your example query above, you shouldn't need to use wildcards if you've
properly tokenized the input.

log.sample.app.call.SampleSvr|12345|node|

Would be tokenized as...

[log, sample, app, call, samplesvr, 12345, node]

With proper tokenization, punctuation would cause the token to be split
allowing you to simply search for "simplesvr" without the wildcards. I
would focus on this before you descend into routing semantics.

On Wednesday, January 23, 2013 2:21:49 PM UTC-5, shift wrote:

Searches can be very slow, especially when I need to search a large field
like @message. It can take up to 45 seconds. The time improves if I do
not need to use asterisks, it'll reduce from 45 seconds to 9 seconds. If I
select which index to search, it'll reduce to 0.51 seconds (no asterisks),
or 12.9 seconds (with asterisks), times vary. Unfortunately, some users
will search for generic strings that require us to append asterisks to find
results.

I am using hourly indexes, keeping 24 hours total (but hope to increase
this to 7 days eventually), at peak load an index can contain 69,308,904
documents, with a size of 33GB (or 66GB replicated).

What can I do to improve these queries? I need to address the need for
using asterisks and route the user to the appropriate index if possible.
Should I try index routing? Are there any good example templates?

Here is an example @message:
A|aBCdef|Jan 22 08:32:26 2013|log.sample.app.call.SampleSvr|12345|node|
123456|bar |CodeName.cpp|123|***** START OF A LONG MESSAGE *****|12345.0123

Here is an example query:

{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "@message",
"query": "SampleSvr"
}
}
],
"must_not": ,
"should":
}
},
"from": 0,
"size": 50,
"sort": ,
"facets": { }
}

--

Thanks, I tokenized it with a nonword pattern and it's working great
without wildcards. I will continue to research tokenizing, but this is a
vast improvement already.

i.e. -

"settings" : {
"index.analysis.analyzer.nonword.type" : "pattern",
"index.analysis.analyzer.nonword.pattern" : "[^\w]+"
},
"mappings" : {
"tcp" : {
"properties" : {
"@message" : {
"type" : "string",
"analyzer" : "nonword"
}
}
}
}

On Wednesday, January 23, 2013 2:21:49 PM UTC-5, shift wrote:

Searches can be very slow, especially when I need to search a large field
like @message. It can take up to 45 seconds. The time improves if I do
not need to use asterisks, it'll reduce from 45 seconds to 9 seconds. If I
select which index to search, it'll reduce to 0.51 seconds (no asterisks),
or 12.9 seconds (with asterisks), times vary. Unfortunately, some users
will search for generic strings that require us to append asterisks to find
results.

I am using hourly indexes, keeping 24 hours total (but hope to increase
this to 7 days eventually), at peak load an index can contain 69,308,904
documents, with a size of 33GB (or 66GB replicated).

What can I do to improve these queries? I need to address the need for
using asterisks and route the user to the appropriate index if possible.
Should I try index routing? Are there any good example templates?

Here is an example @message:
A|aBCdef|Jan 22 08:32:26 2013|log.sample.app.call.SampleSvr|12345|node|
123456|bar |CodeName.cpp|123|***** START OF A LONG MESSAGE *****|12345.0123

Here is an example query:

{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "@message",
"query": "SampleSvr"
}
}
],
"must_not": ,
"should":
}
},
"from": 0,
"size": 50,
"sort": ,
"facets": { }
}

--

Shift - Leading wildcards are index killers period, not just in ES. Try
putting a leading wildcard against a regular database table with a few
million rows.

On Wednesday, January 23, 2013 5:04:08 PM UTC-5, shift wrote:

Thanks, I tokenized it with a nonword pattern and it's working great
without wildcards. I will continue to research tokenizing, but this is a
vast improvement already.

i.e. -

"settings" : {
"index.analysis.analyzer.nonword.type" : "pattern",
"index.analysis.analyzer.nonword.pattern" : "[^\w]+"
},
"mappings" : {
"tcp" : {
"properties" : {
"@message" : {
"type" : "string",
"analyzer" : "nonword"
}
}
}
}

On Wednesday, January 23, 2013 2:21:49 PM UTC-5, shift wrote:

Searches can be very slow, especially when I need to search a large field
like @message. It can take up to 45 seconds. The time improves if I do
not need to use asterisks, it'll reduce from 45 seconds to 9 seconds. If I
select which index to search, it'll reduce to 0.51 seconds (no asterisks),
or 12.9 seconds (with asterisks), times vary. Unfortunately, some users
will search for generic strings that require us to append asterisks to find
results.

I am using hourly indexes, keeping 24 hours total (but hope to increase
this to 7 days eventually), at peak load an index can contain 69,308,904
documents, with a size of 33GB (or 66GB replicated).

What can I do to improve these queries? I need to address the need for
using asterisks and route the user to the appropriate index if possible.
Should I try index routing? Are there any good example templates?

Here is an example @message:
A|aBCdef|Jan 22 08:32:26 2013|log.sample.app.call.SampleSvr|12345|node|
123456|bar |CodeName.cpp|123|***** START OF A LONG MESSAGE *****|12345.0123

Here is an example query:

{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "@message",
"query": "SampleSvr"
}
}
],
"must_not": ,
"should":
}
},
"from": 0,
"size": 50,
"sort": ,
"facets": { }
}

--

If you really need leading wildcards, the trick is to index each token in
reverse order (i.e., backwards). The result is that leading wildcard
searches become trailing wildcard searches which are more efficient.

On Wednesday, January 23, 2013 8:37:11 PM UTC-5, jtr...@gmail.com wrote:

Shift - Leading wildcards are index killers period, not just in ES. Try
putting a leading wildcard against a regular database table with a few
million rows.

On Wednesday, January 23, 2013 5:04:08 PM UTC-5, shift wrote:

Thanks, I tokenized it with a nonword pattern and it's working great
without wildcards. I will continue to research tokenizing, but this is a
vast improvement already.

i.e. -

"settings" : {
"index.analysis.analyzer.nonword.type" : "pattern",
"index.analysis.analyzer.nonword.pattern" : "[^\w]+"
},
"mappings" : {
"tcp" : {
"properties" : {
"@message" : {
"type" : "string",
"analyzer" : "nonword"
}
}
}
}

On Wednesday, January 23, 2013 2:21:49 PM UTC-5, shift wrote:

Searches can be very slow, especially when I need to search a large
field like @message. It can take up to 45 seconds. The time improves if I
do not need to use asterisks, it'll reduce from 45 seconds to 9 seconds.
If I select which index to search, it'll reduce to 0.51 seconds (no
asterisks), or 12.9 seconds (with asterisks), times vary. Unfortunately,
some users will search for generic strings that require us to append
asterisks to find results.

I am using hourly indexes, keeping 24 hours total (but hope to increase
this to 7 days eventually), at peak load an index can contain 69,308,904
documents, with a size of 33GB (or 66GB replicated).

What can I do to improve these queries? I need to address the need for
using asterisks and route the user to the appropriate index if possible.
Should I try index routing? Are there any good example templates?

Here is an example @message:
A|aBCdef|Jan 22 08:32:26 2013|log.sample.app.call.SampleSvr|12345|node|
123456|bar |CodeName.cpp|123|***** START OF A LONG MESSAGE *****|12345.0123

Here is an example query:

{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "@message",
"query": "SampleSvr"
}
}
],
"must_not": ,
"should":
}
},
"from": 0,
"size": 50,
"sort": ,
"facets": { }
}

--