Poor performance of "full text" searches

Hey guys,

we are experience poor performance if we do some "full text" ( searches
without specifying an field name).

If we search in an 200 GB index which lays on SSDs for something like '
program:apache ' the search takes about 10-15 seconds, if we search for '
apache ' the whole search goes for 1,5-2 minutes.

What can cause this? Our mapping? the high heap of the SSD ES node ( 80gb +
~90 GB available Lucene system memory)? Is it normal?

If you need any information, please let me know.

thanks for any respond

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a3a41710-93b4-4e81-93a7-6cde43e5c387%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Some comments:

Searching for program:apache actually search in field program.

Searching with wildcards is something you really should not do! Actually when you do full text search on Google, I guess you don’t use wildcards, right? It’s not really user friendly.
So wildcards are extremely slow and even slower when you start with *! Look here for reference: Elasticsearch Platform — Find real-time answers at scale | Elastic http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html#query-dsl-wildcard-query
Searching in _all field is also something you should not do but it depends on your use case. I’d prefer using copy_to feature in mapping instead of using _all field.

Now, I think it depends on your infra. How many nodes, RAM, … you have.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 28 nov. 2014 à 10:44, horst knete baduncle23@hotmail.de a écrit :

Hey guys,

we are experience poor performance if we do some "full text" ( searches without specifying an field name).

If we search in an 200 GB index which lays on SSDs for something like ' program:apache ' the search takes about 10-15 seconds, if we search for ' apache ' the whole search goes for 1,5-2 minutes.

What can cause this? Our mapping? the high heap of the SSD ES node ( 80gb + ~90 GB available Lucene system memory)? Is it normal?

If you need any information, please let me know.

thanks for any respond

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a3a41710-93b4-4e81-93a7-6cde43e5c387%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/a3a41710-93b4-4e81-93a7-6cde43e5c387%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/A82C3E01-6F67-4E32-86F2-9E9432818780%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Thx for response,

Actually the need to search with wildcards is given because our mapping. In
our events are indexed a whole lot of urls which are indexed to many terms
by the default analyzer, this will lead to an pretty akward output if you
are doing the "top 10" search in Kibana in the table panel. Thx to an post
of an other guy in here in the forum we adjusted the mapping so everything
get indexed into 1 token per field, allowing urls to be displayed in
correct way but having the disadvantage of having to need to use wildcards
because " apache " wont return any events.

We are actually using 1 big server with 256 RAM + 64 CPU, divided into 3
elasticsearch instances.

Given to your post it seems to make sense to adjusted the mapping back to
the way that everything get tokenized back to the default way, eliminating
the need of using wildcards and use not_analyzed fields for urls.

Maybe some kind of stupid question: if you dont specify any field in kibana
search, it will be redirected by default to the _all field right? is there
a way of changing that behaviour so that " apache " get not directed to
_all but rather to "message"-field? that would eliminate the need of _all
field which would reduce disk space and cpu/ram load for indexing

Am Freitag, 28. November 2014 10:56:09 UTC+1 schrieb David Pilato:

Some comments:

Searching for program:apache actually search in field program.

Searching with wildcards is something you really should not do! Actually
when you do full text search on Google, I guess you don’t use wildcards,
right? It’s not really user friendly.
So wildcards are extremely slow and even slower when you start with *!
Look here for reference:
Elasticsearch Platform — Find real-time answers at scale | Elastic
Searching in _all field is also something you should not do but it depends
on your use case. I’d prefer using copy_to feature in mapping instead of
using _all field.

Now, I think it depends on your infra. How many nodes, RAM, … you have.

--
David Pilato | Technical Advocate | Elasticsearch.com
http://Elasticsearch.com

@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
https://twitter.com/scrutmydocs

Le 28 nov. 2014 à 10:44, horst knete <badun...@hotmail.de <javascript:>>
a écrit :

Hey guys,

we are experience poor performance if we do some "full text" ( searches
without specifying an field name).

If we search in an 200 GB index which lays on SSDs for something like '
program:apache ' the search takes about 10-15 seconds, if we search for '
apache ' the whole search goes for 1,5-2 minutes.

What can cause this? Our mapping? the high heap of the SSD ES node ( 80gb

  • ~90 GB available Lucene system memory)? Is it normal?

If you need any information, please let me know.

thanks for any respond

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a3a41710-93b4-4e81-93a7-6cde43e5c387%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a3a41710-93b4-4e81-93a7-6cde43e5c387%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3bf1946b-86bd-4dbf-b7fd-b07a7f27460a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Indeed. If you want to run terms agg on a analyzed field, it might have no sense.
In that case you should use multi field to analyze you field for search and don’t analyze it for aggs.

See Elasticsearch Platform — Find real-time answers at scale | Elastic http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#_multi_fields_3

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 28 nov. 2014 à 11:50, horst knete baduncle23@hotmail.de a écrit :

Thx for response,

Actually the need to search with wildcards is given because our mapping. In our events are indexed a whole lot of urls which are indexed to many terms by the default analyzer, this will lead to an pretty akward output if you are doing the "top 10" search in Kibana in the table panel. Thx to an post of an other guy in here in the forum we adjusted the mapping so everything get indexed into 1 token per field, allowing urls to be displayed in correct way but having the disadvantage of having to need to use wildcards because " apache " wont return any events.

We are actually using 1 big server with 256 RAM + 64 CPU, divided into 3 elasticsearch instances.

Given to your post it seems to make sense to adjusted the mapping back to the way that everything get tokenized back to the default way, eliminating the need of using wildcards and use not_analyzed fields for urls.

Maybe some kind of stupid question: if you dont specify any field in kibana search, it will be redirected by default to the _all field right? is there a way of changing that behaviour so that " apache " get not directed to _all but rather to "message"-field? that would eliminate the need of _all field which would reduce disk space and cpu/ram load for indexing

Am Freitag, 28. November 2014 10:56:09 UTC+1 schrieb David Pilato:
Some comments:

Searching for program:apache actually search in field program.

Searching with wildcards is something you really should not do! Actually when you do full text search on Google, I guess you don’t use wildcards, right? It’s not really user friendly.
So wildcards are extremely slow and even slower when you start with *! Look here for reference: Elasticsearch Platform — Find real-time answers at scale | Elastic http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html#query-dsl-wildcard-query
Searching in _all field is also something you should not do but it depends on your use case. I’d prefer using copy_to feature in mapping instead of using _all field.

Now, I think it depends on your infra. How many nodes, RAM, … you have.

--
David Pilato | Technical Advocate | Elasticsearch.com http://elasticsearch.com/
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 28 nov. 2014 à 10:44, horst knete <badun...@ <>hotmail.de http://hotmail.de/> a écrit :

Hey guys,

we are experience poor performance if we do some "full text" ( searches without specifying an field name).

If we search in an 200 GB index which lays on SSDs for something like ' program:apache ' the search takes about 10-15 seconds, if we search for ' apache ' the whole search goes for 1,5-2 minutes.

What can cause this? Our mapping? the high heap of the SSD ES node ( 80gb + ~90 GB available Lucene system memory)? Is it normal?

If you need any information, please let me know.

thanks for any respond

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@ <>googlegroups.com http://googlegroups.com/.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a3a41710-93b4-4e81-93a7-6cde43e5c387%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/a3a41710-93b4-4e81-93a7-6cde43e5c387%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3bf1946b-86bd-4dbf-b7fd-b07a7f27460a%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/3bf1946b-86bd-4dbf-b7fd-b07a7f27460a%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/DEE3804A-9449-46BE-A0D1-7A0610A23ED7%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

One note, Elasticsearch is not designed to run on one big host, but on many
small servers - it scales out, not up.

If you run 3 nodes on 1 server, this will slow down the overall system
performance.

Avoid left truncation in the form word. Always use right truncation word.

Jörg

On Fri, Nov 28, 2014 at 11:50 AM, horst knete baduncle23@hotmail.de wrote:

We are actually using 1 big server with 256 RAM + 64 CPU, divided into 3
elasticsearch instances.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHwEhCYOr5NfxtmYRW0wzE0j07KuY30dtuOFVi1WXDWgg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.