Trying to figure out the default analyzer for _all


(Jeffrey 'jf' Lim) #1

Hi folks, I've been trying to figure out the default analyzer for
'_all'. At first, I was simply thinking that it would be the standard
analyzer. But as my testing shows, it's not the case at all (stop
words are kept?!)? After some testing, it would appear to be using the
standard tokenizer, with a lowercase filter.

I would have tried to find out more myself... but it's not as if you
could query the mapping and try to see the result for yourself (it
doesnt show for '_all'). Does anybody have any more info? Is it using
any provided analyzer?

thanks,
-jf

--
He who settles on the idea of the intelligent man as a static entity
only shows himself to be a fool.

Mensan / Full-Stack Technical Polymath / System Administrator
12 years over the entire web stack: Performance, Sysadmin, Ruby and Frontend

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAE4WMGhFq%3D0e8ABxu1oVJFELAL1tpz%3Di9wWX9qZsUU%2Ba1FrHUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #2

It depends on your elasticsearch version I guess as in 1.0, standard analyzer does not remove stop words anymore.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 13 mars 2014 à 08:06:55, Jeffrey 'jf' Lim (jfs.world@gmail.com) a écrit:

Hi folks, I've been trying to figure out the default analyzer for
'_all'. At first, I was simply thinking that it would be the standard
analyzer. But as my testing shows, it's not the case at all (stop
words are kept?!)? After some testing, it would appear to be using the
standard tokenizer, with a lowercase filter.

I would have tried to find out more myself... but it's not as if you
could query the mapping and try to see the result for yourself (it
doesnt show for '_all'). Does anybody have any more info? Is it using
any provided analyzer?

thanks,
-jf

--
He who settles on the idea of the intelligent man as a static entity
only shows himself to be a fool.

Mensan / Full-Stack Technical Polymath / System Administrator
12 years over the entire web stack: Performance, Sysadmin, Ruby and Frontend

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAE4WMGhFq%3D0e8ABxu1oVJFELAL1tpz%3Di9wWX9qZsUU%2Ba1FrHUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53215e7c.74b0dc51.158d%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(Jeffrey 'jf' Lim) #3

I see! I guess the docs had me confused (or more like, I read the
description, but didnt go on to read into the details of the settings,
argh).

So the standard analyzer is indeed used for '_all'?

thanks,
-jf

On Thu, Mar 13, 2014 at 3:30 PM, David Pilato david@pilato.fr wrote:

It depends on your elasticsearch version I guess as in 1.0, standard
analyzer does not remove stop words anymore.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 13 mars 2014 à 08:06:55, Jeffrey 'jf' Lim (jfs.world@gmail.com) a écrit:

Hi folks, I've been trying to figure out the default analyzer for
'_all'. At first, I was simply thinking that it would be the standard
analyzer. But as my testing shows, it's not the case at all (stop
words are kept?!)? After some testing, it would appear to be using the
standard tokenizer, with a lowercase filter.

I would have tried to find out more myself... but it's not as if you
could query the mapping and try to see the result for yourself (it
doesnt show for '_all'). Does anybody have any more info? Is it using
any provided analyzer?

thanks,
-jf

--
He who settles on the idea of the intelligent man as a static entity
only shows himself to be a fool.

Mensan / Full-Stack Technical Polymath / System Administrator
12 years over the entire web stack: Performance, Sysadmin, Ruby and Frontend

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAE4WMGhFq%3D0e8ABxu1oVJFELAL1tpz%3Di9wWX9qZsUU%2Ba1FrHUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53215e7c.74b0dc51.158d%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAE4WMGhVmC32xgs_FcQV8h_doGgtfe0rVPxNy0UNkmyskrvrKA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Umutcan Onal) #4

Hi,

I have a question about memory usage.

My cluster has 1 master and data node and 3 data nodes. Each have 6 GB
heap size (which is nearly half of the machine). I have 250 shards with
replicas and I have 600 GB data in total. When I start the cluster, I
can use it for 1 week without any problem. After a week, my cluster
begins to fail due to low memory (below 10%). When I restart all the
nodes, everything is fine, again. Free memory goes up to 40%. And, it
fails again 1 week after the restart.

I think some data is remaining in the memory for a long time even if it
is not used. Is there any configuration to optimize this? Do I need to
flush indices or clear cache periodically?

Thanks,
Umutcan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5322B754.80104%40gamegos.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5