Disabling Default Analyzer for Most Fields

dkullmann · May 15, 2012, 4:45pm

Hello,

I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:

Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with exact
matches or ranges

Location: Lat/Lang

Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City ('Los
Angeles', 'New York') - I can perform exact matches on these

So my question is this: how can I disable the default analyzer for most of
my fields, but use it for 2-3 fields that I need to be able to lookup using
query_string queries and have analyzed contents?
*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.

Thanks,
DK

Igor_Motov · May 15, 2012, 6:26pm

By default, the default analyzer is standardhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer.html.
So, all you have to do is replace default analyzer with your custom
analyzer as describe in the thread that you mentioned and and then set
"standard" analyzer for 2-3 fields that you want to exclude in mappings.

On Tuesday, May 15, 2012 12:45:39 PM UTC-4, David Kullmann wrote:

Hello,

I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:

Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with
exact matches or ranges

Location: Lat/Lang

Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City
('Los Angeles', 'New York') - I can perform exact matches on these

So my question is this: how can I disable the default analyzer for most
of my fields, but use it for 2-3 fields that I need to be able to lookup
using query_string queries and have analyzed contents?
*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.

Thanks,
DK

dkullmann · May 15, 2012, 7:38pm

Igor:

Perfect, thank you!

There are actually only about 10-15 out of 500 fields that I want to be
analyzed and searchable (but I want the remaining fields stored in the
document.) I am solving this problem correctly by using the "keywords"
analyzer or is there a more intelligent way?

-DK

On Tue, May 15, 2012 at 2:26 PM, Igor Motov imotov@gmail.com wrote:

By default, the default analyzer is standardhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer.html.
So, all you have to do is replace default analyzer with your custom
analyzer as describe in the thread that you mentioned and and then set
"standard" analyzer for 2-3 fields that you want to exclude in mappings.

On Tuesday, May 15, 2012 12:45:39 PM UTC-4, David Kullmann wrote:

Hello,

I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:

Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with
exact matches or ranges

Location: Lat/Lang

Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City
('Los Angeles', 'New York') - I can perform exact matches on these

So my question is this: how can I disable the default analyzer for most
of my fields, but use it for 2-3 fields that I need to be able to lookup
using query_string queries and have analyzed contents?
*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.

Thanks,
DK

Igor_Motov · May 15, 2012, 7:53pm

By setting "keyword" as default analyzer, you don't turn off indexing, you
just index every string value as a single token. It sounds like this is not
what you want. If you want a field to be stored but not indexed, you need
to define it in mapping as {..."store":"yes", "index":"no" }. You can do
for many fields using dynamic templateshttp://www.elasticsearch.org/guide/reference/mapping/root-object-type.html.
You should, probably, also disable _all field, since content of all fields
is indexed in _all by default.

On Tuesday, May 15, 2012 3:38:02 PM UTC-4, David Kullmann wrote:

Igor:

Perfect, thank you!

There are actually only about 10-15 out of 500 fields that I want to be
analyzed and searchable (but I want the remaining fields stored in the
document.) I am solving this problem correctly by using the "keywords"
analyzer or is there a more intelligent way?

-DK

On Tue, May 15, 2012 at 2:26 PM, Igor Motov imotov@gmail.com wrote:

By default, the default analyzer is standardhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer.html.
So, all you have to do is replace default analyzer with your custom
analyzer as describe in the thread that you mentioned and and then set
"standard" analyzer for 2-3 fields that you want to exclude in mappings.

On Tuesday, May 15, 2012 12:45:39 PM UTC-4, David Kullmann wrote:

Hello,

I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:

Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with
exact matches or ranges

Location: Lat/Lang

Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City
('Los Angeles', 'New York') - I can perform exact matches on these

So my question is this: how can I disable the default analyzer for
most of my fields, but use it for 2-3 fields that I need to be able to
lookup using query_string queries and have analyzed contents?
*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.

Thanks,
DK

dkullmann · May 15, 2012, 8:23pm

Igor:

Perfect, I read the "store" documentation but I misinterpreted it - thanks
for clearing that up and it's exactly what I'm looking for.

It's hard to tell from the documentation - can I add dynamic mapping in a
config file for a specific index + type combination?

DK

On Tue, May 15, 2012 at 3:53 PM, Igor Motov imotov@gmail.com wrote:

By setting "keyword" as default analyzer, you don't turn off indexing, you
just index every string value as a single token. It sounds like this is not
what you want. If you want a field to be stored but not indexed, you need
to define it in mapping as {..."store":"yes", "index":"no" }. You can do
for many fields using dynamic templateshttp://www.elasticsearch.org/guide/reference/mapping/root-object-type.html.
You should, probably, also disable _all field, since content of all fields
is indexed in _all by default.

On Tuesday, May 15, 2012 3:38:02 PM UTC-4, David Kullmann wrote:

Igor:

Perfect, thank you!

There are actually only about 10-15 out of 500 fields that I want to be
analyzed and searchable (but I want the remaining fields stored in the
document.) I am solving this problem correctly by using the "keywords"
analyzer or is there a more intelligent way?

-DK

On Tue, May 15, 2012 at 2:26 PM, Igor Motov imotov@gmail.com wrote:

By default, the default analyzer is standardhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer.html.
So, all you have to do is replace default analyzer with your custom
analyzer as describe in the thread that you mentioned and and then set
"standard" analyzer for 2-3 fields that you want to exclude in mappings.

On Tuesday, May 15, 2012 12:45:39 PM UTC-4, David Kullmann wrote:

Hello,

I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:

Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with
exact matches or ranges

Location: Lat/Lang

Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City
('Los Angeles', 'New York') - I can perform exact matches on these

So my question is this: how can I disable the default analyzer for
most of my fields, but use it for 2-3 fields that I need to be able to
lookup using query_string queries and have analyzed contents?
*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.

Thanks,
DK

Ivan · May 15, 2012, 11:31pm

Config mappings should be able to handle dynamic mappings. I never
tried since I switched to index templates when I started using dynamic
mappings.

If not, you can use dynamic mappings for a specific index + type
combination with index templates:

Just do not supply a wildcard for the template name.

--
Ivan

On Tue, May 15, 2012 at 1:23 PM, David Kullmann
kullmann.david@gmail.com wrote:

It's hard to tell from the documentation - can I add dynamic mapping in a
config file for a specific index + type combination?

dkullmann · June 19, 2012, 6:40pm

Ivan and Igor:

Great, thank you for your help! I have a follow up question:

My goal is to reduce the system resource requirements for my Elasticsearch
server. I have many documents with 500 fields each, however, I only need
about 15 feeds to be "searchable" but I need the _source to be retrieved
still. I was wondering if this is the correct way to think about how the
server manages memory (RAM):

If I store all but 15 fields and index the 15 fields I need searchable,
the size of the "index" will be smaller
In the scenario in #1 above, the full document will still be retrieved
as _source even though the fields are set to "store"
In the scenario in #1 above, I lose any performance when using ES as a
data storage system because _source will still be stored the same was
regardless of how the individual fields are stored

Is that correct? If so - please let me know, if not, please help me
understand what I'm missing. Thanks!

-DK

On Tuesday, May 15, 2012 7:31:32 PM UTC-4, Ivan Brusic wrote:

Config mappings should be able to handle dynamic mappings. I never
tried since I switched to index templates when I started using dynamic
mappings.

Elasticsearch Platform — Find real-time answers at scale | Elastic

If not, you can use dynamic mappings for a specific index + type
combination with index templates:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Just do not supply a wildcard for the template name.

--
Ivan

On Tue, May 15, 2012 at 1:23 PM, David Kullmann
kullmann.david@gmail.com wrote:

It's hard to tell from the documentation - can I add dynamic mapping in
a
config file for a specific index + type combination?

Igor_Motov · June 20, 2012, 1:23pm

Indexing 15 fields instead of 500 will definitely reduce index size and
memory footprint.
You can control how documents are retrieved by specifying the list of
fields on your search/get request.
If you will store fields as well as source, you will essentially store
fields twice. If you are storing source, you don't have to store individual
fields, es can pull them from source. However, it comes at obvious run time
cost - es will have to pull and parse entire source even when you want to
retrieve only one field. Depending on your traffic and use cases this can
be perfectly fine or completely unacceptable.

On Tuesday, June 19, 2012 2:40:17 PM UTC-4, David Kullmann wrote:

Ivan and Igor:

Great, thank you for your help! I have a follow up question:

My goal is to reduce the system resource requirements for my Elasticsearch
server. I have many documents with 500 fields each, however, I only need
about 15 feeds to be "searchable" but I need the _source to be retrieved
still. I was wondering if this is the correct way to think about how the
server manages memory (RAM):

If I store all but 15 fields and index the 15 fields I need searchable,
the size of the "index" will be smaller

In the scenario in #1 above, the full document will still be retrieved
as _source even though the fields are set to "store"

In the scenario in #1 above, I lose any performance when using ES as a
data storage system because _source will still be stored the same was
regardless of how the individual fields are stored

Is that correct? If so - please let me know, if not, please help me
understand what I'm missing. Thanks!

-DK

On Tuesday, May 15, 2012 7:31:32 PM UTC-4, Ivan Brusic wrote:

Config mappings should be able to handle dynamic mappings. I never
tried since I switched to index templates when I started using dynamic
mappings.

Elasticsearch Platform — Find real-time answers at scale | Elastic

If not, you can use dynamic mappings for a specific index + type
combination with index templates:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Just do not supply a wildcard for the template name.

--
Ivan

On Tue, May 15, 2012 at 1:23 PM, David Kullmann
kullmann.david@gmail.com wrote:

It's hard to tell from the documentation - can I add dynamic mapping in
a
config file for a specific index + type combination?

dkullmann · June 20, 2012, 1:25pm

So if I want to avoid double-storing fields that will never be retrieved individually (only as part of source) then I should do index: no and store: no?

David Kullmann

On Jun 20, 2012, at 9:23 AM, Igor Motov imotov@gmail.com wrote:

Indexing 15 fields instead of 500 will definitely reduce index size and memory footprint.

You can control how documents are retrieved by specifying the list of fields on your search/get request.

If you will store fields as well as source, you will essentially store fields twice. If you are storing source, you don't have to store individual fields, es can pull them from source. However, it comes at obvious run time cost - es will have to pull and parse entire source even when you want to retrieve only one field. Depending on your traffic and use cases this can be perfectly fine or completely unacceptable.

On Tuesday, June 19, 2012 2:40:17 PM UTC-4, David Kullmann wrote:
Ivan and Igor:

Great, thank you for your help! I have a follow up question:

My goal is to reduce the system resource requirements for my Elasticsearch server. I have many documents with 500 fields each, however, I only need about 15 feeds to be "searchable" but I need the _source to be retrieved still. I was wondering if this is the correct way to think about how the server manages memory (RAM):

If I store all but 15 fields and index the 15 fields I need searchable, the size of the "index" will be smaller

In the scenario in #1 above, the full document will still be retrieved as _source even though the fields are set to "store"

In the scenario in #1 above, I lose any performance when using ES as a data storage system because _source will still be stored the same was regardless of how the individual fields are stored

Is that correct? If so - please let me know, if not, please help me understand what I'm missing. Thanks!

-DK

On Tuesday, May 15, 2012 7:31:32 PM UTC-4, Ivan Brusic wrote:
Config mappings should be able to handle dynamic mappings. I never
tried since I switched to index templates when I started using dynamic
mappings.

Elasticsearch Platform — Find real-time answers at scale | Elastic

If not, you can use dynamic mappings for a specific index + type
combination with index templates:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Just do not supply a wildcard for the template name.

--
Ivan

On Tue, May 15, 2012 at 1:23 PM, David Kullmann
kullmann.david@gmail.com wrote:

It's hard to tell from the documentation - can I add dynamic mapping in a
config file for a specific index + type combination?

Clinton_Gormley · June 20, 2012, 1:33pm

On Wed, 2012-06-20 at 06:23 -0700, Igor Motov wrote:

Indexing 15 fields instead of 500 will definitely reduce index size
and memory footprint.

You can control how documents are retrieved by specifying the list
of fields on your search/get request.

If you will store fields as well as source, you will essentially
store fields twice. If you are storing source, you don't have to store
individual fields, es can pull them from source. However, it comes at
obvious run time cost - es will have to pull and parse entire source
even when you want to retrieve only one field. Depending on your
traffic and use cases this can be perfectly fine or completely
unacceptable.

But see this thread before you assume that storing fields is the better
answer:
https://groups.google.com/d/topic/elasticsearch/j8cfbv-j73g/discussion

clint

Igor_Motov · June 20, 2012, 1:51pm

Yes, but store: no is default, you don't have to specify it.

On Wednesday, June 20, 2012 9:25:16 AM UTC-4, David Kullmann wrote:

So if I want to avoid double-storing fields that will never be retrieved
individually (only as part of source) then I should do index: no and store:
no?

David Kullmann

On Jun 20, 2012, at 9:23 AM, Igor Motov imotov@gmail.com wrote:

Indexing 15 fields instead of 500 will definitely reduce index size and
memory footprint.

You can control how documents are retrieved by specifying the list of
fields on your search/get request.

If you will store fields as well as source, you will essentially store
fields twice. If you are storing source, you don't have to store individual
fields, es can pull them from source. However, it comes at obvious run time
cost - es will have to pull and parse entire source even when you want to
retrieve only one field. Depending on your traffic and use cases this can
be perfectly fine or completely unacceptable.

On Tuesday, June 19, 2012 2:40:17 PM UTC-4, David Kullmann wrote:

Ivan and Igor:

Great, thank you for your help! I have a follow up question:

My goal is to reduce the system resource requirements for my
Elasticsearch server. I have many documents with 500 fields each, however,
I only need about 15 feeds to be "searchable" but I need the _source to be
retrieved still. I was wondering if this is the correct way to think about
how the server manages memory (RAM):

If I store all but 15 fields and index the 15 fields I need
searchable, the size of the "index" will be smaller

In the scenario in #1 above, the full document will still be retrieved
as _source even though the fields are set to "store"

In the scenario in #1 above, I lose any performance when using ES as a
data storage system because _source will still be stored the same was
regardless of how the individual fields are stored

Is that correct? If so - please let me know, if not, please help me
understand what I'm missing. Thanks!

-DK

On Tuesday, May 15, 2012 7:31:32 PM UTC-4, Ivan Brusic wrote:

Config mappings should be able to handle dynamic mappings. I never
tried since I switched to index templates when I started using dynamic
mappings.

Elasticsearch Platform — Find real-time answers at scale | Elastic

If not, you can use dynamic mappings for a specific index + type
combination with index templates:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Just do not supply a wildcard for the template name.

--
Ivan

On Tue, May 15, 2012 at 1:23 PM, David Kullmann
kullmann.david@gmail.com wrote:

It's hard to tell from the documentation - can I add dynamic mapping
in a
config file for a specific index + type combination?

dkullmann · June 20, 2012, 1:53pm

Clint/Igor:

So most of the time people will be searching on the 15 indexed fields and the 15 indexed fields are what I need to display. Then if they ask for more details on one item I'll pull the entire source.

When they are searching the 15 fields it sounds like it would make sense to pull those fields from _source to display (as opposed to store them).

When they want details on an item I can pull all fields from source.

So in order to maximize memory and minimize disk seeks I should:

In my mapping index only the 15 fields people are searching on
Do not store any fields and pull them all from source

Does that sound right?

David Kullmann

On Jun 20, 2012, at 9:33 AM, Clinton Gormley clint@traveljury.com wrote:

On Wed, 2012-06-20 at 06:23 -0700, Igor Motov wrote:

Indexing 15 fields instead of 500 will definitely reduce index size
and memory footprint.

You can control how documents are retrieved by specifying the list
of fields on your search/get request.

If you will store fields as well as source, you will essentially
store fields twice. If you are storing source, you don't have to store
individual fields, es can pull them from source. However, it comes at
obvious run time cost - es will have to pull and parse entire source
even when you want to retrieve only one field. Depending on your
traffic and use cases this can be perfectly fine or completely
unacceptable.

But see this thread before you assume that storing fields is the better
answer:
https://groups.google.com/d/topic/elasticsearch/j8cfbv-j73g/discussion

clint

kimchy · June 21, 2012, 7:28am

Yea, also, if you don't need to search on _all, disable it.

On Wed, Jun 20, 2012 at 3:53 PM, David Kullmann kullmann.david@gmail.comwrote:

Clint/Igor:

So most of the time people will be searching on the 15 indexed fields and
the 15 indexed fields are what I need to display. Then if they ask for more
details on one item I'll pull the entire source.

When they are searching the 15 fields it sounds like it would make sense
to pull those fields from _source to display (as opposed to store them).

When they want details on an item I can pull all fields from source.

So in order to maximize memory and minimize disk seeks I should:

In my mapping index only the 15 fields people are searching on

Do not store any fields and pull them all from source

Does that sound right?

David Kullmann

On Jun 20, 2012, at 9:33 AM, Clinton Gormley clint@traveljury.com wrote:

On Wed, 2012-06-20 at 06:23 -0700, Igor Motov wrote:

Indexing 15 fields instead of 500 will definitely reduce index size
and memory footprint.

You can control how documents are retrieved by specifying the list
of fields on your search/get request.

If you will store fields as well as source, you will essentially
store fields twice. If you are storing source, you don't have to store
individual fields, es can pull them from source. However, it comes at
obvious run time cost - es will have to pull and parse entire source
even when you want to retrieve only one field. Depending on your
traffic and use cases this can be perfectly fine or completely
unacceptable.

But see this thread before you assume that storing fields is the better
answer:
https://groups.google.com/d/topic/elasticsearch/j8cfbv-j73g/discussion

clint

Topic		Replies	Views
Disabling default analyzer Elasticsearch	4	2883	July 6, 2017
Remove analyzer for a particular field Elasticsearch	3	1610	July 6, 2017
Can't change analyzer for _all field Elasticsearch	2	464	March 29, 2018
How to enable simple analyzer in elasticsearch Elasticsearch	3	435	July 5, 2017
Custom analyzer not applied on property in query Elasticsearch	6	458	July 6, 2017

Disabling Default Analyzer for Most Fields

Related topics