Disabling Default Analyzer for Most Fields


(dkullmann) #1

Hello,

I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:

Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with exact
matches or ranges

Location: Lat/Lang

Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City ('Los
Angeles', 'New York') - I can perform exact matches on these

So my question is this: how can I disable the default analyzer for most of
my fields, but use it for 2-3 fields that I need to be able to lookup using
query_string queries and have analyzed contents?

*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.

Thanks,
DK


(Igor Motov) #2

By default, the default analyzer is standardhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer.html.
So, all you have to do is replace default analyzer with your custom
analyzer as describe in the thread that you mentioned and and then set
"standard" analyzer for 2-3 fields that you want to exclude in mappings.

On Tuesday, May 15, 2012 12:45:39 PM UTC-4, David Kullmann wrote:

Hello,

I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:

Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with
exact matches or ranges

Location: Lat/Lang

Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City
('Los Angeles', 'New York') - I can perform exact matches on these

So my question is this: how can I disable the default analyzer for most
of my fields, but use it for 2-3 fields that I need to be able to lookup
using query_string queries and have analyzed contents?

*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.

Thanks,
DK


(dkullmann) #3

Igor:

Perfect, thank you!

There are actually only about 10-15 out of 500 fields that I want to be
analyzed and searchable (but I want the remaining fields stored in the
document.) I am solving this problem correctly by using the "keywords"
analyzer or is there a more intelligent way?

-DK

On Tue, May 15, 2012 at 2:26 PM, Igor Motov imotov@gmail.com wrote:

By default, the default analyzer is standardhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer.html.
So, all you have to do is replace default analyzer with your custom
analyzer as describe in the thread that you mentioned and and then set
"standard" analyzer for 2-3 fields that you want to exclude in mappings.

On Tuesday, May 15, 2012 12:45:39 PM UTC-4, David Kullmann wrote:

Hello,

I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:

Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with
exact matches or ranges

Location: Lat/Lang

Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City
('Los Angeles', 'New York') - I can perform exact matches on these

So my question is this: how can I disable the default analyzer for most
of my fields, but use it for 2-3 fields that I need to be able to lookup
using query_string queries and have analyzed contents?

*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.

Thanks,
DK


(Igor Motov) #4

By setting "keyword" as default analyzer, you don't turn off indexing, you
just index every string value as a single token. It sounds like this is not
what you want. If you want a field to be stored but not indexed, you need
to define it in mapping as {..."store":"yes", "index":"no" }. You can do
for many fields using dynamic templateshttp://www.elasticsearch.org/guide/reference/mapping/root-object-type.html.
You should, probably, also disable _all field, since content of all fields
is indexed in _all by default.

On Tuesday, May 15, 2012 3:38:02 PM UTC-4, David Kullmann wrote:

Igor:

Perfect, thank you!

There are actually only about 10-15 out of 500 fields that I want to be
analyzed and searchable (but I want the remaining fields stored in the
document.) I am solving this problem correctly by using the "keywords"
analyzer or is there a more intelligent way?

-DK

On Tue, May 15, 2012 at 2:26 PM, Igor Motov imotov@gmail.com wrote:

By default, the default analyzer is standardhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer.html.
So, all you have to do is replace default analyzer with your custom
analyzer as describe in the thread that you mentioned and and then set
"standard" analyzer for 2-3 fields that you want to exclude in mappings.

On Tuesday, May 15, 2012 12:45:39 PM UTC-4, David Kullmann wrote:

Hello,

I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:

Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with
exact matches or ranges

Location: Lat/Lang

Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City
('Los Angeles', 'New York') - I can perform exact matches on these

So my question is this: how can I disable the default analyzer for
most of my fields, but use it for 2-3 fields that I need to be able to
lookup using query_string queries and have analyzed contents?

*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.

Thanks,
DK


(dkullmann) #5

Igor:

Perfect, I read the "store" documentation but I misinterpreted it - thanks
for clearing that up and it's exactly what I'm looking for.

It's hard to tell from the documentation - can I add dynamic mapping in a
config file for a specific index + type combination?

DK

On Tue, May 15, 2012 at 3:53 PM, Igor Motov imotov@gmail.com wrote:

By setting "keyword" as default analyzer, you don't turn off indexing, you
just index every string value as a single token. It sounds like this is not
what you want. If you want a field to be stored but not indexed, you need
to define it in mapping as {..."store":"yes", "index":"no" }. You can do
for many fields using dynamic templateshttp://www.elasticsearch.org/guide/reference/mapping/root-object-type.html.
You should, probably, also disable _all field, since content of all fields
is indexed in _all by default.

On Tuesday, May 15, 2012 3:38:02 PM UTC-4, David Kullmann wrote:

Igor:

Perfect, thank you!

There are actually only about 10-15 out of 500 fields that I want to be
analyzed and searchable (but I want the remaining fields stored in the
document.) I am solving this problem correctly by using the "keywords"
analyzer or is there a more intelligent way?

-DK

On Tue, May 15, 2012 at 2:26 PM, Igor Motov imotov@gmail.com wrote:

By default, the default analyzer is standardhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer.html.
So, all you have to do is replace default analyzer with your custom
analyzer as describe in the thread that you mentioned and and then set
"standard" analyzer for 2-3 fields that you want to exclude in mappings.

On Tuesday, May 15, 2012 12:45:39 PM UTC-4, David Kullmann wrote:

Hello,

I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:

Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with
exact matches or ranges

Location: Lat/Lang

Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City
('Los Angeles', 'New York') - I can perform exact matches on these

So my question is this: how can I disable the default analyzer for
most of my fields, but use it for 2-3 fields that I need to be able to
lookup using query_string queries and have analyzed contents?

*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.

Thanks,
DK


(Ivan Brusic) #6

Config mappings should be able to handle dynamic mappings. I never
tried since I switched to index templates when I started using dynamic
mappings.

http://www.elasticsearch.org/guide/reference/mapping/conf-mappings.html

If not, you can use dynamic mappings for a specific index + type
combination with index templates:

http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html

Just do not supply a wildcard for the template name.

--
Ivan

On Tue, May 15, 2012 at 1:23 PM, David Kullmann
kullmann.david@gmail.com wrote:

It's hard to tell from the documentation - can I add dynamic mapping in a
config file for a specific index + type combination?


(dkullmann) #7

Ivan and Igor:

Great, thank you for your help! I have a follow up question:

My goal is to reduce the system resource requirements for my ElasticSearch
server. I have many documents with 500 fields each, however, I only need
about 15 feeds to be "searchable" but I need the _source to be retrieved
still. I was wondering if this is the correct way to think about how the
server manages memory (RAM):

  1. If I store all but 15 fields and index the 15 fields I need searchable,
    the size of the "index" will be smaller

  2. In the scenario in #1 above, the full document will still be retrieved
    as _source even though the fields are set to "store"

  3. In the scenario in #1 above, I lose any performance when using ES as a
    data storage system because _source will still be stored the same was
    regardless of how the individual fields are stored

Is that correct? If so - please let me know, if not, please help me
understand what I'm missing. Thanks!

-DK

On Tuesday, May 15, 2012 7:31:32 PM UTC-4, Ivan Brusic wrote:

Config mappings should be able to handle dynamic mappings. I never
tried since I switched to index templates when I started using dynamic
mappings.

http://www.elasticsearch.org/guide/reference/mapping/conf-mappings.html

If not, you can use dynamic mappings for a specific index + type
combination with index templates:

http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html

Just do not supply a wildcard for the template name.

--
Ivan

On Tue, May 15, 2012 at 1:23 PM, David Kullmann
kullmann.david@gmail.com wrote:

It's hard to tell from the documentation - can I add dynamic mapping in
a
config file for a specific index + type combination?


(Igor Motov) #8
  1. Indexing 15 fields instead of 500 will definitely reduce index size and
    memory footprint.
  2. You can control how documents are retrieved by specifying the list of
    fields on your search/get request.
  3. If you will store fields as well as source, you will essentially store
    fields twice. If you are storing source, you don't have to store individual
    fields, es can pull them from source. However, it comes at obvious run time
    cost - es will have to pull and parse entire source even when you want to
    retrieve only one field. Depending on your traffic and use cases this can
    be perfectly fine or completely unacceptable.

On Tuesday, June 19, 2012 2:40:17 PM UTC-4, David Kullmann wrote:

Ivan and Igor:

Great, thank you for your help! I have a follow up question:

My goal is to reduce the system resource requirements for my ElasticSearch
server. I have many documents with 500 fields each, however, I only need
about 15 feeds to be "searchable" but I need the _source to be retrieved
still. I was wondering if this is the correct way to think about how the
server manages memory (RAM):

  1. If I store all but 15 fields and index the 15 fields I need searchable,
    the size of the "index" will be smaller

  2. In the scenario in #1 above, the full document will still be retrieved
    as _source even though the fields are set to "store"

  3. In the scenario in #1 above, I lose any performance when using ES as a
    data storage system because _source will still be stored the same was
    regardless of how the individual fields are stored

Is that correct? If so - please let me know, if not, please help me
understand what I'm missing. Thanks!

-DK

On Tuesday, May 15, 2012 7:31:32 PM UTC-4, Ivan Brusic wrote:

Config mappings should be able to handle dynamic mappings. I never
tried since I switched to index templates when I started using dynamic
mappings.

http://www.elasticsearch.org/guide/reference/mapping/conf-mappings.html

If not, you can use dynamic mappings for a specific index + type
combination with index templates:

http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html

Just do not supply a wildcard for the template name.

--
Ivan

On Tue, May 15, 2012 at 1:23 PM, David Kullmann
kullmann.david@gmail.com wrote:

It's hard to tell from the documentation - can I add dynamic mapping in
a
config file for a specific index + type combination?


(dkullmann) #9

So if I want to avoid double-storing fields that will never be retrieved individually (only as part of source) then I should do index: no and store: no?

David Kullmann

On Jun 20, 2012, at 9:23 AM, Igor Motov imotov@gmail.com wrote:

  1. Indexing 15 fields instead of 500 will definitely reduce index size and memory footprint.
  2. You can control how documents are retrieved by specifying the list of fields on your search/get request.
  3. If you will store fields as well as source, you will essentially store fields twice. If you are storing source, you don't have to store individual fields, es can pull them from source. However, it comes at obvious run time cost - es will have to pull and parse entire source even when you want to retrieve only one field. Depending on your traffic and use cases this can be perfectly fine or completely unacceptable.

On Tuesday, June 19, 2012 2:40:17 PM UTC-4, David Kullmann wrote:
Ivan and Igor:

Great, thank you for your help! I have a follow up question:

My goal is to reduce the system resource requirements for my ElasticSearch server. I have many documents with 500 fields each, however, I only need about 15 feeds to be "searchable" but I need the _source to be retrieved still. I was wondering if this is the correct way to think about how the server manages memory (RAM):

  1. If I store all but 15 fields and index the 15 fields I need searchable, the size of the "index" will be smaller

  2. In the scenario in #1 above, the full document will still be retrieved as _source even though the fields are set to "store"

  3. In the scenario in #1 above, I lose any performance when using ES as a data storage system because _source will still be stored the same was regardless of how the individual fields are stored

Is that correct? If so - please let me know, if not, please help me understand what I'm missing. Thanks!

-DK

On Tuesday, May 15, 2012 7:31:32 PM UTC-4, Ivan Brusic wrote:
Config mappings should be able to handle dynamic mappings. I never
tried since I switched to index templates when I started using dynamic
mappings.

http://www.elasticsearch.org/guide/reference/mapping/conf-mappings.html

If not, you can use dynamic mappings for a specific index + type
combination with index templates:

http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html

Just do not supply a wildcard for the template name.

--
Ivan

On Tue, May 15, 2012 at 1:23 PM, David Kullmann
kullmann.david@gmail.com wrote:

It's hard to tell from the documentation - can I add dynamic mapping in a
config file for a specific index + type combination?


(Clinton Gormley) #10

On Wed, 2012-06-20 at 06:23 -0700, Igor Motov wrote:

  1. Indexing 15 fields instead of 500 will definitely reduce index size
    and memory footprint.
  2. You can control how documents are retrieved by specifying the list
    of fields on your search/get request.
  3. If you will store fields as well as source, you will essentially
    store fields twice. If you are storing source, you don't have to store
    individual fields, es can pull them from source. However, it comes at
    obvious run time cost - es will have to pull and parse entire source
    even when you want to retrieve only one field. Depending on your
    traffic and use cases this can be perfectly fine or completely
    unacceptable.

But see this thread before you assume that storing fields is the better
answer:
https://groups.google.com/d/topic/elasticsearch/j8cfbv-j73g/discussion

clint


(Igor Motov) #11

Yes, but store: no is default, you don't have to specify it.

On Wednesday, June 20, 2012 9:25:16 AM UTC-4, David Kullmann wrote:

So if I want to avoid double-storing fields that will never be retrieved
individually (only as part of source) then I should do index: no and store:
no?

David Kullmann

On Jun 20, 2012, at 9:23 AM, Igor Motov imotov@gmail.com wrote:

  1. Indexing 15 fields instead of 500 will definitely reduce index size and
    memory footprint.
  2. You can control how documents are retrieved by specifying the list of
    fields on your search/get request.
  3. If you will store fields as well as source, you will essentially store
    fields twice. If you are storing source, you don't have to store individual
    fields, es can pull them from source. However, it comes at obvious run time
    cost - es will have to pull and parse entire source even when you want to
    retrieve only one field. Depending on your traffic and use cases this can
    be perfectly fine or completely unacceptable.

On Tuesday, June 19, 2012 2:40:17 PM UTC-4, David Kullmann wrote:

Ivan and Igor:

Great, thank you for your help! I have a follow up question:

My goal is to reduce the system resource requirements for my
ElasticSearch server. I have many documents with 500 fields each, however,
I only need about 15 feeds to be "searchable" but I need the _source to be
retrieved still. I was wondering if this is the correct way to think about
how the server manages memory (RAM):

  1. If I store all but 15 fields and index the 15 fields I need
    searchable, the size of the "index" will be smaller

  2. In the scenario in #1 above, the full document will still be retrieved
    as _source even though the fields are set to "store"

  3. In the scenario in #1 above, I lose any performance when using ES as a
    data storage system because _source will still be stored the same was
    regardless of how the individual fields are stored

Is that correct? If so - please let me know, if not, please help me
understand what I'm missing. Thanks!

-DK

On Tuesday, May 15, 2012 7:31:32 PM UTC-4, Ivan Brusic wrote:

Config mappings should be able to handle dynamic mappings. I never
tried since I switched to index templates when I started using dynamic
mappings.

http://www.elasticsearch.org/guide/reference/mapping/conf-mappings.html

If not, you can use dynamic mappings for a specific index + type
combination with index templates:

http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html

Just do not supply a wildcard for the template name.

--
Ivan

On Tue, May 15, 2012 at 1:23 PM, David Kullmann
kullmann.david@gmail.com wrote:

It's hard to tell from the documentation - can I add dynamic mapping
in a
config file for a specific index + type combination?


(dkullmann) #12

Clint/Igor:

So most of the time people will be searching on the 15 indexed fields and the 15 indexed fields are what I need to display. Then if they ask for more details on one item I'll pull the entire source.

When they are searching the 15 fields it sounds like it would make sense to pull those fields from _source to display (as opposed to store them).

When they want details on an item I can pull all fields from source.

So in order to maximize memory and minimize disk seeks I should:

  1. In my mapping index only the 15 fields people are searching on

  2. Do not store any fields and pull them all from source

Does that sound right?

David Kullmann

On Jun 20, 2012, at 9:33 AM, Clinton Gormley clint@traveljury.com wrote:

On Wed, 2012-06-20 at 06:23 -0700, Igor Motov wrote:

  1. Indexing 15 fields instead of 500 will definitely reduce index size
    and memory footprint.
  2. You can control how documents are retrieved by specifying the list
    of fields on your search/get request.
  3. If you will store fields as well as source, you will essentially
    store fields twice. If you are storing source, you don't have to store
    individual fields, es can pull them from source. However, it comes at
    obvious run time cost - es will have to pull and parse entire source
    even when you want to retrieve only one field. Depending on your
    traffic and use cases this can be perfectly fine or completely
    unacceptable.

But see this thread before you assume that storing fields is the better
answer:
https://groups.google.com/d/topic/elasticsearch/j8cfbv-j73g/discussion

clint


(Shay Banon) #13

Yea, also, if you don't need to search on _all, disable it.

On Wed, Jun 20, 2012 at 3:53 PM, David Kullmann kullmann.david@gmail.comwrote:

Clint/Igor:

So most of the time people will be searching on the 15 indexed fields and
the 15 indexed fields are what I need to display. Then if they ask for more
details on one item I'll pull the entire source.

When they are searching the 15 fields it sounds like it would make sense
to pull those fields from _source to display (as opposed to store them).

When they want details on an item I can pull all fields from source.

So in order to maximize memory and minimize disk seeks I should:

  1. In my mapping index only the 15 fields people are searching on

  2. Do not store any fields and pull them all from source

Does that sound right?

David Kullmann

On Jun 20, 2012, at 9:33 AM, Clinton Gormley clint@traveljury.com wrote:

On Wed, 2012-06-20 at 06:23 -0700, Igor Motov wrote:

  1. Indexing 15 fields instead of 500 will definitely reduce index size
    and memory footprint.
  2. You can control how documents are retrieved by specifying the list
    of fields on your search/get request.
  3. If you will store fields as well as source, you will essentially
    store fields twice. If you are storing source, you don't have to store
    individual fields, es can pull them from source. However, it comes at
    obvious run time cost - es will have to pull and parse entire source
    even when you want to retrieve only one field. Depending on your
    traffic and use cases this can be perfectly fine or completely
    unacceptable.

But see this thread before you assume that storing fields is the better
answer:
https://groups.google.com/d/topic/elasticsearch/j8cfbv-j73g/discussion

clint


(system) #14