I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:
Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with exact
matches or ranges
Location: Lat/Lang
Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City ('Los
Angeles', 'New York') - I can perform exact matches on these
So my question is this: how can I disable the default analyzer for most of
my fields, but use it for 2-3 fields that I need to be able to lookup using
query_string queries and have analyzed contents?
*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.
On Tuesday, May 15, 2012 12:45:39 PM UTC-4, David Kullmann wrote:
Hello,
I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:
Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with
exact matches or ranges
Location: Lat/Lang
Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City
('Los Angeles', 'New York') - I can perform exact matches on these
So my question is this: how can I disable the default analyzer for most
of my fields, but use it for 2-3 fields that I need to be able to lookup
using query_string queries and have analyzed contents?
*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.
There are actually only about 10-15 out of 500 fields that I want to be
analyzed and searchable (but I want the remaining fields stored in the
document.) I am solving this problem correctly by using the "keywords"
analyzer or is there a more intelligent way?
-DK
On Tue, May 15, 2012 at 2:26 PM, Igor Motov imotov@gmail.com wrote:
On Tuesday, May 15, 2012 12:45:39 PM UTC-4, David Kullmann wrote:
Hello,
I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:
Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with
exact matches or ranges
Location: Lat/Lang
Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City
('Los Angeles', 'New York') - I can perform exact matches on these
So my question is this: how can I disable the default analyzer for most
of my fields, but use it for 2-3 fields that I need to be able to lookup
using query_string queries and have analyzed contents?
*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.
By setting "keyword" as default analyzer, you don't turn off indexing, you
just index every string value as a single token. It sounds like this is not
what you want. If you want a field to be stored but not indexed, you need
to define it in mapping as {..."store":"yes", "index":"no" }. You can do
for many fields using dynamic templateshttp://www.elasticsearch.org/guide/reference/mapping/root-object-type.html.
You should, probably, also disable _all field, since content of all fields
is indexed in _all by default.
On Tuesday, May 15, 2012 3:38:02 PM UTC-4, David Kullmann wrote:
Igor:
Perfect, thank you!
There are actually only about 10-15 out of 500 fields that I want to be
analyzed and searchable (but I want the remaining fields stored in the
document.) I am solving this problem correctly by using the "keywords"
analyzer or is there a more intelligent way?
-DK
On Tue, May 15, 2012 at 2:26 PM, Igor Motov imotov@gmail.com wrote:
On Tuesday, May 15, 2012 12:45:39 PM UTC-4, David Kullmann wrote:
Hello,
I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:
Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with
exact matches or ranges
Location: Lat/Lang
Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City
('Los Angeles', 'New York') - I can perform exact matches on these
So my question is this: how can I disable the default analyzer for
most of my fields, but use it for 2-3 fields that I need to be able to
lookup using query_string queries and have analyzed contents?
*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.
Perfect, I read the "store" documentation but I misinterpreted it - thanks
for clearing that up and it's exactly what I'm looking for.
It's hard to tell from the documentation - can I add dynamic mapping in a
config file for a specific index + type combination?
DK
On Tue, May 15, 2012 at 3:53 PM, Igor Motov imotov@gmail.com wrote:
By setting "keyword" as default analyzer, you don't turn off indexing, you
just index every string value as a single token. It sounds like this is not
what you want. If you want a field to be stored but not indexed, you need
to define it in mapping as {..."store":"yes", "index":"no" }. You can do
for many fields using dynamic templateshttp://www.elasticsearch.org/guide/reference/mapping/root-object-type.html.
You should, probably, also disable _all field, since content of all fields
is indexed in _all by default.
On Tuesday, May 15, 2012 3:38:02 PM UTC-4, David Kullmann wrote:
Igor:
Perfect, thank you!
There are actually only about 10-15 out of 500 fields that I want to be
analyzed and searchable (but I want the remaining fields stored in the
document.) I am solving this problem correctly by using the "keywords"
analyzer or is there a more intelligent way?
-DK
On Tue, May 15, 2012 at 2:26 PM, Igor Motov imotov@gmail.com wrote:
On Tuesday, May 15, 2012 12:45:39 PM UTC-4, David Kullmann wrote:
Hello,
I want to disable the default analyzer for most of the fields in my
document. The document represents a piece of real estate and most of the
fields are integers or are keywords like you would find in a select
drop-down:
Integers: Bedrooms, Bathrooms, Rooms, etc - need to be looked up with
exact matches or ranges
Location: Lat/Lang
Keywords: Property Type ('Single Family', 'Condo', 'Townhouse'), City
('Los Angeles', 'New York') - I can perform exact matches on these
So my question is this: how can I disable the default analyzer for
most of my fields, but use it for 2-3 fields that I need to be able to
lookup using query_string queries and have analyzed contents?
*
*
I found this threadhttp://elasticsearch-users.115913.n3.nabble.com/Disabling-default-analyzer-td2932819.html explaining
how to disable it, and I know I can add an analyzer for a specific field,
but I don't know what the default analyzer is or how to add it back.
Great, thank you for your help! I have a follow up question:
My goal is to reduce the system resource requirements for my Elasticsearch
server. I have many documents with 500 fields each, however, I only need
about 15 feeds to be "searchable" but I need the _source to be retrieved
still. I was wondering if this is the correct way to think about how the
server manages memory (RAM):
If I store all but 15 fields and index the 15 fields I need searchable,
the size of the "index" will be smaller
In the scenario in #1 above, the full document will still be retrieved
as _source even though the fields are set to "store"
In the scenario in #1 above, I lose any performance when using ES as a
data storage system because _source will still be stored the same was
regardless of how the individual fields are stored
Is that correct? If so - please let me know, if not, please help me
understand what I'm missing. Thanks!
-DK
On Tuesday, May 15, 2012 7:31:32 PM UTC-4, Ivan Brusic wrote:
Config mappings should be able to handle dynamic mappings. I never
tried since I switched to index templates when I started using dynamic
mappings.
Indexing 15 fields instead of 500 will definitely reduce index size and
memory footprint.
You can control how documents are retrieved by specifying the list of
fields on your search/get request.
If you will store fields as well as source, you will essentially store
fields twice. If you are storing source, you don't have to store individual
fields, es can pull them from source. However, it comes at obvious run time
cost - es will have to pull and parse entire source even when you want to
retrieve only one field. Depending on your traffic and use cases this can
be perfectly fine or completely unacceptable.
On Tuesday, June 19, 2012 2:40:17 PM UTC-4, David Kullmann wrote:
Ivan and Igor:
Great, thank you for your help! I have a follow up question:
My goal is to reduce the system resource requirements for my Elasticsearch
server. I have many documents with 500 fields each, however, I only need
about 15 feeds to be "searchable" but I need the _source to be retrieved
still. I was wondering if this is the correct way to think about how the
server manages memory (RAM):
If I store all but 15 fields and index the 15 fields I need searchable,
the size of the "index" will be smaller
In the scenario in #1 above, the full document will still be retrieved
as _source even though the fields are set to "store"
In the scenario in #1 above, I lose any performance when using ES as a
data storage system because _source will still be stored the same was
regardless of how the individual fields are stored
Is that correct? If so - please let me know, if not, please help me
understand what I'm missing. Thanks!
-DK
On Tuesday, May 15, 2012 7:31:32 PM UTC-4, Ivan Brusic wrote:
Config mappings should be able to handle dynamic mappings. I never
tried since I switched to index templates when I started using dynamic
mappings.
So if I want to avoid double-storing fields that will never be retrieved individually (only as part of source) then I should do index: no and store: no?
Indexing 15 fields instead of 500 will definitely reduce index size and memory footprint.
You can control how documents are retrieved by specifying the list of fields on your search/get request.
If you will store fields as well as source, you will essentially store fields twice. If you are storing source, you don't have to store individual fields, es can pull them from source. However, it comes at obvious run time cost - es will have to pull and parse entire source even when you want to retrieve only one field. Depending on your traffic and use cases this can be perfectly fine or completely unacceptable.
On Tuesday, June 19, 2012 2:40:17 PM UTC-4, David Kullmann wrote:
Ivan and Igor:
Great, thank you for your help! I have a follow up question:
My goal is to reduce the system resource requirements for my Elasticsearch server. I have many documents with 500 fields each, however, I only need about 15 feeds to be "searchable" but I need the _source to be retrieved still. I was wondering if this is the correct way to think about how the server manages memory (RAM):
If I store all but 15 fields and index the 15 fields I need searchable, the size of the "index" will be smaller
In the scenario in #1 above, the full document will still be retrieved as _source even though the fields are set to "store"
In the scenario in #1 above, I lose any performance when using ES as a data storage system because _source will still be stored the same was regardless of how the individual fields are stored
Is that correct? If so - please let me know, if not, please help me understand what I'm missing. Thanks!
-DK
On Tuesday, May 15, 2012 7:31:32 PM UTC-4, Ivan Brusic wrote:
Config mappings should be able to handle dynamic mappings. I never
tried since I switched to index templates when I started using dynamic
mappings.
On Wed, 2012-06-20 at 06:23 -0700, Igor Motov wrote:
Indexing 15 fields instead of 500 will definitely reduce index size
and memory footprint.
You can control how documents are retrieved by specifying the list
of fields on your search/get request.
If you will store fields as well as source, you will essentially
store fields twice. If you are storing source, you don't have to store
individual fields, es can pull them from source. However, it comes at
obvious run time cost - es will have to pull and parse entire source
even when you want to retrieve only one field. Depending on your
traffic and use cases this can be perfectly fine or completely
unacceptable.
Yes, but store: no is default, you don't have to specify it.
On Wednesday, June 20, 2012 9:25:16 AM UTC-4, David Kullmann wrote:
So if I want to avoid double-storing fields that will never be retrieved
individually (only as part of source) then I should do index: no and store:
no?
Indexing 15 fields instead of 500 will definitely reduce index size and
memory footprint.
You can control how documents are retrieved by specifying the list of
fields on your search/get request.
If you will store fields as well as source, you will essentially store
fields twice. If you are storing source, you don't have to store individual
fields, es can pull them from source. However, it comes at obvious run time
cost - es will have to pull and parse entire source even when you want to
retrieve only one field. Depending on your traffic and use cases this can
be perfectly fine or completely unacceptable.
On Tuesday, June 19, 2012 2:40:17 PM UTC-4, David Kullmann wrote:
Ivan and Igor:
Great, thank you for your help! I have a follow up question:
My goal is to reduce the system resource requirements for my
Elasticsearch server. I have many documents with 500 fields each, however,
I only need about 15 feeds to be "searchable" but I need the _source to be
retrieved still. I was wondering if this is the correct way to think about
how the server manages memory (RAM):
If I store all but 15 fields and index the 15 fields I need
searchable, the size of the "index" will be smaller
In the scenario in #1 above, the full document will still be retrieved
as _source even though the fields are set to "store"
In the scenario in #1 above, I lose any performance when using ES as a
data storage system because _source will still be stored the same was
regardless of how the individual fields are stored
Is that correct? If so - please let me know, if not, please help me
understand what I'm missing. Thanks!
-DK
On Tuesday, May 15, 2012 7:31:32 PM UTC-4, Ivan Brusic wrote:
Config mappings should be able to handle dynamic mappings. I never
tried since I switched to index templates when I started using dynamic
mappings.
So most of the time people will be searching on the 15 indexed fields and the 15 indexed fields are what I need to display. Then if they ask for more details on one item I'll pull the entire source.
When they are searching the 15 fields it sounds like it would make sense to pull those fields from _source to display (as opposed to store them).
When they want details on an item I can pull all fields from source.
So in order to maximize memory and minimize disk seeks I should:
In my mapping index only the 15 fields people are searching on
Do not store any fields and pull them all from source
On Wed, 2012-06-20 at 06:23 -0700, Igor Motov wrote:
Indexing 15 fields instead of 500 will definitely reduce index size
and memory footprint.
You can control how documents are retrieved by specifying the list
of fields on your search/get request.
If you will store fields as well as source, you will essentially
store fields twice. If you are storing source, you don't have to store
individual fields, es can pull them from source. However, it comes at
obvious run time cost - es will have to pull and parse entire source
even when you want to retrieve only one field. Depending on your
traffic and use cases this can be perfectly fine or completely
unacceptable.
So most of the time people will be searching on the 15 indexed fields and
the 15 indexed fields are what I need to display. Then if they ask for more
details on one item I'll pull the entire source.
When they are searching the 15 fields it sounds like it would make sense
to pull those fields from _source to display (as opposed to store them).
When they want details on an item I can pull all fields from source.
So in order to maximize memory and minimize disk seeks I should:
In my mapping index only the 15 fields people are searching on
Do not store any fields and pull them all from source
On Wed, 2012-06-20 at 06:23 -0700, Igor Motov wrote:
Indexing 15 fields instead of 500 will definitely reduce index size
and memory footprint.
You can control how documents are retrieved by specifying the list
of fields on your search/get request.
If you will store fields as well as source, you will essentially
store fields twice. If you are storing source, you don't have to store
individual fields, es can pull them from source. However, it comes at
obvious run time cost - es will have to pull and parse entire source
even when you want to retrieve only one field. Depending on your
traffic and use cases this can be perfectly fine or completely
unacceptable.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.