Questions about multi_field, configurations, routing control, filtered alias


(Ivan Ji) #1

Hi, all

Recently, I am studying the ElasticSearch. I have several questions about
it. Hope someone can answer me.

(1) About the multi_field, can it store two type of fields ? such as..

"tweet" : {
"properties" : {
"name" : {
"type" : "multi_field",
"fields" : {
"name" : { "type" : "string", "index" : "not_analyzed" },
"value" : { "type" : "int"}

(2) if it can, what's the query format when post a new document? Could I
explicit specify the value of these two fields? Or there are some type cast
operations inside it?

(3) Does there any default configuration file exist that configure the
default schema mappings of the index and type? Does it only support REST
API to create index/configure the mappings?

(4)After I configured the number of shards/replicas and post many documents
into it, can I re-configure it again? And how ? if so, what happened when
the shard number increase? Do it cost a lots of performance?

(5)About the routing, can I control the documents that must be sent to
different shards? I know I can use the same routing value to index/search
in the same shard. But could I control some documents which must be located
in different shards of the other documents?

(6) Assume I have only one node and one index, what's the difference
between the size of shard is only one and ten of the same index? Does it
cost extra memory if the shards size is ten? What's the suggested rule to
decide this size?

(7) What's the difference between setting the search_type to scroll and
using the parameters(from/size)?

(8) About the alias filtering, what's the cost about creating a alias
filter? Are there any cache algorithms to accelerate these operations using
the alias filter? Or it just append the extra filter condition of the
filtered alias in the query?

Sorry for the newbie questions, could you give me some opinion about these
questions?

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a46431bd-cef8-4714-9f08-0445f376b2a1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(binh.ly) #2

Ivan,

  1. The multi_field type allows you to define different ways that a single
    field value
    will be indexed. Your example below will work and will index a
    single value as string/not_analyzed, and then as an int (use "integer" for
    int)

  2. The document coming in will contain a field named "name" with a single
    value. When it goes into the index, it will be indexed 2 different ways.

  3. A mapping is not required to index data. There is an implied default
    mapping that will parse your JSON content and dynamically update the schema
    if you don't specify one up-front.

  4. You cannot change the shard count after the index is created. You can
    change the replica count anytime. The PUT mapping API allows you to change
    the replica count.

  5. You can specify a single routing value for all documents that you want
    to go to a specific shard/location.

  6. The number of shards will allow you to scale your content later. So if
    your data volume increases, you can add more nodes later and distribute the
    shards around. If you only have a single shard and you run out of space,
    then you cannot scale out unless you increase storage, or increase the
    shard count.

  7. Scroll is used to do a snapshot type of search - i.e., results you get
    back will not be affected by updates to the index after you start
    scrolling. From/size are useful if you want to do paging of search results
    (or infinite scrolling but paged at a time).

  8. Filters execute fast and yes can be cached.

On Monday, January 20, 2014 6:21:43 AM UTC-5, Ivan Ji wrote:

Hi, all

Recently, I am studying the ElasticSearch. I have several questions about
it. Hope someone can answer me.

(1) About the multi_field, can it store two type of fields ? such as..

"tweet" : {
"properties" : {
"name" : {
"type" : "multi_field",
"fields" : {
"name" : { "type" : "string", "index" : "not_analyzed" },
"value" : { "type" : "int"}

(2) if it can, what's the query format when post a new document? Could I
explicit specify the value of these two fields? Or there are some type cast
operations inside it?

(3) Does there any default configuration file exist that configure the
default schema mappings of the index and type? Does it only support REST
API to create index/configure the mappings?

(4)After I configured the number of shards/replicas and post many
documents into it, can I re-configure it again? And how ? if so, what
happened when the shard number increase? Do it cost a lots of performance?

(5)About the routing, can I control the documents that must be sent to
different shards? I know I can use the same routing value to index/search
in the same shard. But could I control some documents which must be located
in different shards of the other documents?

(6) Assume I have only one node and one index, what's the difference
between the size of shard is only one and ten of the same index? Does it
cost extra memory if the shards size is ten? What's the suggested rule to
decide this size?

(7) What's the difference between setting the search_type to scroll and
using the parameters(from/size)?

(8) About the alias filtering, what's the cost about creating a alias
filter? Are there any cache algorithms to accelerate these operations using
the alias filter? Or it just append the extra filter condition of the
filtered alias in the query?

Sorry for the newbie questions, could you give me some opinion about these
questions?

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/faf05ddc-566a-4cc8-9488-7a506c154409%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #3

Ivan,

  1. The multi_field type allows you to define different ways that a single
    field value
    will be indexed. Your example below will work and will index a
    single value as string/not_analyzed, and then as an int (use "integer" for
    int)

  2. The document coming in will contain a field named "name" with a single
    value. When it goes into the index, it will be indexed 2 different ways.

  3. A mapping is not required to index data. There is an implied default
    mapping that will parse your JSON content and dynamically update the schema
    if you don't specify one up-front.

  4. You cannot change the shard count after the index is created. You can
    change the replica count anytime. The PUT mapping API allows you to change
    the replica count.

  5. You can specify a single routing value for all documents that you want
    to go to a specific shard/location.

  6. The number of shards will allow you to scale your content later. So if
    your data volume increases, you can add more nodes later and distribute the
    shards around. If you only have a single shard and you run out of space,
    then you cannot scale out unless you increase storage, or increase the
    shard count.

  7. Scroll is used to do a snapshot type of search - i.e., results you get
    back will not be affected by updates to the index after you start
    scrolling. From/size are useful if you want to do paging of search results
    (or infinite scrolling but paged at a time).

  8. Filters execute fast and yes can be cached.

On Monday, January 20, 2014 6:21:43 AM UTC-5, Ivan Ji wrote:

Hi, all

Recently, I am studying the ElasticSearch. I have several questions about
it. Hope someone can answer me.

(1) About the multi_field, can it store two type of fields ? such as..

"tweet" : {
"properties" : {
"name" : {
"type" : "multi_field",
"fields" : {
"name" : { "type" : "string", "index" : "not_analyzed" },
"value" : { "type" : "int"}

(2) if it can, what's the query format when post a new document? Could I
explicit specify the value of these two fields? Or there are some type cast
operations inside it?

(3) Does there any default configuration file exist that configure the
default schema mappings of the index and type? Does it only support REST
API to create index/configure the mappings?

(4)After I configured the number of shards/replicas and post many
documents into it, can I re-configure it again? And how ? if so, what
happened when the shard number increase? Do it cost a lots of performance?

(5)About the routing, can I control the documents that must be sent to
different shards? I know I can use the same routing value to index/search
in the same shard. But could I control some documents which must be located
in different shards of the other documents?

(6) Assume I have only one node and one index, what's the difference
between the size of shard is only one and ten of the same index? Does it
cost extra memory if the shards size is ten? What's the suggested rule to
decide this size?

(7) What's the difference between setting the search_type to scroll and
using the parameters(from/size)?

(8) About the alias filtering, what's the cost about creating a alias
filter? Are there any cache algorithms to accelerate these operations using
the alias filter? Or it just append the extra filter condition of the
filtered alias in the query?

Sorry for the newbie questions, could you give me some opinion about these
questions?

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2b4090fd-eb7d-4f92-acf5-6299d0b17d3b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Ji) #4

Hi Bing,

First, really thanks for your reply. According to the replies, I have few
questions about it below.

Binh Ly於 2014年1月21日星期二UTC+8上午5時10分10秒寫道:

Ivan,

  1. The multi_field type allows you to define different ways that a single
    field value
    will be indexed. Your example below will work and will index a
    single value as string/not_analyzed, and then as an int (use "integer" for
    int)

  2. The document coming in will contain a field named "name" with a single
    value. When it goes into the index, it will be indexed 2 different ways.

  3. A mapping is not required to index data. There is an implied default
    mapping that will parse your JSON content and dynamically update the schema
    if you don't specify one up-front.

  4. You cannot change the shard count after the index is created. You can
    change the replica count anytime. The PUT mapping API allows you to change
    the replica count.

  5. You can specify a single routing value for all documents that you want
    to go to a specific shard/location.

Yes, but can I control the two sets of document must be store in differentshards? Because if I use different routing values, does it means it can be
stored in different shard? I guest not, right? Although the hash value of
these two values are different, I am not sure what the range that the
routing value belong to a single shard. And I want ti store these documents
in different shard.

  1. The number of shards will allow you to scale your content later. So if
    your data volume increases, you can add more nodes later and distribute the
    shards around. If you only have a single shard and you run out of space,
    then you cannot scale out unless you increase storage, or increase the
    shard count.

  2. Scroll is used to do a snapshot type of search - i.e., results you get
    back will not be affected by updates to the index after you start
    scrolling. From/size are useful if you want to do paging of search results
    (or infinite scrolling but paged at a time).

  3. Filters execute fast and yes can be cached.

About filters, I want to know the underlying algorithm. If I create an
alias which represent about half the index, does it increase the index
size? I mean if I create aliases, does it operate and store some really
data about it into the storage? or it just remember the condition and
process like some predefined adapter which cannot store something stored
data inside the storage?

Another question:
What's the suggestion if I need to modify the mapping of some index, such
as from store="no" to "yes", or remove some field ?
Because after I read these days, it seems hard to change a existed mapping
and there are much limitation of it.

Again, thanks for your replies.

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c581704f-8c66-4151-8816-31065867218b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #5

Ivan,

  1. You're right, two different routing values may has to the same shard.

  2. About ES filters, this might help:
    http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

About your mapping question, the put mapping api will allow you to update
existing mappings. It can merge, if your new mapping does not conflict with
the old one, otherwise, you'll probably need to rebuild the index if there
is a conflict.

On Monday, January 20, 2014 8:43:41 PM UTC-5, Ivan Ji wrote:

Hi Bing,

First, really thanks for your reply. According to the replies, I have few
questions about it below.

Binh Ly於 2014年1月21日星期二UTC+8上午5時10分10秒寫道:

Ivan,

  1. The multi_field type allows you to define different ways that a
    single field value will be indexed. Your example below will work and will
    index a single value as string/not_analyzed, and then as an int (use
    "integer" for int)

  2. The document coming in will contain a field named "name" with a single
    value. When it goes into the index, it will be indexed 2 different ways.

  3. A mapping is not required to index data. There is an implied default
    mapping that will parse your JSON content and dynamically update the schema
    if you don't specify one up-front.

  4. You cannot change the shard count after the index is created. You can
    change the replica count anytime. The PUT mapping API allows you to change
    the replica count.

  5. You can specify a single routing value for all documents that you want
    to go to a specific shard/location.

Yes, but can I control the two sets of document must be store in
different shards? Because if I use different routing values, does it
means it can be stored in different shard? I guest not, right? Although the
hash value of these two values are different, I am not sure what the range
that the routing value belong to a single shard. And I want ti store these
documents in different shard.

  1. The number of shards will allow you to scale your content later. So if
    your data volume increases, you can add more nodes later and distribute the
    shards around. If you only have a single shard and you run out of space,
    then you cannot scale out unless you increase storage, or increase the
    shard count.

  2. Scroll is used to do a snapshot type of search - i.e., results you get
    back will not be affected by updates to the index after you start
    scrolling. From/size are useful if you want to do paging of search results
    (or infinite scrolling but paged at a time).

  3. Filters execute fast and yes can be cached.

About filters, I want to know the underlying algorithm. If I create an
alias which represent about half the index, does it increase the index
size? I mean if I create aliases, does it operate and store some really
data about it into the storage? or it just remember the condition and
process like some predefined adapter which cannot store something stored
data inside the storage?

Another question:
What's the suggestion if I need to modify the mapping of some index, such
as from store="no" to "yes", or remove some field ?
Because after I read these days, it seems hard to change a existed mapping
and there are much limitation of it.

Again, thanks for your replies.

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a750c783-4e1e-444a-ab8a-9180cb86303f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Ji) #6

Hi, Binh

Thanks for your replies.

So it seems there are no ways to force two document to store in different
shard.

And I will read the document of the bitsets.

Regards,

Ivan

Binh Ly於 2014年1月22日星期三UTC+8上午12時57分48秒寫道:

Ivan,

  1. You're right, two different routing values may has to the same shard.

  2. About ES filters, this might help:
    http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

About your mapping question, the put mapping api will allow you to update
existing mappings. It can merge, if your new mapping does not conflict with
the old one, otherwise, you'll probably need to rebuild the index if there
is a conflict.

On Monday, January 20, 2014 8:43:41 PM UTC-5, Ivan Ji wrote:

Hi Bing,

First, really thanks for your reply. According to the replies, I have few
questions about it below.

Binh Ly於 2014年1月21日星期二UTC+8上午5時10分10秒寫道:

Ivan,

  1. The multi_field type allows you to define different ways that a
    single field value will be indexed. Your example below will work and will
    index a single value as string/not_analyzed, and then as an int (use
    "integer" for int)

  2. The document coming in will contain a field named "name" with a
    single value. When it goes into the index, it will be indexed 2 different
    ways.

  3. A mapping is not required to index data. There is an implied default
    mapping that will parse your JSON content and dynamically update the schema
    if you don't specify one up-front.

  4. You cannot change the shard count after the index is created. You can
    change the replica count anytime. The PUT mapping API allows you to change
    the replica count.

  5. You can specify a single routing value for all documents that you
    want to go to a specific shard/location.

Yes, but can I control the two sets of document must be store in
different shards? Because if I use different routing values, does it
means it can be stored in different shard? I guest not, right? Although the
hash value of these two values are different, I am not sure what the range
that the routing value belong to a single shard. And I want ti store these
documents in different shard.

  1. The number of shards will allow you to scale your content later. So
    if your data volume increases, you can add more nodes later and distribute
    the shards around. If you only have a single shard and you run out of
    space, then you cannot scale out unless you increase storage, or increase
    the shard count.

  2. Scroll is used to do a snapshot type of search - i.e., results you
    get back will not be affected by updates to the index after you start
    scrolling. From/size are useful if you want to do paging of search results
    (or infinite scrolling but paged at a time).

  3. Filters execute fast and yes can be cached.

About filters, I want to know the underlying algorithm. If I create an
alias which represent about half the index, does it increase the index
size? I mean if I create aliases, does it operate and store some really
data about it into the storage? or it just remember the condition and
process like some predefined adapter which cannot store something stored
data inside the storage?

Another question:
What's the suggestion if I need to modify the mapping of some index, such
as from store="no" to "yes", or remove some field ?
Because after I read these days, it seems hard to change a existed
mapping and there are much limitation of it.

Again, thanks for your replies.

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b44d5db2-6c0e-40cc-9d9f-80acee93d70d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #7