Index and Mapping Creation

Background

We embed elasticsearch into our web application, and then let Amazon EC2
control the startup or teardown of our server instances based on load
metrics. The exact same web app is used across all instances, therefore each
new instance brings a new elasticsearch node into existence.

I am trying to bulletproof some scenarios, and just want to ask what the
best practice might be for some of these situations.

Programmatic Index/Mapping Creation

We are currently defining our index settings in an elasticsearch.properties
file that we feed to our embedded ES node when we start it up. We store our
mapping files in a location that ES can find and the first time a document
is indexed by ES, the mapping files are read in and properly applied.

One problem, is the mapping files are not automatically loaded if the first
request to ES is a query. Is there a setting that can tell ES to load the
mapping files when the node is created?

At the moment we use the same index configuration across all of our
indicies. If we wanted to specify different index configurations per index,
would we have to resort to a programmatic creation of the indicies via an
API call?

If we switch to programmatic creation of index and mappings, am I right to
assume that this only has to happen on the first node to join the cluster
because subsequent nodes will inherit the cluster's configuration?

In the case of a node joining an existing cluster, do the default mappings
on disk or index configuration info in elasticsearch.properties have any
bearing on how that node functions with regards to mappings and indexes?

Hey,

Node level index settings (for example, the elasticsearch.yml
configuration file, or settings that you provide to the node) are applied to
an index that is instantiated on that relevant node. If there aren't
settings to override it when you create the index programmatically, then
those settings will be applied to the index.

An index instantiated in a node is the low level construct that represent
an index, its not the index "concept" or the clustered fact that an index
has been created. Grr, I think I am confusing you.. . Think about it like
this: When you create an index against a cluster, that index metadata gets
created in the cluster. Then, when a shard needs is allocated on a node, an
index level construct is created, which has is configured with settings.
Those settings are a combination of the index metadata settings, and the
node level settings (those settings in the node configuration that starts
with index.xxx).

So, back to you question. You can have the same node level settings (that
start with index.xxx) and those will apply to any index created. I prefer
that users will provide the index level settings when you create the index.
That way, you know what settings you provide it. When a node joins the
cluster, and a shard for a specific index gets allocate on it, it will use
those settings provided when you created the index through an API to
initialize itself.

On Fri, Aug 12, 2011 at 7:27 PM, James Cook jcook@tracermedia.com wrote:

Background

We embed elasticsearch into our web application, and then let Amazon EC2
control the startup or teardown of our server instances based on load
metrics. The exact same web app is used across all instances, therefore each
new instance brings a new elasticsearch node into existence.

I am trying to bulletproof some scenarios, and just want to ask what the
best practice might be for some of these situations.

Programmatic Index/Mapping Creation

We are currently defining our index settings in an elasticsearch.properties
file that we feed to our embedded ES node when we start it up. We store our
mapping files in a location that ES can find and the first time a document
is indexed by ES, the mapping files are read in and properly applied.

One problem, is the mapping files are not automatically loaded if the first
request to ES is a query. Is there a setting that can tell ES to load the
mapping files when the node is created?

At the moment we use the same index configuration across all of our
indicies. If we wanted to specify different index configurations per index,
would we have to resort to a programmatic creation of the indicies via an
API call?

If we switch to programmatic creation of index and mappings, am I right to
assume that this only has to happen on the first node to join the cluster
because subsequent nodes will inherit the cluster's configuration?

In the case of a node joining an existing cluster, do the default mappings
on disk or index configuration info in elasticsearch.properties have any
bearing on how that node functions with regards to mappings and indexes?

I suspect that an index, once created, is immutable, correct? This statement
has me questioning.

Then, when a shard needs is allocated on a node, an index level construct is
created, which has is configured with settings. Those settings are a
combination of the index metadata settings, and the node level settings
(those settings in the node configuration that starts with index.xxx).

Once the index is created, the allocation of a shard wouldn't be impacted by
the particular index configuration of the node local to the shard would it?

On Sat, Aug 13, 2011 at 1:39 PM, James Cook jcook@tracermedia.com wrote:

I suspect that an index, once created, is immutable, correct? This
statement has me questioning.

Then, when a shard needs is allocated on a node, an index level construct
is created, which has is configured with settings. Those settings are a
combination of the index metadata settings, and the node level settings
(those settings in the node configuration that starts with index.xxx).

Once the index is created, the allocation of a shard wouldn't be impacted
by the particular index configuration of the node local to the shard would
it?

It can be, if you have different index level settings for different nodes.
Thats why I recommend to provide custom index level settings in the index
creation API.