Index "bootstrapping"

Ryan_Crumley · November 8, 2010, 11:48pm

All,

I am integrating Elastic Search into a new java/spring/hibernate/
wicket application. I want to keep this application as standalone as
possible so I opted to integrate ES into the web application itself
(using NodeBuilder.nodeBuilder().data(true).local(false) ). This has
been working great with the exception of "bootstrapping" the index at
startup. The current bootstrap flow looks like this:

Web application is started.
Search service tries to run a count() query to determine if a full
index is needed.

I run into the first issue here when the index has not completed
loading from the local gateway. This causes the count query fail. If I
wait and retry the count query it eventually completes with the
expected answer and I can decide if I need to reindex the content.

In the case the index does not exist I try to create a new index by
executing a "create index" call:

client.admin().indices().prepareCreate(INDEX_NAME).execute();

Then immediately I want to customize a mapping but sometimes a "put
mapping" call fails because the index is not fully created. Again if I
wait long enough after the "create index" call the "put mapping" will
succeed however it seems like there should be a better way. Any
suggestions?

The data set I am using is around 200-500 small documents, with 3-4
terms. I don't anticipate needing a second ES server any time soon so
ideally any solution would not involve a second ES node. I have tried
using the "local" and "file" gateway with similar results. Since the
current dataset is small if I had to index each time the application
started (no persistence of the index) that would be OK however I still
have the "create index" / "put mapping" race condition.

My current ES configuration is very simple:

gateway.type=local
gateway.recover_after_nodes=0
gateway.recover_after_time=0s

Thanks in advance,

Ryan

Lukas_Vlcek1 · November 9, 2010, 12:30am

Hi,

there is a cluster health API that you can use to check if new index is
already allocated and ready for use.
http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/

http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/
Regards,
Lukas

On Tue, Nov 9, 2010 at 12:48 AM, Ryan Crumley crumley@gmail.com wrote:

All,

I am integrating Elastic Search into a new java/spring/hibernate/
wicket application. I want to keep this application as standalone as
possible so I opted to integrate ES into the web application itself
(using NodeBuilder.nodeBuilder().data(true).local(false) ). This has
been working great with the exception of "bootstrapping" the index at
startup. The current bootstrap flow looks like this:

Web application is started.

Search service tries to run a count() query to determine if a full
index is needed.

I run into the first issue here when the index has not completed
loading from the local gateway. This causes the count query fail. If I
wait and retry the count query it eventually completes with the
expected answer and I can decide if I need to reindex the content.

In the case the index does not exist I try to create a new index by
executing a "create index" call:

client.admin().indices().prepareCreate(INDEX_NAME).execute();

Then immediately I want to customize a mapping but sometimes a "put
mapping" call fails because the index is not fully created. Again if I
wait long enough after the "create index" call the "put mapping" will
succeed however it seems like there should be a better way. Any
suggestions?

The data set I am using is around 200-500 small documents, with 3-4
terms. I don't anticipate needing a second ES server any time soon so
ideally any solution would not involve a second ES node. I have tried
using the "local" and "file" gateway with similar results. Since the
current dataset is small if I had to index each time the application
started (no persistence of the index) that would be OK however I still
have the "create index" / "put mapping" race condition.

My current ES configuration is very simple:

gateway.type=local
gateway.recover_after_nodes=0
gateway.recover_after_time=0s

Thanks in advance,

Ryan

Ryan_Crumley · November 9, 2010, 12:36am

Thanks I will take a look. Is there no better way to structure this than
polling between API calls? How are people managing their custom type
mappings?

Ryan

On Mon, Nov 8, 2010 at 6:30 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

there is a cluster health API that you can use to check if new index is
already allocated and ready for use.

http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/

http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/
Regards,
Lukas

On Tue, Nov 9, 2010 at 12:48 AM, Ryan Crumley crumley@gmail.com wrote:

All,

I am integrating Elastic Search into a new java/spring/hibernate/
wicket application. I want to keep this application as standalone as
possible so I opted to integrate ES into the web application itself
(using NodeBuilder.nodeBuilder().data(true).local(false) ). This has
been working great with the exception of "bootstrapping" the index at
startup. The current bootstrap flow looks like this:

Web application is started.

Search service tries to run a count() query to determine if a full
index is needed.

I run into the first issue here when the index has not completed
loading from the local gateway. This causes the count query fail. If I
wait and retry the count query it eventually completes with the
expected answer and I can decide if I need to reindex the content.

In the case the index does not exist I try to create a new index by
executing a "create index" call:

client.admin().indices().prepareCreate(INDEX_NAME).execute();

Then immediately I want to customize a mapping but sometimes a "put
mapping" call fails because the index is not fully created. Again if I
wait long enough after the "create index" call the "put mapping" will
succeed however it seems like there should be a better way. Any
suggestions?

The data set I am using is around 200-500 small documents, with 3-4
terms. I don't anticipate needing a second ES server any time soon so
ideally any solution would not involve a second ES node. I have tried
using the "local" and "file" gateway with similar results. Since the
current dataset is small if I had to index each time the application
started (no persistence of the index) that would be OK however I still
have the "create index" / "put mapping" race condition.

My current ES configuration is very simple:

gateway.type=local
gateway.recover_after_nodes=0
gateway.recover_after_time=0s

Thanks in advance,

Ryan

Lukas_Vlcek1 · November 9, 2010, 1:25am

Hi,

you can have predefined custom types. Check
http://www.elasticsearch.com/docs/elasticsearch/mapping/builtin_mappings/

http://www.elasticsearch.com/docs/elasticsearch/mapping/builtin_mappings/
Lukas

On Tue, Nov 9, 2010 at 1:36 AM, Ryan Crumley crumley@gmail.com wrote:

Thanks I will take a look. Is there no better way to structure this than
polling between API calls? How are people managing their custom type
mappings?

Ryan

On Mon, Nov 8, 2010 at 6:30 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

there is a cluster health API that you can use to check if new index is
already allocated and ready for use.

http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/

http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/
Regards,
Lukas

On Tue, Nov 9, 2010 at 12:48 AM, Ryan Crumley crumley@gmail.com wrote:

All,

I am integrating Elastic Search into a new java/spring/hibernate/
wicket application. I want to keep this application as standalone as
possible so I opted to integrate ES into the web application itself
(using NodeBuilder.nodeBuilder().data(true).local(false) ). This has
been working great with the exception of "bootstrapping" the index at
startup. The current bootstrap flow looks like this:

Web application is started.

Search service tries to run a count() query to determine if a full
index is needed.

I run into the first issue here when the index has not completed
loading from the local gateway. This causes the count query fail. If I
wait and retry the count query it eventually completes with the
expected answer and I can decide if I need to reindex the content.

In the case the index does not exist I try to create a new index by
executing a "create index" call:

client.admin().indices().prepareCreate(INDEX_NAME).execute();

Then immediately I want to customize a mapping but sometimes a "put
mapping" call fails because the index is not fully created. Again if I
wait long enough after the "create index" call the "put mapping" will
succeed however it seems like there should be a better way. Any
suggestions?

The data set I am using is around 200-500 small documents, with 3-4
terms. I don't anticipate needing a second ES server any time soon so
ideally any solution would not involve a second ES node. I have tried
using the "local" and "file" gateway with similar results. Since the
current dataset is small if I had to index each time the application
started (no persistence of the index) that would be OK however I still
have the "create index" / "put mapping" race condition.

My current ES configuration is very simple:

gateway.type=local
gateway.recover_after_nodes=0
gateway.recover_after_time=0s

Thanks in advance,

Ryan

jminard · November 9, 2010, 12:37pm

There have been conflicting suggestions around pushing mappings through the API vs. a configuration on the physical server. I think on a large cluster, ensuring file updates across the board is harder than pushing the mappings through the API. Especially if you create new index names dynamically but have index-specific mappings for them.

Maybe an option to create and index synchronously? Option on apply mappings to be synchronous as well?

Or back to polling to determine if it looks up to date...

--j

On 2010-11-08, at 11:25 PM, Lukáš Vlček wrote:

Hi,

you can have predefined custom types. Check http://www.elasticsearch.com/docs/elasticsearch/mapping/builtin_mappings/

Lukas

On Tue, Nov 9, 2010 at 1:36 AM, Ryan Crumley crumley@gmail.com wrote:
Thanks I will take a look. Is there no better way to structure this than polling between API calls? How are people managing their custom type mappings?

Ryan

On Mon, Nov 8, 2010 at 6:30 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:
Hi,

there is a cluster health API that you can use to check if new index is already allocated and ready for use.
http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/

Regards,
Lukas

On Tue, Nov 9, 2010 at 12:48 AM, Ryan Crumley crumley@gmail.com wrote:
All,

I am integrating Elastic Search into a new java/spring/hibernate/
wicket application. I want to keep this application as standalone as
possible so I opted to integrate ES into the web application itself
(using NodeBuilder.nodeBuilder().data(true).local(false) ). This has
been working great with the exception of "bootstrapping" the index at
startup. The current bootstrap flow looks like this:

Web application is started.

Search service tries to run a count() query to determine if a full
index is needed.

I run into the first issue here when the index has not completed
loading from the local gateway. This causes the count query fail. If I
wait and retry the count query it eventually completes with the
expected answer and I can decide if I need to reindex the content.

In the case the index does not exist I try to create a new index by
executing a "create index" call:

client.admin().indices().prepareCreate(INDEX_NAME).execute();

Then immediately I want to customize a mapping but sometimes a "put
mapping" call fails because the index is not fully created. Again if I
wait long enough after the "create index" call the "put mapping" will
succeed however it seems like there should be a better way. Any
suggestions?

The data set I am using is around 200-500 small documents, with 3-4
terms. I don't anticipate needing a second ES server any time soon so
ideally any solution would not involve a second ES node. I have tried
using the "local" and "file" gateway with similar results. Since the
current dataset is small if I had to index each time the application
started (no persistence of the index) that would be OK however I still
have the "create index" / "put mapping" race condition.

My current ES configuration is very simple:

gateway.type=local
gateway.recover_after_nodes=0
gateway.recover_after_time=0s

Thanks in advance,

Ryan

kimchy · November 9, 2010, 1:15pm

Make sure to call get or actionGet once you execute an operation, otherwise,
your not blocking on it to return (its executed in an async manner). Based
on the code you posted, you don't call it on index creation.

Regarding know when the index has recovered, you can use the cluster health
API, as was suggested by Lukas.

-shay.banon

On Tue, Nov 9, 2010 at 1:48 AM, Ryan Crumley crumley@gmail.com wrote:

All,

I am integrating Elastic Search into a new java/spring/hibernate/
wicket application. I want to keep this application as standalone as
possible so I opted to integrate ES into the web application itself
(using NodeBuilder.nodeBuilder().data(true).local(false) ). This has
been working great with the exception of "bootstrapping" the index at
startup. The current bootstrap flow looks like this:

Web application is started.

Search service tries to run a count() query to determine if a full
index is needed.

I run into the first issue here when the index has not completed
loading from the local gateway. This causes the count query fail. If I
wait and retry the count query it eventually completes with the
expected answer and I can decide if I need to reindex the content.

In the case the index does not exist I try to create a new index by
executing a "create index" call:

client.admin().indices().prepareCreate(INDEX_NAME).execute();

Then immediately I want to customize a mapping but sometimes a "put
mapping" call fails because the index is not fully created. Again if I
wait long enough after the "create index" call the "put mapping" will
succeed however it seems like there should be a better way. Any
suggestions?

The data set I am using is around 200-500 small documents, with 3-4
terms. I don't anticipate needing a second ES server any time soon so
ideally any solution would not involve a second ES node. I have tried
using the "local" and "file" gateway with similar results. Since the
current dataset is small if I had to index each time the application
started (no persistence of the index) that would be OK however I still
have the "create index" / "put mapping" race condition.

My current ES configuration is very simple:

gateway.type=local
gateway.recover_after_nodes=0
gateway.recover_after_time=0s

Thanks in advance,

Ryan

Ryan_Crumley · November 10, 2010, 3:44am

Thanks for the information this got me on the right track.

I ended up making two changes:

Before making a count query I am using the health api to wait until the
index comes up:
client.admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus().execute().actionGet();

If an IndexMissingException is thrown I know the index needs to be created.

When creating the index I had a problem with "put mapping" before the
index was ready... Instead of using the health api you can also specify a
mapping when the index is created.

Thanks again,

Ryan

On Tue, Nov 9, 2010 at 7:15 AM, Shay Banon shay.banon@elasticsearch.comwrote:

Make sure to call get or actionGet once you execute an operation,
otherwise, your not blocking on it to return (its executed in an async
manner). Based on the code you posted, you don't call it on index creation.

Regarding know when the index has recovered, you can use the cluster health
API, as was suggested by Lukas.

-shay.banon

On Tue, Nov 9, 2010 at 1:48 AM, Ryan Crumley crumley@gmail.com wrote:

All,

I am integrating Elastic Search into a new java/spring/hibernate/
wicket application. I want to keep this application as standalone as
possible so I opted to integrate ES into the web application itself
(using NodeBuilder.nodeBuilder().data(true).local(false) ). This has
been working great with the exception of "bootstrapping" the index at
startup. The current bootstrap flow looks like this:

Web application is started.

Search service tries to run a count() query to determine if a full
index is needed.

I run into the first issue here when the index has not completed
loading from the local gateway. This causes the count query fail. If I
wait and retry the count query it eventually completes with the
expected answer and I can decide if I need to reindex the content.

In the case the index does not exist I try to create a new index by
executing a "create index" call:

client.admin().indices().prepareCreate(INDEX_NAME).execute();

Then immediately I want to customize a mapping but sometimes a "put
mapping" call fails because the index is not fully created. Again if I
wait long enough after the "create index" call the "put mapping" will
succeed however it seems like there should be a better way. Any
suggestions?

The data set I am using is around 200-500 small documents, with 3-4
terms. I don't anticipate needing a second ES server any time soon so
ideally any solution would not involve a second ES node. I have tried
using the "local" and "file" gateway with similar results. Since the
current dataset is small if I had to index each time the application
started (no persistence of the index) that would be OK however I still
have the "create index" / "put mapping" race condition.

My current ES configuration is very simple:

gateway.type=local
gateway.recover_after_nodes=0
gateway.recover_after_time=0s

Thanks in advance,

Ryan

Topic		Replies	Views
Creating Indicies - Sometimes necessary, other times not? Elasticsearch	10	481	July 6, 2017
Index not recognized from java API Elasticsearch	8	467	July 6, 2017
Determining when an index operation is complete Elasticsearch	6	2919	July 6, 2017
Need help with Java API Elasticsearch	7	1261	July 6, 2017
Creating, Updating, Validating and Rebuilding Index using Java API Elasticsearch	5	1630	July 6, 2017

Index "bootstrapping"

Related topics