Index "bootstrapping"


(Ryan Crumley) #1

All,

I am integrating Elastic Search into a new java/spring/hibernate/
wicket application. I want to keep this application as standalone as
possible so I opted to integrate ES into the web application itself
(using NodeBuilder.nodeBuilder().data(true).local(false) ). This has
been working great with the exception of "bootstrapping" the index at
startup. The current bootstrap flow looks like this:

  1. Web application is started.
  2. Search service tries to run a count() query to determine if a full
    index is needed.

I run into the first issue here when the index has not completed
loading from the local gateway. This causes the count query fail. If I
wait and retry the count query it eventually completes with the
expected answer and I can decide if I need to reindex the content.

In the case the index does not exist I try to create a new index by
executing a "create index" call:

client.admin().indices().prepareCreate(INDEX_NAME).execute();

Then immediately I want to customize a mapping but sometimes a "put
mapping" call fails because the index is not fully created. Again if I
wait long enough after the "create index" call the "put mapping" will
succeed however it seems like there should be a better way. Any
suggestions?

The data set I am using is around 200-500 small documents, with 3-4
terms. I don't anticipate needing a second ES server any time soon so
ideally any solution would not involve a second ES node. I have tried
using the "local" and "file" gateway with similar results. Since the
current dataset is small if I had to index each time the application
started (no persistence of the index) that would be OK however I still
have the "create index" / "put mapping" race condition.

My current ES configuration is very simple:

gateway.type=local
gateway.recover_after_nodes=0
gateway.recover_after_time=0s

Thanks in advance,

Ryan


(Lukáš Vlček) #2

Hi,

there is a cluster health API that you can use to check if new index is
already allocated and ready for use.
http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/

http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/
Regards,
Lukas

On Tue, Nov 9, 2010 at 12:48 AM, Ryan Crumley crumley@gmail.com wrote:

All,

I am integrating Elastic Search into a new java/spring/hibernate/
wicket application. I want to keep this application as standalone as
possible so I opted to integrate ES into the web application itself
(using NodeBuilder.nodeBuilder().data(true).local(false) ). This has
been working great with the exception of "bootstrapping" the index at
startup. The current bootstrap flow looks like this:

  1. Web application is started.
  2. Search service tries to run a count() query to determine if a full
    index is needed.

I run into the first issue here when the index has not completed
loading from the local gateway. This causes the count query fail. If I
wait and retry the count query it eventually completes with the
expected answer and I can decide if I need to reindex the content.

In the case the index does not exist I try to create a new index by
executing a "create index" call:

client.admin().indices().prepareCreate(INDEX_NAME).execute();

Then immediately I want to customize a mapping but sometimes a "put
mapping" call fails because the index is not fully created. Again if I
wait long enough after the "create index" call the "put mapping" will
succeed however it seems like there should be a better way. Any
suggestions?

The data set I am using is around 200-500 small documents, with 3-4
terms. I don't anticipate needing a second ES server any time soon so
ideally any solution would not involve a second ES node. I have tried
using the "local" and "file" gateway with similar results. Since the
current dataset is small if I had to index each time the application
started (no persistence of the index) that would be OK however I still
have the "create index" / "put mapping" race condition.

My current ES configuration is very simple:

gateway.type=local
gateway.recover_after_nodes=0
gateway.recover_after_time=0s

Thanks in advance,

Ryan


(Ryan Crumley) #3

Thanks I will take a look. Is there no better way to structure this than
polling between API calls? How are people managing their custom type
mappings?

Ryan

On Mon, Nov 8, 2010 at 6:30 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

there is a cluster health API that you can use to check if new index is
already allocated and ready for use.

http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/

http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/
Regards,
Lukas

On Tue, Nov 9, 2010 at 12:48 AM, Ryan Crumley crumley@gmail.com wrote:

All,

I am integrating Elastic Search into a new java/spring/hibernate/
wicket application. I want to keep this application as standalone as
possible so I opted to integrate ES into the web application itself
(using NodeBuilder.nodeBuilder().data(true).local(false) ). This has
been working great with the exception of "bootstrapping" the index at
startup. The current bootstrap flow looks like this:

  1. Web application is started.
  2. Search service tries to run a count() query to determine if a full
    index is needed.

I run into the first issue here when the index has not completed
loading from the local gateway. This causes the count query fail. If I
wait and retry the count query it eventually completes with the
expected answer and I can decide if I need to reindex the content.

In the case the index does not exist I try to create a new index by
executing a "create index" call:

client.admin().indices().prepareCreate(INDEX_NAME).execute();

Then immediately I want to customize a mapping but sometimes a "put
mapping" call fails because the index is not fully created. Again if I
wait long enough after the "create index" call the "put mapping" will
succeed however it seems like there should be a better way. Any
suggestions?

The data set I am using is around 200-500 small documents, with 3-4
terms. I don't anticipate needing a second ES server any time soon so
ideally any solution would not involve a second ES node. I have tried
using the "local" and "file" gateway with similar results. Since the
current dataset is small if I had to index each time the application
started (no persistence of the index) that would be OK however I still
have the "create index" / "put mapping" race condition.

My current ES configuration is very simple:

gateway.type=local
gateway.recover_after_nodes=0
gateway.recover_after_time=0s

Thanks in advance,

Ryan


(Lukáš Vlček) #4

Hi,

you can have predefined custom types. Check
http://www.elasticsearch.com/docs/elasticsearch/mapping/builtin_mappings/

http://www.elasticsearch.com/docs/elasticsearch/mapping/builtin_mappings/
Lukas

On Tue, Nov 9, 2010 at 1:36 AM, Ryan Crumley crumley@gmail.com wrote:

Thanks I will take a look. Is there no better way to structure this than
polling between API calls? How are people managing their custom type
mappings?

Ryan

On Mon, Nov 8, 2010 at 6:30 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

there is a cluster health API that you can use to check if new index is
already allocated and ready for use.

http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/

http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/
Regards,
Lukas

On Tue, Nov 9, 2010 at 12:48 AM, Ryan Crumley crumley@gmail.com wrote:

All,

I am integrating Elastic Search into a new java/spring/hibernate/
wicket application. I want to keep this application as standalone as
possible so I opted to integrate ES into the web application itself
(using NodeBuilder.nodeBuilder().data(true).local(false) ). This has
been working great with the exception of "bootstrapping" the index at
startup. The current bootstrap flow looks like this:

  1. Web application is started.
  2. Search service tries to run a count() query to determine if a full
    index is needed.

I run into the first issue here when the index has not completed
loading from the local gateway. This causes the count query fail. If I
wait and retry the count query it eventually completes with the
expected answer and I can decide if I need to reindex the content.

In the case the index does not exist I try to create a new index by
executing a "create index" call:

client.admin().indices().prepareCreate(INDEX_NAME).execute();

Then immediately I want to customize a mapping but sometimes a "put
mapping" call fails because the index is not fully created. Again if I
wait long enough after the "create index" call the "put mapping" will
succeed however it seems like there should be a better way. Any
suggestions?

The data set I am using is around 200-500 small documents, with 3-4
terms. I don't anticipate needing a second ES server any time soon so
ideally any solution would not involve a second ES node. I have tried
using the "local" and "file" gateway with similar results. Since the
current dataset is small if I had to index each time the application
started (no persistence of the index) that would be OK however I still
have the "create index" / "put mapping" race condition.

My current ES configuration is very simple:

gateway.type=local
gateway.recover_after_nodes=0
gateway.recover_after_time=0s

Thanks in advance,

Ryan


(jminard) #5

There have been conflicting suggestions around pushing mappings through the API vs. a configuration on the physical server. I think on a large cluster, ensuring file updates across the board is harder than pushing the mappings through the API. Especially if you create new index names dynamically but have index-specific mappings for them.

Maybe an option to create and index synchronously? Option on apply mappings to be synchronous as well?

Or back to polling to determine if it looks up to date...

--j

On 2010-11-08, at 11:25 PM, Lukáš Vlček wrote:

Hi,

you can have predefined custom types. Check http://www.elasticsearch.com/docs/elasticsearch/mapping/builtin_mappings/

Lukas

On Tue, Nov 9, 2010 at 1:36 AM, Ryan Crumley crumley@gmail.com wrote:
Thanks I will take a look. Is there no better way to structure this than polling between API calls? How are people managing their custom type mappings?

Ryan

On Mon, Nov 8, 2010 at 6:30 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:
Hi,

there is a cluster health API that you can use to check if new index is already allocated and ready for use.
http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/

Regards,
Lukas

On Tue, Nov 9, 2010 at 12:48 AM, Ryan Crumley crumley@gmail.com wrote:
All,

I am integrating Elastic Search into a new java/spring/hibernate/
wicket application. I want to keep this application as standalone as
possible so I opted to integrate ES into the web application itself
(using NodeBuilder.nodeBuilder().data(true).local(false) ). This has
been working great with the exception of "bootstrapping" the index at
startup. The current bootstrap flow looks like this:

  1. Web application is started.
  2. Search service tries to run a count() query to determine if a full
    index is needed.

I run into the first issue here when the index has not completed
loading from the local gateway. This causes the count query fail. If I
wait and retry the count query it eventually completes with the
expected answer and I can decide if I need to reindex the content.

In the case the index does not exist I try to create a new index by
executing a "create index" call:

client.admin().indices().prepareCreate(INDEX_NAME).execute();

Then immediately I want to customize a mapping but sometimes a "put
mapping" call fails because the index is not fully created. Again if I
wait long enough after the "create index" call the "put mapping" will
succeed however it seems like there should be a better way. Any
suggestions?

The data set I am using is around 200-500 small documents, with 3-4
terms. I don't anticipate needing a second ES server any time soon so
ideally any solution would not involve a second ES node. I have tried
using the "local" and "file" gateway with similar results. Since the
current dataset is small if I had to index each time the application
started (no persistence of the index) that would be OK however I still
have the "create index" / "put mapping" race condition.

My current ES configuration is very simple:

gateway.type=local
gateway.recover_after_nodes=0
gateway.recover_after_time=0s

Thanks in advance,

Ryan


(Shay Banon) #6

Make sure to call get or actionGet once you execute an operation, otherwise,
your not blocking on it to return (its executed in an async manner). Based
on the code you posted, you don't call it on index creation.

Regarding know when the index has recovered, you can use the cluster health
API, as was suggested by Lukas.

-shay.banon

On Tue, Nov 9, 2010 at 1:48 AM, Ryan Crumley crumley@gmail.com wrote:

All,

I am integrating Elastic Search into a new java/spring/hibernate/
wicket application. I want to keep this application as standalone as
possible so I opted to integrate ES into the web application itself
(using NodeBuilder.nodeBuilder().data(true).local(false) ). This has
been working great with the exception of "bootstrapping" the index at
startup. The current bootstrap flow looks like this:

  1. Web application is started.
  2. Search service tries to run a count() query to determine if a full
    index is needed.

I run into the first issue here when the index has not completed
loading from the local gateway. This causes the count query fail. If I
wait and retry the count query it eventually completes with the
expected answer and I can decide if I need to reindex the content.

In the case the index does not exist I try to create a new index by
executing a "create index" call:

client.admin().indices().prepareCreate(INDEX_NAME).execute();

Then immediately I want to customize a mapping but sometimes a "put
mapping" call fails because the index is not fully created. Again if I
wait long enough after the "create index" call the "put mapping" will
succeed however it seems like there should be a better way. Any
suggestions?

The data set I am using is around 200-500 small documents, with 3-4
terms. I don't anticipate needing a second ES server any time soon so
ideally any solution would not involve a second ES node. I have tried
using the "local" and "file" gateway with similar results. Since the
current dataset is small if I had to index each time the application
started (no persistence of the index) that would be OK however I still
have the "create index" / "put mapping" race condition.

My current ES configuration is very simple:

gateway.type=local
gateway.recover_after_nodes=0
gateway.recover_after_time=0s

Thanks in advance,

Ryan


(Ryan Crumley) #7

Thanks for the information this got me on the right track.

I ended up making two changes:

  • Before making a count query I am using the health api to wait until the
    index comes up:
    client.admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus().execute().actionGet();

If an IndexMissingException is thrown I know the index needs to be created.

  • When creating the index I had a problem with "put mapping" before the
    index was ready... Instead of using the health api you can also specify a
    mapping when the index is created.

Thanks again,

Ryan

On Tue, Nov 9, 2010 at 7:15 AM, Shay Banon shay.banon@elasticsearch.comwrote:

Make sure to call get or actionGet once you execute an operation,
otherwise, your not blocking on it to return (its executed in an async
manner). Based on the code you posted, you don't call it on index creation.

Regarding know when the index has recovered, you can use the cluster health
API, as was suggested by Lukas.

-shay.banon

On Tue, Nov 9, 2010 at 1:48 AM, Ryan Crumley crumley@gmail.com wrote:

All,

I am integrating Elastic Search into a new java/spring/hibernate/
wicket application. I want to keep this application as standalone as
possible so I opted to integrate ES into the web application itself
(using NodeBuilder.nodeBuilder().data(true).local(false) ). This has
been working great with the exception of "bootstrapping" the index at
startup. The current bootstrap flow looks like this:

  1. Web application is started.
  2. Search service tries to run a count() query to determine if a full
    index is needed.

I run into the first issue here when the index has not completed
loading from the local gateway. This causes the count query fail. If I
wait and retry the count query it eventually completes with the
expected answer and I can decide if I need to reindex the content.

In the case the index does not exist I try to create a new index by
executing a "create index" call:

client.admin().indices().prepareCreate(INDEX_NAME).execute();

Then immediately I want to customize a mapping but sometimes a "put
mapping" call fails because the index is not fully created. Again if I
wait long enough after the "create index" call the "put mapping" will
succeed however it seems like there should be a better way. Any
suggestions?

The data set I am using is around 200-500 small documents, with 3-4
terms. I don't anticipate needing a second ES server any time soon so
ideally any solution would not involve a second ES node. I have tried
using the "local" and "file" gateway with similar results. Since the
current dataset is small if I had to index each time the application
started (no persistence of the index) that would be OK however I still
have the "create index" / "put mapping" race condition.

My current ES configuration is very simple:

gateway.type=local
gateway.recover_after_nodes=0
gateway.recover_after_time=0s

Thanks in advance,

Ryan


(system) #8