Search failures on server restarts


(Diptamay) #1

Hi Shay

Please checkout git@github.com:diptamay/es-issue.git and have a look
at the issue listed below, running against the latest trunk build of
ES.

Scenario:

Most of the times when an ES server is stopped and restarted, and if
search queries are fired while the server is starting up, the
following issues are seen:

  1. {
    "error" : "ClusterBlockException[blocked by: [1/not recovered from
    gateway];[3/index not recovered];]"
    }
  2. Once the above error is encountered and the server has started up
    completely, the search queries which were executing successfully
    before the stoppage, are not returning results anymore.

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at configuration
    below)
  2. run ./automate.sh.
    a) This will create an es-test index and load the sample data.
    b) Then it fires a query which returns results correctly. This runs
    in an infinite loop
  3. stop running ./automate.sh
  4. run ./break-it.sh. This will keep running the query above from 2b
    in an infinite loop. Keep it running
  5. stop and start ES at localhost:9200
  6. You should see the issues listed above. If not repeat step 5.

Let me know if you need anything else.

Thanks
Diptamay

Note: Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs


(Ryan Crumley) #2

Try the cluster health api:

client().admin().cluster().prepareHealth(INDEX_NAME
).setWaitForYellowStatus().execute().actionGet();

Ryan

On Wed, Nov 10, 2010 at 8:20 PM, diptamay diptamay@gmail.com wrote:

Hi Shay

Please checkout git@github.com:diptamay/es-issue.git and have a look
at the issue listed below, running against the latest trunk build of
ES.

Scenario:

Most of the times when an ES server is stopped and restarted, and if
search queries are fired while the server is starting up, the
following issues are seen:

  1. {
    "error" : "ClusterBlockException[blocked by: [1/not recovered from
    gateway];[3/index not recovered];]"
    }
  2. Once the above error is encountered and the server has started up
    completely, the search queries which were executing successfully
    before the stoppage, are not returning results anymore.

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at configuration
    below)
  2. run ./automate.sh.
    a) This will create an es-test index and load the sample data.
    b) Then it fires a query which returns results correctly. This runs
    in an infinite loop
  3. stop running ./automate.sh
  4. run ./break-it.sh. This will keep running the query above from 2b
    in an infinite loop. Keep it running
  5. stop and start ES at localhost:9200
  6. You should see the issues listed above. If not repeat step 5.

Let me know if you need anything else.

Thanks
Diptamay

Note: Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs


(Diptamay) #3

How would this help?
client().admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus().execute().actionGet();

-Diptamay

On Nov 10, 10:12 pm, Ryan Crumley crum...@gmail.com wrote:

Try the cluster health api:

client().admin().cluster().prepareHealth(INDEX_NAME
).setWaitForYellowStatus().execute().actionGet();

Ryan

On Wed, Nov 10, 2010 at 8:20 PM, diptamay dipta...@gmail.com wrote:

Hi Shay

Please checkout g...@github.com:diptamay/es-issue.git and have a look
at the issue listed below, running against the latest trunk build of
ES.

Scenario:

Most of the times when an ES server is stopped and restarted, and if
search queries are fired while the server is starting up, the
following issues are seen:

  1. {
    "error" : "ClusterBlockException[blocked by: [1/not recovered from
    gateway];[3/index not recovered];]"
    }
  2. Once the above error is encountered and the server has started up
    completely, the search queries which were executing successfully
    before the stoppage, are not returning results anymore.

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at configuration
    below)
  2. run ./automate.sh.
    a) This will create an es-test index and load the sample data.
    b) Then it fires a query which returns results correctly. This runs
    in an infinite loop
  3. stop running ./automate.sh
  4. run ./break-it.sh. This will keep running the query above from 2b
    in an infinite loop. Keep it running
  5. stop and start ES at localhost:9200
  6. You should see the issues listed above. If not repeat step 5.

Let me know if you need anything else.

Thanks
Diptamay

Note: Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs


(Diptamay) #4

Ideally, in the scenario described above, point 2 should not happen
even when queries were fired while the cluster was coming up.

-Diptamay

On Nov 10, 10:32 pm, diptamay dipta...@gmail.com wrote:

How would this help?
client().admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus ().execute().actionGet();

-Diptamay

On Nov 10, 10:12 pm, Ryan Crumley crum...@gmail.com wrote:

Try the cluster health api:

client().admin().cluster().prepareHealth(INDEX_NAME
).setWaitForYellowStatus().execute().actionGet();

Ryan

On Wed, Nov 10, 2010 at 8:20 PM, diptamay dipta...@gmail.com wrote:

Hi Shay

Please checkout g...@github.com:diptamay/es-issue.git and have a look
at the issue listed below, running against the latest trunk build of
ES.

Scenario:

Most of the times when an ES server is stopped and restarted, and if
search queries are fired while the server is starting up, the
following issues are seen:

  1. {
    "error" : "ClusterBlockException[blocked by: [1/not recovered from
    gateway];[3/index not recovered];]"
    }
  2. Once the above error is encountered and the server has started up
    completely, the search queries which were executing successfully
    before the stoppage, are not returning results anymore.

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at configuration
    below)
  2. run ./automate.sh.
    a) This will create an es-test index and load the sample data.
    b) Then it fires a query which returns results correctly. This runs
    in an infinite loop
  3. stop running ./automate.sh
  4. run ./break-it.sh. This will keep running the query above from 2b
    in an infinite loop. Keep it running
  5. stop and start ES at localhost:9200
  6. You should see the issues listed above. If not repeat step 5.

Let me know if you need anything else.

Thanks
Diptamay

Note: Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs


(Ryan Crumley) #5

Sorry I should have explained in a little more detail. The cluster health
api will allow you to block until the index becomes available so you won't
get the "not recovered from gateway" errors. See the documentation for more
information.

Ryan

On Wed, Nov 10, 2010 at 9:39 PM, diptamay diptamay@gmail.com wrote:

Ideally, in the scenario described above, point 2 should not happen
even when queries were fired while the cluster was coming up.

-Diptamay

On Nov 10, 10:32 pm, diptamay dipta...@gmail.com wrote:

How would this help?

client().admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus
().execute().actionGet();

-Diptamay

On Nov 10, 10:12 pm, Ryan Crumley crum...@gmail.com wrote:

Try the cluster health api:

client().admin().cluster().prepareHealth(INDEX_NAME
).setWaitForYellowStatus().execute().actionGet();

Ryan

On Wed, Nov 10, 2010 at 8:20 PM, diptamay dipta...@gmail.com wrote:

Hi Shay

Please checkout g...@github.com:diptamay/es-issue.git and have a
look

at the issue listed below, running against the latest trunk build of
ES.

Scenario:

Most of the times when an ES server is stopped and restarted, and if
search queries are fired while the server is starting up, the
following issues are seen:

  1. {
    "error" : "ClusterBlockException[blocked by: [1/not recovered from
    gateway];[3/index not recovered];]"
    }
  2. Once the above error is encountered and the server has started up
    completely, the search queries which were executing successfully
    before the stoppage, are not returning results anymore.

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at configuration
    below)
  2. run ./automate.sh.
    a) This will create an es-test index and load the sample data.
    b) Then it fires a query which returns results correctly. This runs
    in an infinite loop
  3. stop running ./automate.sh
  4. run ./break-it.sh. This will keep running the query above from 2b
    in an infinite loop. Keep it running
  5. stop and start ES at localhost:9200
  6. You should see the issues listed above. If not repeat step 5.

Let me know if you need anything else.

Thanks
Diptamay

Note: Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs


(Diptamay) #6

Thanks but as I mentioned earlier, point 2 of the scenario should not
happen even when queries were fired while the cluster was coming up
i.e. the search results should return once the cluster is backup and
not stop returning results. The exceptions are understandable while
the cluster is coming up, though.

When users are hitting search from a website (say an html page making
ajax calls to ES through the REST api), the blocking concept to check
the state of a cluster is not ideal. What I am simulating using my
test case is an actual user hitting the ES servers through the REST
api, who is not aware of any concepts of a cluster.

-Diptamay

On Nov 10, 11:33 pm, Ryan Crumley crum...@gmail.com wrote:

Sorry I should have explained in a little more detail. The cluster health
api will allow you to block until the index becomes available so you won't
get the "not recovered from gateway" errors. See the documentation for more
information.

Ryan

On Wed, Nov 10, 2010 at 9:39 PM, diptamay dipta...@gmail.com wrote:

Ideally, in the scenario described above, point 2 should not happen
even when queries were fired while the cluster was coming up.

-Diptamay

On Nov 10, 10:32 pm, diptamay dipta...@gmail.com wrote:

How would this help?

client().admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus
().execute().actionGet();

-Diptamay

On Nov 10, 10:12 pm, Ryan Crumley crum...@gmail.com wrote:

Try the cluster health api:

client().admin().cluster().prepareHealth(INDEX_NAME
).setWaitForYellowStatus().execute().actionGet();

Ryan

On Wed, Nov 10, 2010 at 8:20 PM, diptamay dipta...@gmail.com wrote:

Hi Shay

Please checkout g...@github.com:diptamay/es-issue.git and have a
look

at the issue listed below, running against the latest trunk build of
ES.

Scenario:

Most of the times when an ES server is stopped and restarted, and if
search queries are fired while the server is starting up, the
following issues are seen:

  1. {
    "error" : "ClusterBlockException[blocked by: [1/not recovered from
    gateway];[3/index not recovered];]"
    }
  2. Once the above error is encountered and the server has started up
    completely, the search queries which were executing successfully
    before the stoppage, are not returning results anymore.

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at configuration
    below)
  2. run ./automate.sh.
    a) This will create an es-test index and load the sample data.
    b) Then it fires a query which returns results correctly. This runs
    in an infinite loop
  3. stop running ./automate.sh
  4. run ./break-it.sh. This will keep running the query above from 2b
    in an infinite loop. Keep it running
  5. stop and start ES at localhost:9200
  6. You should see the issues listed above. If not repeat step 5.

Let me know if you need anything else.

Thanks
Diptamay

Note: Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs


(ppearcy) #7

Hey,
Give latest master a shot. I was having similar issues as discussed
here:
http://elasticsearch-users.115913.n3.nabble.com/Recovery-issues-on-master-tp1865254p1872570.html;cid=1289464614068-787

If you're on latest, must be a different issue.

Thanks

On Nov 10, 10:27 pm, diptamay dipta...@gmail.com wrote:

Thanks but as I mentioned earlier, point 2 of the scenario should not
happen even when queries were fired while the cluster was coming up
i.e. the search results should return once the cluster is backup and
not stop returning results. The exceptions are understandable while
the cluster is coming up, though.

When users are hitting search from a website (say an html page making
ajax calls to ES through the REST api), the blocking concept to check
the state of a cluster is not ideal. What I am simulating using my
test case is an actual user hitting the ES servers through the REST
api, who is not aware of any concepts of a cluster.

-Diptamay

On Nov 10, 11:33 pm, Ryan Crumley crum...@gmail.com wrote:

Sorry I should have explained in a little more detail. The cluster health
api will allow you to block until the index becomes available so you won't
get the "not recovered from gateway" errors. See the documentation for more
information.

Ryan

On Wed, Nov 10, 2010 at 9:39 PM, diptamay dipta...@gmail.com wrote:

Ideally, in the scenario described above, point 2 should not happen
even when queries were fired while the cluster was coming up.

-Diptamay

On Nov 10, 10:32 pm, diptamay dipta...@gmail.com wrote:

How would this help?

client().admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus
().execute().actionGet();

-Diptamay

On Nov 10, 10:12 pm, Ryan Crumley crum...@gmail.com wrote:

Try the cluster health api:

client().admin().cluster().prepareHealth(INDEX_NAME
).setWaitForYellowStatus().execute().actionGet();

Ryan

On Wed, Nov 10, 2010 at 8:20 PM, diptamay dipta...@gmail.com wrote:

Hi Shay

Please checkout g...@github.com:diptamay/es-issue.git and have a
look

at the issue listed below, running against the latest trunk build of
ES.

Scenario:

Most of the times when an ES server is stopped and restarted, and if
search queries are fired while the server is starting up, the
following issues are seen:

  1. {
    "error" : "ClusterBlockException[blocked by: [1/not recovered from
    gateway];[3/index not recovered];]"
    }
  2. Once the above error is encountered and the server has started up
    completely, the search queries which were executing successfully
    before the stoppage, are not returning results anymore.

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at configuration
    below)
  2. run ./automate.sh.
    a) This will create an es-test index and load the sample data.
    b) Then it fires a query which returns results correctly. This runs
    in an infinite loop
  3. stop running ./automate.sh
  4. run ./break-it.sh. This will keep running the query above from 2b
    in an infinite loop. Keep it running
  5. stop and start ES at localhost:9200
  6. You should see the issues listed above. If not repeat step 5.

Let me know if you need anything else.

Thanks
Diptamay

Note: Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs


(Shay Banon) #8

Hi,

Regarding the recovery block exception, if you do a full shutdown and
start the cluster, clients will get failures while the cluster is down.
Until the cluster metadata is recovered, the cluster continues to be "down"
(in a blocked state), it is just in its initialization phase. Then, each
index is recovered from the gateway, until it is recovered to be able to
properly answer queries, it is blocked as well, but on the index level now
(so once an index is recovered, that index no loger blocked).

I wil run the test to see if I can verify the queries not returning
results post restart. Which version are you running on?

Two things regarding the configuration:

  1. Why do you set index.memory.enabled? It does not control anything, where
    did you find it?
  2. This is a single node with file system based gateway, any reason you are
    not using the local gateway?

-shay.banon

On Thu, Nov 11, 2010 at 10:40 AM, Paul ppearcy@gmail.com wrote:

Hey,
Give latest master a shot. I was having similar issues as discussed
here:

http://elasticsearch-users.115913.n3.nabble.com/Recovery-issues-on-master-tp1865254p1872570.html;cid=1289464614068-787

If you're on latest, must be a different issue.

Thanks

On Nov 10, 10:27 pm, diptamay dipta...@gmail.com wrote:

Thanks but as I mentioned earlier, point 2 of the scenario should not
happen even when queries were fired while the cluster was coming up
i.e. the search results should return once the cluster is backup and
not stop returning results. The exceptions are understandable while
the cluster is coming up, though.

When users are hitting search from a website (say an html page making
ajax calls to ES through the REST api), the blocking concept to check
the state of a cluster is not ideal. What I am simulating using my
test case is an actual user hitting the ES servers through the REST
api, who is not aware of any concepts of a cluster.

-Diptamay

On Nov 10, 11:33 pm, Ryan Crumley crum...@gmail.com wrote:

Sorry I should have explained in a little more detail. The cluster
health

api will allow you to block until the index becomes available so you
won't

get the "not recovered from gateway" errors. See the documentation for
more

information.

Ryan

On Wed, Nov 10, 2010 at 9:39 PM, diptamay dipta...@gmail.com wrote:

Ideally, in the scenario described above, point 2 should not happen
even when queries were fired while the cluster was coming up.

-Diptamay

On Nov 10, 10:32 pm, diptamay dipta...@gmail.com wrote:

How would this help?

client().admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus

().execute().actionGet();

-Diptamay

On Nov 10, 10:12 pm, Ryan Crumley crum...@gmail.com wrote:

Try the cluster health api:

client().admin().cluster().prepareHealth(INDEX_NAME
).setWaitForYellowStatus().execute().actionGet();

Ryan

On Wed, Nov 10, 2010 at 8:20 PM, diptamay dipta...@gmail.com
wrote:

Hi Shay

Please checkout g...@github.com:diptamay/es-issue.git and have
a

look

at the issue listed below, running against the latest trunk
build of

ES.

Scenario:

Most of the times when an ES server is stopped and restarted,
and if

search queries are fired while the server is starting up, the
following issues are seen:

  1. {
    "error" : "ClusterBlockException[blocked by: [1/not recovered
    from

gateway];[3/index not recovered];]"
}
2) Once the above error is encountered and the server has
started up

completely, the search queries which were executing
successfully

before the stoppage, are not returning results anymore.

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at
    configuration

below)
2) run ./automate.sh.
a) This will create an es-test index and load the sample
data.

b) Then it fires a query which returns results correctly.
This runs

in an infinite loop
3) stop running ./automate.sh
4) run ./break-it.sh. This will keep running the query above
from 2b

in an infinite loop. Keep it running
5) stop and start ES at localhost:9200
6) You should see the issues listed above. If not repeat step

Let me know if you need anything else.

Thanks
Diptamay

Note: Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs


(Diptamay) #9

Sure, although I had the latest checkout till yesterday. Will update
again today and check it out.

-Diptamay

On Nov 11, 3:40 am, Paul ppea...@gmail.com wrote:

Hey,
Give latest master a shot. I was having similar issues as discussed
here:http://elasticsearch-users.115913.n3.nabble.com/Recovery-issues-on-ma...

If you're on latest, must be a different issue.

Thanks

On Nov 10, 10:27 pm, diptamay dipta...@gmail.com wrote:

Thanks but as I mentioned earlier, point 2 of the scenario should not
happen even when queries were fired while the cluster was coming up
i.e. the search results should return once the cluster is backup and
not stop returning results. The exceptions are understandable while
the cluster is coming up, though.

When users are hitting search from a website (say an html page making
ajax calls to ES through the REST api), the blocking concept to check
the state of a cluster is not ideal. What I am simulating using my
test case is an actual user hitting the ES servers through the REST
api, who is not aware of any concepts of a cluster.

-Diptamay

On Nov 10, 11:33 pm, Ryan Crumley crum...@gmail.com wrote:

Sorry I should have explained in a little more detail. The cluster health
api will allow you to block until the index becomes available so you won't
get the "not recovered from gateway" errors. See the documentation for more
information.

Ryan

On Wed, Nov 10, 2010 at 9:39 PM, diptamay dipta...@gmail.com wrote:

Ideally, in the scenario described above, point 2 should not happen
even when queries were fired while the cluster was coming up.

-Diptamay

On Nov 10, 10:32 pm, diptamay dipta...@gmail.com wrote:

How would this help?

client().admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus
().execute().actionGet();

-Diptamay

On Nov 10, 10:12 pm, Ryan Crumley crum...@gmail.com wrote:

Try the cluster health api:

client().admin().cluster().prepareHealth(INDEX_NAME
).setWaitForYellowStatus().execute().actionGet();

Ryan

On Wed, Nov 10, 2010 at 8:20 PM, diptamay dipta...@gmail.com wrote:

Hi Shay

Please checkout g...@github.com:diptamay/es-issue.git and have a
look

at the issue listed below, running against the latest trunk build of
ES.

Scenario:

Most of the times when an ES server is stopped and restarted, and if
search queries are fired while the server is starting up, the
following issues are seen:

  1. {
    "error" : "ClusterBlockException[blocked by: [1/not recovered from
    gateway];[3/index not recovered];]"
    }
  2. Once the above error is encountered and the server has started up
    completely, the search queries which were executing successfully
    before the stoppage, are not returning results anymore.

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at configuration
    below)
  2. run ./automate.sh.
    a) This will create an es-test index and load the sample data.
    b) Then it fires a query which returns results correctly. This runs
    in an infinite loop
  3. stop running ./automate.sh
  4. run ./break-it.sh. This will keep running the query above from 2b
    in an infinite loop. Keep it running
  5. stop and start ES at localhost:9200
  6. You should see the issues listed above. If not repeat step 5.

Let me know if you need anything else.

Thanks
Diptamay

Note: Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs


(Diptamay) #10

Hi Shay

I am running against a trunk build, yesterday's version. I will do an
update today and see. I understand the exceptions, was more concerned
about the search queries not returning.

Regarding the configuration:

  1. the index.memory.enabled.... I remember it existing ever since I
    started looking at ES 0.8. I remember something like shard data gets
    cached in memory for faster performance, and then I think you changed
    that to per node memory caching. Isn't it so? If not what setting do
    I use to have in memory caching for faster performance and if so how
    do I also control the memory size for it, like 20 mb or 100 mb etc
    etc?
  2. I am using the fs gateway since we would be using the same in our
    QA and PROD environments for centralized nfs backups.

Cheers!
Diptamay

On Nov 11, 5:02 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

Regarding the recovery block exception, if you do a full shutdown and
start the cluster, clients will get failures while the cluster is down.
Until the cluster metadata is recovered, the cluster continues to be "down"
(in a blocked state), it is just in its initialization phase. Then, each
index is recovered from the gateway, until it is recovered to be able to
properly answer queries, it is blocked as well, but on the index level now
(so once an index is recovered, that index no loger blocked).

I wil run the test to see if I can verify the queries not returning
results post restart. Which version are you running on?

Two things regarding the configuration:

  1. Why do you set index.memory.enabled? It does not control anything, where
    did you find it?
  2. This is a single node with file system based gateway, any reason you are
    not using the local gateway?

-shay.banon

On Thu, Nov 11, 2010 at 10:40 AM, Paul ppea...@gmail.com wrote:

Hey,
Give latest master a shot. I was having similar issues as discussed
here:

http://elasticsearch-users.115913.n3.nabble.com/Recovery-issues-on-ma...

If you're on latest, must be a different issue.

Thanks

On Nov 10, 10:27 pm, diptamay dipta...@gmail.com wrote:

Thanks but as I mentioned earlier, point 2 of the scenario should not
happen even when queries were fired while the cluster was coming up
i.e. the search results should return once the cluster is backup and
not stop returning results. The exceptions are understandable while
the cluster is coming up, though.

When users are hitting search from a website (say an html page making
ajax calls to ES through the REST api), the blocking concept to check
the state of a cluster is not ideal. What I am simulating using my
test case is an actual user hitting the ES servers through the REST
api, who is not aware of any concepts of a cluster.

-Diptamay

On Nov 10, 11:33 pm, Ryan Crumley crum...@gmail.com wrote:

Sorry I should have explained in a little more detail. The cluster
health

api will allow you to block until the index becomes available so you
won't

get the "not recovered from gateway" errors. See the documentation for
more

information.

Ryan

On Wed, Nov 10, 2010 at 9:39 PM, diptamay dipta...@gmail.com wrote:

Ideally, in the scenario described above, point 2 should not happen
even when queries were fired while the cluster was coming up.

-Diptamay

On Nov 10, 10:32 pm, diptamay dipta...@gmail.com wrote:

How would this help?

client().admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus

().execute().actionGet();

-Diptamay

On Nov 10, 10:12 pm, Ryan Crumley crum...@gmail.com wrote:

Try the cluster health api:

client().admin().cluster().prepareHealth(INDEX_NAME
).setWaitForYellowStatus().execute().actionGet();

Ryan

On Wed, Nov 10, 2010 at 8:20 PM, diptamay dipta...@gmail.com
wrote:

Hi Shay

Please checkout g...@github.com:diptamay/es-issue.git and have
a

look

at the issue listed below, running against the latest trunk
build of

ES.

Scenario:

Most of the times when an ES server is stopped and restarted,
and if

search queries are fired while the server is starting up, the
following issues are seen:

  1. {
    "error" : "ClusterBlockException[blocked by: [1/not recovered
    from

gateway];[3/index not recovered];]"
}
2) Once the above error is encountered and the server has
started up

completely, the search queries which were executing
successfully

before the stoppage, are not returning results anymore.

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at
    configuration

below)
2) run ./automate.sh.
a) This will create an es-test index and load the sample
data.

b) Then it fires a query which returns results correctly.
This runs

in an infinite loop
3) stop running ./automate.sh
4) run ./break-it.sh. This will keep running the query above
from 2b

in an infinite loop. Keep it running
5) stop and start ES at localhost:9200
6) You should see the issues listed above. If not repeat step

Let me know if you need anything else.

Thanks
Diptamay

Note: Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs


(Shay Banon) #11

Hey,

Well, that problem was one of the more interesting ones to track down... .
It was introduced in master, so first of all, big thanks for you (and Paul)
for spending the time and helping flush out any potential problems.

The bug was a sneaky one. At first glance you would think that data gets
lost, but its not, its really there, and other queries you do (like match
all) do return the relevant data. The problem was actually with the specific
field you were searching on: appAccountIds. By mistake, when parsing a
serialized mapping form (for example, during recovery), it was getting
underscore cased, so it was renamed to app_account_ids. Now, because this
field is a numeric type, a specific query needs to be built in order to do
matching on it, but, because it was renamed, and there was no mapping
for appAccountIds, it was using the regular text based query, which failed
to find hits.

I have just pushed a fix to this in master.

Regarding the memory aspect, you can set it using
index.store.fs.memory.enabled to true, but I suggest against it. I don't
think you will see much better performance out of it (especially thanks to
file system caching). And, if you use it, you won't be able to move to local
gateway (as you should not used it with local gateway) in the future. It
does make sense in very advance cases, where one would want to load the term
info in memory, or other lucene file constructs, but really, things should
be pretty fast without it.

-shay.banon

On Thu, Nov 11, 2010 at 3:09 PM, diptamay diptamay@gmail.com wrote:

Hi Shay

I am running against a trunk build, yesterday's version. I will do an
update today and see. I understand the exceptions, was more concerned
about the search queries not returning.

Regarding the configuration:

  1. the index.memory.enabled.... I remember it existing ever since I
    started looking at ES 0.8. I remember something like shard data gets
    cached in memory for faster performance, and then I think you changed
    that to per node memory caching. Isn't it so? If not what setting do
    I use to have in memory caching for faster performance and if so how
    do I also control the memory size for it, like 20 mb or 100 mb etc
    etc?
  2. I am using the fs gateway since we would be using the same in our
    QA and PROD environments for centralized nfs backups.

Cheers!
Diptamay

On Nov 11, 5:02 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

Regarding the recovery block exception, if you do a full shutdown and
start the cluster, clients will get failures while the cluster is down.
Until the cluster metadata is recovered, the cluster continues to be
"down"
(in a blocked state), it is just in its initialization phase. Then, each
index is recovered from the gateway, until it is recovered to be able to
properly answer queries, it is blocked as well, but on the index level
now
(so once an index is recovered, that index no loger blocked).

I wil run the test to see if I can verify the queries not returning
results post restart. Which version are you running on?

Two things regarding the configuration:

  1. Why do you set index.memory.enabled? It does not control anything,
    where
    did you find it?
  2. This is a single node with file system based gateway, any reason you
    are
    not using the local gateway?

-shay.banon

On Thu, Nov 11, 2010 at 10:40 AM, Paul ppea...@gmail.com wrote:

Hey,
Give latest master a shot. I was having similar issues as discussed
here:

http://elasticsearch-users.115913.n3.nabble.com/Recovery-issues-on-ma.
..

If you're on latest, must be a different issue.

Thanks

On Nov 10, 10:27 pm, diptamay dipta...@gmail.com wrote:

Thanks but as I mentioned earlier, point 2 of the scenario should not
happen even when queries were fired while the cluster was coming up
i.e. the search results should return once the cluster is backup and
not stop returning results. The exceptions are understandable while
the cluster is coming up, though.

When users are hitting search from a website (say an html page making
ajax calls to ES through the REST api), the blocking concept to check
the state of a cluster is not ideal. What I am simulating using my
test case is an actual user hitting the ES servers through the REST
api, who is not aware of any concepts of a cluster.

-Diptamay

On Nov 10, 11:33 pm, Ryan Crumley crum...@gmail.com wrote:

Sorry I should have explained in a little more detail. The cluster
health

api will allow you to block until the index becomes available so
you

won't

get the "not recovered from gateway" errors. See the documentation
for

more

information.

Ryan

On Wed, Nov 10, 2010 at 9:39 PM, diptamay dipta...@gmail.com
wrote:

Ideally, in the scenario described above, point 2 should not
happen

even when queries were fired while the cluster was coming up.

-Diptamay

On Nov 10, 10:32 pm, diptamay dipta...@gmail.com wrote:

How would this help?

client().admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus

().execute().actionGet();

-Diptamay

On Nov 10, 10:12 pm, Ryan Crumley crum...@gmail.com wrote:

Try the cluster health api:

client().admin().cluster().prepareHealth(INDEX_NAME
).setWaitForYellowStatus().execute().actionGet();

Ryan

On Wed, Nov 10, 2010 at 8:20 PM, diptamay <
dipta...@gmail.com>

wrote:

Hi Shay

Please checkout g...@github.com:diptamay/es-issue.git and
have

a

look

at the issue listed below, running against the latest trunk
build of

ES.

Scenario:

Most of the times when an ES server is stopped and
restarted,

and if

search queries are fired while the server is starting up,
the

following issues are seen:

  1. {
    "error" : "ClusterBlockException[blocked by: [1/not
    recovered

from

gateway];[3/index not recovered];]"
}
2) Once the above error is encountered and the server has
started up

completely, the search queries which were executing
successfully

before the stoppage, are not returning results anymore.

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at
    configuration

below)
2) run ./automate.sh.
a) This will create an es-test index and load the sample
data.

b) Then it fires a query which returns results correctly.
This runs

in an infinite loop
3) stop running ./automate.sh
4) run ./break-it.sh. This will keep running the query
above

from 2b

in an infinite loop. Keep it running
5) stop and start ES at localhost:9200
6) You should see the issues listed above. If not repeat
step

Let me know if you need anything else.

Thanks
Diptamay

Note: Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs


(Diptamay) #12

Awesome! I am glad that I could be of help. And thanks for the
configuration suggestions.

I just took the latest and gave it a spin and the issue is fixed.

Any idea by when we can expect a final 0.13 release?

-Diptamay

On Nov 11, 8:45 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

Well, that problem was one of the more interesting ones to track down... .
It was introduced in master, so first of all, big thanks for you (and Paul)
for spending the time and helping flush out any potential problems.

The bug was a sneaky one. At first glance you would think that data gets
lost, but its not, its really there, and other queries you do (like match
all) do return the relevant data. The problem was actually with the specific
field you were searching on: appAccountIds. By mistake, when parsing a
serialized mapping form (for example, during recovery), it was getting
underscore cased, so it was renamed to app_account_ids. Now, because this
field is a numeric type, a specific query needs to be built in order to do
matching on it, but, because it was renamed, and there was no mapping
for appAccountIds, it was using the regular text based query, which failed
to find hits.

I have just pushed a fix to this in master.

Regarding the memory aspect, you can set it using
index.store.fs.memory.enabled to true, but I suggest against it. I don't
think you will see much better performance out of it (especially thanks to
file system caching). And, if you use it, you won't be able to move to local
gateway (as you should not used it with local gateway) in the future. It
does make sense in very advance cases, where one would want to load the term
info in memory, or other lucene file constructs, but really, things should
be pretty fast without it.

-shay.banon

On Thu, Nov 11, 2010 at 3:09 PM, diptamay dipta...@gmail.com wrote:

Hi Shay

I am running against a trunk build, yesterday's version. I will do an
update today and see. I understand the exceptions, was more concerned
about the search queries not returning.

Regarding the configuration:

  1. the index.memory.enabled.... I remember it existing ever since I
    started looking at ES 0.8. I remember something like shard data gets
    cached in memory for faster performance, and then I think you changed
    that to per node memory caching. Isn't it so? If not what setting do
    I use to have in memory caching for faster performance and if so how
    do I also control the memory size for it, like 20 mb or 100 mb etc
    etc?
  2. I am using the fs gateway since we would be using the same in our
    QA and PROD environments for centralized nfs backups.

Cheers!
Diptamay

On Nov 11, 5:02 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

Regarding the recovery block exception, if you do a full shutdown and
start the cluster, clients will get failures while the cluster is down.
Until the cluster metadata is recovered, the cluster continues to be
"down"
(in a blocked state), it is just in its initialization phase. Then, each
index is recovered from the gateway, until it is recovered to be able to
properly answer queries, it is blocked as well, but on the index level
now
(so once an index is recovered, that index no loger blocked).

I wil run the test to see if I can verify the queries not returning
results post restart. Which version are you running on?

Two things regarding the configuration:

  1. Why do you set index.memory.enabled? It does not control anything,
    where
    did you find it?
  2. This is a single node with file system based gateway, any reason you
    are
    not using the local gateway?

-shay.banon

On Thu, Nov 11, 2010 at 10:40 AM, Paul ppea...@gmail.com wrote:

Hey,
Give latest master a shot. I was having similar issues as discussed
here:

http://elasticsearch-users.115913.n3.nabble.com/Recovery-issues-on-ma.
..

If you're on latest, must be a different issue.

Thanks

On Nov 10, 10:27 pm, diptamay dipta...@gmail.com wrote:

Thanks but as I mentioned earlier, point 2 of the scenario should not
happen even when queries were fired while the cluster was coming up
i.e. the search results should return once the cluster is backup and
not stop returning results. The exceptions are understandable while
the cluster is coming up, though.

When users are hitting search from a website (say an html page making
ajax calls to ES through the REST api), the blocking concept to check
the state of a cluster is not ideal. What I am simulating using my
test case is an actual user hitting the ES servers through the REST
api, who is not aware of any concepts of a cluster.

-Diptamay

On Nov 10, 11:33 pm, Ryan Crumley crum...@gmail.com wrote:

Sorry I should have explained in a little more detail. The cluster
health

api will allow you to block until the index becomes available so
you

won't

get the "not recovered from gateway" errors. See the documentation
for

more

information.

Ryan

On Wed, Nov 10, 2010 at 9:39 PM, diptamay dipta...@gmail.com
wrote:

Ideally, in the scenario described above, point 2 should not
happen

even when queries were fired while the cluster was coming up.

-Diptamay

On Nov 10, 10:32 pm, diptamay dipta...@gmail.com wrote:

How would this help?

client().admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus

().execute().actionGet();

-Diptamay

On Nov 10, 10:12 pm, Ryan Crumley crum...@gmail.com wrote:

Try the cluster health api:

client().admin().cluster().prepareHealth(INDEX_NAME
).setWaitForYellowStatus().execute().actionGet();

Ryan

On Wed, Nov 10, 2010 at 8:20 PM, diptamay <
dipta...@gmail.com>

wrote:

Hi Shay

Please checkout g...@github.com:diptamay/es-issue.git and
have

a

look

at the issue listed below, running against the latest trunk
build of

ES.

Scenario:

Most of the times when an ES server is stopped and
restarted,

and if

search queries are fired while the server is starting up,
the

following issues are seen:

  1. {
    "error" : "ClusterBlockException[blocked by: [1/not
    recovered

from

gateway];[3/index not recovered];]"
}
2) Once the above error is encountered and the server has
started up

completely, the search queries which were executing
successfully

before the stoppage, are not returning results anymore.

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at
    configuration

below)
2) run ./automate.sh.
a) This will create an es-test index and load the sample
data.

b) Then it fires a query which returns results correctly.
This runs

in an infinite loop
3) stop running ./automate.sh
4) run ./break-it.sh. This will keep running the query
above

from 2b

in an infinite loop. Keep it running
5) stop and start ES at localhost:9200
6) You should see the issues listed above. If not repeat
step

Let me know if you need anything else.

Thanks
Diptamay

Note: Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs


(Shay Banon) #13

It should be released sometime next week.

On Nov 11, 2010, at 5:27 PM, diptamay diptamay@gmail.com wrote:

Awesome! I am glad that I could be of help. And thanks for the
configuration suggestions.

I just took the latest and gave it a spin and the issue is fixed.

Any idea by when we can expect a final 0.13 release?

-Diptamay

On Nov 11, 8:45 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

Well, that problem was one of the more interesting ones to track down... .
It was introduced in master, so first of all, big thanks for you (and Paul)
for spending the time and helping flush out any potential problems.

The bug was a sneaky one. At first glance you would think that data gets
lost, but its not, its really there, and other queries you do (like match
all) do return the relevant data. The problem was actually with the specific
field you were searching on: appAccountIds. By mistake, when parsing a
serialized mapping form (for example, during recovery), it was getting
underscore cased, so it was renamed to app_account_ids. Now, because this
field is a numeric type, a specific query needs to be built in order to do
matching on it, but, because it was renamed, and there was no mapping
for appAccountIds, it was using the regular text based query, which failed
to find hits.

I have just pushed a fix to this in master.

Regarding the memory aspect, you can set it using
index.store.fs.memory.enabled to true, but I suggest against it. I don't
think you will see much better performance out of it (especially thanks to
file system caching). And, if you use it, you won't be able to move to local
gateway (as you should not used it with local gateway) in the future. It
does make sense in very advance cases, where one would want to load the term
info in memory, or other lucene file constructs, but really, things should
be pretty fast without it.

-shay.banon

On Thu, Nov 11, 2010 at 3:09 PM, diptamay dipta...@gmail.com wrote:

Hi Shay

I am running against a trunk build, yesterday's version. I will do an
update today and see. I understand the exceptions, was more concerned
about the search queries not returning.

Regarding the configuration:

  1. the index.memory.enabled.... I remember it existing ever since I
    started looking at ES 0.8. I remember something like shard data gets
    cached in memory for faster performance, and then I think you changed
    that to per node memory caching. Isn't it so? If not what setting do
    I use to have in memory caching for faster performance and if so how
    do I also control the memory size for it, like 20 mb or 100 mb etc
    etc?
  2. I am using the fs gateway since we would be using the same in our
    QA and PROD environments for centralized nfs backups.

Cheers!
Diptamay

On Nov 11, 5:02 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

Regarding the recovery block exception, if you do a full shutdown and
start the cluster, clients will get failures while the cluster is down.
Until the cluster metadata is recovered, the cluster continues to be
"down"
(in a blocked state), it is just in its initialization phase. Then, each
index is recovered from the gateway, until it is recovered to be able to
properly answer queries, it is blocked as well, but on the index level
now
(so once an index is recovered, that index no loger blocked).

I wil run the test to see if I can verify the queries not returning
results post restart. Which version are you running on?

Two things regarding the configuration:

  1. Why do you set index.memory.enabled? It does not control anything,
    where
    did you find it?
  2. This is a single node with file system based gateway, any reason you
    are
    not using the local gateway?

-shay.banon

On Thu, Nov 11, 2010 at 10:40 AM, Paul ppea...@gmail.com wrote:

Hey,
Give latest master a shot. I was having similar issues as discussed
here:

http://elasticsearch-users.115913.n3.nabble.com/Recovery-issues-on-ma.
..

If you're on latest, must be a different issue.

Thanks

On Nov 10, 10:27 pm, diptamay dipta...@gmail.com wrote:

Thanks but as I mentioned earlier, point 2 of the scenario should not
happen even when queries were fired while the cluster was coming up
i.e. the search results should return once the cluster is backup and
not stop returning results. The exceptions are understandable while
the cluster is coming up, though.

When users are hitting search from a website (say an html page making
ajax calls to ES through the REST api), the blocking concept to check
the state of a cluster is not ideal. What I am simulating using my
test case is an actual user hitting the ES servers through the REST
api, who is not aware of any concepts of a cluster.

-Diptamay

On Nov 10, 11:33 pm, Ryan Crumley crum...@gmail.com wrote:

Sorry I should have explained in a little more detail. The cluster
health

api will allow you to block until the index becomes available so
you

won't

get the "not recovered from gateway" errors. See the documentation
for

more

information.

Ryan

On Wed, Nov 10, 2010 at 9:39 PM, diptamay dipta...@gmail.com
wrote:

Ideally, in the scenario described above, point 2 should not
happen

even when queries were fired while the cluster was coming up.

-Diptamay

On Nov 10, 10:32 pm, diptamay dipta...@gmail.com wrote:

How would this help?

client().admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus

().execute().actionGet();

-Diptamay

On Nov 10, 10:12 pm, Ryan Crumley crum...@gmail.com wrote:

Try the cluster health api:

client().admin().cluster().prepareHealth(INDEX_NAME
).setWaitForYellowStatus().execute().actionGet();

Ryan

On Wed, Nov 10, 2010 at 8:20 PM, diptamay <
dipta...@gmail.com>

wrote:

Hi Shay

Please checkout g...@github.com:diptamay/es-issue.git and
have

a

look

at the issue listed below, running against the latest trunk
build of

ES.

Scenario:

Most of the times when an ES server is stopped and
restarted,

and if

search queries are fired while the server is starting up,
the

following issues are seen:

  1. {
    "error" : "ClusterBlockException[blocked by: [1/not
    recovered

from

gateway];[3/index not recovered];]"
}
2) Once the above error is encountered and the server has
started up

completely, the search queries which were executing
successfully

before the stoppage, are not returning results anymore.

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at
    configuration

below)
2) run ./automate.sh.
a) This will create an es-test index and load the sample
data.

b) Then it fires a query which returns results correctly.
This runs

in an infinite loop
3) stop running ./automate.sh
4) run ./break-it.sh. This will keep running the query
above

from 2b

in an infinite loop. Keep it running
5) stop and start ES at localhost:9200
6) You should see the issues listed above. If not repeat
step

Let me know if you need anything else.

Thanks
Diptamay

Note: Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs


(ppearcy) #14

Glad to hear we'll see a 0.13 next week. We're looking forward to it.

Thanks

On Nov 11, 9:24 am, Shay Banon shay.ba...@elasticsearch.com wrote:

It should be released sometime next week.

On Nov 11, 2010, at 5:27 PM, diptamay dipta...@gmail.com wrote:

Awesome! I am glad that I could be of help. And thanks for the
configuration suggestions.

I just took the latest and gave it a spin and the issue is fixed.

Any idea by when we can expect a final 0.13 release?

-Diptamay

On Nov 11, 8:45 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

Well, that problem was one of the more interesting ones to track down... .
It was introduced in master, so first of all, big thanks for you (and Paul)
for spending the time and helping flush out any potential problems.

The bug was a sneaky one. At first glance you would think that data gets
lost, but its not, its really there, and other queries you do (like match
all) do return the relevant data. The problem was actually with the specific
field you were searching on: appAccountIds. By mistake, when parsing a
serialized mapping form (for example, during recovery), it was getting
underscore cased, so it was renamed to app_account_ids. Now, because this
field is a numeric type, a specific query needs to be built in order to do
matching on it, but, because it was renamed, and there was no mapping
for appAccountIds, it was using the regular text based query, which failed
to find hits.

I have just pushed a fix to this in master.

Regarding the memory aspect, you can set it using
index.store.fs.memory.enabled to true, but I suggest against it. I don't
think you will see much better performance out of it (especially thanks to
file system caching). And, if you use it, you won't be able to move to local
gateway (as you should not used it with local gateway) in the future. It
does make sense in very advance cases, where one would want to load the term
info in memory, or other lucene file constructs, but really, things should
be pretty fast without it.

-shay.banon

On Thu, Nov 11, 2010 at 3:09 PM, diptamay dipta...@gmail.com wrote:

Hi Shay

I am running against a trunk build, yesterday's version. I will do an
update today and see. I understand the exceptions, was more concerned
about the search queries not returning.

Regarding the configuration:

  1. the index.memory.enabled.... I remember it existing ever since I
    started looking at ES 0.8. I remember something like shard data gets
    cached in memory for faster performance, and then I think you changed
    that to per node memory caching. Isn't it so? If not what setting do
    I use to have in memory caching for faster performance and if so how
    do I also control the memory size for it, like 20 mb or 100 mb etc
    etc?
  2. I am using the fs gateway since we would be using the same in our
    QA and PROD environments for centralized nfs backups.

Cheers!
Diptamay

On Nov 11, 5:02 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

Regarding the recovery block exception, if you do a full shutdown and
start the cluster, clients will get failures while the cluster is down.
Until the cluster metadata is recovered, the cluster continues to be
"down"
(in a blocked state), it is just in its initialization phase. Then, each
index is recovered from the gateway, until it is recovered to be able to
properly answer queries, it is blocked as well, but on the index level
now
(so once an index is recovered, that index no loger blocked).

I wil run the test to see if I can verify the queries not returning
results post restart. Which version are you running on?

Two things regarding the configuration:

  1. Why do you set index.memory.enabled? It does not control anything,
    where
    did you find it?
  2. This is a single node with file system based gateway, any reason you
    are
    not using the local gateway?

-shay.banon

On Thu, Nov 11, 2010 at 10:40 AM, Paul ppea...@gmail.com wrote:

Hey,
Give latest master a shot. I was having similar issues as discussed
here:

http://elasticsearch-users.115913.n3.nabble.com/Recovery-issues-on-ma.
..

If you're on latest, must be a different issue.

Thanks

On Nov 10, 10:27 pm, diptamay dipta...@gmail.com wrote:

Thanks but as I mentioned earlier, point 2 of the scenario should not
happen even when queries were fired while the cluster was coming up
i.e. the search results should return once the cluster is backup and
not stop returning results. The exceptions are understandable while
the cluster is coming up, though.

When users are hitting search from a website (say an html page making
ajax calls to ES through the REST api), the blocking concept to check
the state of a cluster is not ideal. What I am simulating using my
test case is an actual user hitting the ES servers through the REST
api, who is not aware of any concepts of a cluster.

-Diptamay

On Nov 10, 11:33 pm, Ryan Crumley crum...@gmail.com wrote:

Sorry I should have explained in a little more detail. The cluster
health

api will allow you to block until the index becomes available so
you

won't

get the "not recovered from gateway" errors. See the documentation
for

more

information.

Ryan

On Wed, Nov 10, 2010 at 9:39 PM, diptamay dipta...@gmail.com
wrote:

Ideally, in the scenario described above, point 2 should not
happen

even when queries were fired while the cluster was coming up.

-Diptamay

On Nov 10, 10:32 pm, diptamay dipta...@gmail.com wrote:

How would this help?

client().admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus

().execute().actionGet();

-Diptamay

On Nov 10, 10:12 pm, Ryan Crumley crum...@gmail.com wrote:

Try the cluster health api:

client().admin().cluster().prepareHealth(INDEX_NAME
).setWaitForYellowStatus().execute().actionGet();

Ryan

On Wed, Nov 10, 2010 at 8:20 PM, diptamay <
dipta...@gmail.com>

wrote:

Hi Shay

Please checkout g...@github.com:diptamay/es-issue.git and
have

a

look

at the issue listed below, running against the latest trunk
build of

ES.

Scenario:

Most of the times when an ES server is stopped and
restarted,

and if

search queries are fired while the server is starting up,
the

following issues are seen:

  1. {
    "error" : "ClusterBlockException[blocked by: [1/not
    recovered

from

gateway];[3/index not recovered];]"
}
2) Once the above error is encountered and the server has
started up

completely, the search queries which were executing
successfully

before the stoppage, are not returning results anymore.

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at
    configuration

below)
2) run ./automate.sh.
a) This will create an es-test index and load the sample
data.

b) Then it fires a query which returns results correctly.
This runs

in an infinite loop
3) stop running ./automate.sh
4) run ./break-it.sh. This will keep running the query
above

from 2b

in an infinite loop. Keep it running
5) stop and start ES at localhost:9200
6) You should see the issues listed above. If not repeat
step

Let me know if you need anything else.

Thanks
Diptamay

Note: Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs


(system) #15