Hey,
Well, that problem was one of the more interesting ones to track down... .
It was introduced in master, so first of all, big thanks for you (and Paul)
for spending the time and helping flush out any potential problems.
The bug was a sneaky one. At first glance you would think that data gets
lost, but its not, its really there, and other queries you do (like match
all) do return the relevant data. The problem was actually with the specific
field you were searching on: appAccountIds. By mistake, when parsing a
serialized mapping form (for example, during recovery), it was getting
underscore cased, so it was renamed to app_account_ids. Now, because this
field is a numeric type, a specific query needs to be built in order to do
matching on it, but, because it was renamed, and there was no mapping
for appAccountIds, it was using the regular text based query, which failed
to find hits.
I have just pushed a fix to this in master.
Regarding the memory aspect, you can set it using
index.store.fs.memory.enabled to true, but I suggest against it. I don't
think you will see much better performance out of it (especially thanks to
file system caching). And, if you use it, you won't be able to move to local
gateway (as you should not used it with local gateway) in the future. It
does make sense in very advance cases, where one would want to load the term
info in memory, or other lucene file constructs, but really, things should
be pretty fast without it.
-shay.banon
On Thu, Nov 11, 2010 at 3:09 PM, diptamay diptamay@gmail.com wrote:
Hi Shay
I am running against a trunk build, yesterday's version. I will do an
update today and see. I understand the exceptions, was more concerned
about the search queries not returning.
Regarding the configuration:
- the index.memory.enabled.... I remember it existing ever since I
started looking at ES 0.8. I remember something like shard data gets
cached in memory for faster performance, and then I think you changed
that to per node memory caching. Isn't it so? If not what setting do
I use to have in memory caching for faster performance and if so how
do I also control the memory size for it, like 20 mb or 100 mb etc
etc?
- I am using the fs gateway since we would be using the same in our
QA and PROD environments for centralized nfs backups.
Cheers!
Diptamay
On Nov 11, 5:02 am, Shay Banon shay.ba...@elasticsearch.com wrote:
Hi,
Regarding the recovery block exception, if you do a full shutdown and
start the cluster, clients will get failures while the cluster is down.
Until the cluster metadata is recovered, the cluster continues to be
"down"
(in a blocked state), it is just in its initialization phase. Then, each
index is recovered from the gateway, until it is recovered to be able to
properly answer queries, it is blocked as well, but on the index level
now
(so once an index is recovered, that index no loger blocked).
I wil run the test to see if I can verify the queries not returning
results post restart. Which version are you running on?
Two things regarding the configuration:
- Why do you set index.memory.enabled? It does not control anything,
where
did you find it?
- This is a single node with file system based gateway, any reason you
are
not using the local gateway?
-shay.banon
On Thu, Nov 11, 2010 at 10:40 AM, Paul ppea...@gmail.com wrote:
Hey,
Give latest master a shot. I was having similar issues as discussed
here:
http://elasticsearch-users.115913.n3.nabble.com/Recovery-issues-on-ma.
..
If you're on latest, must be a different issue.
Thanks
On Nov 10, 10:27 pm, diptamay dipta...@gmail.com wrote:
Thanks but as I mentioned earlier, point 2 of the scenario should not
happen even when queries were fired while the cluster was coming up
i.e. the search results should return once the cluster is backup and
not stop returning results. The exceptions are understandable while
the cluster is coming up, though.
When users are hitting search from a website (say an html page making
ajax calls to ES through the REST api), the blocking concept to check
the state of a cluster is not ideal. What I am simulating using my
test case is an actual user hitting the ES servers through the REST
api, who is not aware of any concepts of a cluster.
-Diptamay
On Nov 10, 11:33 pm, Ryan Crumley crum...@gmail.com wrote:
Sorry I should have explained in a little more detail. The cluster
health
api will allow you to block until the index becomes available so
you
won't
get the "not recovered from gateway" errors. See the documentation
for
more
information.
Ryan
On Wed, Nov 10, 2010 at 9:39 PM, diptamay dipta...@gmail.com
wrote:
Ideally, in the scenario described above, point 2 should not
happen
even when queries were fired while the cluster was coming up.
-Diptamay
On Nov 10, 10:32 pm, diptamay dipta...@gmail.com wrote:
How would this help?
client().admin().cluster().prepareHealth(INDEX_NAME).setWaitForYellowStatus
().execute().actionGet();
-Diptamay
On Nov 10, 10:12 pm, Ryan Crumley crum...@gmail.com wrote:
Try the cluster health api:
client().admin().cluster().prepareHealth(INDEX_NAME
).setWaitForYellowStatus().execute().actionGet();
Ryan
On Wed, Nov 10, 2010 at 8:20 PM, diptamay <
dipta...@gmail.com>
wrote:
Hi Shay
Please checkout g...@github.com:diptamay/es-issue.git and
have
a
look
at the issue listed below, running against the latest trunk
build of
ES.
Scenario:
Most of the times when an ES server is stopped and
restarted,
and if
search queries are fired while the server is starting up,
the
following issues are seen:
- {
"error" : "ClusterBlockException[blocked by: [1/not
recovered
from
gateway];[3/index not recovered];]"
}
- Once the above error is encountered and the server has
started up
completely, the search queries which were executing
successfully
before the stoppage, are not returning results anymore.
Steps to setup and reproduce:
- Ensure ES is running at localhost:9200 (look at
configuration
below)
- run ./automate.sh.
a) This will create an es-test index and load the sample
data.
b) Then it fires a query which returns results correctly.
This runs
in an infinite loop
- stop running ./automate.sh
- run ./break-it.sh. This will keep running the query
above
from 2b
in an infinite loop. Keep it running
- stop and start ES at localhost:9200
- You should see the issues listed above. If not repeat
step
Let me know if you need anything else.
Thanks
Diptamay
Note: Configuration of ES:
cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs