Thanks for looking into it and your suggestions. Actually, the
mapping_path was not set deliberately. So I am pretty sure there is
something going on at ES level.
When seeding happens, the mapping is setup by automate.sh from the
path data/mappings. Please refer https://github.com/diptamay/es-issue/blob/master/data/mappings/audio.json
When reconfigure happens a different mapping file is loaded, from the
root folder es-issue. Please refer https://github.com/diptamay/es-issue/blob/master/audio.json
As you already saw, if we use the same mappings from data/mappings in
reconfigure, then the query works fine. However, if one uses a
different mapping like above, then the search is not working as
expected.
One might think that the updated mapping is not right. But I strongly
think that is not the case. Suppose, we use the new mapping for
initial seeding then you will see the query would work. This would
raise another interesting scenario though :). Then the search would
return only the audio and not the video, which I find pretty bizarre.
How do I hot swap indices using aliases? Hot swapping is a good idea,
but I have limited hardware resources (2 servers with limited ram) at
disposal in my QA environment at the moment. Indexing like a couple of
million docs with an uncompressed json size of 12 GB is already
driving one of the servers crazy, under normal load, where I have
allocated ES, 4 GB of heap. So have to do more an "in-place" deletion,
expunging and re-indexing. By the way, what all factors do I need to
consider while deciding on RAM requirements? I see with my current
data size, 6 GB of heap on one of the servers, doesn't exactly drive
the server crazy, under normal load. Yet to do load testing, so can't
say much.
Let me know if you need further info.
Thanks
Diptamay
On Nov 17, 4:34 pm, Shay Banon shay.ba...@elasticsearch.com wrote:
Your reconfigure script is not setting the mapping_path in your script
(nothing to do with elasticsearch). Change this in the script:
mappings_path="data/mappings/"
and it works.
Regarding your question, if you are going to reindex a big portion of the
data, its better to create a new index (no need for a new cluster) and index
the data into it. You can use aliases to hot swap using indices. This is
because documents are only marked as deleted, and that optimize request you
made might be heavy...
One more thing though, when you delete a mapping, and the relevant data is
also deleted, so no need for the first delete data request you make.
-shay.banon
On Wed, Nov 17, 2010 at 10:32 PM, diptamay dipta...@gmail.com wrote:
Hi
I see the following issue with the current trunk build of ES.
Scenario:
Re-Indexing results in invalid search results, when mappings are
changed in a live system and re-applied after all the documents and
the respective mappings were deleted. Code at
GitHub - diptamay/es-issue: Sample code to communicate issues with ES
Steps to setup and reproduce:
- Ensure ES is running at localhost:9200 (look at configuration
below)
- run ./automate.sh.
a) This will create an es-test index with the seed mappings and
load the sample data.
b) Then it fires a query which returns results correctly ie 2 audio
and 1 video
- Now run ./reconfigure.sh
a) This first deletes all the audio documents and the corresponding
audio mapping.
b) Then its refreshes the indices and does an expunge of the
deleted audio documents.
c) Then it puts the new mapping for audio and does a load of the
sample data.
d) Then it fires a query, same as step 2b above, which returns
results incorrectly now i.e 1 video is only returned.
Note:
- If I had used the new re-configured audio mapping at the time of
creation of the index, then there is no problem.
Configuration of ES:
cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs
Maybe I shouldn't be deleting and creating mappings in a live system
and instead create a new cluster up with the desired changes.
Thoughts? Is this expected behavior or a bug?
Thanks
Diptamay