Re-Indexing results in invalid search results, when mappings are changed in a live system and re-applied after documents and mappings were deleted


(Diptamay) #1

Hi

I see the following issue with the current trunk build of ES.

Scenario:

Re-Indexing results in invalid search results, when mappings are
changed in a live system and re-applied after all the documents and
the respective mappings were deleted. Code at https://github.com/diptamay/es-issue

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at configuration
    below)
  2. run ./automate.sh.
    a) This will create an es-test index with the seed mappings and
    load the sample data.
    b) Then it fires a query which returns results correctly ie 2 audio
    and 1 video
  3. Now run ./reconfigure.sh
    a) This first deletes all the audio documents and the corresponding
    audio mapping.
    b) Then its refreshes the indices and does an expunge of the
    deleted audio documents.
    c) Then it puts the new mapping for audio and does a load of the
    sample data.
    d) Then it fires a query, same as step 2b above, which returns
    results incorrectly now i.e 1 video is only returned.

Note:

  1. If I had used the new re-configured audio mapping at the time of
    creation of the index, then there is no problem.

Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs

Maybe I shouldn't be deleting and creating mappings in a live system
and instead create a new cluster up with the desired changes.
Thoughts? Is this expected behavior or a bug?

Thanks
Diptamay


(Shay Banon) #2

Your reconfigure script is not setting the mapping_path in your script
(nothing to do with elasticsearch). Change this in the script:

mappings_path="data/mappings/"

and it works.

Regarding your question, if you are going to reindex a big portion of the
data, its better to create a new index (no need for a new cluster) and index
the data into it. You can use aliases to hot swap using indices. This is
because documents are only marked as deleted, and that optimize request you
made might be heavy...

One more thing though, when you delete a mapping, and the relevant data is
also deleted, so no need for the first delete data request you make.

-shay.banon

On Wed, Nov 17, 2010 at 10:32 PM, diptamay diptamay@gmail.com wrote:

Hi

I see the following issue with the current trunk build of ES.

Scenario:

Re-Indexing results in invalid search results, when mappings are
changed in a live system and re-applied after all the documents and
the respective mappings were deleted. Code at
https://github.com/diptamay/es-issue

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at configuration
    below)
  2. run ./automate.sh.
    a) This will create an es-test index with the seed mappings and
    load the sample data.
    b) Then it fires a query which returns results correctly ie 2 audio
    and 1 video
  3. Now run ./reconfigure.sh
    a) This first deletes all the audio documents and the corresponding
    audio mapping.
    b) Then its refreshes the indices and does an expunge of the
    deleted audio documents.
    c) Then it puts the new mapping for audio and does a load of the
    sample data.
    d) Then it fires a query, same as step 2b above, which returns
    results incorrectly now i.e 1 video is only returned.

Note:

  1. If I had used the new re-configured audio mapping at the time of
    creation of the index, then there is no problem.

Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs

Maybe I shouldn't be deleting and creating mappings in a live system
and instead create a new cluster up with the desired changes.
Thoughts? Is this expected behavior or a bug?

Thanks
Diptamay


(Diptamay) #3

Thanks for looking into it and your suggestions. Actually, the
mapping_path was not set deliberately. So I am pretty sure there is
something going on at ES level.

When seeding happens, the mapping is setup by automate.sh from the
path data/mappings. Please refer https://github.com/diptamay/es-issue/blob/master/data/mappings/audio.json

When reconfigure happens a different mapping file is loaded, from the
root folder es-issue. Please refer https://github.com/diptamay/es-issue/blob/master/audio.json

As you already saw, if we use the same mappings from data/mappings in
reconfigure, then the query works fine. However, if one uses a
different mapping like above, then the search is not working as
expected.

One might think that the updated mapping is not right. But I strongly
think that is not the case. Suppose, we use the new mapping for
initial seeding then you will see the query would work. This would
raise another interesting scenario though :). Then the search would
return only the audio and not the video, which I find pretty bizarre.

How do I hot swap indices using aliases? Hot swapping is a good idea,
but I have limited hardware resources (2 servers with limited ram) at
disposal in my QA environment at the moment. Indexing like a couple of
million docs with an uncompressed json size of 12 GB is already
driving one of the servers crazy, under normal load, where I have
allocated ES, 4 GB of heap. So have to do more an "in-place" deletion,
expunging and re-indexing. By the way, what all factors do I need to
consider while deciding on RAM requirements? I see with my current
data size, 6 GB of heap on one of the servers, doesn't exactly drive
the server crazy, under normal load. Yet to do load testing, so can't
say much.

Let me know if you need further info.

Thanks
Diptamay

On Nov 17, 4:34 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Your reconfigure script is not setting the mapping_path in your script
(nothing to do with elasticsearch). Change this in the script:

mappings_path="data/mappings/"

and it works.

Regarding your question, if you are going to reindex a big portion of the
data, its better to create a new index (no need for a new cluster) and index
the data into it. You can use aliases to hot swap using indices. This is
because documents are only marked as deleted, and that optimize request you
made might be heavy...

One more thing though, when you delete a mapping, and the relevant data is
also deleted, so no need for the first delete data request you make.

-shay.banon

On Wed, Nov 17, 2010 at 10:32 PM, diptamay dipta...@gmail.com wrote:

Hi

I see the following issue with the current trunk build of ES.

Scenario:

Re-Indexing results in invalid search results, when mappings are
changed in a live system and re-applied after all the documents and
the respective mappings were deleted. Code at
https://github.com/diptamay/es-issue

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at configuration
    below)
  2. run ./automate.sh.
    a) This will create an es-test index with the seed mappings and
    load the sample data.
    b) Then it fires a query which returns results correctly ie 2 audio
    and 1 video
  3. Now run ./reconfigure.sh
    a) This first deletes all the audio documents and the corresponding
    audio mapping.
    b) Then its refreshes the indices and does an expunge of the
    deleted audio documents.
    c) Then it puts the new mapping for audio and does a load of the
    sample data.
    d) Then it fires a query, same as step 2b above, which returns
    results incorrectly now i.e 1 video is only returned.

Note:

  1. If I had used the new re-configured audio mapping at the time of
    creation of the index, then there is no problem.

Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs

Maybe I shouldn't be deleting and creating mappings in a live system
and instead create a new cluster up with the desired changes.
Thoughts? Is this expected behavior or a bug?

Thanks
Diptamay


(Diptamay) #4

Hi Shay

Any thoughts on the above?

-Diptamay

On Nov 17, 8:34 pm, diptamay dipta...@gmail.com wrote:

Thanks for looking into it and your suggestions. Actually, the
mapping_path was not set deliberately. So I am pretty sure there is
something going on at ES level.

When seeding happens, the mapping is setup by automate.sh from the
path data/mappings. Please referhttps://github.com/diptamay/es-issue/blob/master/data/mappings/audio....

When reconfigure happens a different mapping file is loaded, from the
root folder es-issue. Please referhttps://github.com/diptamay/es-issue/blob/master/audio.json

As you already saw, if we use the same mappings from data/mappings in
reconfigure, then the query works fine. However, if one uses a
different mapping like above, then the search is not working as
expected.

One might think that the updated mapping is not right. But I strongly
think that is not the case. Suppose, we use the new mapping for
initial seeding then you will see the query would work. This would
raise another interesting scenario though :). Then the search would
return only the audio and not the video, which I find pretty bizarre.

How do I hot swap indices using aliases? Hot swapping is a good idea,
but I have limited hardware resources (2 servers with limited ram) at
disposal in my QA environment at the moment. Indexing like a couple of
million docs with an uncompressed json size of 12 GB is already
driving one of the servers crazy, under normal load, where I have
allocated ES, 4 GB of heap. So have to do more an "in-place" deletion,
expunging and re-indexing. By the way, what all factors do I need to
consider while deciding on RAM requirements? I see with my current
data size, 6 GB of heap on one of the servers, doesn't exactly drive
the server crazy, under normal load. Yet to do load testing, so can't
say much.

Let me know if you need further info.

Thanks
Diptamay

On Nov 17, 4:34 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Your reconfigure script is not setting the mapping_path in your script
(nothing to do with elasticsearch). Change this in the script:

mappings_path="data/mappings/"

and it works.

Regarding your question, if you are going to reindex a big portion of the
data, its better to create a new index (no need for a new cluster) and index
the data into it. You can use aliases to hot swap using indices. This is
because documents are only marked as deleted, and that optimize request you
made might be heavy...

One more thing though, when you delete a mapping, and the relevant data is
also deleted, so no need for the first delete data request you make.

-shay.banon

On Wed, Nov 17, 2010 at 10:32 PM, diptamay dipta...@gmail.com wrote:

Hi

I see the following issue with the current trunk build of ES.

Scenario:

Re-Indexing results in invalid search results, when mappings are
changed in a live system and re-applied after all the documents and
the respective mappings were deleted. Code at
https://github.com/diptamay/es-issue

Steps to setup and reproduce:

  1. Ensure ES is running at localhost:9200 (look at configuration
    below)
  2. run ./automate.sh.
    a) This will create an es-test index with the seed mappings and
    load the sample data.
    b) Then it fires a query which returns results correctly ie 2 audio
    and 1 video
  3. Now run ./reconfigure.sh
    a) This first deletes all the audio documents and the corresponding
    audio mapping.
    b) Then its refreshes the indices and does an expunge of the
    deleted audio documents.
    c) Then it puts the new mapping for audio and does a load of the
    sample data.
    d) Then it fires a query, same as step 2b above, which returns
    results incorrectly now i.e 1 video is only returned.

Note:

  1. If I had used the new re-configured audio mapping at the time of
    creation of the index, then there is no problem.

Configuration of ES:

cluster:
name: sanyal
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices
index:
memory:
enabled: true
gateway:
snapshot_interval : 30s
store:
type: niofs
number_of_shards : 2
number_of_replicas : 1
path:
home: /Users/sanyal/Installs/elasticsearch
logs: /Users/sanyal/Documents/workspace/logs

Maybe I shouldn't be deleting and creating mappings in a live system
and instead create a new cluster up with the desired changes.
Thoughts? Is this expected behavior or a bug?

Thanks
Diptamay


(Shay Banon) #5
    Hey, 
    
    Â Â  Yes, I see what you mean. I will have another look. Is there a chance that next time the recreations will be simpler :)?, it would help trying to understand whats going on..., in this case for example, a single script with the curl and data in it would go a long way...-shay.banon
	
	
    On Saturday, November 20, 2010 at 4:18 AM, diptamay wrote:
    
        Hi ShayAny thoughts on the above?-DiptamayOn Nov 17, 8:34Â pm, diptamay <dipta...@gmail.com> wrote: Thanks for looking into it and your suggestions. Actually, the mapping_path was not set deliberately. So I am pretty sure there is something going on at ES level. When seeding happens, the mapping is setup by automate.sh from the path data/mappings. Please referhttps://github.com/diptamay/es-issue/blob/master/data/mappings/audio.... When reconfigure happens a different mapping file is loaded, from the root folder es-issue. Please referhttps://github.com/diptamay/es-issue/blob/master/audio.json As you already saw, if we use the same mappings from data/mappings in reconfigure, then the query works fine. However, if one uses a different mapping like above, then the search is not working as expected. One might think that the updated mapping is not right. But I strongly think that is not the case. Suppose, we use the new mapping for initial seeding then you will see the query wo

uld work. This would raise another interesting scenario though :). Then the search would return only the audio and not the video, which I find pretty bizarre. How do I hot swap indices using aliases? Hot swapping is a good idea, but I have limited hardware resources (2 servers with limited ram) at disposal in my QA environment at the moment. Indexing like a couple of million docs with an uncompressed json size of 12 GB is already driving one of the servers crazy, under normal load, where I have allocated ES, 4 GB of heap. So have to do more an "in-place" deletion, expunging and re-indexing. By the way, what all factors do I need to consider while deciding on RAM requirements? I see with my current data size, 6 GB of heap on one of the servers, doesn't exactly drive the server crazy, under normal load. Yet to do load testing, so can't say much. Let me know if you need further info. Thanks Diptamay On Nov 17, 4:34Â pm, Shay Banon wrote: > Your reconfigure script is not setting the map
ping_path in your script > (nothing to do with elasticsearch). Change this in the script: > mappings_path="data/mappings/" > and it works. > Regarding your question, if you are going to reindex a big portion of the > data, its better to create a new index (no need for a new cluster) and index > the data into it. You can use aliases to hot swap using indices. This is > because documents are only marked as deleted, and that optimize request you > made might be heavy... > One more thing though, when you delete a mapping, and the relevant data is > also deleted, so no need for the first delete data request you make. > -shay.banon > On Wed, Nov 17, 2010 at 10:32 PM, diptamay wrote: > > Hi > > I see the following issue with the current trunk build of ES. > > Scenario: > > ------------ > > Re-Indexing results in invalid search results, when mappings are > > changed in a live system and re-applied after all the documents and > > the respective mappings were deleted. Code at > >https://githu
b.com/diptamay/es-issue > > Steps to setup and reproduce: > > ------------------------------ > > 1) Ensure ES is running at localhost:9200 (look at configuration > > below) > > 2) run ./automate.sh. > > Â a) This will create an es-test index with the seed mappings and > > load the sample data. > > Â b) Then it fires a query which returns results correctly ie 2 audio > > and 1 video > > 3) Now run ./reconfigure.sh > > Â a) This first deletes all the audio documents and the corresponding > > audio mapping. > > Â b) Then its refreshes the indices and does an expunge of the > > deleted audio documents. > > Â c) Then it puts the new mapping for audio and does a load of the > > sample data. > > Â d) Then it fires a query, same as step 2b above, which returns > > results incorrectly now i.e 1 video is only returned. > > Note: > > ---- > > 1) If I had used the new re-configured audio mapping at the time of > > creation of the index, then there is no problem. > > Configuration of ES: >

-------------------- > > cluster: > > Â name: sanyal > > gateway: > > Â type: fs > > Â fs: > > Â Â location: Â /Users/sanyal/Documents/workspace/hb_indices > > index: > > Â memory: > > Â Â enabled: true > > Â gateway: > > Â Â snapshot_interval : 30s > > Â store: > > Â Â type: niofs > > Â number_of_shards : 2 > > Â number_of_replicas : 1 > > path: > > Â home: /Users/sanyal/Installs/elasticsearch > > Â logs: /Users/sanyal/Documents/workspace/logs > > Maybe I shouldn't be deleting and creating mappings in a live system > > and instead create a new cluster up with the desired changes. > > Thoughts? Is this expected behavior or a bug? > > Thanks > > Diptamay...@gmail.com>.ba...@elasticsearch.com>


(Diptamay) #6

Thanks for having another look. Sure, will make it more simpler the
next time and thanks for the final 0.13 release :).

Cheers!
Diptamay

On Nov 20, 6:35 am, Shay Banon shay.ba...@elasticsearch.com wrote:

    Hey, 

       Yes, I see what you mean. I will have another look. Is there a chance that next time the recreations will be simpler :)?, it would help trying to understand whats going on..., in this case for example, a single script with the curl and data in it would go a long way...-shay.banon

    On Saturday, November 20, 2010 at 4:18 AM, diptamay wrote:

        Hi ShayAny thoughts on the above?-DiptamayOn Nov 17, 8:34 pm, diptamay <dipta...@gmail.com> wrote: Thanks for looking into it and your suggestions. Actually, the mapping_path was not set deliberately. So I am pretty sure there is something going on at ES level. When seeding happens, the mapping is setup by automate.sh from the path data/mappings. Please referhttps://github.com/diptamay/es-issue/blob/master/data/mappings/audio.... When reconfigure happens a different mapping file is loaded, from the root folder es-issue. Please referhttps://github.com/diptamay/es-issue/blob/master/audio.jsonAs you already saw, if we use the same mappings from data/mappings in reconfigure, then the query works fine. However, if one uses a different mapping like above, then the search is not working as expected. One might think that the updated mapping is not right. But I strongly think that is not the case. Suppose, we use the new mapping for initial seeding then you will see the query wo

uld work. This would raise another interesting scenario though :). Then the search would return only the audio and not the video, which I find pretty bizarre. How do I hot swap indices using aliases? Hot swapping is a good idea, but I have limited hardware resources (2 servers with limited ram) at disposal in my QA environment at the moment. Indexing like a couple of million docs with an uncompressed json size of 12 GB is already driving one of the servers crazy, under normal load, where I have allocated ES, 4 GB of heap. So have to do more an "in-place" deletion, expunging and re-indexing. By the way, what all factors do I need to consider while deciding on RAM requirements? I see with my current data size, 6 GB of heap on one of the servers, doesn't exactly drive the server crazy, under normal load. Yet to do load testing, so can't say much. Let me know if you need further info. Thanks Diptamay On Nov 17, 4:34 pm, Shay Banon wrote: > Your reconfigure script is not setting the map
ping_path in your script > (nothing to do with elasticsearch). Change this in the script: > mappings_path="data/mappings/" > and it works. > Regarding your question, if you are going to reindex a big portion of the > data, its better to create a new index (no need for a new cluster) and index > the data into it. You can use aliases to hot swap using indices. This is > because documents are only marked as deleted, and that optimize request you > made might be heavy... > One more thing though, when you delete a mapping, and the relevant data is > also deleted, so no need for the first delete data request you make. > -shay.banon > On Wed, Nov 17, 2010 at 10:32 PM, diptamay wrote: > > Hi > > I see the following issue with the current trunk build of ES. > > Scenario: > > ------------ > > Re-Indexing results in invalid search results, when mappings are > > changed in a live system and re-applied after all the documents and > > the respective mappings were deleted. Code at > >https://githu
b.com/diptamay/es-issue > > Steps to setup and reproduce: > > ------------------------------ > > 1) Ensure ES is running at localhost:9200 (look at configuration > > below) > > 2) run ./automate.sh. > > a) This will create an es-test index with the seed mappings and > > load the sample data. > > b) Then it fires a query which returns results correctly ie 2 audio > > and 1 video > > 3) Now run ./reconfigure.sh > > a) This first deletes all the audio documents and the corresponding > > audio mapping. > > b) Then its refreshes the indices and does an expunge of the > > deleted audio documents. > > c) Then it puts the new mapping for audio and does a load of the > > sample data. > > d) Then it fires a query, same as step 2b above, which returns > > results incorrectly now i.e 1 video is only returned. > > Note: > > ---- > > 1) If I had used the new re-configured audio mapping at the time of > > creation of the index, then there is no problem. > > Configuration of ES: >

-------------------- > > cluster: > > name: sanyal > > gateway: > > type: fs > > fs: > > location: /Users/sanyal/Documents/workspace/hb_indices > > index: > > memory: > > enabled: true > > gateway: > > snapshot_interval : 30s > > store: > > type: niofs > > number_of_shards : 2 > > number_of_replicas : 1 > > path: > > home: /Users/sanyal/Installs/elasticsearch > > logs: /Users/sanyal/Documents/workspace/logs > > Maybe I shouldn't be deleting and creating mappings in a live system > > and instead create a new cluster up with the desired changes. > > Thoughts? Is this expected behavior or a bug? > > Thanks > > Diptamay...@gmail.com>.ba...@elasticsearch.com>


(Shay Banon) #7

Hey,

Tracked down the problem, and fixed it. It is represented in two issues:
531 and 532. Note, that since you query on tags without any type prefix,
then it means that one of the fields out of all the mappings will be chosen,
so in your case, the video mapping will also need to be redefined, not just
audio.

-shay.banon

On Sat, Nov 20, 2010 at 7:02 PM, diptamay diptamay@gmail.com wrote:

Thanks for having another look. Sure, will make it more simpler the
next time and thanks for the final 0.13 release :).

Cheers!
Diptamay

On Nov 20, 6:35 am, Shay Banon shay.ba...@elasticsearch.com wrote:

    Hey,

       Yes, I see what you mean. I will have another look. Is there a

chance that next time the recreations will be simpler :)?, it would help
trying to understand whats going on..., in this case for example, a single
script with the curl and data in it would go a long way...-shay.banon

    On Saturday, November 20, 2010 at 4:18 AM, diptamay wrote:

        Hi ShayAny thoughts on the above?-DiptamayOn Nov 17, 8:34 pm,

diptamay dipta...@gmail.com wrote: Thanks for looking into it and your
suggestions. Actually, the mapping_path was not set deliberately. So I am
pretty sure there is something going on at ES level. When seeding happens,
the mapping is setup by automate.sh from the path data/mappings. Please
referhttps://
github.com/diptamay/es-issue/blob/master/data/mappings/audio.... When
reconfigure happens a different mapping file is loaded, from the root folder
es-issue. Please referhttps://
github.com/diptamay/es-issue/blob/master/audio.jsonAs you already saw, if
we use the same mappings from data/mappings in reconfigure, then the query
works fine. However, if one uses a different mapping like above, then the
search is not working as expected. One might think that the updated mapping
is not right. But I strongly think that is not the case. Suppose, we use the
new mapping for initial seeding then you will see the query wo

uld work. This would raise another interesting scenario though :). Then
the search would return only the audio and not the video, which I find
pretty bizarre. How do I hot swap indices using aliases? Hot swapping is a
good idea, but I have limited hardware resources (2 servers with limited
ram) at disposal in my QA environment at the moment. Indexing like a couple
of million docs with an uncompressed json size of 12 GB is already driving
one of the servers crazy, under normal load, where I have allocated ES, 4 GB
of heap. So have to do more an "in-place" deletion, expunging and
re-indexing. By the way, what all factors do I need to consider while
deciding on RAM requirements? I see with my current data size, 6 GB of heap
on one of the servers, doesn't exactly drive the server crazy, under normal
load. Yet to do load testing, so can't say much. Let me know if you need
further info. Thanks Diptamay On Nov 17, 4:34 pm, Shay Banon wrote: > Your
reconfigure script is not setting the map
ping_path in your script > (nothing to do with elasticsearch). Change
this in the script: > mappings_path="data/mappings/" > and it works. >
Regarding your question, if you are going to reindex a big portion of the >
data, its better to create a new index (no need for a new cluster) and index
the data into it. You can use aliases to hot swap using indices. This is >
because documents are only marked as deleted, and that optimize request you
made might be heavy... > One more thing though, when you delete a mapping,
and the relevant data is > also deleted, so no need for the first delete
data request you make. > -shay.banon > On Wed, Nov 17, 2010 at 10:32 PM,
diptamay wrote: > > Hi > > I see the following issue with the current trunk
build of ES. > > Scenario: > > ------------ > > Re-Indexing results in
invalid search results, when mappings are > > changed in a live system and
re-applied after all the documents and > > the respective mappings were
deleted. Code at > >https://githu
b.com/diptamay/es-issue > > Steps to setup and reproduce: > >
------------------------------ > > 1) Ensure ES is running at localhost:9200
(look at configuration > > below) > > 2) run ./automate.sh. > > a) This
will create an es-test index with the seed mappings and > > load the sample
data. > > b) Then it fires a query which returns results correctly ie 2
audio > > and 1 video > > 3) Now run ./reconfigure.sh > > a) This first
deletes all the audio documents and the corresponding > > audio mapping. > >
b) Then its refreshes the indices and does an expunge of the > > deleted
audio documents. > > c) Then it puts the new mapping for audio and does a
load of the > > sample data. > > d) Then it fires a query, same as step 2b
above, which returns > > results incorrectly now i.e 1 video is only
returned. > > Note: > > ---- > > 1) If I had used the new re-configured
audio mapping at the time of > > creation of the index, then there is no
problem. > > Configuration of ES: >

-------------------- > > cluster: > > name: sanyal > > gateway: > >
type: fs > > fs: > > location:
/Users/sanyal/Documents/workspace/hb_indices > > index: > > memory: > >
enabled: true > > gateway: > > snapshot_interval : 30s > > store: > >
type: niofs > > number_of_shards : 2 > > number_of_replicas : 1 > >
path: > > home: /Users/sanyal/Installs/elasticsearch > > logs:
/Users/sanyal/Documents/workspace/logs > > Maybe I shouldn't be deleting and
creating mappings in a live system > > and instead create a new cluster up
with the desired changes. > > Thoughts? Is this expected behavior or a bug?

Thanks > > Diptamay...@gmail.com>.ba...@elasticsearch.com>


(Diptamay) #8

Hey

Thanks! Will check it out. Sorry I did not reply earlier.

Cheers!
Diptamay

On Nov 23, 8:28 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

Tracked down the problem, and fixed it. It is represented in two issues:
531 and 532. Note, that since you query on tags without any type prefix,
then it means that one of the fields out of all the mappings will be chosen,
so in your case, the video mapping will also need to be redefined, not just
audio.

-shay.banon

On Sat, Nov 20, 2010 at 7:02 PM, diptamay dipta...@gmail.com wrote:

Thanks for having another look. Sure, will make it more simpler the
next time and thanks for the final 0.13 release :).

Cheers!
Diptamay

On Nov 20, 6:35 am, Shay Banon shay.ba...@elasticsearch.com wrote:

    Hey,
       Yes, I see what you mean. I will have another look. Is there a

chance that next time the recreations will be simpler :)?, it would help
trying to understand whats going on..., in this case for example, a single
script with the curl and data in it would go a long way...-shay.banon

    On Saturday, November 20, 2010 at 4:18 AM, diptamay wrote:
        Hi ShayAny thoughts on the above?-DiptamayOn Nov 17, 8:34 pm,

diptamay dipta...@gmail.com wrote: Thanks for looking into it and your
suggestions. Actually, the mapping_path was not set deliberately. So I am
pretty sure there is something going on at ES level. When seeding happens,
the mapping is setup by automate.sh from the path data/mappings. Please
referhttps://
github.com/diptamay/es-issue/blob/master/data/mappings/audio.... When
reconfigure happens a different mapping file is loaded, from the root folder
es-issue. Please referhttps://
github.com/diptamay/es-issue/blob/master/audio.jsonAs you already saw, if
we use the same mappings from data/mappings in reconfigure, then the query
works fine. However, if one uses a different mapping like above, then the
search is not working as expected. One might think that the updated mapping
is not right. But I strongly think that is not the case. Suppose, we use the
new mapping for initial seeding then you will see the query wo

uld work. This would raise another interesting scenario though :). Then
the search would return only the audio and not the video, which I find
pretty bizarre. How do I hot swap indices using aliases? Hot swapping is a
good idea, but I have limited hardware resources (2 servers with limited
ram) at disposal in my QA environment at the moment. Indexing like a couple
of million docs with an uncompressed json size of 12 GB is already driving
one of the servers crazy, under normal load, where I have allocated ES, 4 GB
of heap. So have to do more an "in-place" deletion, expunging and
re-indexing. By the way, what all factors do I need to consider while
deciding on RAM requirements? I see with my current data size, 6 GB of heap
on one of the servers, doesn't exactly drive the server crazy, under normal
load. Yet to do load testing, so can't say much. Let me know if you need
further info. Thanks Diptamay On Nov 17, 4:34 pm, Shay Banon wrote: > Your
reconfigure script is not setting the map
ping_path in your script > (nothing to do with elasticsearch). Change
this in the script: > mappings_path="data/mappings/" > and it works. >
Regarding your question, if you are going to reindex a big portion of the >
data, its better to create a new index (no need for a new cluster) and index
the data into it. You can use aliases to hot swap using indices. This is >
because documents are only marked as deleted, and that optimize request you
made might be heavy... > One more thing though, when you delete a mapping,
and the relevant data is > also deleted, so no need for the first delete
data request you make. > -shay.banon > On Wed, Nov 17, 2010 at 10:32 PM,
diptamay wrote: > > Hi > > I see the following issue with the current trunk
build of ES. > > Scenario: > > ------------ > > Re-Indexing results in
invalid search results, when mappings are > > changed in a live system and
re-applied after all the documents and > > the respective mappings were
deleted. Code at > >https://githu
b.com/diptamay/es-issue > > Steps to setup and reproduce: > >
------------------------------ > > 1) Ensure ES is running at localhost:9200
(look at configuration > > below) > > 2) run ./automate.sh. > > a) This
will create an es-test index with the seed mappings and > > load the sample
data. > > b) Then it fires a query which returns results correctly ie 2
audio > > and 1 video > > 3) Now run ./reconfigure.sh > > a) This first
deletes all the audio documents and the corresponding > > audio mapping. > >
b) Then its refreshes the indices and does an expunge of the > > deleted
audio documents. > > c) Then it puts the new mapping for audio and does a
load of the > > sample data. > > d) Then it fires a query, same as step 2b
above, which returns > > results incorrectly now i.e 1 video is only
returned. > > Note: > > ---- > > 1) If I had used the new re-configured
audio mapping at the time of > > creation of the index, then there is no
problem. > > Configuration of ES: >

-------------------- > > cluster: > > name: sanyal > > gateway: > >
type: fs > > fs: > > location:
/Users/sanyal/Documents/workspace/hb_indices > > index: > > memory: > >
enabled: true > > gateway: > > snapshot_interval : 30s > > store: > >
type: niofs > > number_of_shards : 2 > > number_of_replicas : 1 > >
path: > > home: /Users/sanyal/Installs/elasticsearch > > logs:
/Users/sanyal/Documents/workspace/logs > > Maybe I shouldn't be deleting and
creating mappings in a live system > > and instead create a new cluster up
with the desired changes. > > Thoughts? Is this expected behavior or a bug?

Thanks > > Diptamay...@gmail.com>.ba...@elasticsearch.com>


(system) #9