Inconsistent 'fields' returned when explicitly asking for fields in 0.19.4

Hi, all.

We have an odd case where requesting explicit fields on a query sometimes
returns all the fields, and sometimes doesn't - for the same ids query
(i.e. for the same record). We haven't been able to nail it down - but it
usually works fine if we've just re-indexed and haven't yet shut down and
restarted ES, and usually doesn't work if we've shut down and restarted
ES.

We don't have any stored fields - everything is in the _source.

So - here's a CURL sample that sometimes returns the requested 'blob' field
and sometimes doesn't.

(I may mess up the formatting here - I'm using CURL on Windows, and taking
my original Windows command and morphing it into something more readable:

curl -XGET http://localhost:9200/cust1/_search?fields=blob -d '{
"query" : {
"ids" : {
"values" : ["ent://SD_ASSET/0/122"]
}
}
}'

When it DOES NOT return the blob field, here's the resulting JSON:

{ "_shards" : { "failed" : 0,
"successful" : 1,
"total" : 1
},
"hits" : { "hits" : [ { "_id" : "ent://SD_ASSET/0/122",
"_index" : "cust1",
"_score" : 1.0,
"_type" : "sd_asset"
} ],
"max_score" : 1.0,
"total" : 1
},
"timed_out" : false,
"took" : 1
}

When it DOES return the blob field, here's the resulting JSON:

{ "_shards" : { "failed" : 0,
"successful" : 1,
"total" : 1
},
"hits" : { "hits" : [ { "_id" : "ent://SD_ASSET/0/122",
"_index" : "cust1",
"_score" : 1.0,
"_type" : "sd_asset",
"fields" : { "_source.blob" :
"H4sIAAAAAAAAAJ3UUWvDIBAA4L/Sx+1lLun2UJAQMVcqTTSorGQvEhrXBUwGW7Pfv3QZY82KBZ+8Qz8PzkPcvO2HzvbHBLdNEsUxRuOKu/q4f53SKcQvrXXjRl93NtlCtRMyUxh9p/izdoNN+sE5jKYYo5/zf5kEpSWjmgk+o2kKRamrNPV6QikoZXJ4gvzcR4EuDnTLQPfgdWshC6LPBVEK9JWujoUIp2CUkDO98krKNHsGrjasDHoPmgORp8qzpt6HqCAUBanVlbZcVo9exAEyo4XZcrELamYBGSNGV+WsrBF6A9L4B+40JIaTYmaVfW9r97E4DG1jFzfLu+j23z3o9wP4AlqyXlULBAAA"
}
} ],
"max_score" : 1.0,
"total" : 1
},
"timed_out" : false,
"took" : 2
}

If I don't ask for any fields, here's the response (just to show you what's
in the _source)

{ "_shards" : { "failed" : 0,
"successful" : 1,
"total" : 1
},
"hits" : { "hits" : [ { "_id" : "ent://SD_ASSET/0/122",
"_index" : "cust1",
"_score" : 1.0,
"_source" : { "ACCESS_LEVEL_nfacet" : [ "1",
"2",
"3",
"4"
],
"ASSET_NAME_display" : [ "Serials guide (3.1)" ],
"CITIZENSHIPS_facet" : [ "@@EMPTY@@" ],
"CLEARANCE_nfacet" : [ "20",
"0",
"10",
"99",
"5"
],
"DOC_TEXT" : [ "null",
"ASSET",
"OTHER",
"Serials guide (3.1)"
],
"DS_EC" : "SD_ASSET",
"DS_KEY" : 7,
"FORMAT_display" : [ "ASSET" ],
"FORMAT_facet" : [ "ASSET" ],
"KEYWORDS_display" : [ "null" ],
"KEYWORDS_facet" : [ "null" ],
"MEDIA_TYPE_display" : [ "OTHER" ],
"MEDIA_TYPE_facet" : [ "OTHER" ],
"NEED_TO_KNOWS_facet" : [ "@@EMPTY@@" ],
"RELEVANCE_SORT_nsort" : "9",
"RESTRICTIONS_facet" : [ "@@EMPTY@@" ],
"_id" : "ent://SD_ASSET/0/122",
"blob" :
"H4sIAAAAAAAAAJ3UUWvDIBAA4L/Sx+1lLun2UJAQMVcqTTSorGQvEhrXBUwGW7Pfv3QZY82KBZ+8Qz8PzkPcvO2HzvbHBLdNEsUxRuOKu/q4f53SKcQvrXXjRl93NtlCtRMyUxh9p/izdoNN+sE5jKYYo5/zf5kEpSWjmgk+o2kKRamrNPV6QikoZXJ4gvzcR4EuDnTLQPfgdWshC6LPBVEK9JWujoUIp2CUkDO98krKNHsGrjasDHoPmgORp8qzpt6HqCAUBanVlbZcVo9exAEyo4XZcrELamYBGSNGV+WsrBF6A9L4B+40JIaTYmaVfW9r97E4DG1jFzfLu+j23z3o9wP4AlqyXlULBAAA",
"sequence" : 4
},
"_type" : "sd_asset"
} ],
"max_score" : 1.0,
"total" : 1
},
"timed_out" : false,
"took" : 1
}

Anyways - this is a sort of minimal example. What we're really seeing is
on a request for 6 fields to be returned, sometimes we get all 6, sometimes
we just get one or two.

I know that we can modify the query to prefix all the field names by
"_source.", and that does indeed consistently return all the requested
fields. But, since my understanding is that when requesting fields just by
name, they should be returned from the _source without needing that
qualifier.

Any ideas if this is a bug? I looked through the archives and found one
that's similar, but not identical:
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/fields$20not$20returned/elasticsearch/z_Er_Q1N1yw/tWb1JfvuaHUJ

(That's where I found the idea of qualifying the field names)

I've attached the index meta data if that is of any help.

Thanks!

Bob.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I haven't used this much, but here's what I've read:

  1. ElasticSearch stores and retrieves your _source JSON without
    modification. I've read that it doesn't even need to be fully valid, though
    I have not tried that!

  2. To be returned explicitly by a request of a list of fields, the field
    must be stored. Contained in the _source doesn't count, I don't believe. At
    least, not from my experience.

  3. Also, to be sorted on, the field must be stored, and it's recommended
    that a second copy must also be stored without being broken into tokens.
    Though for now, I only sort on Geo distance, and after following all of the
    examples it does work very nicely.

Since I typically parse the returned _source and it already contains the
entire document, I don't bother trying to tell ElasticSearch what fields I
want. I pull what I want from my parsed JSON.

So I haven't explored the added benefits of actually storing fields, nor of
storing two copies: One analyzed and one not analyzed. That may be next on
my list of cool ES features to dive into.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks, but that's not the issue. The issue is for the same query,
sometimes the results come back fine, and sometimes they don't. Identical
source data, identical indexing, identical index - just shutting down ES
and starting it up again can change the behaviour... :frowning:

  1. Right, it does.
  2. On this
    page: Elasticsearch Platform — Find real-time answers at scale | Elastic
    the documentation indicates that if a field isn't stored, it will load the
    _source and extract the field from there.
  3. Adding stored = true to the fields will both store it in the index, AND
    have it in the _source. One or the other is fine for the fields I care
    about - I don't want both - just bloats the index size.

On Wednesday, 27 February 2013 17:43:51 UTC-5, InquiringMind wrote:

I haven't used this much, but here's what I've read:

  1. Elasticsearch stores and retrieves your _source JSON without
    modification. I've read that it doesn't even need to be fully valid, though
    I have not tried that!

  2. To be returned explicitly by a request of a list of fields, the field
    must be stored. Contained in the _source doesn't count, I don't believe. At
    least, not from my experience.

  3. Also, to be sorted on, the field must be stored, and it's recommended
    that a second copy must also be stored without being broken into tokens.
    Though for now, I only sort on Geo distance, and after following all of the
    examples it does work very nicely.

Since I typically parse the returned _source and it already contains the
entire document, I don't bother trying to tell Elasticsearch what fields I
want. I pull what I want from my parsed JSON.

So I haven't explored the added benefits of actually storing fields, nor
of storing two copies: One analyzed and one not analyzed. That may be next
on my list of cool ES features to dive into.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You do not need to store the field, reading from source works as well. The
default Elasticsearch behavior of not-storing individual fields, but store
the original source works well in most cases. If you are requesting six
fields, you are better off using source instead of having to do six
different field lookups. Although the OP didn't mention sorting, a field
must be indexed, not stored although it can be, in order to be sorted on.

Shot in the dark here: you mentioned that it works after a reindex. Are you
perhaps indexing an incorrect document in between re-indexes? The "no
fields" example above, was that taken after a failed query?

Ivan

On Thu, Feb 28, 2013 at 5:44 AM, Robert Sandiford bobsandiford@gmail.comwrote:

Thanks, but that's not the issue. The issue is for the same query,
sometimes the results come back fine, and sometimes they don't. Identical
source data, identical indexing, identical index - just shutting down ES
and starting it up again can change the behaviour... :frowning:

  1. Right, it does.
  2. On this page:
    Elasticsearch Platform — Find real-time answers at scale | Elastic the
    documentation indicates that if a field isn't stored, it will load the
    _source and extract the field from there.
  3. Adding stored = true to the fields will both store it in the index, AND
    have it in the _source. One or the other is fine for the fields I care
    about - I don't want both - just bloats the index size.

On Wednesday, 27 February 2013 17:43:51 UTC-5, InquiringMind wrote:

I haven't used this much, but here's what I've read:

  1. Elasticsearch stores and retrieves your _source JSON without
    modification. I've read that it doesn't even need to be fully valid, though
    I have not tried that!

  2. To be returned explicitly by a request of a list of fields, the field
    must be stored. Contained in the _source doesn't count, I don't believe. At
    least, not from my experience.

  3. Also, to be sorted on, the field must be stored, and it's recommended
    that a second copy must also be stored without being broken into tokens.
    Though for now, I only sort on Geo distance, and after following all of the
    examples it does work very nicely.

Since I typically parse the returned _source and it already contains the
entire document, I don't bother trying to tell Elasticsearch what fields I
want. I pull what I want from my parsed JSON.

So I haven't explored the added benefits of actually storing fields, nor
of storing two copies: One analyzed and one not analyzed. That may be next
on my list of cool ES features to dive into.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks, Ivan, for confirming my understanding that I don't need to flag the
fields as stored - just having them in the _source is sufficient.

It's not the indexing itself. We're using the exact same input data and
code to index each time.

What I'm seeing is that (generally) once I've indexed, it works fine, until
I shut down and restart ES. After the restart, (generally) it won't work.
I.E. no reindex in between, just stopping and starting ES.

Bob.

On Thursday, 28 February 2013 16:23:11 UTC-5, Ivan Brusic wrote:

You do not need to store the field, reading from source works as well. The
default Elasticsearch behavior of not-storing individual fields, but store
the original source works well in most cases. If you are requesting six
fields, you are better off using source instead of having to do six
different field lookups. Although the OP didn't mention sorting, a field
must be indexed, not stored although it can be, in order to be sorted on.

Shot in the dark here: you mentioned that it works after a reindex. Are
you perhaps indexing an incorrect document in between re-indexes? The "no
fields" example above, was that taken after a failed query?

Ivan

On Thu, Feb 28, 2013 at 5:44 AM, Robert Sandiford <bobsan...@gmail.com<javascript:>

wrote:

Thanks, but that's not the issue. The issue is for the same query,
sometimes the results come back fine, and sometimes they don't. Identical
source data, identical indexing, identical index - just shutting down ES
and starting it up again can change the behaviour... :frowning:

  1. Right, it does.
  2. On this page:
    Elasticsearch Platform — Find real-time answers at scale | Elastic the
    documentation indicates that if a field isn't stored, it will load the
    _source and extract the field from there.
  3. Adding stored = true to the fields will both store it in the index,
    AND have it in the _source. One or the other is fine for the fields I care
    about - I don't want both - just bloats the index size.

On Wednesday, 27 February 2013 17:43:51 UTC-5, InquiringMind wrote:

I haven't used this much, but here's what I've read:

  1. Elasticsearch stores and retrieves your _source JSON without
    modification. I've read that it doesn't even need to be fully valid, though
    I have not tried that!

  2. To be returned explicitly by a request of a list of fields, the field
    must be stored. Contained in the _source doesn't count, I don't believe. At
    least, not from my experience.

  3. Also, to be sorted on, the field must be stored, and it's recommended
    that a second copy must also be stored without being broken into tokens.
    Though for now, I only sort on Geo distance, and after following all of the
    examples it does work very nicely.

Since I typically parse the returned _source and it already contains the
entire document, I don't bother trying to tell Elasticsearch what fields I
want. I pull what I want from my parsed JSON.

So I haven't explored the added benefits of actually storing fields, nor
of storing two copies: One analyzed and one not analyzed. That may be next
on my list of cool ES features to dive into.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Bob

On Thu, 2013-02-28 at 14:04 -0800, Robert Sandiford wrote:

Thanks, Ivan, for confirming my understanding that I don't need to
flag the fields as stored - just having them in the _source is
sufficient.

It's not the indexing itself. We're using the exact same input data
and code to index each time.

What I'm seeing is that (generally) once I've indexed, it works fine,
until I shut down and restart ES. After the restart, (generally) it
won't work. I.E. no reindex in between, just stopping and starting
ES.

Is it possible that you have multiple fields with the same name?

A structure like

{ foo: { 
      bar: { baz: "xxx"},
      baz: "zzz"
}} 

might choose "foo.bar.baz" or "foo.baz" on different runs, and so
confuse the results. Note: the same field name in different types can
have a similar effect.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Nope, no duplicate fields (and, no structures - just simple, single-level
fields). There's a complete copy of the contents of an index result (i.e.
no fields specified - shows the _source contents) in the original post.
The only potential 'duplicate' is that the _id field exists both inside
and outside of the _source.

On Friday, 1 March 2013 06:52:24 UTC-5, Clinton Gormley wrote:

Hi Bob

On Thu, 2013-02-28 at 14:04 -0800, Robert Sandiford wrote:

Thanks, Ivan, for confirming my understanding that I don't need to
flag the fields as stored - just having them in the _source is
sufficient.

It's not the indexing itself. We're using the exact same input data
and code to index each time.

What I'm seeing is that (generally) once I've indexed, it works fine,
until I shut down and restart ES. After the restart, (generally) it
won't work. I.E. no reindex in between, just stopping and starting
ES.

Is it possible that you have multiple fields with the same name?

A structure like

{ foo: { 
      bar: { baz: "xxx"}, 
      baz: "zzz" 
}} 

might choose "foo.bar.baz" or "foo.baz" on different runs, and so
confuse the results. Note: the same field name in different types can
have a similar effect.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Fri, 2013-03-01 at 05:02 -0800, Robert Sandiford wrote:

Nope, no duplicate fields (and, no structures - just simple,
single-level fields). There's a complete copy of the contents of an
index result (i.e. no fields specified - shows the _source contents)
in the original post. The only potential 'duplicate' is that the _id
field exists both inside and outside of the _source.

Any chance you could put together a recreation?

Also, are you using the latest version of ES? 0.20.5?

clint

On Friday, 1 March 2013 06:52:24 UTC-5, Clinton Gormley wrote:
Hi Bob

    On Thu, 2013-02-28 at 14:04 -0800, Robert Sandiford wrote: 
    > Thanks, Ivan, for confirming my understanding that I don't
    need to 
    > flag the fields as stored - just having them in the _source
    is 
    > sufficient. 
    >   
    > It's not the indexing itself.  We're using the exact same
    input data 
    > and code to index each time. 
    >   
    > What I'm seeing is that (generally) once I've indexed, it
    works fine, 
    > until I shut down and restart ES.  After the restart,
    (generally) it 
    > won't work.  I.E. no reindex in between, just stopping and
    starting 
    > ES. 
    
    Is it possible that you have multiple fields with the same
    name? 
    
    A structure like 
    
        { foo: { 
              bar: { baz: "xxx"}, 
              baz: "zzz" 
        }} 
    
    might choose "foo.bar.baz" or "foo.baz" on different runs, and
    so 
    confuse the results. Note: the same field name in different
    types can 
    have a similar effect. 
    
    clint 
    
    
    
    >         

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

HI, Clint.

More info is in the original post - but, we're running 0.19.4. We were
hoping to wait for 0.90 to be GA before updating to a more recent version -
i.e. we want to upgrade once, from 0.19.4 to 0.90. (We have some custom
Lucene code we roll in - which is NOT in the area of retrieving fields (we
mess with similarity a bit, and do some custom facet filtering in some
situations - not this situation)

As to a recreation, I'll try to figure one out - but I haven't found a
consistent way to reproduce it - it's intermittent.

On Friday, 1 March 2013 08:12:01 UTC-5, Clinton Gormley wrote:

On Fri, 2013-03-01 at 05:02 -0800, Robert Sandiford wrote:

Nope, no duplicate fields (and, no structures - just simple,
single-level fields). There's a complete copy of the contents of an
index result (i.e. no fields specified - shows the _source contents)
in the original post. The only potential 'duplicate' is that the _id
field exists both inside and outside of the _source.

Any chance you could put together a recreation?

Also, are you using the latest version of ES? 0.20.5?

clint

On Friday, 1 March 2013 06:52:24 UTC-5, Clinton Gormley wrote:
Hi Bob

    On Thu, 2013-02-28 at 14:04 -0800, Robert Sandiford wrote: 
    > Thanks, Ivan, for confirming my understanding that I don't 
    need to 
    > flag the fields as stored - just having them in the _source 
    is 
    > sufficient. 
    >   
    > It's not the indexing itself.  We're using the exact same 
    input data 
    > and code to index each time. 
    >   
    > What I'm seeing is that (generally) once I've indexed, it 
    works fine, 
    > until I shut down and restart ES.  After the restart, 
    (generally) it 
    > won't work.  I.E. no reindex in between, just stopping and 
    starting 
    > ES. 
    
    Is it possible that you have multiple fields with the same 
    name? 
    
    A structure like 
    
        { foo: { 
              bar: { baz: "xxx"}, 
              baz: "zzz" 
        }} 
    
    might choose "foo.bar.baz" or "foo.baz" on different runs, and 
    so 
    confuse the results. Note: the same field name in different 
    types can 
    have a similar effect. 
    
    clint 
    
    
    
    >         

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Be careful not to mix Lucene libraries in your code. Your existing code
would also need to be updated to Lucene 4.1. I am the the process in
converting now and I finally can no longer ignore all the deprecation
warnings! :slight_smile:

--
Ivan

On Fri, Mar 1, 2013 at 5:51 AM, Robert Sandiford bobsandiford@gmail.comwrote:

HI, Clint.

More info is in the original post - but, we're running 0.19.4. We were
hoping to wait for 0.90 to be GA before updating to a more recent version -
i.e. we want to upgrade once, from 0.19.4 to 0.90. (We have some custom
Lucene code we roll in - which is NOT in the area of retrieving fields (we
mess with similarity a bit, and do some custom facet filtering in some
situations - not this situation)

As to a recreation, I'll try to figure one out - but I haven't found a
consistent way to reproduce it - it's intermittent.

On Friday, 1 March 2013 08:12:01 UTC-5, Clinton Gormley wrote:

On Fri, 2013-03-01 at 05:02 -0800, Robert Sandiford wrote:

Nope, no duplicate fields (and, no structures - just simple,
single-level fields). There's a complete copy of the contents of an
index result (i.e. no fields specified - shows the _source contents)
in the original post. The only potential 'duplicate' is that the _id
field exists both inside and outside of the _source.

Any chance you could put together a recreation?

Also, are you using the latest version of ES? 0.20.5?

clint

On Friday, 1 March 2013 06:52:24 UTC-5, Clinton Gormley wrote:
Hi Bob

    On Thu, 2013-02-28 at 14:04 -0800, Robert Sandiford wrote:
    > Thanks, Ivan, for confirming my understanding that I don't
    need to
    > flag the fields as stored - just having them in the _source
    is
    > sufficient.
    >
    > It's not the indexing itself.  We're using the exact same
    input data
    > and code to index each time.
    >
    > What I'm seeing is that (generally) once I've indexed, it
    works fine,
    > until I shut down and restart ES.  After the restart,
    (generally) it
    > won't work.  I.E. no reindex in between, just stopping and
    starting
    > ES.

    Is it possible that you have multiple fields with the same
    name?

    A structure like

        { foo: {
              bar: { baz: "xxx"},
              baz: "zzz"
        }}

    might choose "foo.bar.baz" or "foo.baz" on different runs, and
    so
    confuse the results. Note: the same field name in different
    types can
    have a similar effect.

    clint



    >

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I figured out the bug in 0.19.4.

Basically, we have mappings like this (in part):

"mappings" : {
  "_default_" : {
   "properties" : {
      "blob" :      {"type" : "binary", "store" : "false", "index" : 

"not_analyzed", "include_in_all" : "false" }
}
}
}

When starting 'from scratch' (i.e. indexing data for the first time),
internally ES has BinaryFieldMapper element for "blob", where "store" is
"NO", which is correct.

However - when shutting down and re-starting ES, the BinaryFieldMapper
entry that gets created has "store" set to "YES". So, it tries in
FetchPhase to place the "blob" info into the fieldSelectorMapper, rather
than into the extractFieldNames. And, since there is no "blob" field other
than in _source, it fails to retrieve the "blob" value.

So - the but appears to be that the "store" "false" is not being preserved
properly when you just shut down ES and then start it up again.

I'm going to leave this to those who know the ES internals to take a look
at this particular bug.

Thanks!

Bob.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Thu, 2013-03-07 at 08:10 -0800, Robert Sandiford wrote:

I figured out the bug in 0.19.4.

Nice catch.

Please open an issue for this, so that it doesn't get lost

ta

clint

Basically, we have mappings like this (in part):

"mappings" : {
  "_default_" : {
   "properties" : {
      "blob" :      {"type" : "binary", "store" : "false",

"index" : "not_analyzed", "include_in_all" : "false" }
}
}
}

When starting 'from scratch' (i.e. indexing data for the first time),
internally ES has BinaryFieldMapper element for "blob", where "store"
is "NO", which is correct.

However - when shutting down and re-starting ES, the BinaryFieldMapper
entry that gets created has "store" set to "YES". So, it tries in
FetchPhase to place the "blob" info into the fieldSelectorMapper,
rather than into the extractFieldNames. And, since there is no "blob"
field other than in _source, it fails to retrieve the "blob" value.

So - the but appears to be that the "store" "false" is not being
preserved properly when you just shut down ES and then start it up
again.

I'm going to leave this to those who know the ES internals to take a
look at this particular bug.

Thanks!

Bob.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Turns out it has been fixed sometime post-0.19.4. Works fine (differently,
but fine :)) in both 0.20.5 and 0.90.0 Beta 1. (Once I figured out how to
run an "ids" query - another bug, resolved but not in Beta 1).

Thanks, all!

Bob.

On Thursday, 7 March 2013 13:31:39 UTC-5, Clinton Gormley wrote:

On Thu, 2013-03-07 at 08:10 -0800, Robert Sandiford wrote:

I figured out the bug in 0.19.4.

Nice catch.

Please open an issue for this, so that it doesn't get lost

ta

clint

Basically, we have mappings like this (in part):

"mappings" : { 
  "_default_" : { 
   "properties" : { 
      "blob" :      {"type" : "binary", "store" : "false", 

"index" : "not_analyzed", "include_in_all" : "false" }
}
}
}

When starting 'from scratch' (i.e. indexing data for the first time),
internally ES has BinaryFieldMapper element for "blob", where "store"
is "NO", which is correct.

However - when shutting down and re-starting ES, the BinaryFieldMapper
entry that gets created has "store" set to "YES". So, it tries in
FetchPhase to place the "blob" info into the fieldSelectorMapper,
rather than into the extractFieldNames. And, since there is no "blob"
field other than in _source, it fails to retrieve the "blob" value.

So - the but appears to be that the "store" "false" is not being
preserved properly when you just shut down ES and then start it up
again.

I'm going to leave this to those who know the ES internals to take a
look at this particular bug.

Thanks!

Bob.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.