NumberFormatException when sorting by numeric document ID

Hello hello! I have a bizarre error I've been trying to debug for a few
weeks with no luck, and I'm finally left to conclude that it may be a bug
in ElasticSearch.

Once every few days, I start seeing shard failures in my query results,
like this:

{
"index": "my_index",
"shard": 3,
"status": 500,
"reason": "RemoteTransportException[[HOSTNAME][inet[/10.0.123.123:9300]][search/phase/query]]; nested: QueryPhaseExecutionException[[my_index][3]: query[ConstantScore(cache(_type:my_type))],from[0],size[10],sort[<custom:"id": org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@97c2b4f>]: Query Failed [Failed to execute main query]]; nested: ElasticSearchException[java.lang.NumberFormatException: Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)]; nested: UncheckedExecutionException[java.lang.NumberFormatException: Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)]; nested: NumberFormatException[Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)]; "
}

This query is operating against an index with about 100 different fields
(including several different nested types), but the relevant portion of the
mapping looks like this:

{
  "my_type" : {
  "_id"           : { "type" : "long", "path" : "id" },
  "properties"    : {
    "id"            : { "type" : "long" },
    /* ... LOTS OF OTHER FIELDS, INCLUDING MANY NESTED TYPES */
  }}
}

I've been able to isolate the shard failures to a minimal query of this
form:

{
  "query" : { "match_all" : { } },
  "sort" : [{
    "id" : { "order" : "asc" }
  }]
}

Basically, sorting by (numeric) ID causes shard failures when the shards
sometimes mistakenly think that there are non-numeric values in the "id"
field. I've audited the data, and it conforms with the schema. The id
fields always contain valid LONG values.

Whenever the shard failures occur, I can silence them for a few days by
optimizing the index, like this:

curl -XPOST 'http://HOSTNAME:9200/my_index/_optimize?max_num_segments=1'

And the shard failures will stop for a day or two, but inevitably, within a
few days the failures will return and I'll have to optimize the index
again. The weird thing is that the status URL always reports GREEN status
and all shards healthy, even when these queries are failing on every
request.

I experienced these failures originally on 0.90.5, but I continued seeing
the same problems after recently upgrading to 0.90.10. I even deleted the
index and rebuilt from scratch under 0.90.10, but I've kept seeing the same
failures.

Any idea what might be going on?

Thanks!

benji smith

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d8da85f-ed52-4127-b283-94998e851713%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

is there a complete stack trace in the elasticsearch log files available
you could post here?
Also, your query is only spanning one index and one type? If not can you
provide other mappings? (I dont think this is an issue here, when you say
that optimizing it down to one segment makes it work again, just want to
exclude things). Is it possible that there is one type in your index, that
has a different id mapping? And that type has been deleted again? This
would explain why it works after an optimize...

--Alex

On Thu, Feb 13, 2014 at 1:03 AM, Benji Smith benji@benjismith.net wrote:

Hello hello! I have a bizarre error I've been trying to debug for a few
weeks with no luck, and I'm finally left to conclude that it may be a bug
in Elasticsearch.

Once every few days, I start seeing shard failures in my query results,
like this:

{
"index": "my_index",
"shard": 3,
"status": 500,
"reason": "RemoteTransportException[[HOSTNAME][inet[/10.0.123.123:9300]][search/phase/query]]; nested: QueryPhaseExecutionException[[my_index][3]: query[ConstantScore(cache(_type:my_type))],from[0],size[10],sort[<custom:"id": org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@97c2b4f>]: Query Failed [Failed to execute main query]]; nested: ElasticSearchException[java.lang.NumberFormatException: Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)]; nested: UncheckedExecutionException[java.lang.NumberFormatException: Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)]; nested: NumberFormatException[Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)]; "
}

This query is operating against an index with about 100 different fields
(including several different nested types), but the relevant portion of the
mapping looks like this:

{
  "my_type" : {
  "_id"           : { "type" : "long", "path" : "id" },
  "properties"    : {
    "id"            : { "type" : "long" },
    /* ... LOTS OF OTHER FIELDS, INCLUDING MANY NESTED TYPES */
  }}
}

I've been able to isolate the shard failures to a minimal query of this
form:

{
  "query" : { "match_all" : { } },
  "sort" : [{
    "id" : { "order" : "asc" }
  }]
}

Basically, sorting by (numeric) ID causes shard failures when the shards
sometimes mistakenly think that there are non-numeric values in the "id"
field. I've audited the data, and it conforms with the schema. The id
fields always contain valid LONG values.

Whenever the shard failures occur, I can silence them for a few days by
optimizing the index, like this:

curl -XPOST '

http://HOSTNAME:9200/my_index/_optimize?max_num_segments=1'

And the shard failures will stop for a day or two, but inevitably, within
a few days the failures will return and I'll have to optimize the index
again. The weird thing is that the status URL always reports GREEN status
and all shards healthy, even when these queries are failing on every
request.

I experienced these failures originally on 0.90.5, but I continued seeing
the same problems after recently upgrading to 0.90.10. I even deleted the
index and rebuilt from scratch under 0.90.10, but I've kept seeing the same
failures.

Any idea what might be going on?

Thanks!

benji smith

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d8da85f-ed52-4127-b283-94998e851713%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_bSg1WNQWwcHeTfXgUpE0TczNv%3DSByfBriNCf-SVLWEg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Like Alex mentioned, I would check all the mappings to ensure the types of
the id field are all the same (doesn't matter what the value is in it -
what matters is the type defined in the mapping). Your error message means:
in one type the id field is long (in the mapping), and in the other type
the id field is int (in the mapping). And you are querying across those 2
types which gives this error.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5a86eafd-5c0f-4895-be49-650dca9db892%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes, you guys are right. There are multiple different types in this index,
and some of them have LONG ids, while others have INTEGER or STRING ids.
I'll have to redesign a few parts of the mappings to fix that problem.

I suppose the same restriction applies across nested types as well, right?
I'm heavily using nested types, and many of them have their own inner ID
fields.

Thanks for your help!

benji

On Thursday, February 13, 2014 7:45:37 AM UTC-5, Binh Ly wrote:

Like Alex mentioned, I would check all the mappings to ensure the types of
the id field are all the same (doesn't matter what the value is in it -
what matters is the type defined in the mapping). Your error message means:
in one type the id field is long (in the mapping), and in the other type
the id field is int (in the mapping). And you are querying across those 2
types which gives this error.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/63afc704-3d2e-452b-8ebc-808d9d3e3dfc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Are there any ES committers on the mailing list who wouldn't mind
commenting on this issue?

If I have two different types "user" and "session", and both of them have
and "id" field, shouldn't Elasticsearch understand that those are two
different fields, and that their fully-qualified names are actually
"user.id" and "session.id"? Using only fully-qualified names in the lucene
internals seems like a straightforward way to fix the problem.

Incidentally, it looks like there's a bug-report (submitted two YEARS ago!)
here:

If this is the desirable behavior, then why hasn't this bug been closed as
"won't fix"? Or if it's legitimately a bug, why wasn't it fixed before
releasing 1.0? It seems like a pretty fundamental flaw in the system that
the functionality of one type can be broken by the definition of another
essentially unrelated type.

I can understand why the behavior is what it is, historically, but it seems
self-evidently like a bug. In what kind of system would this be the
desirable behavior?

Thanks!

benji

On Thursday, February 13, 2014 10:48:56 AM UTC-5, Benji Smith wrote:

Yes, you guys are right. There are multiple different types in this index,
and some of them have LONG ids, while others have INTEGER or STRING ids.
I'll have to redesign a few parts of the mappings to fix that problem.

I suppose the same restriction applies across nested types as well, right?
I'm heavily using nested types, and many of them have their own inner ID
fields.

Thanks for your help!

benji

On Thursday, February 13, 2014 7:45:37 AM UTC-5, Binh Ly wrote:

Like Alex mentioned, I would check all the mappings to ensure the types
of the id field are all the same (doesn't matter what the value is in it -
what matters is the type defined in the mapping). Your error message means:
in one type the id field is long (in the mapping), and in the other type
the id field is int (in the mapping). And you are querying across those 2
types which gives this error.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/24c29e85-6a94-426d-9746-71960b23fd4c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

This is on the plate. I'm not 100% sure exactly what the fix will be but it
could be something along the lines of a warning when a mapping is
introduced with the same field name but different types.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8495900e-e4d7-4c9b-9c8c-d794bb94029e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for your comment! Looks like correct github issue to reference is
this one:

I've added my comments, and I'm rooting for a solution to this problem
rather than just a warning, which won't really solve the problem for us.
Fingers crossed!

benji

On Thursday, February 13, 2014 11:21:08 AM UTC-5, Binh Ly wrote:

This is on the plate. I'm not 100% sure exactly what the fix will be but
it could be something along the lines of a warning when a mapping is
introduced with the same field name but different types.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1cb37bc-1698-4b03-9791-535e9f12bbf0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I doubt this issue will ever be "fixed" since the limitation exists in
Lucene. All types belong to the same index and a field's data needs to be
uniform in Lucene's eyes. A document's type is used to indicate different
mappings for a document, but not different ways to segment the data types
in the index itself. This scenario should be documented however, so that
others do not fall into the same trap.

--
Ivan

On Thu, Feb 13, 2014 at 9:18 AM, Benji Smith benji@benjismith.net wrote:

Thanks for your comment! Looks like correct github issue to reference is
this one:

Field resolution should be unambiguous · Issue #4081 · elastic/elasticsearch · GitHub

I've added my comments, and I'm rooting for a solution to this problem
rather than just a warning, which won't really solve the problem for us.
Fingers crossed!

benji

On Thursday, February 13, 2014 11:21:08 AM UTC-5, Binh Ly wrote:

This is on the plate. I'm not 100% sure exactly what the fix will be but
it could be something along the lines of a warning when a mapping is
introduced with the same field name but different types.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c1cb37bc-1698-4b03-9791-535e9f12bbf0%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAdUBCn3jyjT_%3D51TXUQm4N4KdYYHrExWx4L7%2BxypKBKQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

This can absolutely be fixed in Elasticsearch. It's not a problem
with Lucene, but with how ES data is mapped onto the Lucene data model.

The problem is that types and fields use local names instead of
fully-qualified names. As far as Lucene is concerned, there would be a
field named "user.id" mapped as a long, another field named "product.id"
mapped as a string, and a nested type named "user.address.id" mapped as an
integer. Under this kind of system, "user" and "product" can exist in the
same index, without even the possibility that their names and types would
clash.

benji

On Thursday, February 13, 2014 6:41:41 PM UTC-5, Ivan Brusic wrote:

I doubt this issue will ever be "fixed" since the limitation exists in
Lucene. All types belong to the same index and a field's data needs to be
uniform in Lucene's eyes. A document's type is used to indicate different
mappings for a document, but not different ways to segment the data types
in the index itself. This scenario should be documented however, so that
others do not fall into the same trap.

--
Ivan

On Thu, Feb 13, 2014 at 9:18 AM, Benji Smith <be...@benjismith.net<javascript:>

wrote:

Thanks for your comment! Looks like correct github issue to reference is
this one:

Field resolution should be unambiguous · Issue #4081 · elastic/elasticsearch · GitHub

I've added my comments, and I'm rooting for a solution to this problem
rather than just a warning, which won't really solve the problem for us.
Fingers crossed!

benji

On Thursday, February 13, 2014 11:21:08 AM UTC-5, Binh Ly wrote:

This is on the plate. I'm not 100% sure exactly what the fix will be but
it could be something along the lines of a warning when a mapping is
introduced with the same field name but different types.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c1cb37bc-1698-4b03-9791-535e9f12bbf0%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2934c18e-fce5-4ffa-9fe4-b0115d53e2f9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.