Garbled document with ES 0.19.x with _source compressed


(Craig Brown) #1

I'm hoping this is enough to describe the issue. If not, then I can
post a gist with more information.

We have an ES 0.19.0 cluster of 4 nodes on AWS with amazon linux. with
all of our indexes, we have _source compression enabled. I've noticed
for some operations with the java api, the _source document is garbled
when retrieved from ES. In particular, it looks like (SearchHit)
hit.sourceAsString() will return a garbled source when new
String((SearchHit) hit.source()) returns the document fine. Also
performing a matchall() using curl and the REST api against the index
will return the document(s) fine.

If I disable _source compression and reindex the docuements, then I
don't see any problems with (SearchHit) hit.sourceAsString().
Upgrading to ES 0.19.1 did not have any effect. I did not see the
problem with several versions of 0.18.x that we used.

GARBLED
ZV{"aliases":[],"last_modified":"26 Jan 2012","source_datab 3_i
"0,"fir 5nam@E"F@ N` P

GOOD
{"aliases":[],"last_modified":"26 Jan 2012","source_database_id":
0,"first_names":"First Names","last_names":"Last Names","birth_date":
{"year":null,"month":null,"day":null},"death_date":
...
}

Thanks!

  • Craig

(Shay Banon) #2

Its a specific bug with compression enabled and using sourceAsString in the
Java API, its fixed in 0.19 branch (and upcoming 0.19.2), for now, you can
do the conversion to string yourself using the source byte array data.

On Tue, Apr 3, 2012 at 11:46 PM, Craig Brown cbrown@youwho.com wrote:

I'm hoping this is enough to describe the issue. If not, then I can
post a gist with more information.

We have an ES 0.19.0 cluster of 4 nodes on AWS with amazon linux. with
all of our indexes, we have _source compression enabled. I've noticed
for some operations with the java api, the _source document is garbled
when retrieved from ES. In particular, it looks like (SearchHit)
hit.sourceAsString() will return a garbled source when new
String((SearchHit) hit.source()) returns the document fine. Also
performing a matchall() using curl and the REST api against the index
will return the document(s) fine.

If I disable _source compression and reindex the docuements, then I
don't see any problems with (SearchHit) hit.sourceAsString().
Upgrading to ES 0.19.1 did not have any effect. I did not see the
problem with several versions of 0.18.x that we used.

GARBLED
ZV  {"aliases":[],"last_modified":"2 6 Jan 2012","source_datab 3 _i
" 0,"fir 5 nam@E "F@ N` P

GOOD
{"aliases":[],"last_modified":"26 Jan 2012","source_database_id":
0,"first_names":"First Names","last_names":"Last Names","birth_date":
{"year":null,"month":null,"day":null},"death_date":
...
}

Thanks!

  • Craig

(Craig Brown) #3

Thanks Shay! already implemented.

  • Craig

On Tue, Apr 3, 2012 at 3:44 PM, Shay Banon kimchy@gmail.com wrote:

Its a specific bug with compression enabled and using sourceAsString in
the Java API, its fixed in 0.19 branch (and upcoming 0.19.2), for now, you
can do the conversion to string yourself using the source byte array data.

On Tue, Apr 3, 2012 at 11:46 PM, Craig Brown cbrown@youwho.com wrote:

I'm hoping this is enough to describe the issue. If not, then I can
post a gist with more information.

We have an ES 0.19.0 cluster of 4 nodes on AWS with amazon linux. with
all of our indexes, we have _source compression enabled. I've noticed
for some operations with the java api, the _source document is garbled
when retrieved from ES. In particular, it looks like (SearchHit)
hit.sourceAsString() will return a garbled source when new
String((SearchHit) hit.source()) returns the document fine. Also
performing a matchall() using curl and the REST api against the index
will return the document(s) fine.

If I disable _source compression and reindex the docuements, then I
don't see any problems with (SearchHit) hit.sourceAsString().
Upgrading to ES 0.19.1 did not have any effect. I did not see the
problem with several versions of 0.18.x that we used.

GARBLED
ZV  {"aliases":[],"last_modified":"2 6 Jan 2012","source_datab 3 _i
" 0,"fir 5 nam@E "F@ N` P

GOOD
{"aliases":[],"last_modified":"26 Jan 2012","source_database_id":
0,"first_names":"First Names","last_names":"Last Names","birth_date":
{"year":null,"month":null,"day":null},"death_date":
...
}

Thanks!

  • Craig

--

CRAIG BROWN
chief architect
youwho, Inc.

www.youwho.com http://www.youwho.com/

T: 801.855. 0921
M: 801.913. 0939


(system) #4