Serialize/Deserialize SearchResponse in Java

We're trying to take some load off our shared Elasticsearch cluster by caching our SearchResponse instances in a Redis cache. I'm aware that Elasticsearch provides a caching abstraction, but we'd like to keep resources off our cluster and on Redis for our specific application.

We're using the Java API for making all our requests and I noticed that you cannot serialize/deserialize SearchResponse objects out of the box. Ideally, I'd like to be able to convert them to and from strings. I have the following code to perform the serialization:

public static String serialize(SearchResponse response) throws IOException {
    BytesStreamOutput streamOutput = new BytesStreamOutput();
    response.writeTo(streamOutput);
    return streamOutput.bytes().toUtf8();
  }

  public static SearchResponse deserialize(String response) throws IOException {
    SearchResponse searchResponse = new SearchResponse();
    byte[] bytes = response.getBytes(Charset.forName("UTF-8"));
    ByteBufferStreamInput bbsi = new ByteBufferStreamInput(ByteBuffer.wrap(bytes));
    searchResponse.readFrom(bbsi);
    return searchResponse;
  }

Serialization seems to go fine. Upon inspection I have, what looks like, a valid search response. However, trying to deserialize that string results in some exceptions:

java.io.IOException: Can't read unknown type [63]
	at org.elasticsearch.common.io.stream.StreamInput.readGenericValue(StreamInput.java:422)
	at org.elasticsearch.common.io.stream.StreamInput.readMap(StreamInput.java:341)
	at org.elasticsearch.search.aggregations.InternalAggregation.readFrom(InternalAggregation.java:228)
	at org.elasticsearch.search.aggregations.bucket.terms.StringTerms$1.readResult(StringTerms.java:49)
	at org.elasticsearch.search.aggregations.bucket.terms.StringTerms$1.readResult(StringTerms.java:45)
	at org.elasticsearch.search.aggregations.InternalAggregations.readFrom(InternalAggregations.java:220)
	at org.elasticsearch.search.aggregations.InternalAggregations.readAggregations(InternalAggregations.java:202)
	at org.elasticsearch.search.internal.InternalSearchResponse.readFrom(InternalSearchResponse.java:134)
	at org.elasticsearch.search.internal.InternalSearchResponse.readInternalSearchResponse(InternalSearchResponse.java:126)
	at org.elasticsearch.action.search.SearchResponse.readFrom(SearchResponse.java:202)

and

java.io.EOFException
	at org.elasticsearch.common.io.stream.ByteBufferStreamInput.readBytes(ByteBufferStreamInput.java:76)
	at org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:96)
	at org.elasticsearch.common.io.stream.StreamInput.readText(StreamInput.java:232)
	at org.elasticsearch.search.SearchShardTarget.readFrom(SearchShardTarget.java:102)
	at org.elasticsearch.search.SearchShardTarget.readSearchShardTarget(SearchShardTarget.java:86)
	at org.elasticsearch.search.internal.InternalSearchHits.readFrom(InternalSearchHits.java:219)
	at org.elasticsearch.search.internal.InternalSearchHits.readFrom(InternalSearchHits.java:205)
	at org.elasticsearch.search.internal.InternalSearchHits.readSearchHits(InternalSearchHits.java:199)
	at org.elasticsearch.search.internal.InternalSearchResponse.readFrom(InternalSearchResponse.java:132)
	at org.elasticsearch.search.internal.InternalSearchResponse.readInternalSearchResponse(InternalSearchResponse.java:126)
	at org.elasticsearch.action.search.SearchResponse.readFrom(SearchResponse.java:202)

Is there something obvious I'm doing wrong here? This approach seems simple enough. Any help would be greatly appreciated!

Edit: I should add I'm using elasticsearch-2.4.2.jar

Leaving a reply for anyone that finds this and attempts to do this in the future. We decided against this. There's a lot of unneeded things tacked on to the java objects in the SearchResponse object (Date Formatters, etc). Since we'd pay a performance penalty for serialization, we decided to just cache our own data after we've transformed it from the raw SearchResponse object.

I'm not surprised it's blowing up, it's a custom serialization protocol, there's to reason to expect it to be valid UTF-8 bytes. In fact, I am surprised it's not blowing up sooner. If you must cache a string, you could base64 encode the bytes. You'd be better caching the bytes directly. The worst option would be converting the response to JSON and caching that, unless you're presenting raw JSON to your clients (I suspect and hope not).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.