NEST - Option to not deserialize the source of hits

Hi

I am using version 6.0.2 of NEST and Elasticsearch.Net.

For performance reasons, and because the source may have unknown fields added dynamically, I would like to not deserialize the sources, when getting my search results. I am only passing the hit sources along to the client, so I don't need the added deserialization and garbage collection overhead.

Currently I have solved this by:
var result = await client.SearchAsync<object>(searchRequest);
and then ToString() on the hit's source. This, however, feels like a hack and performance could be even better had "Source" been a string already.

An alternative could be if there was a way to get the json (or PostData) from a SearchRequest and then use the low level api, but then I wouldn't have my aggregations deserialized and I would like that.

Am I missing something or is my best bet to use ?

Best regards
/dba

There's a few ways you might go about tackling this:

  1. If you don't need all fields from _source, you can use Source Filtering to return the fields of interest
  2. You can use the low level client exposed on NEST to return the response as a byte array or string
var bytesResponse = await client.LowLevel.SearchAsync<BytesResponse>(
    "index", 
    "type", 
    PostData.Serializable(searchRequest));

// do something with the response bytes
var bytes = bytesResponse.Body
  1. Use ILazyDocument as the document type. This still performs some deserialization into a JToken (internalized Json.NET JToken type) but may perform better than deserializing to your own type.

It's not possible to control deserialization of aggregations independently of _source. The nearest built-in way would be with ILazyDocument.

Hi Russ

Thank you for the suggestions.

I guess I am looking for a combination of high-level and low-level. I want to use the high-level api for creating the query and reading the result in regards to "metadata" (aggregations and so on), I just don't need or want the object (the hit's source) to be deserialized, meaning I just want it is a string (or byte array), so that I can forward it to the client.
I did notice that when using SearchAsync<object> the source object is your internal json.NET object and I am just in luck that I can do a ToString() on it and get the json.
Would it maybe make sense to provide a non generic version of the search method or a special implementation for SearchAsync<string> that will give me a source of type string?

I can see others had the same need and found different solutions to it, but these are "broken" in 6.X, so instead of "hacking" it with SearchAsync<object>, it would be nice with an "official" way of doing this.
When moving large amount of data, the performance benefit would be noticeable.

Best regards
dba

Having a non-generic version would run counter to the design of the search API within the client, and I'm honestly not sure if the effort of implementing within NEST would be met by its usage. From what I've typically seen, folks tend to either want the whole response deserialized, or to handle the whole response as a string or byte array to pass of to another system. Both cases are covered by the high level client and low level client, respectively. I can be convinced otherwise though :smiley:

It would be possible to implement your own IElasticsearchSerializer to do this though. The interface is pretty straightforward:

And read from the stream up until the closing } at the same depth as the opening {. Using the JsonNetSerializer in NEST.JsonSerializer nuget package might get you closer, although you'll now be working with JsonReader to read JSON tokens.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.