Streamlining Result JSON

Is there any way to transform result JSON prior to returning it? I'm trying to optimize the performance of our system, which relies on a sometime narrow pipe between a web server and an ElasticSearch index cluster. I've found that the transmission of ElasticSearch result JSON back to the WebServers can hurt performance in some extreme situations, due to restricted bandwidth.

Is there any facility in Elasticsearch (besides specifying the fields I'm interested in returning) to further reduce the amount of JSON being returned? For instance, my immediate need is for the results to return merely an integer ID and distance value for each matching document. But the result JSON is also including the index name and type in every hit. When dealing with very large result sets, the JSON formatting overhead can add up. Below is an example of some of the current JSON I'm getting back, followed by an example of JSON I really need:

Current Output Format Sample:
{
took: 95,
timed_out: false,
_shards: {
total: 5,
successful: 5,
failed: 0,
},
hits: {
total: 1457,
max_score: null,
hits: [
{
_index: "myindex",
_type: "mytype",
_id: "1849116",
_score: null,
fields: {
id: 1849116
},
sort: [
0.39546441522778686
]
},
{
_index: "myindex",
_type: "mytype",
_id: "723963",
_score: null,
fields: {
id: 723963
}
sort: [
0.47589688265048097
]
},
...

JSON I'd Like to Receive:
{
hits: {
total: 1457,
max_score: null,
hits: [
{id: 1849116, sort: 0.39546441522778686},
{id: 723963, sort: 0.47589688265048097},
...

Any suggestions/recommendations on how to accomplish this?

Thanks,
Chris

Would it be possible to set up a Node.js proxy on the Elasticsearch
side of that narrow pipe? That way you could transform the API in
whatever way you want.

On Thu, Nov 10, 2011 at 10:52 AM, Schnyder
chris.schnyder@cardinal-holdings.com wrote:

Is there any way to transform result JSON prior to returning it? I'm trying
to optimize the performance of our system, which relies on a sometime narrow
pipe between a web server and an Elasticsearch index cluster. I've found
that the transmission of Elasticsearch result JSON back to the WebServers
can hurt performance in some extreme situations, due to restricted
bandwidth. Is there any facility in Elasticsearch (besides specifying the
fields I'm interested in returning) to further reduce the amount of JSON
being returned? For instance, my immediate need is for the results to return
merely an integer ID and distance value for each matching document. But the
result JSON is also including the index name and type in every hit. When
dealing with very large result sets, the JSON formatting overhead can add
up. Below is an example of some of the current JSON I'm getting back,
followed by an example of JSON I really need: Current Output Format Sample:

{
took: 95,
timed_out: false,
_shards: {
total: 5,
successful: 5,
failed: 0,
},
hits: {
total: 1457,
max_score: null,
hits: [
{
_index: "myindex",
_type: "mytype",
_id: "1849116",
_score: null,
fields: {
id: 1849116
},
sort: [
0.39546441522778686
]
},
{
_index: "myindex",
_type: "mytype",
_id: "723963",
_score: null,
fields: {
id: 723963
}
sort: [
0.47589688265048097
]
},
...

JSON I'd Like to Receive:

{
hits: {
total: 1457,
max_score: null,
hits: [
{id: 1849116, sort: 0.39546441522778686},
{id: 723963, sort: 0.47589688265048097},
...

Any suggestions/recommendations on how to accomplish this? Thanks, Chris


View this message in context: Streamlining Result JSON
Sent from the Elasticsearch Users mailing list archive at Nabble.com.