Possible optimisations for large _source documents


(Yann Simon) #1

Hi!

For a query, I observe a performance difference if I include or not the _source.
With: "_source": true, the query needs about 250 ms to complete.
With: "_source": false, the query needs about 50 ms to complete.

In the mapping, I already disabled the _all.
I search about how to optimise this use-case and could not found any interesting information.

If you have any tips:

  • about a possible configuration (disk, kernel, JVM...)
  • about how to work-around this.

For example, has anybody tried to put the JSON as a String into one field, like {"json": "<the json of the document>"}?

Thanks in advance for any pointers,
Yann


(David Pilato) #2

Note that you can still store the _source field but when you query, you can ask for other fields but _source.


(Colin Goodheart-Smithe) #3

The JSON of the indexed document is already stored in a single field (called _source) . A couple of questions about your use case:

  • How many documents are you retrieving? (Maybe you could share you whole search request?)
  • How much memory have you allowed for the FileSystem Cache?

(Yann Simon) #4

yes thx! I know that already.
In that case, I'm interested in the whole _source.


(Yann Simon) #5

Yes I am speaking exactly about this single field _source. When I deactive the retrieving of _source in the query results, I observe this difference of performance.
I am retrieving 20 documents.

I am not sure about what you ask precisely with FileSystem Cache. The free command outputs:

$ free
             total       used       free     shared    buffers     cached
Mem:      15396852   12732848    2664004       4296     153104    3033332

(Yann Simon) #6

I'll try to compress the http responses to check if it makes a difference.
If you have any other idea, feel free to suggest. Thx!


(Chang Wang) #7

We're running into the same problem, when including _source, it takes 2+ seconds for a size of 50, when _source is false it takes 400ms

This is our query

GET /case_centric/case_centric/_search?pretty
{
  "size": 50,
  "from": 0,

  "_source": {
    "includes": ["diagnoses.days_to_last_follow_up"]
  },
  
  "query": {
    "bool": {
      "filter": [
        {
          "nested": {
            "path": "diagnoses",
            "query": {
              "bool": {
                "must": [
                  {
                    "exists": {
                      "field": "diagnoses.vital_status"
                    }
                  }
                ]
              }
            }
          }
        }
      ],
      "should": [
        {
          "bool": {
            "should": [
              {
                "nested": {
                  "path": "diagnoses",
                  "query": {
                    "bool": {
                      "must": [
                        {
                          "exists": {
                            "field": "diagnoses.days_to_death"
                          }
                        }
                      ]
                    }
                  }
                }
              },
              {
                "nested": {
                  "path": "diagnoses",
                  "query": {
                    "bool": {
                      "must": [
                        {
                          "exists": {
                            "field": "diagnoses.days_to_last_follow_up"
                          }
                        }
                      ]
                    }
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

(system) #8