Forcemerge into one segment - bad for Sorting performance on doc_value fields?

Hi All,

I have a number of fairly large time-based indices which we create every week.
Now this piece of documentation in the Guide
https://www.elastic.co/guide/en/elasticsearch/guide/current/merge-process.html#optimize-api
gave me the impression that having less number of Segments would give me better performance in any case.

So we always force a segment merge down to one single segment after we are done with the weekly indexing.

A weekly index is roughly 2,000,000,000 documents with overall ~500GB spread across two shards (with 1 replica) on overall 4 nodes.

What I recognized with our large indices is that something seem to hurt sort performance a lot (the sorts always goes on doc_value fields).
Also what I noticed is that the results of sorted queries do not benefit that much from caching.
Subsequent executions of sorted simple filter queries still run ~1.5 - 2 seconds.

Here are the details on my two segments of the primary shards:

"_38h" : {
  "generation" : 4193,
  "num_docs" : 995808448,
  "deleted_docs" : 0,
  "size_in_bytes" : 253814717556,
  "memory_in_bytes" : 752220123,
  "committed" : true,
  "search" : true,
  "version" : "5.4.1",
  "compound" : false
}
...
"_363" : {
    "generation" : 4107,
    "num_docs" : 995768798,
    "deleted_docs" : 0,
    "size_in_bytes" : 253690281166,
    "memory_in_bytes" : 749432404,
    "committed" : true,
    "search" : true,
    "version" : "5.4.1",
    "compound" : false
}

Thats a segments size of ~236GB. Is there a point where too large Segments have negative impacts on sorts (or IO Buffering in general)?
My naive theory is that those segments are way to large for the OS to do efficient IO buffering and because we use doc values we are hitting the disks way to often (and also for subsequent re-execution of the same query)

Does someone have experience or know about negative impacts of too large segments?

Cheers
Robert

If your query is just a match_all sorted by a date field, then search performance should be about the same regardless of the number of segments. I don't think the OS buffering would perform worse on a large files. Merging down to fewer segments mostly helps with queries that are terms-dictionary intensive like range or prefix queries, and things that need global ordinals like parent/child queries and terms aggregations.