This is correct, but does not change the fact that the heap size must be limited to no more than 50% of the available RAM. I think there is some confusion here. At a high level there are three main components to Elasticsearch's memory usage: the heap, direct memory, and filesystem cache (also known as page cache). Heap and direct memory are attributed to the Elasticsearch process; if they add up to more than the available RAM then the process is liable to be killed. Filesystem cache is not attributed to the process since it can all be discarded if needed (with a performance penalty obviously). Non-data nodes do not really need any filesystem cache as you say, but they still need direct memory for networking. Heap size is fixed at startup but direct memory grows and shrinks as needed. In older versions the direct memory size is limited to be no larger than the heap and in newer versions it has a slightly more conservative limit. Thus you need at least twice as much RAM as your heap size (plus overhead in older versions) just for the process, and anything left over is available for the filesystem cache.
The docs on this were adjusted to clarify this recently:
Elasticsearch requires memory for purposes other than the JVM heap and it is important to leave space for this. For instance, Elasticsearch uses off-heap buffers for efficient network communication, relies on the operating system’s filesystem cache for efficient access to files, and the JVM itself requires some memory too.
Here "off-heap buffers" is referring to direct memory, which is distinct from the filesystem cache.
It depends (of course). Since direct memory grows and shrinks as needed, it's possible that most of the spare 50% is used by the filesystem cache, but it's also possible that it's all taken up by networking. I know there have been changes within the 7.x series that affect the profile of direct memory usage but I've not been following the details.
No, I don't have any hard figures for this, except that it's certainly no more than the heap size. I would expect it to be quite spiky - I believe the most expensive time for a master node is when a lot of nodes are all concurrently joining the cluster, e.g. after a network partition heals, since the master must send the full cluster state to each joining node. In normal running I would expect it to be much lower.