Did you try the advice in the allocation explain message first?
manually call [POST /_cluster/reroute?retry_failed&metric=none] to retry
I think that would have been enough.
These are all from the data node. I think you need to look in the master node logs.
Your temp-file theory is possible, but Elasticsearch doesn't create humongous temp files as a matter of course so are you sure it's not something else? The only way I could see this happening is if you force-merged one or more enormous shards.