Thanks again for the reply.
Taking in @leandrojmp 's response, below, I have confirmed that while the source index is on the data_hot tier, the .enrich-* index is on data_content.
@stephenb , your point about the enrich indices being on the data nodes makes sense. I was hoping, though, to get confirmation that the enrichment data would be cached in memory on the ingest nodes to avoid having to perform a lookup against the .enrich index for each document.
I have checked the read iops on the data_content tier, and nothing is being adversely impacted, so I suspect that the data is cached in memory there. It would be nice to eliminate the round-trip latency and the implications for bandwidth utilisation, given that this cluster is spread across multiple AZs in AWS.
I currently have separate ingest and data_hot nodes, and am reluctant to merge those roles as the cluster is very ingest-heavy, and both the data_hot and ingest nodes are running about as warm as I am comfortable with.
The docs also suggest that it could be prudent to have distinct ingest-only nodes for heavy loads.
As for how I determined the docs count, I have stack monitoring set up for the cluster and noted log messages about the enrich policy execution which indicated that there was a suspiciously-round number of documents processed. I went into Stack Management -> Index Management -> Indices, found the source index, and viewed the stats. I then did the same for the relevant .enrich index and found the discrepancy.
It has occurred fairly often, to the extent that my data has been negtively affected, and that I have had manually trigger a refresh of the source data, wait until the counts matched, and then disable the periodic refresh of the source index data to prevent the discrepancy returning.
Information overload again, sorry!