Indexing long documents in chunks

It strongly depends on what you mean by "long documents". Is the volume 100k? 1000k? 1000m? Or the count?

Note that parent/child relationship requires routing and routing efficiency depends on your shard structure. If you have few nodes and few shards, there is not much difference, but if you can scale your nodes to a few dozen or hundreds, or if you can distribute the documents over several indices, you can handle large shard count and parent/child can be distributed more comfortably. The shard size is crucial, it should not grow over some GB.

  1. Yes, indexing your paragraphs with coordinates, like 'document ID' and 'paragraph ID', into Elasticsearch documents, will make sense. Your queries will return exact paragraph coordinates as result.

  2. "most efficient" always depends on your query use case and how much time/space you want to trade. For example, you can use aggregate query to return the estimated document count for the matched paragraphs. But sometimes you want the exact document count, and maybe a second query would make more sense.

You can index document metadata in a "metadata document" or augment all paragraphs with the metadata. The "metadata document" would require an extra "get" request while the denormalized metadata over all paragraphs take more space and is hard to change once written. So there is always a price to pay, and no exact answer to your question.