How to index documents in elasticsearch in the hierarchical way ( depth of the tree)?

I have 1 pdf document in which there is a heading then subheading and it's paragraphs ( more than 1 paragraph). I don't know the depth of the tree.
For example:

> heading1
>   subheading1
>      old collecting she considered discovered. So at parties he warrant oh staying. Square new horses and put better end. Sincerity collected happiness do is contented. Sigh ever way now many. Alteration you any nor unsatiable diminution reasonable companions shy partiality. Leaf by left deal mile oh if easy. Added woman first get led joy not early jokes.
>      Prepared is me marianne pleasure likewise debating. Wonder an unable except better stairs do ye admire. His and eat secure sex called esteem praise. So moreover as speedily differed branched ignorant. Tall are her knew poor now does then. Procured to contempt oh he raptures amounted occasion. One boy assure income spirit lovers set. 
>      Is education residence conveying so so. Suppose shyness say ten behaved morning had. Any unsatiable assistance compliment occasional too reasonably advantages. Unpleasing has ask acceptance partiality alteration understood two. Worth no tiled my at house added. Married he hearing am it totally removal. Remove but suffer wanted his lively length. Moonlight two applauded conveying end direction old principle but. Are expenses distance weddings perceive strongly who age domestic. 
>   subheading2
>      Certainty listening no no behaviour existence assurance situation is. Because add why not esteems amiable him. Interested the unaffected mrs law friendship add principles. Indeed on people do merits to. Court heard which up above hoped grave do. Answer living law things either sir bed length. Looked before we an on merely. These no death he at share alone. Yet outward the him compass hearted are tedious.
>      Mr do raising article general norland my hastily. Its companions say uncommonly pianoforte favourable. Education affection consulted by mr attending he therefore on forfeited. High way more far feet kind evil play led. Sometimes furnished collected add for resources attention. Norland an by minuter enquire it general on towards forming. Adapted mrs totally company two yet conduct men. 
> heading2
>   subheading1
>     ....
>     ....
>   subheading2
>     .....
>     ....

What is the use case?

I mean : what kind of object do you want to retrieve? A full document? Just a paragraph?

I just want to retrieve paragraph here

Elasticsearch doesn't really handle hierarchical structures. There are of course several ways to solve this, one way I have used in the past is to split the document into small pieces, designed to be returned as search hit. Data should also be denormalized so you add all the data+metadata you need into the small documents. So in your instance perhaps create documents containing just a singe paragraph, with subheading, text for all headings above, index of place in document, document name, etc.
When you search you display only the data from the search hit. Once the use wants to read the document, you can create separate functions that scroll the user to the search hit, and you can use a similar data structure to search inside the document.

1 Like

Thank you so much @Babadofar for your input. Its really helpful for me

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.