First of all Thanks for your time.
We want to give the user as less data as possible . So we restricted it to pages. Once the top page comes on top of that we apply information retrieval algorithms to extract the paragraphs and give it back to the user. Though it’s higgly difficult we are giving our best.
I got your points 2 and 3. But not point 1. Let’s say we index the last few lines of one page and first few lines of next as one document. How am I going to use that ? please suggest
We have tried point-2. Thanks for that anyway
Regarding point-3 can we use parent child relation ship for pages. For example, a page will be parent and paragraphs inside the page will be children. Whenever something matches I can return the parent which is page. Will this work ? The point here is to make the documents small To increase the relevancy as our present documents have 30-40 lines each page.
Please suggest, if possible
Thanks again as always :)