Lucene Documents

Hi all,

I am not too sure that this is the right place. But we are new to build our own full-text engine. We are planning to index files on the fly using the Lucence Java API.

That would be an easy requirement as such, but (there always is a but, right?) the raw text files we receive are paged using custom page splits.

The requirement is to find the page in which the searched text is found. Therefore a document = page and not a file. I could build some custom code to analyze the pages prior indexing and pass these over to the indexer, but these files being scanned during indexing I would like to customize the parser to create a new document each time a page is found.

Ideal scenario would be to be able to retrieve either the file, the page or even the line the searched text is embedded in.

Is that doable? And if yes, would someone have a walkthrough?

It would be highly appreciated if someone has an answer.

Welcome to our community! :smiley:

While we have expertise in Lucene, building something for yourself directly on top of it is outside the scope of what we can help with here sorry to say.

That's ok. :slight_smile:

Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.