How to index (and search) large strings

Need some advice in dealing with large strings of data. It is our goal to OCR documents and make their text fully searchable. So far, we've been indexing the text without any issues, but recently have run into Lucene errors once we hit strings exceeding 32K characters.

What is the standard approach for dealing with large pieces of text like this, e.g. 500K+ characters? Is there a setting that will allow this or do we need to change our whole approach and indexing the data differently? Any guidance would be appreciated...


Was thinking of attempting to break apart large pieces of text into an array (instead of one gigantic field). Would this be a suitable solution? Any better options?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.