Need some advice in dealing with large strings of data. It is our goal to OCR documents and make their text fully searchable. So far, we've been indexing the text without any issues, but recently have run into Lucene errors once we hit strings exceeding 32K characters.
What is the standard approach for dealing with large pieces of text like this, e.g. 500K+ characters? Is there a setting that will allow this or do we need to change our whole approach and indexing the data differently? Any guidance would be appreciated...
Thanks!