How to index (and search) large strings

denmcktr · June 26, 2019, 6:05pm

Need some advice in dealing with large strings of data. It is our goal to OCR documents and make their text fully searchable. So far, we've been indexing the text without any issues, but recently have run into Lucene errors once we hit strings exceeding 32K characters.

What is the standard approach for dealing with large pieces of text like this, e.g. 500K+ characters? Is there a setting that will allow this or do we need to change our whole approach and indexing the data differently? Any guidance would be appreciated...

Thanks!

denmcktr · July 8, 2019, 1:43pm

Was thinking of attempting to break apart large pieces of text into an array (instead of one gigantic field). Would this be a suitable solution? Any better options?

system · August 5, 2019, 1:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to index large text string data? Elasticsearch	4	1582	September 28, 2020
Document contains at least one immense term in field="REGIONS" (whose UTF8 e Elasticsearch	3	1537	July 5, 2017
Are there special considerations for indexing extremely long Strings Elasticsearch	2	1316	July 5, 2017
Indexing very long word Elasticsearch	1	486	April 22, 2020
Large string fields Elasticsearch	6	4744	February 15, 2017

How to index (and search) large strings

Related topics