What is the maximum text size that can be indexed as a single term?

ShayWeizman · December 22, 2021, 2:24pm

Hi,

I'm using 7.16.1,
When I save to elastic field with large amount of text it is not indexed as a single term.
What is the maximum text size for indexing field as a single term?

Thanks,
Shay

AquaX · December 22, 2021, 2:42pm

I've put some pretty big (multi thousand line stack traces) pieces of text in a field and have not run a problem. If your bulk insert is larger than 20MB it will break it up into smaller pieces, however if a single document is larger than 20MB is will just get indexed using the normal (non bulk) api.

ShayWeizman · December 22, 2021, 2:55pm

Thanks for your answer Andreas, but it not reply my question. I asked about something else.
I've saved a field with few lines of text (lets say 200 words) and the field is marked as _Igonred.
It is not indexed... I need to know what is the maximum field size that can be indexed as a single term.

AquaX · December 22, 2021, 3:00pm

Whoops. Sorry you're right I got mixed up with another post.
Check your ignore_above parameter...but 200 words is not a lot and that should easily work if you have not changed any settings before. There is a limit on the http size of 100MB if I remember correctly but that can be changed too.

ShayWeizman · December 22, 2021, 3:05pm

Thanks again, but I will wait for the elastic team member answer.

stephenb · December 22, 2021, 6:02pm

Hi @ShayWeizman

I am a bit confused as you are using some mixed terminology... let perhaps clarify a bit.

First there is a source document that contains fields.

Those _source field are then either "Indexed" which makes them searchable or they are not indexed and thus are not searchable.

Whether a field is indexed or not does not affect the _source unless you specifically drop the _source.

An indexed fields is then searchable.

Fields that are indexed have fields types example text (for full text search) or a keyword type which is for exact match or aggregations etc

keywords are used in term searches so many of us think that keyword and term are synonymous, so I am unclear what you are actually asking.

Also when we think of full text that is a sentence / paragraph etc we think of every word as a token (I think you may be using term for this)

Perhaps you could help clarify exactly are you trying to accomplish?

Are you asking what is the longest text field or the longest keyword field or the longest field in the _source field?

text fields can be easily be many MBs but that may or may not be most efficient

keywords can be very long as well but that is not efficient typically you use ignore_above to limit the actual length.

Or are you asking what is the longest string IN the text field?

There is also a binary / blob type that I think goes up to 2GB.

Or are you asking about an ingest strategy like you have big docs and you are unclear how to ingest them the way you want? example you are trying to use HTTP and it is chunking up the data?

There is a built in limit in the HTTP (chunk handling) layer that limits requests to 100mb. You can set it using http.max_content_length (for example, set it to a bigger value).

So back to what are you really asking / trying to accomplish / what is the actual issue you are trying to solve.

Mark_Harwood · December 22, 2021, 8:35pm

This blog gives full details.

ShayWeizman · December 24, 2021, 6:51pm

Hi Mark,

In this post you have said:
"This is typically ignored because the value is too large to be indexed as a single term."

So this is what I'm asking about.

Thanks,
Shay

Christian_Dahlqvist · December 24, 2021, 7:13pm

Quoting from the blog post:

The other big issue with the keyword field is it can’t handle very long fields. The default string mapping ignores strings longer than 256 characters, silently dropping values from the list of indexed terms. The majority of Elasticsearch’s log file messages exceed this limit.

And even if you do raise the Elasticsearch limit, you cannot exceed the hard Lucene limit of 32k for a single token, and Elasticsearch certainly logs some messages that exceed this.

ShayWeizman · December 26, 2021, 11:32am

Cool thanks!

system · January 23, 2022, 11:33am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Increasing Field Capacity Effectively Elasticsearch	1	409	February 7, 2019
Indexing large text fields Elasticsearch	3	1518	January 6, 2020
What's the right way to implement large text fields? Elasticsearch	2	966	March 6, 2017
Document contains at least one immense term in field .. again Elasticsearch	1	425	October 29, 2019
Elastic Search performance issues when searching on docs with large field data Elasticsearch	6	1180	October 22, 2018

What is the maximum text size that can be indexed as a single term?

Related topics