_id length

jbalaguero · May 22, 2019, 2:12pm

We are using ES6.4 to index a cache whose values are xml documents. The key of this cache is generated capturing values from the xml document, so it's easy to get ids longer than 512 bytes.

Then when we perform an index request, the _id must be this key and since it's longer than 512, the bulk request fails.

So, is it possible (by config or in any way) to remove this limitation?

whatgeorgemade · May 22, 2019, 2:56pm

I wasn't even aware of a length restriction on the _id field. There's nothing about it in the docs so I'm not sure if it can be removed/changed.

Do you have to use that value as the actual document ID? If not, you could let Elasticsearch generate an _id value for you (by not specifying one yourself), then add your own ID value to a different field that's mapped as a keyword type. It would have implications on how you access the documents but those may not affect you.

warkolm · May 22, 2019, 3:20pm

There's a 512 byte limit, you may want to hash it down to that.

I'll raise an issue to get that documented

jbalaguero · May 22, 2019, 3:53pm

Our development is external, here I simplified the question but it's a bit more complicated. This cache belongs to a bank ... This hash you suggest implies work not only on our side but on their side... I don't think they are going to change the cache design because I tell them ES does not support ids longer than 512 bytes.

Instead of this, why not to add a new config parameter, something like "index.max_id_length=xxxx", and set it to 512 by default?

Thanks.

And when you validate the id length, instead of "id_length > 512 then error" write "id_length > index.max_id_length then error".

warkolm · May 22, 2019, 10:51pm

Can you potentially look at moving that value out of _id?

jbalaguero · May 23, 2019, 7:32am

We need to remove documents using the _id. In the remove method we only receive the cacheId and the cache object. If we use an autogenerated ID when inserting documents, we will need to store it into the cache object, and now that's not possible because we don't control this object. So my only chance is to add an intermediate concurrentmap where the key is the cacheId and the value the autogenerated id. Yes, I can do it.

But my question is: is this 512 bytes limitation a technical limitation? So if you say it is technically impossible to have ids longer than 512 then ok, that's the end of the story.

But if this limitation is because at any point in the past someone though that having such a long ids would be awful in terms of performance, maybe it would be great to let this decission to the end user. If someone needs longer ids, and if this is a performance problem it will be his/her job to request for more cores, memory or whatever he/she needs.

I don't know what implications can have to allow these longer ids (by config) in terms of your code. Maybe it will be easy to do it if this limit is only used in validation time when indexing documents, but I don't know if it's being used for something else.

Anyways, thanks for your time

warkolm · May 23, 2019, 10:13pm

Doc PR at Update id-field.asciidoc by markwalkom · Pull Request #42482 · elastic/elasticsearch · GitHub

It's a technical limitation in that it's a coded one. I don't know why it's there though, you may want to raise an issue to seek further clarification.

system · June 20, 2019, 10:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Validation Failed: 1: id is too long, must be no longer than 512 bytes Elasticsearch	2	1997	November 5, 2017
Id is too long, must be no longer than 512 Elasticsearch	3	3638	April 27, 2017
Maximum length of a specified document ID Elasticsearch	3	13502	July 5, 2017
Elasticsearch editing source code to accomodate _id field size more than 512 bytes Elasticsearch	2	406	August 9, 2019
Facing issue while indexing the data Elasticsearch	7	732	October 18, 2018

_id length

Related topics