Re: Why isn't Elasticsearch using Sha1 for id?


(Dan Pilone) #1

Do you mean SHA1 of the document itself? I'm pretty sure that would be a
problem for us as we can have nested documents with identical values. For
example, you could have:

<some_doc>

Dan
Pilone

...
</some_doc>

If "author" is indexed as a nested document it will need an id which can't
just be the SHA1 of the content we provided. Now I suppose if it has the
_parentId as part of the "content" then the SHA1 would be different, but I
don't know when/how the parent id is associated with nested docs. -- Dan

--
Dan Pilone
Managing Partner, Element 84 LLC
www.element84.com / dan@element84.com / 703-622-7370

On Tue, Jul 26, 2011 at 6:25 PM, ajsie johnny.weng.luu@gmail.com wrote:

CouchDB is using a 40 long characters SHA1 id and they say that the
risk is very minimal.

I wonder if there is a risk that the id Elastic search auto generates
will collide with another one since it's only 22 characters long.


(Paul Loy) #2

You can supply your own ids...

On Tue, Jul 26, 2011 at 11:44 PM, Dan Pilone dan@element84.com wrote:

Do you mean SHA1 of the document itself? I'm pretty sure that would be a
problem for us as we can have nested documents with identical values. For
example, you could have:

<some_doc>

Dan
Pilone

...
</some_doc>

If "author" is indexed as a nested document it will need an id which can't
just be the SHA1 of the content we provided. Now I suppose if it has the
_parentId as part of the "content" then the SHA1 would be different, but I
don't know when/how the parent id is associated with nested docs. -- Dan

--
Dan Pilone
Managing Partner, Element 84 LLC
www.element84.com / dan@element84.com / 703-622-7370

On Tue, Jul 26, 2011 at 6:25 PM, ajsie johnny.weng.luu@gmail.com wrote:

CouchDB is using a 40 long characters SHA1 id and they say that the
risk is very minimal.

I wonder if there is a risk that the id Elastic search auto generates
will collide with another one since it's only 22 characters long.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Shay Banon) #3

The id generated is a type4 UUID (128bit) that is then base64 to reserve
space.

On Wed, Jul 27, 2011 at 1:25 AM, ajsie johnny.weng.luu@gmail.com wrote:

CouchDB is using a 40 long characters SHA1 id and they say that the
risk is very minimal.

I wonder if there is a risk that the id Elastic search auto generates
will collide with another one since it's only 22 characters long.


(system) #4