Re: Why isn't Elasticsearch using Sha1 for id?

Do you mean SHA1 of the document itself? I'm pretty sure that would be a
problem for us as we can have nested documents with identical values. For
example, you could have:

<some_doc>

Dan
Pilone

...
</some_doc>

If "author" is indexed as a nested document it will need an id which can't
just be the SHA1 of the content we provided. Now I suppose if it has the
_parentId as part of the "content" then the SHA1 would be different, but I
don't know when/how the parent id is associated with nested docs. -- Dan

--
Dan Pilone
Managing Partner, Element 84 LLC
www.element84.com / dan@element84.com / 703-622-7370

On Tue, Jul 26, 2011 at 6:25 PM, ajsie johnny.weng.luu@gmail.com wrote:

CouchDB is using a 40 long characters SHA1 id and they say that the
risk is very minimal.

I wonder if there is a risk that the id Elastic search auto generates
will collide with another one since it's only 22 characters long.

You can supply your own ids...

On Tue, Jul 26, 2011 at 11:44 PM, Dan Pilone dan@element84.com wrote:

Do you mean SHA1 of the document itself? I'm pretty sure that would be a
problem for us as we can have nested documents with identical values. For
example, you could have:

<some_doc>

Dan
Pilone

...
</some_doc>

If "author" is indexed as a nested document it will need an id which can't
just be the SHA1 of the content we provided. Now I suppose if it has the
_parentId as part of the "content" then the SHA1 would be different, but I
don't know when/how the parent id is associated with nested docs. -- Dan

--
Dan Pilone
Managing Partner, Element 84 LLC
www.element84.com / dan@element84.com / 703-622-7370

On Tue, Jul 26, 2011 at 6:25 PM, ajsie johnny.weng.luu@gmail.com wrote:

CouchDB is using a 40 long characters SHA1 id and they say that the
risk is very minimal.

I wonder if there is a risk that the id Elastic search auto generates
will collide with another one since it's only 22 characters long.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

The id generated is a type4 UUID (128bit) that is then base64 to reserve
space.

On Wed, Jul 27, 2011 at 1:25 AM, ajsie johnny.weng.luu@gmail.com wrote:

CouchDB is using a 40 long characters SHA1 id and they say that the
risk is very minimal.

I wonder if there is a risk that the id Elastic search auto generates
will collide with another one since it's only 22 characters long.