COMB uuid or string equivilents to speed up inserts


(Dennis) #1

Anyone ever looked at the indexing (i.e. INSERTing in database parlance)
speed using the random generated uuids or COMB uuids in ElasticSearch?

There are also string equivalent, and base 64 equivalents. I may try a
simple bash script and see what happens.


(Dennis) #2

OK, I finally got to this point in design and production.

I generate a COMB_GUID where the upper 32 bits are based on the bits 33
through 1 of Unix time in milliseconds. So, there are 93 bits of randomness
every 2 milliseconds and the rollover on the upper bits happnes every 106
years.

When inserting in postgres the ratio of speed between a fully random UUID
and a COMB _GUID holds as beneficial for the COMB_GUID.
The COMB_GUID is 2X faster.

In ElasticSearch, there is NO discernible difference between the two for
indexing. I'm still going to use COMB_GUIDS in case content goes to BTREE
indexes anywhere in the chain as if the content is fed time related, or can
be presorted on the id field so that it IS timer related and partially
sequential, it will speed up.

Pretty interesting.

On Monday, May 14, 2012 7:00:42 PM UTC-7, Dennis wrote:

Anyone ever looked at the indexing (i.e. INSERTing in database parlance)
speed using the random generated uuids or COMB uuids in ElasticSearch?

http://stackoverflow.com/questions/170346/what-are-the-performance-improvement-of-sequential-guid-over-standard-guid/170363#170363

There are also string equivalent, and base 64 equivalents. I may try a
simple bash script and see what happens.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Dennis) #3

Just to be sure that something wasn't 'endian' between the indexing of
Elastic search and Postgres, I flipped the 32 bits end for end and put them
on the other end of the string. Exact same result. The lucene indexing for
the '_id' field is apparently quite different than the indexing in Postgres.

Someday in the next 6 months, I'll do this same test on CouchDB and maybe
MongoDB.

On Thursday, September 26, 2013 2:45:21 PM UTC-7, Dennis wrote:

OK, I finally got to this point in design and production.

I generate a COMB_GUID where the upper 32 bits are based on the bits 33
through 1 of Unix time in milliseconds. So, there are 93 bits of randomness
every 2 milliseconds and the rollover on the upper bits happnes every 106
years.

When inserting in postgres the ratio of speed between a fully random UUID
and a COMB _GUID holds as beneficial for the COMB_GUID.
The COMB_GUID is 2X faster.

In ElasticSearch, there is NO discernible difference between the two for
indexing. I'm still going to use COMB_GUIDS in case content goes to BTREE
indexes anywhere in the chain as if the content is fed time related, or can
be presorted on the id field so that it IS timer related and partially
sequential, it will speed up.

Pretty interesting.

On Monday, May 14, 2012 7:00:42 PM UTC-7, Dennis wrote:

Anyone ever looked at the indexing (i.e. INSERTing in database parlance)
speed using the random generated uuids or COMB uuids in ElasticSearch?

http://stackoverflow.com/questions/170346/what-are-the-performance-improvement-of-sequential-guid-over-standard-guid/170363#170363

There are also string equivalent, and base 64 equivalents. I may try a
simple bash script and see what happens.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4