Uniqueness constraint


(Greg-3) #1

Hi all

Is it possible to enforce a uniqueness constraint for a particular field in
Elasticsearch? I would like to prevent multiple documents from being
indexed with the same email address.

Thanks
Greg


(Igor Motov) #2

Elasticsearch doesn't enforce any uniqueness constraints unless the field
is used as ID. If your documents already have IDs that you cannot be
changed, you could index email addresses as child records. Just make sure
that you are using op_type=create to ensure that new records doesn't
overwrite an existing record.

On Thursday, August 2, 2012 5:15:41 PM UTC-4, Greg wrote:

Hi all

Is it possible to enforce a uniqueness constraint for a particular field
in Elasticsearch? I would like to prevent multiple documents from being
indexed with the same email address.

Thanks
Greg


(Greg-3) #3

Thanks Igor, that's really useful to know.

Are there any restrictions to the length or the characters that can be used
for the ID field?

Greg

On Thursday, 2 August 2012 22:33:15 UTC+1, Igor Motov wrote:

Elasticsearch doesn't enforce any uniqueness constraints unless the field
is used as ID. If your documents already have IDs that you cannot be
changed, you could index email addresses as child records. Just make sure
that you are using op_type=create to ensure that new records doesn't
overwrite an existing record.

On Thursday, August 2, 2012 5:15:41 PM UTC-4, Greg wrote:

Hi all

Is it possible to enforce a uniqueness constraint for a particular field
in Elasticsearch? I would like to prevent multiple documents from being
indexed with the same email address.

Thanks
Greg


(Igor Motov) #4

I cannot think of any length restrictions besides URL size that your HTTP
client can handle and available memory. However, considering that ID is
used in almost all record-related operations, it would be beneficial to
keep it as small as possible.

For character set restrictions
see https://groups.google.com/d/topic/elasticsearch/zLpG_a2cPz4/discussion

On Thursday, August 2, 2012 6:20:25 PM UTC-4, Greg wrote:

Thanks Igor, that's really useful to know.

Are there any restrictions to the length or the characters that can be
used for the ID field?

Greg

On Thursday, 2 August 2012 22:33:15 UTC+1, Igor Motov wrote:

Elasticsearch doesn't enforce any uniqueness constraints unless the field
is used as ID. If your documents already have IDs that you cannot be
changed, you could index email addresses as child records. Just make sure
that you are using op_type=create to ensure that new records doesn't
overwrite an existing record.

On Thursday, August 2, 2012 5:15:41 PM UTC-4, Greg wrote:

Hi all

Is it possible to enforce a uniqueness constraint for a particular field
in Elasticsearch? I would like to prevent multiple documents from being
indexed with the same email address.

Thanks
Greg


(Clinton Gormley) #5

On Thu, 2012-08-02 at 14:33 -0700, Igor Motov wrote:

Elasticsearch doesn't enforce any uniqueness constraints unless the
field is used as ID. If your documents already have IDs that you
cannot be changed, you could index email addresses as child records.
Just make sure that you are using op_type=create to ensure that new
records doesn't overwrite an existing record.

I wrote a Perl module to handle just this situation:
https://metacpan.org/module/ElasticSearchX::UniqueKey

The downside of the module is that you have an extra index dedicated to
unique keys. The upside is that you can ensure that (eg) the email
address is unique, but still allow the user to update their email
address without having to reindex any docs that refer to the user's ID.

The concept used in the module is pretty simple, so it should be easy to
reimplement in the language of your choice

clint


(Clinton Gormley) #6

I wrote a Perl module to handle just this situation:
https://metacpan.org/module/ElasticSearchX::UniqueKey

For Perl users, UniqueKey is nicely integrated into Elastic::Model:

https://metacpan.org/module/Elastic::Manual::Attributes::Unique

clint


(Greg-3) #7

Thanks Clint, but I am not using Perl.

On Friday, 3 August 2012 10:37:58 UTC+1, Clinton Gormley wrote:

I wrote a Perl module to handle just this situation:
https://metacpan.org/module/ElasticSearchX::UniqueKey

For Perl users, UniqueKey is nicely integrated into Elastic::Model:

https://metacpan.org/module/Elastic::Manual::Attributes::Unique

clint


(Greg-3) #8

Should I raise this is a feature request, or is there a technical reason
why a field in Elasticsearch can't be made to be unique?

Greg


(Clinton Gormley) #9

On Fri, 2012-08-03 at 03:17 -0700, Greg wrote:

Thanks Clint, but I am not using Perl.

As I said:

The concept used in the module is pretty simple, so it should be easy
to
reimplement in the language of your choice

Even if you don't know Perl, the code with the docs should be fairly
easy to understand and reimplement

https://metacpan.org/module/ElasticSearchX::UniqueKey#SYNOPSIS
https://metacpan.org/source/DRTECH/ElasticSearchX-UniqueKey-0.03/lib/ElasticSearchX/UniqueKey.pm


(Greg-3) #10

Sorry, I missed the last part of your message. I will look at doing this,
but I'm really wondering whether this is something that would be useful (or
even possible) to have built in to Elasticsearch.

On Friday, 3 August 2012 11:34:43 UTC+1, Clinton Gormley wrote:

On Fri, 2012-08-03 at 03:17 -0700, Greg wrote:

Thanks Clint, but I am not using Perl.

As I said:

The concept used in the module is pretty simple, so it should be easy
to
reimplement in the language of your choice

Even if you don't know Perl, the code with the docs should be fairly
easy to understand and reimplement

https://metacpan.org/module/ElasticSearchX::UniqueKey#SYNOPSIS

https://metacpan.org/source/DRTECH/ElasticSearchX-UniqueKey-0.03/lib/ElasticSearchX/UniqueKey.pm


(Richard Louapre) #11

Is Igor's suggestion still the best way to handle uniqueness constraints in
ElasticSearch?

Is there any plans to have a built-in feature?

On Friday, August 3, 2012 6:48:20 AM UTC-4, Greg wrote:

Sorry, I missed the last part of your message. I will look at doing this,
but I'm really wondering whether this is something that would be useful (or
even possible) to have built in to Elasticsearch.

On Friday, 3 August 2012 11:34:43 UTC+1, Clinton Gormley wrote:

On Fri, 2012-08-03 at 03:17 -0700, Greg wrote:

Thanks Clint, but I am not using Perl.

As I said:

The concept used in the module is pretty simple, so it should be easy
to
reimplement in the language of your choice

Even if you don't know Perl, the code with the docs should be fairly
easy to understand and reimplement

https://metacpan.org/module/ElasticSearchX::UniqueKey#SYNOPSIS

https://metacpan.org/source/DRTECH/ElasticSearchX-UniqueKey-0.03/lib/ElasticSearchX/UniqueKey.pm

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #12

Can you please elaborate what kind of uniqueness you are interested in?

Do you think of same value in same fields? Do you want to ignore the whole
doc while indexing or search? Or only the field? Do you want to modify
_source or leave _source intact? Or do you want to version documents also
like with the ID uniqueness?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Richard Louapre) #13

I need to reject at indexing time if same value in the same field is found
in a different document.

On Wednesday, October 9, 2013 8:07:23 AM UTC-4, Jörg Prante wrote:

Can you please elaborate what kind of uniqueness you are interested in?

Do you think of same value in same fields? Do you want to ignore the whole
doc while indexing or search? Or only the field? Do you want to modify
_source or leave _source intact? Or do you want to version documents also
like with the ID uniqueness?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #14