Force elasticsearch uniqueness constraint


(rubyprince) #1

Hi,

We are using elasticsearch as our primary datastore and we are constantly
updating the elasticsearch indexes, using worker processes.

Before inserting, we have to check if another record with the same field is
already in the database, otherwise we dont write the duplicate. For this
purpose, after each write to the elasticsearch, we are doing an index
refresh so that we will get the original while searching the database (if
index is not refreshed, the record will not come in search, if it is
encountered within 1 second, which is the refresh rate of elasticsearch).
But this is affecting the insert rate of the database. Is there any better
way to check the uniqueness of a field in the database

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

Can you use this field as your document id?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 20 novembre 2013 at 09:57:26, Prince (princejcet@gmail.com) a écrit:

Hi,

We are using elasticsearch as our primary datastore and we are constantly updating the elasticsearch indexes, using worker processes.

Before inserting, we have to check if another record with the same field is already in the database, otherwise we dont write the duplicate. For this purpose, after each write to the elasticsearch, we are doing an index refresh so that we will get the original while searching the database (if index is not refreshed, the record will not come in search, if it is encountered within 1 second, which is the refresh rate of elasticsearch). But this is affecting the insert rate of the database. Is there any better way to check the uniqueness of a field in the database

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #3

Moving it to elasticsearch mailing list cause you replied to me instead of the ML.

By default, it will overwrite.
If you want elasticsearch to fail in that case, use
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'
or
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1/create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'
See: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index
.html#operation-type

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 21 novembre 2013 at 12:51:44, Prince (princejcet@gmail.com) a écrit:

What will happen if I try to insert another document with same id. Will it give a error (or overwrite the existing document) ?

On Wednesday, November 20, 2013 2:41:44 PM UTC+5:30, David Pilato wrote:
Can you use this field as your document id?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 20 novembre 2013 at 09:57:26, Prince (princ...@gmail.com) a écrit:

Hi,

We are using elasticsearch as our primary datastore and we are constantly updating the elasticsearch indexes, using worker processes.

Before inserting, we have to check if another record with the same field is already in the database, otherwise we dont write the duplicate. For this purpose, after each write to the elasticsearch, we are doing an index refresh so that we will get the original while searching the database (if index is not refreshed, the record will not come in search, if it is encountered within 1 second, which is the refresh rate of elasticsearch). But this is affecting the insert rate of the database. Is there any better way to check the uniqueness of a field in the database

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(rubyprince) #4

Awesome man!! Thanks for your great support.

On Thursday, November 21, 2013 5:30:31 PM UTC+5:30, David Pilato wrote:

Moving it to elasticsearch mailing list cause you replied to me instead of
the ML.

By default, it will overwrite.
If you want elasticsearch to fail in that case, use

$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create http://www.google.com/url?q=http%3A%2F%2Flocalhost%3A9200%2Ftwitter%2Ftweet%2F1%3Fop_type%3Dcreate&sa=D&sntz=1&usg=AFQjCNExGCvlsgMgNcLOS615srUiV0ci8A' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'

or

$ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create http://www.google.com/url?q=http%3A%2F%2Flocalhost%3A9200%2Ftwitter%2Ftweet%2F1%2F_create&sa=D&sntz=1&usg=AFQjCNGBxH7fypnxh_1FCaxjQeyDgztnlA' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'

See:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#operation-typehttp://www.google.com/url?q=http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Fdocs-index_.html%23operation-type&sa=D&sntz=1&usg=AFQjCNEfUzHVYmgX26SzxrOvQM5G00HsyQ

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonethttps://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fdadoonet&sa=D&sntz=1&usg=AFQjCNE-DMC3YEu3X_lhRIhUzuSZGsaSqA
| @elasticsearchfrhttps://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Felasticsearchfr&sa=D&sntz=1&usg=AFQjCNGfXdQ98RWFMJXdiqpKnZb5GMg0zA

Le 21 novembre 2013 at 12:51:44, Prince (princ...@gmail.com <javascript:>)
a écrit:

What will happen if I try to insert another document with same id. Will it
give a error (or overwrite the existing document) ?

On Wednesday, November 20, 2013 2:41:44 PM UTC+5:30, David Pilato wrote:

Can you use this field as your document id?

 -- 

David Pilato | Technical Advocate | Elasticsearch.com
@dadoonethttps://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fdadoonet&sa=D&sntz=1&usg=AFQjCNE-DMC3YEu3X_lhRIhUzuSZGsaSqA
| @elasticsearchfrhttps://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Felasticsearchfr&sa=D&sntz=1&usg=AFQjCNGfXdQ98RWFMJXdiqpKnZb5GMg0zA

Le 20 novembre 2013 at 09:57:26, Prince (princ...@gmail.com) a écrit:

Hi,

We are using elasticsearch as our primary datastore and we are
constantly updating the elasticsearch indexes, using worker processes.

Before inserting, we have to check if another record with the same field
is already in the database, otherwise we dont write the duplicate. For this
purpose, after each write to the elasticsearch, we are doing an index
refresh so that we will get the original while searching the database (if
index is not refreshed, the record will not come in search, if it is
encountered within 1 second, which is the refresh rate of elasticsearch).
But this is affecting the insert rate of the database. Is there any better
way to check the uniqueness of a field in the database

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5