Unique Constraint?

I'm having problems understanding where I'm going wrong. I've set up a
simple ES cluster and I am able to insert data fairly quickly using a
simple program.

I would like to be able to define a unique constraint (ID Field) on the
data and use it to remove remove any true duplicates which are duplicated
across all fields. If I can use an ID column that I provide, I would be
able to re-add a range of data knowing I had not created any duplicates.

It kind of sounded like the ID field does that, but it isn't clear from the
documentation.

Following the ES web page suggestion, I added the following...
http://www.elasticsearch.org/guide/reference/mapping/id-field/

When running the below script, I get an error about the provided ID not
matching the content one.

What am I doing wrong?

curl -XPOST
'http://localhost:9200/log20130207/webservicebolcalls?replication=async' -d
'{
"_id": {
"type": "string",
"index": "not_analyzed",
"store" : "yes"
}
}
'
#log20130116

#curl -XPOST
'http://internal-cabinlb-279349935.us-east-1.elb.amazonaws.com:9200/log20130207/webservicebolcalls?replication=async?op_type=create'
-d '{
curl -XPOST
'http://localhost:9200/log20130207/webservicebolcalls?replication=async' -d
'{
"_timestamp":"1/16/2013 2:03:07 AM",
"_id":"2171985207026",
"Milliseconds":"720",
"MessageId":"e242134d-15b4-46e9-be88-61c96f97a797",
"Direction":"Outbound",
"Webid":"",
"Token":"",
"ErrorCode":"0",
"MessageText":"|results|",
"StackTrace":"",
"Client":"10.3.44.1",
"ClientId":"",
"EmployeeId":""
}
'

#{"ok":true,"_index":"log20130207","_type":"webservicebolcalls","_id":"UE6H-LrARnmqF_JBC5LPkw","_version":1}
#{"error":"MapperParsingException[Failed to parse [_id]]; nested:
MapperParsingException[Provided id [TPGerHo3QLKz3jx1UXR09g] does not match
the content one [2171985207026]]; ","status":400}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello Ralph,

The problem seems to be that the URI you're using is of the following
format:

$HOST:$PORT/$INDEX/$TYPE

The other option is to have

$HOST:$PORT/$INDEX/$TYPE/$ID

When you index without ID in the URI, ES creates a random ID for you
(TPGerHo3QLKz3jx1UXR09g in your case), which doesn't match the one you
provide in the document (2171985207026). I would say the way to go is to
skip providing the "_id" field in the document (because ES takes it from
the URI), and put it in the URI, like this:

curl -XPOST '
http://localhost:9200/log20130207/webservicebolcalls/2171985207026?op_type=create-d
'{
"Milliseconds":"720",
[...]
}'

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Tue, May 7, 2013 at 10:21 PM, Ralph Trickey ralphtrickey@gmail.comwrote:

I'm having problems understanding where I'm going wrong. I've set up a
simple ES cluster and I am able to insert data fairly quickly using a
simple program.

I would like to be able to define a unique constraint (ID Field) on the
data and use it to remove remove any true duplicates which are duplicated
across all fields. If I can use an ID column that I provide, I would be
able to re-add a range of data knowing I had not created any duplicates.

It kind of sounded like the ID field does that, but it isn't clear from
the documentation.

Following the ES web page suggestion, I added the following...
http://www.elasticsearch.org/guide/reference/mapping/id-field/

When running the below script, I get an error about the provided ID not
matching the content one.

What am I doing wrong?

curl -XPOST '
http://localhost:9200/log20130207/webservicebolcalls?replication=async'
-d '{
"_id": {
"type": "string",
"index": "not_analyzed",
"store" : "yes"
}
}
'
#log20130116

#curl -XPOST '
http://internal-cabinlb-279349935.us-east-1.elb.amazonaws.com:9200/log20130207/webservicebolcalls?replication=async?op_type=create'
-d '{
curl -XPOST '
http://localhost:9200/log20130207/webservicebolcalls?replication=async'
-d '{
"_timestamp":"1/16/2013 2:03:07 AM",
"_id":"2171985207026",
"Milliseconds":"720",
"MessageId":"e242134d-15b4-46e9-be88-61c96f97a797",
"Direction":"Outbound",
"Webid":"",
"Token":"",
"ErrorCode":"0",
"MessageText":"|results|",
"StackTrace":"",
"Client":"10.3.44.1",
"ClientId":"",
"EmployeeId":""
}
'

#{"ok":true,"_index":"log20130207","_type":"webservicebolcalls","_id":"UE6H-LrARnmqF_JBC5LPkw","_version":1}
#{"error":"MapperParsingException[Failed to parse [_id]]; nested:
MapperParsingException[Provided id [TPGerHo3QLKz3jx1UXR09g] does not match
the content one [2171985207026]]; ","status":400}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Radu, I'll give that a try.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.