Lowercase id field


(Mustafa Sener) #1

Hi,
In order to get a document with a known id we have to be careful about case
sensitivity. This is problematic in some situations. However, if we
convert all id field to lowercase this problem will be solved. We modified
following files and converted id to its lowercase version.
SinglePingRequest.java
DeleteRequest.java
GetRequest.java
IndexRequest.java
SingleShardOperationRequest.java

Is it possible to make such a modification in ES master? Does it make sense?

Thanks...

Mustafa Sener
www.ifountain.com
WebRep
Overall rating


(James Cook) #2

You can specify a custom analyzer which will index the id as a lowercase
value.

In our Elastic Search properties we specify:

index.analysis.analyzer.lowercase_keyword.type=custom
index.analysis.analyzer.lowercase_keyword.tokenizer=keyword
index.analysis.analyzer.lowercase_keyword.filter.0=lowercase

Then in the mapping file, we specify the use of this analyzer:

"username":{"type": "string", "index": "analyzed",
"analyzer":"lowercase_keyword"},

Jim Cook
tracermedia interactive http://www.tracermedia.com/

On Wed, Apr 20, 2011 at 10:07 AM, Mustafa Sener mustafa.sener@gmail.comwrote:

Hi,
In order to get a document with a known id we have to be careful about case
sensitivity. This is problematic in some situations. However, if we
convert all id field to lowercase this problem will be solved. We modified
following files and converted id to its lowercase version.
SinglePingRequest.java
DeleteRequest.java
GetRequest.java
IndexRequest.java
SingleShardOperationRequest.java

Is it possible to make such a modification in ES master? Does it make
sense?

Thanks...

Mustafa Sener
www.ifountain.com
WebRep
Overall rating


(Shay Banon) #3

I don't think that it make sense to have case insensitive id. As one of many examples, some auto generates ids rely on being case sensitive. elasticsearch itself, by the way, base64 the auto generated UUID it does to make it smaller.
On Wednesday, April 20, 2011 at 5:07 PM, Mustafa Sener wrote:

Hi,
In order to get a document with a known id we have to be careful about case sensitivity. This is problematic in some situations. However, if we
convert all id field to lowercase this problem will be solved. We modified following files and converted id to its lowercase version.
SinglePingRequest.java
DeleteRequest.java
GetRequest.java
IndexRequest.java
SingleShardOperationRequest.java

Is it possible to make such a modification in ES master? Does it make sense?

Thanks...

Mustafa Sener
www.ifountain.com
WebRep
Overall rating


(dbenson) #4

We ended up making our id field case insensitive, as we were getting
duplicates. We had some data providers who weren't consistent in their
case usage, so an updated document would appear as a duplicate.

We ended up solving this in our index submission app.

David


(Mustafa Sener) #5

I think we cannot specify analyzer for id fields right? I agree with David.
We have same situation here. Some data provider cause duplicate records. I
think at least there should be an option while defining mapping for id
fields.

On Wed, Apr 20, 2011 at 6:20 PM, James Cook jcook@tracermedia.com wrote:

You can specify a custom analyzer which will index the id as a lowercase
value.

In our Elastic Search properties we specify:

index.analysis.analyzer.lowercase_keyword.type=custom
index.analysis.analyzer.lowercase_keyword.tokenizer=keyword
index.analysis.analyzer.lowercase_keyword.filter.0=lowercase

Then in the mapping file, we specify the use of this analyzer:

"username":{"type": "string", "index": "analyzed",
"analyzer":"lowercase_keyword"},

Jim Cook
tracermedia interactive http://www.tracermedia.com/

On Wed, Apr 20, 2011 at 10:07 AM, Mustafa Sener mustafa.sener@gmail.comwrote:

Hi,
In order to get a document with a known id we have to be careful about
case sensitivity. This is problematic in some situations. However, if we
convert all id field to lowercase this problem will be solved. We modified
following files and converted id to its lowercase version.
SinglePingRequest.java
DeleteRequest.java
GetRequest.java
IndexRequest.java
SingleShardOperationRequest.java

Is it possible to make such a modification in ES master? Does it make
sense?

Thanks...

Mustafa Sener
www.ifountain.com
WebRep
Overall rating

--
Mustafa Sener
www.ifountain.com


(Shay Banon) #6

No, you can't specify it using an analyzer on the id field. The casing should be handled before it reaches a shard, since the hashing needs to make sure to take that into account. It can be added as a feature in the mapping definition.
On Thursday, April 21, 2011 at 10:27 AM, Mustafa Sener wrote:

I think we cannot specify analyzer for id fields right? I agree with David. We have same situation here. Some data provider cause duplicate records. I think at least there should be an option while defining mapping for id fields.

On Wed, Apr 20, 2011 at 6:20 PM, James Cook jcook@tracermedia.com wrote:

You can specify a custom analyzer which will index the id as a lowercase value.

In our Elastic Search properties we specify:

index.analysis.analyzer.lowercase_keyword.type=custom

index.analysis.analyzer.lowercase_keyword.tokenizer=keyword

index.analysis.analyzer.lowercase_keyword.filter.0=lowercase

Then in the mapping file, we specify the use of this analyzer:

"username":{"type": "string", "index": "analyzed", "analyzer":"lowercase_keyword"},

Jim Cook
tracermedia interactive

On Wed, Apr 20, 2011 at 10:07 AM, Mustafa Sener mustafa.sener@gmail.com wrote:

Hi,
In order to get a document with a known id we have to be careful about case sensitivity. This is problematic in some situations. However, if we
convert all id field to lowercase this problem will be solved. We modified following files and converted id to its lowercase version.
SinglePingRequest.java
DeleteRequest.java
GetRequest.java
IndexRequest.java
SingleShardOperationRequest.java

Is it possible to make such a modification in ES master? Does it make sense?

Thanks...

Mustafa Sener
www.ifountain.com
WebRep
Overall rating

--
Mustafa Sener
www.ifountain.com


(James Cook) #7

Yes, sorry for the misleading answer. I didn't realize that by "id", the OP
meant the "_id" field.

*-- jim
*

On Thu, Apr 21, 2011 at 6:17 AM, Shay Banon shay.banon@elasticsearch.comwrote:

No, you can't specify it using an analyzer on the id field. The casing
should be handled before it reaches a shard, since the hashing needs to make
sure to take that into account. It can be added as a feature in the mapping
definition.

On Thursday, April 21, 2011 at 10:27 AM, Mustafa Sener wrote:

I think we cannot specify analyzer for id fields right? I agree with David.
We have same situation here. Some data provider cause duplicate records. I
think at least there should be an option while defining mapping for id
fields.

On Wed, Apr 20, 2011 at 6:20 PM, James Cook jcook@tracermedia.com wrote:

You can specify a custom analyzer which will index the id as a lowercase
value.

In our Elastic Search properties we specify:

index.analysis.analyzer.lowercase_keyword.type=custom
index.analysis.analyzer.lowercase_keyword.tokenizer=keyword
index.analysis.analyzer.lowercase_keyword.filter.0=lowercase

Then in the mapping file, we specify the use of this analyzer:

"username":{"type": "string", "index": "analyzed",
"analyzer":"lowercase_keyword"},

Jim Cook
tracermedia interactive http://www.tracermedia.com/

On Wed, Apr 20, 2011 at 10:07 AM, Mustafa Sener mustafa.sener@gmail.comwrote:

Hi,
In order to get a document with a known id we have to be careful about case
sensitivity. This is problematic in some situations. However, if we
convert all id field to lowercase this problem will be solved. We modified
following files and converted id to its lowercase version.
SinglePingRequest.java
DeleteRequest.java
GetRequest.java
IndexRequest.java
SingleShardOperationRequest.java

Is it possible to make such a modification in ES master? Does it make
sense?

Thanks...

Mustafa Sener
www.ifountain.com
WebRep
Overall rating

--
Mustafa Sener
www.ifountain.com


(Mustafa Sener) #8

I created an issue for this feature

Thanks

On Thu, Apr 21, 2011 at 3:46 PM, James Cook jcook@tracermedia.com wrote:

Yes, sorry for the misleading answer. I didn't realize that by "id", the OP
meant the "_id" field.

*-- jim
*

On Thu, Apr 21, 2011 at 6:17 AM, Shay Banon shay.banon@elasticsearch.comwrote:

No, you can't specify it using an analyzer on the id field. The casing
should be handled before it reaches a shard, since the hashing needs to make
sure to take that into account. It can be added as a feature in the mapping
definition.

On Thursday, April 21, 2011 at 10:27 AM, Mustafa Sener wrote:

I think we cannot specify analyzer for id fields right? I agree with
David. We have same situation here. Some data provider cause duplicate
records. I think at least there should be an option while defining mapping
for id fields.

On Wed, Apr 20, 2011 at 6:20 PM, James Cook jcook@tracermedia.comwrote:

You can specify a custom analyzer which will index the id as a lowercase
value.

In our Elastic Search properties we specify:

index.analysis.analyzer.lowercase_keyword.type=custom
index.analysis.analyzer.lowercase_keyword.tokenizer=keyword
index.analysis.analyzer.lowercase_keyword.filter.0=lowercase

Then in the mapping file, we specify the use of this analyzer:

"username":{"type": "string", "index": "analyzed",
"analyzer":"lowercase_keyword"},

Jim Cook
tracermedia interactive http://www.tracermedia.com/

On Wed, Apr 20, 2011 at 10:07 AM, Mustafa Sener mustafa.sener@gmail.comwrote:

Hi,
In order to get a document with a known id we have to be careful about
case sensitivity. This is problematic in some situations. However, if we
convert all id field to lowercase this problem will be solved. We modified
following files and converted id to its lowercase version.
SinglePingRequest.java
DeleteRequest.java
GetRequest.java
IndexRequest.java
SingleShardOperationRequest.java

Is it possible to make such a modification in ES master? Does it make
sense?

Thanks...

Mustafa Sener
www.ifountain.com
WebRep
Overall rating

--
Mustafa Sener
www.ifountain.com

--
Mustafa Sener
www.ifountain.com


(system) #9