Tweet id (long number) being rounded

Walter_dos_Santos_Fi · February 29, 2012, 2:27pm

Hi,

I trying to index some tweets and learn more about ES. It seems to be
a very nice product! But I struggle with a problem. I created an index
and defined a mapping to index only 3 fields (screen_name, geo and
created_at). Also, the index stores the source.

Tweets have an identifier (id) that is stored using 64 bits. But when
the document is inserted in ES, this field is stored with a rounded
value in _source field. For example, the value 174204927055368192
becomes 174204927055368200.

I tried to find a setting in documentation to prevent this, but I
couldn't find it.

I am using pyes HTTP and the latest version of ES. Also tried Thrift
plugin (1.0.0) without success.

Any helps will be appreciated. Thanks in advance
---- mapping definition ---
{
"tweet" : {
"_ttl" : { "enabled" : true, "default" : "120d" },
"_all" : {"enabled" : false},
"_source" : {"enabled" : true, "compress": true},
"dynamic": false,
"properties" : {
"created_at": {"type": "date", "index": "not_analyzed", "format":
"yyyy-MM-dd'T'HH:mm:ss.SSSSSS"},
"text" : { "type" : "string", "analyzer" : "default"},
"user": {
"dynamic": false,
"type": "object",
"properties": {
"screen_name": {"type": "string", "index" : "not_analyzed"}
}
},
"geo": {
"dynamic": false,
"type": "object",
"properties": {
"coordinates": {"type": "geo_point", "lat_lon": true}
}
}
}
}
}
--- end mapping definition --

kimchy · February 29, 2012, 4:30pm

long values are signed 64bit integers, provide it as a string (which is what twitter uses for ids, they moved from numeric value a long time ago). Btw, it gets rounded on your end when you construct the json, the _source stored is the bytes of the json you provided, so when you ask for it, you get it as it was provided.

On Wednesday, February 29, 2012 at 4:27 PM, Walter dos Santos Filho wrote:

Hi,

I trying to index some tweets and learn more about ES. It seems to be
a very nice product! But I struggle with a problem. I created an index
and defined a mapping to index only 3 fields (screen_name, geo and
created_at). Also, the index stores the source.

Tweets have an identifier (id) that is stored using 64 bits. But when
the document is inserted in ES, this field is stored with a rounded
value in _source field. For example, the value 174204927055368192
becomes 174204927055368200.

I tried to find a setting in documentation to prevent this, but I
couldn't find it.

I am using pyes HTTP and the latest version of ES. Also tried Thrift
plugin (1.0.0) without success.

Any helps will be appreciated. Thanks in advance
---- mapping definition ---
{
"tweet" : {
"_ttl" : { "enabled" : true, "default" : "120d" },
"_all" : {"enabled" : false},
"_source" : {"enabled" : true, "compress": true},
"dynamic": false,
"properties" : {
"created_at": {"type": "date", "index": "not_analyzed", "format":
"yyyy-MM-dd'T'HH:mm:ss.SSSSSS"},
"text" : { "type" : "string", "analyzer" : "default"},
"user": {
"dynamic": false,
"type": "object",
"properties": {
"screen_name": {"type": "string", "index" : "not_analyzed"}
}
},
"geo": {
"dynamic": false,
"type": "object",
"properties": {
"coordinates": {"type": "geo_point", "lat_lon": true}
}
}
}
}
}
--- end mapping definition --