I get huge differences in index size when bulk indexing large MultiPolygons across version of ES (1.5.2 and 5.5.0) with the following mapping:
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"lowercase": {
"type": "custom",
"filter": " lowercase",
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"_default_": {
"properties": {
"geometry": {
"precision": "1m",
"tree": "quadtree",
"type": "geo_shape"
},
"uri": {
"index": "not_analyzed",
"type": "string"
},
"id": {
"index": "not_analyzed",
"store": true,
"type": "string"
},
"type": {
"index": "not_analyzed",
"type": "string"
},
"name": {
"fields": {
"analyzed": {
"index": "analyzed",
"store": true,
"type": "string"
},
"exact": {
"analyzer": "lowercase",
"store": true,
"type": "string"
}
},
"type": "string"
},
"dataset": {
"index": "not_analyzed",
"type": "string"
},
"validSince": {
"format": "date_optional_time",
"type": "date"
},
"validUntil": {
"format": "date_optional_time",
"type": "date"
}
}
}
}
}
What could be the reason of this?
The 1.5.2 version runs on a AWS cluster and the 5.5.0 on a single machine with ES getting 16Gb of RAM.
Size on the cluster is 7.4mb, on the single node is 4.8gb (using precision 10m otherwise it gets out of heap space).
I get the same behaviour if I update the mapping to use text iso string for 5.5.0
Stefano
