Simon
I wrote a small C# .NET app to index directly into ES, I got the same size differences and really slow indexing.
For ES 1.7.2 I used NEST client 1.7.2, For ES 2.3.2 I used NEST client 2.3.2, however I think the 1.7.2 client was able index in ES 2.3.2, anyway I tested with different client s to adhere to the supported ES/Client combinations.
Stats from head plugin
ES 1.7.2
upgarde-test
size: 15.5Mi (15.5Mi)
docs: 2,000 (2,000)
ES 2.3.2
upgarde-test
size: 2.84Gi (2.84Gi)
docs: 2,000 (2,000)
Mapping (Same mapping)
{ "mappings": { "tomato": { "_all" :{ "enabled" : false }, "properties": { "id": { "type": "string", "index": "not_analyzed", "doc_values": false }, "dateTimeCreated": { "type": "date" }, "dateTimeModified": { "type": "date" }, "name": { "type": "string" }, "description": { "type": "string" }, "isPublic": { "type": "boolean" }, "tomatoCenter": { "type": "geo_point", "geohash": true, "geohash_prefix": true, "geohash_precision": 3 }, "tomatoShape": { "type": "geo_shape", "tree": "quadtree", "precision": "1m" }, "type": { "type": "string" }, "farmId": { "type": "string", "index": "not_analyzed", "doc_values": false } } } } }
Same Content (Sample), all shapes are "circle"
{ "_index": "upgarde-test", "_type": "tomato", "_id": "AVURtqKTfJzofNCcHO8s", "_version": 1, "_score": 1, "_source": { "description": "Random batchNo5 #tomatoIdx3", "tomatorShape": { "coordinates": [ -87.21241972439854 , 41.53887701360597 ], "type": "circle", "radius": 4444 }, "tomatoCenter": [ -87.21241972439854 , 41.53887701360597 ], "isPublic": true, "name": "Batch#5 Count#3", "tags": [ "batchNo5" , "tomatoIdx3" ], "type": "tomato", "farmId": "farm_c6840442-7312-473c-8501-ed035dcc65bf", "dateTimeCreated": "0001-01-01T00:00:00" } }
C# code to index
var bulkDescriptor = new BulkDescriptor();
for (int b = 0; b < 200; b++)
{
var tomato= GetRandomTomato(i, b);
bulkDescriptor.Index<Tomato>(op => op.Document(tomato).Index(EsIndexName));
}
var result = ElasticNestClient.Bulk(bulkDescriptor);