Index size

Your mapping looks fine. It seems like you store each file individually.
You can have it another way though, instead of storing each field, you can
store _source, but exclude the data field, then, you will be able to
compress it as well:
Elasticsearch Platform — Find real-time answers at scale | Elastic(check
exclude part).

Reducing the number of segments in the actual merge configuration will mean
slower indexing. I suggested using the optimize call to reduce it once you
loaded the data (its a one time effort).

On Sat, Jan 14, 2012 at 7:42 PM, slavag slavago@gmail.com wrote:

The mapping is :
"csv" : {
"_source" : {
"enabled" : false
},
"properties" : {
"id" : {
"index" : "not_analyzed",
"store" : "yes",
"type" : "string"
},
"source" : {
"index" : "not_analyzed",
"store" : "yes",
"type" : "string"
},
"file" : {
"index" : "no",
"store" : "yes",
"type" : "string"
},
"taskid" : {
"index" : "not_analyzed",
"store" : "yes",
"type" : "string"
},
"name" : {
"index" : "not_analyzed",
"store" : "yes",
"type" : "string"
},
"data" : {
"type" : "string"
},
"date" : {
"store" : "yes",
"format" : "dateOptionalTime",
"type" : "date"
},
"account" : {
"index" : "not_analyzed",
"store" : "yes",
"type" : "string"
}
}
}

In the csv have about 55k rows each row has 5-10 fields that each field
can have list of values, so all values within same row are indexed as one
list (array) and mapped in the "data" field (you can see in the mapping)
So, the total amount of source data with fields that are additional fields
is about 25mb and index (on the disk) is also about same size (more or
less).
So the question is this ok ?
And another question: is changing max_segments could affect on index speed
/ query speed ? Where is this configuration in java api ?

Thank You and Best Regards.