[https://www.elastic.co/guide/en/elasticsearch/guide/current/_deep_dive_on_doc_values.html]
From the above link (In general) number fields doc_value will use less disk compare with string fields doc_values. I tried a simple test for the same but the results were vice versa. Please refer the below steps. (I tried in both 2.3.2 and 5.4.1)
I created two indices namely str_index & int_index.
str_index have 5 keyword fields (with field mapping{"norms":false,"index":"not_analyzed","type”:”keyword”,”doc_values":true})
int_index have 5 integer fields (with field mapping{“type”:”integer”,”doc_values":true})
Refer the following code snippet
public static void indexTestData()
{
int fieldCardinality = 5000;
int numDocsToBeIndexed = 10000000; //10 million
for(int i=0;i<numDocToBeIndexed;i++)
{
JSONObject strIndexDoc = getDocumentJson(i,fieldCardinality,true);
JSONObject intIndexDoc = getDocumentJson(i,fieldCardinality,false); // Did Bulk indexing on every 1000 docs.
}
}
public static JSONObject getDocumentJson(int docnumber, int fieldCardinality, boolean strField)
{
int value = docnumber % fieldCardinality;
JSONObject json = new JSONObject();
json.put("f1", strField ? "This is field1 doc value" + value : value);
json.put("f2", strField ? "This is field2 doc value" + value : value);
json.put("f3", strField ? "This is field3 doc value" + value : value);
json.put("f4", strField ? "This is field4 doc value" + value : value);
json.put("f5", strField ? "This is field5 doc value" + value : value);
return json;
}
The results are str_index size is 320.2mb and int_index size is 373.5mb.
Any ideas ?