Doc_values disk usage

[https://www.elastic.co/guide/en/elasticsearch/guide/current/_deep_dive_on_doc_values.html]

From the above link (In general) number fields doc_value will use less disk compare with string fields doc_values. I tried a simple test for the same but the results were vice versa. Please refer the below steps. (I tried in both 2.3.2 and 5.4.1)

I created two indices namely str_index & int_index.

str_index have 5 keyword fields (with field mapping{"norms":false,"index":"not_analyzed","type”:”keyword”,”doc_values":true})

int_index have 5 integer fields (with field mapping{“type”:”integer”,”doc_values":true})

Refer the following code snippet

public static void indexTestData()
{
int fieldCardinality = 5000;
int numDocsToBeIndexed = 10000000; //10 million
for(int i=0;i<numDocToBeIndexed;i++)
{
JSONObject strIndexDoc = getDocumentJson(i,fieldCardinality,true);
JSONObject intIndexDoc = getDocumentJson(i,fieldCardinality,false); // Did Bulk indexing on every 1000 docs.
}

}

public static JSONObject getDocumentJson(int docnumber, int fieldCardinality, boolean strField)
{
int value = docnumber % fieldCardinality;
JSONObject json = new JSONObject();
json.put("f1", strField ? "This is field1 doc value" + value : value);
json.put("f2", strField ? "This is field2 doc value" + value : value);
json.put("f3", strField ? "This is field3 doc value" + value : value);
json.put("f4", strField ? "This is field4 doc value" + value : value);
json.put("f5", strField ? "This is field5 doc value" + value : value);
return json;
}

The results are str_index size is 320.2mb and int_index size is 373.5mb.

Any ideas ?

any idea ? @Adrein Grand

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.