Question on Index Size

Rahul_Sharma · August 5, 2012, 12:12pm

Hi,

I was trying to understand how much size an index would take for a certain
size of input data.

Below is the scenario and observation:

For the purpose I picked 10 column csv with 1000 rows. The size of the
csv is 111 KB.
I created 2 Field (of type String) for each column. 1 Analyzed to run
search and 1 Not Analyzed to run facet.
The index was configured to create 5 segments.
After indexing I found that the size of the index was 4.5MB. (This
includes all 5 shards , trans log etc...)
Which means its almost 45 times more than the original size. Which is
very significant increase.

Then I tried not to include _source and found that the index size reduced
by 25%. Came down to 3 mb. Which is still significant.

Am I missing something? Or is there any other ways to reduce the size of
Indices?

Thanks
Rahul

kimchy · August 7, 2012, 10:00pm

Store level compression is one way to compress the data (available in 0.19.8). Note that CSV is quite different by itself than json format.

On Aug 5, 2012, at 2:12 PM, Rahul Sharma rahul.sharma.coder@gmail.com wrote:

Hi,

I was trying to understand how much size an index would take for a certain size of input data.

Below is the scenario and observation:

For the purpose I picked 10 column csv with 1000 rows. The size of the csv is 111 KB.

I created 2 Field (of type String) for each column. 1 Analyzed to run search and 1 Not Analyzed to run facet.

The index was configured to create 5 segments.

After indexing I found that the size of the index was 4.5MB. (This includes all 5 shards , trans log etc...)

Which means its almost 45 times more than the original size. Which is very significant increase.

Then I tried not to include _source and found that the index size reduced by 25%. Came down to 3 mb. Which is still significant.

Am I missing something? Or is there any other ways to reduce the size of Indices?

Thanks
Rahul

Rahul_Sharma · August 7, 2012, 10:11pm

I am in 0.19.3

I tried following things which helped.

Reduced the size of Field keys.
Did a _source compression.

This reduced the size of index by half, but in my context its still very
big.

I tried removing _source. This reduced the size of the index 50%
further. But if I understand correctly, search will stop working as it
works on _source.
Since I store each value as field for faceting, wondering if there is a
way to reconstruct the document from the Fields which I index as "not
analyzed".

Thanks
Rahul

On Wed, Aug 8, 2012 at 3:30 AM, Shay Banon kimchy@gmail.com wrote:

Store level compression is one way to compress the data (available in
0.19.8). Note that CSV is quite different by itself than json format.

On Aug 5, 2012, at 2:12 PM, Rahul Sharma rahul.sharma.coder@gmail.com
wrote:

Hi,

I was trying to understand how much size an index would take for a certain
size of input data.

Below is the scenario and observation:

For the purpose I picked 10 column csv with 1000 rows. The size of the
csv is 111 KB.

I created 2 Field (of type String) for each column. 1 Analyzed to run
search and 1 Not Analyzed to run facet.

The index was configured to create 5 segments.

After indexing I found that the size of the index was 4.5MB. (This
includes all 5 shards , trans log etc...)

Which means its almost 45 times more than the original size. Which
is very significant increase.

Then I tried not to include _source and found that the index size reduced
by 25%. Came down to 3 mb. Which is still significant.

Am I missing something? Or is there any other ways to reduce the size of
Indices?

Thanks
Rahul

Rahul_Sharma · August 7, 2012, 10:23pm

When you say store level compression, do you mean compression will work if
the fields are marked field("store", "yes") or otherwise as well?
Is it additional to _source compression?

Does it impact faceting performance?

On Wed, Aug 8, 2012 at 3:30 AM, Shay Banon kimchy@gmail.com wrote:

Store level compression is one way to compress the data (available in
0.19.8). Note that CSV is quite different by itself than json format.

On Aug 5, 2012, at 2:12 PM, Rahul Sharma rahul.sharma.coder@gmail.com
wrote:

Hi,

I was trying to understand how much size an index would take for a certain
size of input data.

Below is the scenario and observation:

For the purpose I picked 10 column csv with 1000 rows. The size of the
csv is 111 KB.

I created 2 Field (of type String) for each column. 1 Analyzed to run
search and 1 Not Analyzed to run facet.

The index was configured to create 5 segments.

After indexing I found that the size of the index was 4.5MB. (This
includes all 5 shards , trans log etc...)

Which means its almost 45 times more than the original size. Which
is very significant increase.

Then I tried not to include _source and found that the index size reduced
by 25%. Came down to 3 mb. Which is still significant.

Am I missing something? Or is there any other ways to reduce the size of
Indices?

Thanks
Rahul

Topic		Replies	Views
Index size Elasticsearch	7	343	July 6, 2017
Size of Index Elasticsearch	8	16420	July 5, 2017
Elasticsearch index storage size Elasticsearch	2	586	November 22, 2019
Whenever I index the data, index size is different, even though I set the same config and mapping Elasticsearch	1	320	July 6, 2017
Elasticsearch index size less then dataset disk space Elasticsearch	2	291	August 2, 2021

Question on Index Size

Related topics