I don't run a blog but I thought I would share some results with the
community.
Using Elasticsearch 1.4.3
I wanted to test the various ways we could save some storage on our ES
index and here are some numbers
Created 6 different indexes with the various mapping settings.
Each index containing 4 types.
Insert 100,000 documents per type so total 400,000 per index.
Average document size 300-400 bytes.
The values represent the total primary space taken by each index based on
the different mapping settings.
I don't run a blog but I thought I would share some results with the
community.
Using Elasticsearch 1.4.3
I wanted to test the various ways we could save some storage on our ES
index and here are some numbers
Created 6 different indexes with the various mapping settings.
Each index containing 4 types.
Insert 100,000 documents per type so total 400,000 per index.
Average document size 300-400 bytes.
The values represent the total primary space taken by each index based on
the different mapping settings.
If you have no _source you cannot reindex or view the actual raw content
that was sent to ES, only the analysed portions you keep.
No _all means you have to know the exact field you want to search on or
else you may get no results, as ES will search _all by default (think of it
as a shortcut search field).
As an aside, we are working on adding a new compression algorithm for ES
which will also improve storage capacity.
I don't run a blog but I thought I would share some results with the
community.
Using Elasticsearch 1.4.3
I wanted to test the various ways we could save some storage on our ES
index and here are some numbers
Created 6 different indexes with the various mapping settings.
Each index containing 4 types.
Insert 100,000 documents per type so total 400,000 per index.
Average document size 300-400 bytes.
The values represent the total primary space taken by each index based on
the different mapping settings.
If you have no _source you cannot reindex or view the actual raw content
that was sent to ES, only the analysed portions you keep.
No _all means you have to know the exact field you want to search on or
else you may get no results, as ES will search _all by default (think of it
as a shortcut search field).
As an aside, we are working on adding a new compression algorithm for ES
which will also improve storage capacity.
I don't run a blog but I thought I would share some results with the
community.
Using Elasticsearch 1.4.3
I wanted to test the various ways we could save some storage on our ES
index and here are some numbers
Created 6 different indexes with the various mapping settings.
Each index containing 4 types.
Insert 100,000 documents per type so total 400,000 per index.
Average document size 300-400 bytes.
The values represent the total primary space taken by each index based
on the different mapping settings.
I think there some interesting use-cases here for instance if you are
building a pure analytics dashboard where it's 100% aggregations then you
can save allot of space with _source: false, _all: false
In my case I'm opting for _source: true, _all: false. Since I need to
re-index a document but don't care about the all search. My users are
required to specify the field they want to search by specifying the field
through a drop down... So it's good for the 25% saving
On Monday, 23 February 2015 17:04:35 UTC-5, Jack Park wrote:
Thank you very much, Mark.
On Mon, Feb 23, 2015 at 12:54 PM, Mark Walkom <markw...@gmail.com
<javascript:>> wrote:
Thanks John, this is a really interesting test.
If you have no _source you cannot reindex or view the actual raw content
that was sent to ES, only the analysed portions you keep.
No _all means you have to know the exact field you want to search on or
else you may get no results, as ES will search _all by default (think of it
as a shortcut search field).
As an aside, we are working on adding a new compression algorithm for ES
which will also improve storage capacity.
What is lost (the tradeoff) when _source is disabled?
What is lost when _all is disabled?
This is interesting!
Thanks
Jack
On Mon, Feb 23, 2015 at 12:10 PM, John Smith <java.d...@gmail.com
<javascript:>> wrote:
I don't run a blog but I thought I would share some results with the
community.
Using Elasticsearch 1.4.3
I wanted to test the various ways we could save some storage on our ES
index and here are some numbers
Created 6 different indexes with the various mapping settings.
Each index containing 4 types.
Insert 100,000 documents per type so total 400,000 per index.
Average document size 300-400 bytes.
The values represent the total primary space taken by each index based
on the different mapping settings.
Mark when you say you cannot re-index the document you mean re-index within
the cluster? But if we resubmit the document using the index API it will
get re-indexed and updated version 2 right?
So Elastic search will mark the document to be deleted from the segment and
eventually merge the "updated" data?
On Monday, 23 February 2015 17:19:49 UTC-5, John Smith wrote:
Yeah sorry should have mentioned the tradeoffs.
I think there some interesting use-cases here for instance if you are
building a pure analytics dashboard where it's 100% aggregations then you
can save allot of space with _source: false, _all: false
In my case I'm opting for _source: true, _all: false. Since I need to
re-index a document but don't care about the all search. My users are
required to specify the field they want to search by specifying the field
through a drop down... So it's good for the 25% saving
On Monday, 23 February 2015 17:04:35 UTC-5, Jack Park wrote:
If you have no _source you cannot reindex or view the actual raw content
that was sent to ES, only the analysed portions you keep.
No _all means you have to know the exact field you want to search on or
else you may get no results, as ES will search _all by default (think of it
as a shortcut search field).
As an aside, we are working on adding a new compression algorithm for ES
which will also improve storage capacity.
I don't run a blog but I thought I would share some results with the
community.
Using Elasticsearch 1.4.3
I wanted to test the various ways we could save some storage on our ES
index and here are some numbers
Created 6 different indexes with the various mapping settings.
Each index containing 4 types.
Insert 100,000 documents per type so total 400,000 per index.
Average document size 300-400 bytes.
The values represent the total primary space taken by each index based
on the different mapping settings.
Mark when you say you cannot re-index the document you mean re-index
within the cluster? But if we resubmit the document using the index API it
will get re-indexed and updated version 2 right?
So Elastic search will mark the document to be deleted from the segment
and eventually merge the "updated" data?
On Monday, 23 February 2015 17:19:49 UTC-5, John Smith wrote:
Yeah sorry should have mentioned the tradeoffs.
I think there some interesting use-cases here for instance if you are
building a pure analytics dashboard where it's 100% aggregations then you
can save allot of space with _source: false, _all: false
In my case I'm opting for _source: true, _all: false. Since I need to
re-index a document but don't care about the all search. My users are
required to specify the field they want to search by specifying the field
through a drop down... So it's good for the 25% saving
On Monday, 23 February 2015 17:04:35 UTC-5, Jack Park wrote:
If you have no _source you cannot reindex or view the actual raw
content that was sent to ES, only the analysed portions you keep.
No _all means you have to know the exact field you want to search on or
else you may get no results, as ES will search _all by default (think of it
as a shortcut search field).
As an aside, we are working on adding a new compression algorithm for
ES which will also improve storage capacity.
I don't run a blog but I thought I would share some results with the
community.
Using Elasticsearch 1.4.3
I wanted to test the various ways we could save some storage on our
ES index and here are some numbers
Created 6 different indexes with the various mapping settings.
Each index containing 4 types.
Insert 100,000 documents per type so total 400,000 per index.
Average document size 300-400 bytes.
The values represent the total primary space taken by each index
based on the different mapping settings.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.