Snapshots don't get compressed

Hi Everyone,

I just started experimenting with the cool snapshot feature of ES (using
ES 1.3.2 on Ubuntu 14.04) using curator. I created a new repository on a
mounted NFS storage, using only the default options (compression turned
on). I checked it using curl:

user@myserver:~# curl -XGET
'http://IP_ADDRESS:9200/_snapshot/logBack?pretty'
{
"logBack" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : "/es_snapshots"
}
}
}

So, after that I used curator to create a snapshot of some older
indices. The process finished after some minutes, so I decided to have a
look at the files it created. It turned out that the snapshot's files
take up exactly as much space as the indices did originally while they
were in the cluster, so no compression happened at all. This is kind of
a problem for me, because I assumed that compression will greatly reduce
the size of the indices I put in a snapshot. So is there anything I'm
doing wrong?

Thank you,
Domonkos

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/541ADFEF.7080800%40modit.hu.
For more options, visit https://groups.google.com/d/optout.

I tried using tar on the indices folder, it resulted in a 1.5G tarball
(compared to the 2.8G size of the folder), so I really think something
is wrong here.

2014.09.18. 15:36 keltezéssel, "Tomcsányi, Domonkos" wrote:

Hi Everyone,

I just started experimenting with the cool snapshot feature of ES
(using ES 1.3.2 on Ubuntu 14.04) using curator. I created a new
repository on a mounted NFS storage, using only the default options
(compression turned on). I checked it using curl:

user@myserver:~# curl -XGET
'http://IP_ADDRESS:9200/_snapshot/logBack?pretty'
{
"logBack" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : "/es_snapshots"
}
}
}

So, after that I used curator to create a snapshot of some older
indices. The process finished after some minutes, so I decided to have
a look at the files it created. It turned out that the snapshot's
files take up exactly as much space as the indices did originally
while they were in the cluster, so no compression happened at all.
This is kind of a problem for me, because I assumed that compression
will greatly reduce the size of the indices I put in a snapshot. So is
there anything I'm doing wrong?

Thank you,
Domonkos

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/541AE449.8000508%40modit.hu.
For more options, visit https://groups.google.com/d/optout.

Only metadata are compressed.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 18 sept. 2014 à 15:36, "Tomcsányi, Domonkos" tomcsanyid@modit.hu a écrit :

Hi Everyone,

I just started experimenting with the cool snapshot feature of ES (using ES 1.3.2 on Ubuntu 14.04) using curator. I created a new repository on a mounted NFS storage, using only the default options (compression turned on). I checked it using curl:

user@myserver:~# curl -XGET 'http://IP_ADDRESS:9200/_snapshot/logBack?pretty'
{
"logBack" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : "/es_snapshots"
}
}
}

So, after that I used curator to create a snapshot of some older indices. The process finished after some minutes, so I decided to have a look at the files it created. It turned out that the snapshot's files take up exactly as much space as the indices did originally while they were in the cluster, so no compression happened at all. This is kind of a problem for me, because I assumed that compression will greatly reduce the size of the indices I put in a snapshot. So is there anything I'm doing wrong?

Thank you,
Domonkos

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/541ADFEF.7080800%40modit.hu.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8D533203-2C4F-4DA2-89F1-7B5F4842D5E0%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Thank you for the answer, but may I know why? What is the reason behind
this?

thanks,
Domonkos

2014.09.18. 16:15 keltezéssel, David Pilato írta:

Only metadata are compressed.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 18 sept. 2014 à 15:36, "Tomcsányi, Domonkos" tomcsanyid@modit.hu a écrit :

Hi Everyone,

I just started experimenting with the cool snapshot feature of ES (using ES 1.3.2 on Ubuntu 14.04) using curator. I created a new repository on a mounted NFS storage, using only the default options (compression turned on). I checked it using curl:

user@myserver:~# curl -XGET 'http://IP_ADDRESS:9200/_snapshot/logBack?pretty'
{
"logBack" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : "/es_snapshots"
}
}
}

So, after that I used curator to create a snapshot of some older indices. The process finished after some minutes, so I decided to have a look at the files it created. It turned out that the snapshot's files take up exactly as much space as the indices did originally while they were in the cluster, so no compression happened at all. This is kind of a problem for me, because I assumed that compression will greatly reduce the size of the indices I put in a snapshot. So is there anything I'm doing wrong?

Thank you,
Domonkos

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/541ADFEF.7080800%40modit.hu.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/541AED9F.4030007%40modit.hu.
For more options, visit https://groups.google.com/d/optout.

I don't know. I think this could happen in the future but unsure though.
May be Igor could answer this?

Here is a related doc PR: Clarify s3 snapshot compress behavior by ppearcy · Pull Request #7654 · elastic/elasticsearch · GitHub

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 18 septembre 2014 à 16:35:14, Tomcsányi, Domonkos (tomcsanyid@modit.hu) a écrit:

Thank you for the answer, but may I know why? What is the reason behind
this?

thanks,
Domonkos

2014.09.18. 16:15 keltezéssel, David Pilato írta:

Only metadata are compressed.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 18 sept. 2014 à 15:36, "Tomcsányi, Domonkos" tomcsanyid@modit.hu a écrit :

Hi Everyone,

I just started experimenting with the cool snapshot feature of ES (using ES 1.3.2 on Ubuntu 14.04) using curator. I created a new repository on a mounted NFS storage, using only the default options (compression turned on). I checked it using curl:

user@myserver:~# curl -XGET 'http://IP_ADDRESS:9200/_snapshot/logBack?pretty'
{
"logBack" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : "/es_snapshots"
}
}
}

So, after that I used curator to create a snapshot of some older indices. The process finished after some minutes, so I decided to have a look at the files it created. It turned out that the snapshot's files take up exactly as much space as the indices did originally while they were in the cluster, so no compression happened at all. This is kind of a problem for me, because I assumed that compression will greatly reduce the size of the indices I put in a snapshot. So is there anything I'm doing wrong?

Thank you,
Domonkos

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/541ADFEF.7080800%40modit.hu.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/541AED9F.4030007%40modit.hu.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.541aee74.625558ec.b066%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

thank you again, I'll keep an eye on this issue.

Domonkos

2014.09.18. 16:38 keltezéssel, David Pilato írta:

I don't know. I think this could happen in the future but unsure though.
May be Igor could answer this?

Here is a related doc PR:
Clarify s3 snapshot compress behavior by ppearcy · Pull Request #7654 · elastic/elasticsearch · GitHub

--
David Pilato | /Technical Advocate/ | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr

Le 18 septembre 2014 à 16:35:14, Tomcsányi, Domonkos
(tomcsanyid@modit.hu mailto:tomcsanyid@modit.hu) a écrit:

Thank you for the answer, but may I know why? What is the reason behind
this?

thanks,
Domonkos

2014.09.18. 16:15 keltezéssel, David Pilato írta:

Only metadata are compressed.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 18 sept. 2014 à 15:36, "Tomcsányi, Domonkos"
tomcsanyid@modit.hu a écrit :

Hi Everyone,

I just started experimenting with the cool snapshot feature of ES
(using ES 1.3.2 on Ubuntu 14.04) using curator. I created a new
repository on a mounted NFS storage, using only the default options
(compression turned on). I checked it using curl:

user@myserver:~# curl -XGET
'http://IP_ADDRESS:9200/_snapshot/logBack?pretty'
{
"logBack" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : "/es_snapshots"
}
}
}

So, after that I used curator to create a snapshot of some older
indices. The process finished after some minutes, so I decided to
have a look at the files it created. It turned out that the
snapshot's files take up exactly as much space as the indices did
originally while they were in the cluster, so no compression happened
at all. This is kind of a problem for me, because I assumed that
compression will greatly reduce the size of the indices I put in a
snapshot. So is there anything I'm doing wrong?

Thank you,
Domonkos

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/541ADFEF.7080800%40modit.hu.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/541AED9F.4030007%40modit.hu.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com
mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.541aee74.625558ec.b066%40MacBook-Air-de-David.local
https://groups.google.com/d/msgid/elasticsearch/etPan.541aee74.625558ec.b066%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/541AF1F4.20502%40modit.hu.
For more options, visit https://groups.google.com/d/optout.

There were two reasons for not enabling compression on data files. First of
all, the way "chunking" in snapshot/restore API was implemented didn't
allow simple implementation of compression on data files. Moreover, the
data files are already compressed to a certain degree. In my tests I was
getting about 20% compressions rates on index data with recent versions of
Elasticsearch (we have a limitation that we can compress only one file at a
time). So, difficulties with implementation together with limit benefits
made this feature not very compelling. After recent refactoring of the
storage code, it's now much easier to add this feature in if it makes
sense. However, I am really curious how you've got such great compression
rates. Which version of Elasticsearch were these indices created with? Did
you upgrade from older version of elasticsearch recently and most of your
data files are created with older versions of Lucene?

On Thursday, September 18, 2014 10:53:58 AM UTC-4, Domonkos Tomcsanyi wrote:

thank you again, I'll keep an eye on this issue.

Domonkos

2014.09.18. 16:38 keltezéssel, David Pilato írta:

I don't know. I think this could happen in the future but unsure though.
May be Igor could answer this?

Here is a related doc PR:
Clarify s3 snapshot compress behavior by ppearcy · Pull Request #7654 · elastic/elasticsearch · GitHub
https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fpull%2F7654&sa=D&sntz=1&usg=AFQjCNFe2yi4QlvBIRxBw9tDL3IOdwzsIQ

 -- 

David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr

Le 18 septembre 2014 à 16:35:14, Tomcsányi, Domonkos (tomcsanyid@modit.hu)
a écrit:

Thank you for the answer, but may I know why? What is the reason behind
this?

thanks,
Domonkos

2014.09.18. 16:15 keltezéssel, David Pilato írta:

Only metadata are compressed.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 18 sept. 2014 à 15:36, "Tomcsányi, Domonkos" tomcsanyid@modit.hu
tomcsanyid@modit.hu a écrit :

Hi Everyone,

I just started experimenting with the cool snapshot feature of ES
(using ES 1.3.2 on Ubuntu 14.04) using curator. I created a new repository
on a mounted NFS storage, using only the default options (compression
turned on). I checked it using curl:

user@myserver:~# curl -XGET '
http://IP_ADDRESS:9200/_snapshot/logBack?pretty'
{
"logBack" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : "/es_snapshots"
}
}
}

So, after that I used curator to create a snapshot of some older
indices. The process finished after some minutes, so I decided to have a
look at the files it created. It turned out that the snapshot's files take
up exactly as much space as the indices did originally while they were in
the cluster, so no compression happened at all. This is kind of a problem
for me, because I assumed that compression will greatly reduce the size of
the indices I put in a snapshot. So is there anything I'm doing wrong?

Thank you,
Domonkos

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/541ADFEF.7080800%40modit.hu
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/541AED9F.4030007%40modit.hu
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.541aee74.625558ec.b066%40MacBook-Air-de-David.local
https://groups.google.com/d/msgid/elasticsearch/etPan.541aee74.625558ec.b066%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c8712a76-6eed-4a48-8a1e-95062e2f1e7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yes, I updated from 0.90 (meaning some of my data was created with 0.90,
then some with 1.0) to the latest version of elasticsearch. Is there a
way to compress older data?

Thanks!

2014.09.19. 19:42 keltezéssel, Igor Motov írta:

There were two reasons for not enabling compression on data files.
First of all, the way "chunking" in snapshot/restore API was
implemented didn't allow simple implementation of compression on data
files. Moreover, the data files are already compressed to a certain
degree. In my tests I was getting about 20% compressions rates on
index data with recent versions of Elasticsearch (we have a limitation
that we can compress only one file at a time). So, difficulties with
implementation together with limit benefits made this feature not very
compelling. After recent refactoring of the storage code, it's now
much easier to add this feature in if it makes sense. However, I am
really curious how you've got such great compression rates. Which
version of Elasticsearch were these indices created with? Did you
upgrade from older version of elasticsearch recently and most of your
data files are created with older versions of Lucene?

On Thursday, September 18, 2014 10:53:58 AM UTC-4, Domonkos Tomcsanyi
wrote:

thank you again, I'll keep an eye on this issue.

Domonkos

2014.09.18. 16:38 keltezéssel, David Pilato írta:
I don't know. I think this could happen in the future but unsure
though.
May be Igor could answer this?

Here is a related doc PR:
https://github.com/elasticsearch/elasticsearch/pull/7654
<https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fpull%2F7654&sa=D&sntz=1&usg=AFQjCNFe2yi4QlvBIRxBw9tDL3IOdwzsIQ>


-- 
*David Pilato* | /Technical Advocate/ | *Elasticsearch.com*
@dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr
<https://twitter.com/elasticsearchfr>


Le 18 septembre 2014 à 16:35:14, Tomcsányi, Domonkos
(tomcsanyid@modit.hu <mailto:tomcsanyid@modit.hu>) a écrit:
Thank you for the answer, but may I know why? What is the reason
behind
this?

thanks,
Domonkos

2014.09.18. 16:15 keltezéssel, David Pilato írta:
> Only metadata are compressed.
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>> Le 18 sept. 2014 à 15:36, "Tomcsányi, Domonkos"
<tomcsanyid@modit.hu> <mailto:tomcsanyid@modit.hu> a écrit :
>>
>> Hi Everyone,
>>
>> I just started experimenting with the cool snapshot feature
of ES (using ES 1.3.2 on Ubuntu 14.04) using curator. I created
a new repository on a mounted NFS storage, using only the
default options (compression turned on). I checked it using curl:
>>
>> user@myserver:~# curl -XGET
'http://IP_ADDRESS:9200/_snapshot/logBack?pretty
<http://IP_ADDRESS:9200/_snapshot/logBack?pretty>'
>> {
>> "logBack" : {
>> "type" : "fs",
>> "settings" : {
>> "compress" : "true",
>> "location" : "/es_snapshots"
>> }
>> }
>> }
>>
>> So, after that I used curator to create a snapshot of some
older indices. The process finished after some minutes, so I
decided to have a look at the files it created. It turned out
that the snapshot's files take up exactly as much space as the
indices did originally while they were in the cluster, so no
compression happened at all. This is kind of a problem for me,
because I assumed that compression will greatly reduce the size
of the indices I put in a snapshot. So is there anything I'm
doing wrong?
>>
>> Thank you,
>> Domonkos
>>
>> --
>> You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearch+unsubscribe@googlegroups.com
<mailto:elasticsearch+unsubscribe@googlegroups.com>.
>> To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/541ADFEF.7080800%40modit.hu
<https://groups.google.com/d/msgid/elasticsearch/541ADFEF.7080800%40modit.hu>.
>> For more options, visit https://groups.google.com/d/optout
<https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearch+unsubscribe@googlegroups.com
<mailto:elasticsearch+unsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/541AED9F.4030007%40modit.hu
<https://groups.google.com/d/msgid/elasticsearch/541AED9F.4030007%40modit.hu>.
For more options, visit https://groups.google.com/d/optout
<https://groups.google.com/d/optout>.
-- 
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com
<mailto:elasticsearch+unsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.541aee74.625558ec.b066%40MacBook-Air-de-David.local
<https://groups.google.com/d/msgid/elasticsearch/etPan.541aee74.625558ec.b066%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout
<https://groups.google.com/d/optout>.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com
mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c8712a76-6eed-4a48-8a1e-95062e2f1e7d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c8712a76-6eed-4a48-8a1e-95062e2f1e7d%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5429458F.7060704%40modit.hu.
For more options, visit https://groups.google.com/d/optout.