39m tweets on two nodes requires 11tb on S3?

Hi,

ES 0.11.0-SNAPSHOT has been running solid for a couple of months with an S3
Gateway, but there's a serious problem. Status shows that there are 39m
docs with a store size of 50.5 GB, see https://gist.github.com/728313. An
'ls' of the contents of the nodes shows thousands of '__*.partNN' files,
some dated a couple of months ago. A 'du' of the indices/twitter directory
(2 nodes) shows 11 TB, quite a bit of waste space!

What's the best way to compress and clean up this mess? Are old part files
eligible for deletion? How do I keep this from happening again?

...Thanks,
...Ken

Each snapshot operation that happens should clean up at the end all files that are no longer referenced. The "part" files are broken down files into chunks to improve the write performance into S3 and work around the 5gb limit. Is there a chance that you can upgrade to 0.13.1?
On Saturday, December 4, 2010 at 6:58 PM, Kenneth Loafman wrote:

Hi,

ES 0.11.0-SNAPSHOT has been running solid for a couple of months with an S3 Gateway, but there's a serious problem. Status shows that there are 39m docs with a store size of 50.5 GB, see twitter index status · GitHub. An 'ls' of the contents of the nodes shows thousands of '__*.partNN' files, some dated a couple of months ago. A 'du' of the indices/twitter directory (2 nodes) shows 11 TB, quite a bit of waste space!

What's the best way to compress and clean up this mess? Are old part files eligible for deletion? How do I keep this from happening again?

...Thanks,
...Ken

It looks like the cleanups are not happening in 0.11 as they should.

Will be upgrading to 0.13.1 today, in 'local' mode. This should clear up
the problem, and since we have two nodes, we should be able to quiesce one,
back it up, and bring it back online, at least that's the way I think the
backup should be done. Any thoughts on that?

On Sat, Dec 4, 2010 at 9:34 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Each snapshot operation that happens should clean up at the end all files
that are no longer referenced. The "part" files are broken down files into
chunks to improve the write performance into S3 and work around the 5gb
limit. Is there a chance that you can upgrade to 0.13.1?

On Saturday, December 4, 2010 at 6:58 PM, Kenneth Loafman wrote:

Hi,

ES 0.11.0-SNAPSHOT has been running solid for a couple of months with an S3
Gateway, but there's a serious problem. Status shows that there are 39m
docs with a store size of 50.5 GB, see twitter index status · GitHub. An
'ls' of the contents of the nodes shows thousands of '__*.partNN' files,
some dated a couple of months ago. A 'du' of the indices/twitter directory
(2 nodes) shows 11 TB, quite a bit of waste space!

What's the best way to compress and clean up this mess? Are old part files
eligible for deletion? How do I keep this from happening again?

...Thanks,
...Ken

If you want to move to local gateway mode from shared (s3) gateway, you need to reindex the data.
On Sunday, December 5, 2010 at 5:50 PM, Kenneth Loafman wrote:

It looks like the cleanups are not happening in 0.11 as they should.

Will be upgrading to 0.13.1 today, in 'local' mode. This should clear up the problem, and since we have two nodes, we should be able to quiesce one, back it up, and bring it back online, at least that's the way I think the backup should be done. Any thoughts on that?

On Sat, Dec 4, 2010 at 9:34 PM, Shay Banon shay.banon@elasticsearch.com wrote:

Each snapshot operation that happens should clean up at the end all files that are no longer referenced. The "part" files are broken down files into chunks to improve the write performance into S3 and work around the 5gb limit. Is there a chance that you can upgrade to 0.13.1?

On Saturday, December 4, 2010 at 6:58 PM, Kenneth Loafman wrote:

Hi,

ES 0.11.0-SNAPSHOT has been running solid for a couple of months with an S3 Gateway, but there's a serious problem. Status shows that there are 39m docs with a store size of 50.5 GB, see twitter index status · GitHub. An 'ls' of the contents of the nodes shows thousands of '__*.partNN' files, some dated a couple of months ago. A 'du' of the indices/twitter directory (2 nodes) shows 11 TB, quite a bit of waste space!

What's the best way to compress and clean up this mess? Are old part files eligible for deletion? How do I keep this from happening again?

...Thanks,
...Ken

Doing that now. I could not figure out how to clear out the old part files
safely and wanted to get away from S3 anyway, so reindex it is.

On Sun, Dec 5, 2010 at 12:56 PM, Shay Banon shay.banon@elasticsearch.comwrote:

If you want to move to local gateway mode from shared (s3) gateway, you
need to reindex the data.

On Sunday, December 5, 2010 at 5:50 PM, Kenneth Loafman wrote:

It looks like the cleanups are not happening in 0.11 as they should.

Will be upgrading to 0.13.1 today, in 'local' mode. This should clear up
the problem, and since we have two nodes, we should be able to quiesce one,
back it up, and bring it back online, at least that's the way I think the
backup should be done. Any thoughts on that?

On Sat, Dec 4, 2010 at 9:34 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Each snapshot operation that happens should clean up at the end all files
that are no longer referenced. The "part" files are broken down files into
chunks to improve the write performance into S3 and work around the 5gb
limit. Is there a chance that you can upgrade to 0.13.1?

On Saturday, December 4, 2010 at 6:58 PM, Kenneth Loafman wrote:

Hi,

ES 0.11.0-SNAPSHOT has been running solid for a couple of months with an S3
Gateway, but there's a serious problem. Status shows that there are 39m
docs with a store size of 50.5 GB, see twitter index status · GitHub. An
'ls' of the contents of the nodes shows thousands of '__*.partNN' files,
some dated a couple of months ago. A 'du' of the indices/twitter directory
(2 nodes) shows 11 TB, quite a bit of waste space!

What's the best way to compress and clean up this mess? Are old part files
eligible for deletion? How do I keep this from happening again?

...Thanks,
...Ken

Are you using EBS for the local storage? This should give you better IO perf (well, sometimes, depends how AWS woke up in the morning), and better long term persistency.
On Sunday, December 5, 2010 at 9:03 PM, Kenneth Loafman wrote:

Doing that now. I could not figure out how to clear out the old part files safely and wanted to get away from S3 anyway, so reindex it is.

On Sun, Dec 5, 2010 at 12:56 PM, Shay Banon shay.banon@elasticsearch.com wrote:

If you want to move to local gateway mode from shared (s3) gateway, you need to reindex the data.

On Sunday, December 5, 2010 at 5:50 PM, Kenneth Loafman wrote:

It looks like the cleanups are not happening in 0.11 as they should.

Will be upgrading to 0.13.1 today, in 'local' mode. This should clear up the problem, and since we have two nodes, we should be able to quiesce one, back it up, and bring it back online, at least that's the way I think the backup should be done. Any thoughts on that?

On Sat, Dec 4, 2010 at 9:34 PM, Shay Banon shay.banon@elasticsearch.com wrote:

Each snapshot operation that happens should clean up at the end all files that are no longer referenced. The "part" files are broken down files into chunks to improve the write performance into S3 and work around the 5gb limit. Is there a chance that you can upgrade to 0.13.1?

On Saturday, December 4, 2010 at 6:58 PM, Kenneth Loafman wrote:

Hi,

ES 0.11.0-SNAPSHOT has been running solid for a couple of months with an S3 Gateway, but there's a serious problem. Status shows that there are 39m docs with a store size of 50.5 GB, see twitter index status · GitHub. An 'ls' of the contents of the nodes shows thousands of '__*.partNN' files, some dated a couple of months ago. A 'du' of the indices/twitter directory (2 nodes) shows 11 TB, quite a bit of waste space!

What's the best way to compress and clean up this mess? Are old part files eligible for deletion? How do I keep this from happening again?

...Thanks,
...Ken

No, just the local storage on the machines themselves. We're not hosted on
Amazon and the transfer costs were getting horrendous, much more than the
storage costs.

On Sun, Dec 5, 2010 at 1:04 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Are you using EBS for the local storage? This should give you better IO
perf (well, sometimes, depends how AWS woke up in the morning), and better
long term persistency.

On Sunday, December 5, 2010 at 9:03 PM, Kenneth Loafman wrote:

Doing that now. I could not figure out how to clear out the old part files
safely and wanted to get away from S3 anyway, so reindex it is.

On Sun, Dec 5, 2010 at 12:56 PM, Shay Banon shay.banon@elasticsearch.comwrote:

If you want to move to local gateway mode from shared (s3) gateway, you
need to reindex the data.

On Sunday, December 5, 2010 at 5:50 PM, Kenneth Loafman wrote:

It looks like the cleanups are not happening in 0.11 as they should.

Will be upgrading to 0.13.1 today, in 'local' mode. This should clear up
the problem, and since we have two nodes, we should be able to quiesce one,
back it up, and bring it back online, at least that's the way I think the
backup should be done. Any thoughts on that?

On Sat, Dec 4, 2010 at 9:34 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Each snapshot operation that happens should clean up at the end all
files that are no longer referenced. The "part" files are broken down files
into chunks to improve the write performance into S3 and work around the 5gb
limit. Is there a chance that you can upgrade to 0.13.1?

On Saturday, December 4, 2010 at 6:58 PM, Kenneth Loafman wrote:

Hi,

ES 0.11.0-SNAPSHOT has been running solid for a couple of months with an S3
Gateway, but there's a serious problem. Status shows that there are 39m
docs with a store size of 50.5 GB, see twitter index status · GitHub. An
'ls' of the contents of the nodes shows thousands of '__*.partNN' files,
some dated a couple of months ago. A 'du' of the indices/twitter directory
(2 nodes) shows 11 TB, quite a bit of waste space!

What's the best way to compress and clean up this mess? Are old part files
eligible for deletion? How do I keep this from happening again?

...Thanks,
...Ken

Right, I see. I agree, using s3 outside of amazon makes much less sense than using local gateway.
On Sunday, December 5, 2010 at 10:04 PM, Kenneth Loafman wrote:

No, just the local storage on the machines themselves. We're not hosted on Amazon and the transfer costs were getting horrendous, much more than the storage costs.

On Sun, Dec 5, 2010 at 1:04 PM, Shay Banon shay.banon@elasticsearch.com wrote:

Are you using EBS for the local storage? This should give you better IO perf (well, sometimes, depends how AWS woke up in the morning), and better long term persistency.

On Sunday, December 5, 2010 at 9:03 PM, Kenneth Loafman wrote:

Doing that now. I could not figure out how to clear out the old part files safely and wanted to get away from S3 anyway, so reindex it is.

On Sun, Dec 5, 2010 at 12:56 PM, Shay Banon shay.banon@elasticsearch.com wrote:

If you want to move to local gateway mode from shared (s3) gateway, you need to reindex the data.

On Sunday, December 5, 2010 at 5:50 PM, Kenneth Loafman wrote:

It looks like the cleanups are not happening in 0.11 as they should.

Will be upgrading to 0.13.1 today, in 'local' mode. This should clear up the problem, and since we have two nodes, we should be able to quiesce one, back it up, and bring it back online, at least that's the way I think the backup should be done. Any thoughts on that?

On Sat, Dec 4, 2010 at 9:34 PM, Shay Banon shay.banon@elasticsearch.com wrote:

Each snapshot operation that happens should clean up at the end all files that are no longer referenced. The "part" files are broken down files into chunks to improve the write performance into S3 and work around the 5gb limit. Is there a chance that you can upgrade to 0.13.1?

On Saturday, December 4, 2010 at 6:58 PM, Kenneth Loafman wrote:

Hi,

ES 0.11.0-SNAPSHOT has been running solid for a couple of months with an S3 Gateway, but there's a serious problem. Status shows that there are 39m docs with a store size of 50.5 GB, see twitter index status · GitHub. An 'ls' of the contents of the nodes shows thousands of '__*.partNN' files, some dated a couple of months ago. A 'du' of the indices/twitter directory (2 nodes) shows 11 TB, quite a bit of waste space!

What's the best way to compress and clean up this mess? Are old part files eligible for deletion? How do I keep this from happening again?

...Thanks,
...Ken