Files not deleted on upgrade


(Nik Everett) #1

I started a rolling restart yesterday but has add to stop because the disks
were filling up oddly. It looks like when the bode comes up it no longer
deletes shards it can't use.

Elasticsearch reports that the disk is nearly full but that it isn't using
most of the space. When I look myself the disk is mostly full and most of
the space is taken up by shards.

I'm not clear where to go from here though. Find the files elasticsearch
doesn't have open and delete them?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2oZ1rVNrNewfOA646KBv4jGoh_mqQ%2Bcyq5CqdBxwTXRQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Lee Hinman) #2

On Thursday, August 21, 2014 2:44:19 PM UTC+2, Nikolas Everett wrote:

I started a rolling restart yesterday but has add to stop because the
disks were filling up oddly. It looks like when the bode comes up it no
longer deletes shards it can't use.

Elasticsearch reports that the disk is nearly full but that it isn't using
most of the space. When I look myself the disk is mostly full and most of
the space is taken up by shards.

I'm not clear where to go from here though. Find the files elasticsearch
doesn't have open and delete them?

Hi Nikolas,

Can you provide the output of curl 'localhost:9200/_cat/shards?v' and
curl 'localhost:9200/_cat/health?v'? Also, can you describe your cluster
topology and what the current disk usages are for all nodes across the
cluster?

Additionally, what version of ES are you using before and after the upgrade?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4bcb3a29-7fb8-4f09-80fd-83dbd7d81c47%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #3

Hi Lee! Thanks for responding. Ok, here goes:

Version: 1.2.1->1.3.2
curl 'localhost:9200/_cat/health?v:
epoch timestamp cluster status node.total node.data
shards pri relo init unassign
1408630877 14:21:17 production-search-eqiad green 17 17
6050 2017 0 0 0

curl 'localhost:9200/_cat/shards?v':

I'll keep digging into it now that I'm properly awake as well.

Nik

On Thu, Aug 21, 2014 at 9:36 AM, Lee Hinman matthew.hinman@gmail.com
wrote:

On Thursday, August 21, 2014 2:44:19 PM UTC+2, Nikolas Everett wrote:

I started a rolling restart yesterday but has add to stop because the
disks were filling up oddly. It looks like when the bode comes up it no
longer deletes shards it can't use.

Elasticsearch reports that the disk is nearly full but that it isn't
using most of the space. When I look myself the disk is mostly full and
most of the space is taken up by shards.

I'm not clear where to go from here though. Find the files elasticsearch
doesn't have open and delete them?

Hi Nikolas,

Can you provide the output of curl 'localhost:9200/_cat/shards?v' and
curl 'localhost:9200/_cat/health?v'? Also, can you describe your cluster
topology and what the current disk usages are for all nodes across the
cluster?

Additionally, what version of ES are you using before and after the
upgrade?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4bcb3a29-7fb8-4f09-80fd-83dbd7d81c47%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4bcb3a29-7fb8-4f09-80fd-83dbd7d81c47%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1tXqL2j%2BpnJtjv3iwnYGbww4uTJ5EE66fomaqpF8nTEQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #4

This gist shows the error in action:


Total - free on disk is 479163707392
But used is 238902736642
Meaning about 50% of used space isn't accounted for.
But everything on that partition is in elasticsearch's directory:
manybubbles@elastic1001:/var/lib/elasticsearch/production-search-eqiad/nodes/0/indices$
du -h | tail -n1
447G .

Its like when we did the upgrade some files weren't deleted when they were
no longer in use.

On Thu, Aug 21, 2014 at 10:24 AM, Nikolas Everett nik9000@gmail.com wrote:

Hi Lee! Thanks for responding. Ok, here goes:

Version: 1.2.1->1.3.2
curl 'localhost:9200/_cat/health?v:
epoch timestamp cluster status node.total node.data
shards pri relo init unassign
1408630877 14:21:17 production-search-eqiad green 17 17
6050 2017 0 0 0

curl 'localhost:9200/_cat/shards?v':
https://gist.github.com/nik9000/815f3e39a3673e6f48ac

I'll keep digging into it now that I'm properly awake as well.

Nik

On Thu, Aug 21, 2014 at 9:36 AM, Lee Hinman matthew.hinman@gmail.com
wrote:

On Thursday, August 21, 2014 2:44:19 PM UTC+2, Nikolas Everett wrote:

I started a rolling restart yesterday but has add to stop because the
disks were filling up oddly. It looks like when the bode comes up it no
longer deletes shards it can't use.

Elasticsearch reports that the disk is nearly full but that it isn't
using most of the space. When I look myself the disk is mostly full and
most of the space is taken up by shards.

I'm not clear where to go from here though. Find the files elasticsearch
doesn't have open and delete them?

Hi Nikolas,

Can you provide the output of curl 'localhost:9200/_cat/shards?v' and
curl 'localhost:9200/_cat/health?v'? Also, can you describe your cluster
topology and what the current disk usages are for all nodes across the
cluster?

Additionally, what version of ES are you using before and after the
upgrade?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4bcb3a29-7fb8-4f09-80fd-83dbd7d81c47%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4bcb3a29-7fb8-4f09-80fd-83dbd7d81c47%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0QkcnQrn%3D%2BSp4EGoNfySfn2%2B7M%3DS8E2EhzPbR7k2wWuA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #5

Whatson shows this very well:

Other points of interest:

  1. We're using auto_expand_replicas.
  2. The logs are totally clean.

Nik

On Thu, Aug 21, 2014 at 10:35 AM, Nikolas Everett nik9000@gmail.com wrote:

This gist shows the error in action:
https://gist.github.com/nik9000/3acdb38052dba3fbc5a0
Total - free on disk is 479163707392
But used is 238902736642
Meaning about 50% of used space isn't accounted for.
But everything on that partition is in elasticsearch's directory:
manybubbles@elastic1001:/var/lib/elasticsearch/production-search-eqiad/nodes/0/indices$
du -h | tail -n1
447G .

Its like when we did the upgrade some files weren't deleted when they were
no longer in use.

On Thu, Aug 21, 2014 at 10:24 AM, Nikolas Everett nik9000@gmail.com
wrote:

Hi Lee! Thanks for responding. Ok, here goes:

Version: 1.2.1->1.3.2
curl 'localhost:9200/_cat/health?v:
epoch timestamp cluster status node.total node.data
shards pri relo init unassign
1408630877 14:21:17 production-search-eqiad green 17
17 6050 2017 0 0 0

curl 'localhost:9200/_cat/shards?v':
https://gist.github.com/nik9000/815f3e39a3673e6f48ac

I'll keep digging into it now that I'm properly awake as well.

Nik

On Thu, Aug 21, 2014 at 9:36 AM, Lee Hinman matthew.hinman@gmail.com
wrote:

On Thursday, August 21, 2014 2:44:19 PM UTC+2, Nikolas Everett wrote:

I started a rolling restart yesterday but has add to stop because the
disks were filling up oddly. It looks like when the bode comes up it no
longer deletes shards it can't use.

Elasticsearch reports that the disk is nearly full but that it isn't
using most of the space. When I look myself the disk is mostly full and
most of the space is taken up by shards.

I'm not clear where to go from here though. Find the files
elasticsearch doesn't have open and delete them?

Hi Nikolas,

Can you provide the output of curl 'localhost:9200/_cat/shards?v' and
curl 'localhost:9200/_cat/health?v'? Also, can you describe your cluster
topology and what the current disk usages are for all nodes across the
cluster?

Additionally, what version of ES are you using before and after the
upgrade?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4bcb3a29-7fb8-4f09-80fd-83dbd7d81c47%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4bcb3a29-7fb8-4f09-80fd-83dbd7d81c47%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3-%3D2GjBejyU5szEVj%2BqY_%3DxgGybe9nLhHcZ418uVwFdA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #6

Moving this to https://github.com/elasticsearch/elasticsearch/issues/7386
. Its a bug, but I have no idea what caused it.

Side note: after digging through the code for two hours I can't find
anything that sweeps up files/directories/local shard storage that is
unused. I see lots of deletes done in finally blocks but I'm not sure how
I got in this state nor if there is something designed to dig me out of it.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0N_U97FPTA5LLWq03uoLYyA6XL1jWNtvWzQ1qnxs4s1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #7

Resolved: https://github.com/elasticsearch/elasticsearch/issues/7386

For posterity: if you nuke the contents of your node's disk after stopping
Elasticsearch 1.2 but before starting Elasticsearch 1.3 then you won't end
up with too much data that can't be cleared. The more nodes you upgrade the
more shards you'll be able to delete any way. https://github.com/s1monw

On Thu, Aug 21, 2014 at 2:37 PM, Nikolas Everett nik9000@gmail.com wrote:

Moving this to https://github.com/elasticsearch/elasticsearch/issues/7386
. Its a bug, but I have no idea what caused it.

Side note: after digging through the code for two hours I can't find
anything that sweeps up files/directories/local shard storage that is
unused. I see lots of deletes done in finally blocks but I'm not sure how
I got in this state nor if there is something designed to dig me out of it.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2-fEGDd0gChr7WQvj4ZHfP_6SX59yV8AL9jYZLW4QJ8g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Lee Hinman) #8

On 8/21/14, 9:57 PM, Nikolas Everett wrote:

Resolved: https://github.com/elasticsearch/elasticsearch/issues/7386

For posterity: if you nuke the contents of your node's disk after
stopping Elasticsearch 1.2 but before starting Elasticsearch 1.3 then
you won't end up with too much data that can't be cleared. The more
nodes you upgrade the more shards you'll be able to delete any
way.

Okay cool, thanks for the heads up and followup!

;; Lee

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53F65C2F.2090703%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #9