Files not deleted on upgrade

nik9000 · August 21, 2014, 12:44pm

I started a rolling restart yesterday but has add to stop because the disks
were filling up oddly. It looks like when the bode comes up it no longer
deletes shards it can't use.

Elasticsearch reports that the disk is nearly full but that it isn't using
most of the space. When I look myself the disk is mostly full and most of
the space is taken up by shards.

I'm not clear where to go from here though. Find the files elasticsearch
doesn't have open and delete them?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2oZ1rVNrNewfOA646KBv4jGoh_mqQ%2Bcyq5CqdBxwTXRQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

dakrone · August 21, 2014, 1:36pm

On Thursday, August 21, 2014 2:44:19 PM UTC+2, Nikolas Everett wrote:

I started a rolling restart yesterday but has add to stop because the
disks were filling up oddly. It looks like when the bode comes up it no
longer deletes shards it can't use.

Elasticsearch reports that the disk is nearly full but that it isn't using
most of the space. When I look myself the disk is mostly full and most of
the space is taken up by shards.

I'm not clear where to go from here though. Find the files elasticsearch
doesn't have open and delete them?

Hi Nikolas,

Can you provide the output of curl 'localhost:9200/_cat/shards?v' and
curl 'localhost:9200/_cat/health?v'? Also, can you describe your cluster
topology and what the current disk usages are for all nodes across the
cluster?

Additionally, what version of ES are you using before and after the upgrade?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4bcb3a29-7fb8-4f09-80fd-83dbd7d81c47%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · August 21, 2014, 2:24pm

Hi Lee! Thanks for responding. Ok, here goes:

Version: 1.2.1->1.3.2
curl 'localhost:9200/_cat/health?v:
epoch timestamp cluster status node.total node.data
shards pri relo init unassign
1408630877 14:21:17 production-search-eqiad green 17 17
6050 2017 0 0 0

curl 'localhost:9200/_cat/shards?v':

gist.github.com

https://gist.github.com/nik9000/815f3e39a3673e6f48ac

gistfile1.txt

index                                        shard prirep state      docs    store ip           node        
fdcwiki_general_1408025031                   0     p      STARTED     142    1.9mb 127.0.1.1    elastic1008 
fdcwiki_general_1408025031                   0     r      STARTED     142    1.9mb 10.64.32.143 elastic1011 
fdcwiki_general_1408025031                   0     r      STARTED     142    1.9mb 10.64.0.113  elastic1006 
dawikisource_general_1407948319              0     r      STARTED    3057    7.4mb 10.64.0.110  elastic1003 
dawikisource_general_1407948319              0     p      STARTED    3057    7.4mb 127.0.1.1    elastic1008 
dawikisource_general_1407948319              0     r      STARTED    3057    7.5mb 10.64.48.11  elastic1014 
cawikisource_general_1407943227              0     p      STARTED    2966   15.4mb 127.0.1.1    elastic1008 
cawikisource_general_1407943227              0     r      STARTED    2966   15.4mb 10.64.32.144 elastic1012 
cawikisource_general_1407943227              0     r      STARTED    2966   15.4mb 10.64.48.11  elastic1014

This file has been truncated. show original

I'll keep digging into it now that I'm properly awake as well.

Nik

On Thu, Aug 21, 2014 at 9:36 AM, Lee Hinman matthew.hinman@gmail.com
wrote:

On Thursday, August 21, 2014 2:44:19 PM UTC+2, Nikolas Everett wrote:

I started a rolling restart yesterday but has add to stop because the
disks were filling up oddly. It looks like when the bode comes up it no
longer deletes shards it can't use.

Elasticsearch reports that the disk is nearly full but that it isn't
using most of the space. When I look myself the disk is mostly full and
most of the space is taken up by shards.

I'm not clear where to go from here though. Find the files elasticsearch
doesn't have open and delete them?

Hi Nikolas,

Can you provide the output of curl 'localhost:9200/_cat/shards?v' and
curl 'localhost:9200/_cat/health?v'? Also, can you describe your cluster
topology and what the current disk usages are for all nodes across the
cluster?

Additionally, what version of ES are you using before and after the
upgrade?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4bcb3a29-7fb8-4f09-80fd-83dbd7d81c47%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4bcb3a29-7fb8-4f09-80fd-83dbd7d81c47%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1tXqL2j%2BpnJtjv3iwnYGbww4uTJ5EE66fomaqpF8nTEQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · August 21, 2014, 2:35pm

This gist shows the error in action:

gist.github.com

https://gist.github.com/nik9000/3acdb38052dba3fbc5a0

gistfile1.json

{
  "cluster_name" : "production-search-eqiad",
  "nodes" : {
    "MbtXUh9pQwSOOVpwxp3dnQ" : {
      "timestamp" : 1408631417970,
      "name" : "elastic1001",
      "transport_address" : "inet[/10.64.0.108:9300]",
      "host" : "elastic1001",
      "ip" : [ "inet[/10.64.0.108:9300]", "NONE" ],
      "attributes" : {

This file has been truncated. show original

Total - free on disk is 479163707392
But used is 238902736642
Meaning about 50% of used space isn't accounted for.
But everything on that partition is in elasticsearch's directory:
manybubbles@elastic1001:/var/lib/elasticsearch/production-search-eqiad/nodes/0/indices$
du -h | tail -n1
447G .

Its like when we did the upgrade some files weren't deleted when they were
no longer in use.

On Thu, Aug 21, 2014 at 10:24 AM, Nikolas Everett nik9000@gmail.com wrote:

Hi Lee! Thanks for responding. Ok, here goes:

Version: 1.2.1->1.3.2
curl 'localhost:9200/_cat/health?v:
epoch timestamp cluster status node.total node.data
shards pri relo init unassign
1408630877 14:21:17 production-search-eqiad green 17 17
6050 2017 0 0 0

curl 'localhost:9200/_cat/shards?v':
gist:815f3e39a3673e6f48ac · GitHub

I'll keep digging into it now that I'm properly awake as well.

Nik

On Thu, Aug 21, 2014 at 9:36 AM, Lee Hinman matthew.hinman@gmail.com
wrote:

On Thursday, August 21, 2014 2:44:19 PM UTC+2, Nikolas Everett wrote:

I started a rolling restart yesterday but has add to stop because the
disks were filling up oddly. It looks like when the bode comes up it no
longer deletes shards it can't use.

Elasticsearch reports that the disk is nearly full but that it isn't
using most of the space. When I look myself the disk is mostly full and
most of the space is taken up by shards.

I'm not clear where to go from here though. Find the files elasticsearch
doesn't have open and delete them?

Hi Nikolas,

Can you provide the output of curl 'localhost:9200/_cat/shards?v' and
curl 'localhost:9200/_cat/health?v'? Also, can you describe your cluster
topology and what the current disk usages are for all nodes across the
cluster?

Additionally, what version of ES are you using before and after the
upgrade?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4bcb3a29-7fb8-4f09-80fd-83dbd7d81c47%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4bcb3a29-7fb8-4f09-80fd-83dbd7d81c47%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0QkcnQrn%3D%2BSp4EGoNfySfn2%2B7M%3DS8E2EhzPbR7k2wWuA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · August 21, 2014, 3:10pm

Whatson shows this very well:

Other points of interest:

We're using auto_expand_replicas.
The logs are totally clean.

Nik

On Thu, Aug 21, 2014 at 10:35 AM, Nikolas Everett nik9000@gmail.com wrote:

This gist shows the error in action:
gist:3acdb38052dba3fbc5a0 · GitHub
Total - free on disk is 479163707392
But used is 238902736642
Meaning about 50% of used space isn't accounted for.
But everything on that partition is in elasticsearch's directory:
manybubbles@elastic1001:/var/lib/elasticsearch/production-search-eqiad/nodes/0/indices$
du -h | tail -n1
447G .

Its like when we did the upgrade some files weren't deleted when they were
no longer in use.

On Thu, Aug 21, 2014 at 10:24 AM, Nikolas Everett nik9000@gmail.com
wrote:

Hi Lee! Thanks for responding. Ok, here goes:

Version: 1.2.1->1.3.2
curl 'localhost:9200/_cat/health?v:
epoch timestamp cluster status node.total node.data
shards pri relo init unassign
1408630877 14:21:17 production-search-eqiad green 17
17 6050 2017 0 0 0

curl 'localhost:9200/_cat/shards?v':
gist:815f3e39a3673e6f48ac · GitHub

I'll keep digging into it now that I'm properly awake as well.

Nik

On Thu, Aug 21, 2014 at 9:36 AM, Lee Hinman matthew.hinman@gmail.com
wrote:

On Thursday, August 21, 2014 2:44:19 PM UTC+2, Nikolas Everett wrote:

I started a rolling restart yesterday but has add to stop because the
disks were filling up oddly. It looks like when the bode comes up it no
longer deletes shards it can't use.

Elasticsearch reports that the disk is nearly full but that it isn't
using most of the space. When I look myself the disk is mostly full and
most of the space is taken up by shards.

I'm not clear where to go from here though. Find the files
elasticsearch doesn't have open and delete them?

Hi Nikolas,

Can you provide the output of curl 'localhost:9200/_cat/shards?v' and
curl 'localhost:9200/_cat/health?v'? Also, can you describe your cluster
topology and what the current disk usages are for all nodes across the
cluster?

Additionally, what version of ES are you using before and after the
upgrade?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4bcb3a29-7fb8-4f09-80fd-83dbd7d81c47%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4bcb3a29-7fb8-4f09-80fd-83dbd7d81c47%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3-%3D2GjBejyU5szEVj%2BqY_%3DxgGybe9nLhHcZ418uVwFdA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · August 21, 2014, 6:37pm

Moving this to https://github.com/elasticsearch/elasticsearch/issues/7386
. Its a bug, but I have no idea what caused it.

Side note: after digging through the code for two hours I can't find
anything that sweeps up files/directories/local shard storage that is
unused. I see lots of deletes done in finally blocks but I'm not sure how
I got in this state nor if there is something designed to dig me out of it.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0N_U97FPTA5LLWq03uoLYyA6XL1jWNtvWzQ1qnxs4s1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · August 21, 2014, 7:57pm

Resolved: Internal: Upgrade caused shard data to stay on nodes · Issue #7386 · elastic/elasticsearch · GitHub

For posterity: if you nuke the contents of your node's disk after stopping
Elasticsearch 1.2 but before starting Elasticsearch 1.3 then you won't end
up with too much data that can't be cleared. The more nodes you upgrade the
more shards you'll be able to delete any way. https://github.com/s1monw

On Thu, Aug 21, 2014 at 2:37 PM, Nikolas Everett nik9000@gmail.com wrote:

Moving this to Internal: Upgrade caused shard data to stay on nodes · Issue #7386 · elastic/elasticsearch · GitHub
. Its a bug, but I have no idea what caused it.

Side note: after digging through the code for two hours I can't find
anything that sweeps up files/directories/local shard storage that is
unused. I see lots of deletes done in finally blocks but I'm not sure how
I got in this state nor if there is something designed to dig me out of it.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2-fEGDd0gChr7WQvj4ZHfP_6SX59yV8AL9jYZLW4QJ8g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

dakrone · August 21, 2014, 8:52pm

On 8/21/14, 9:57 PM, Nikolas Everett wrote:

Resolved: Internal: Upgrade caused shard data to stay on nodes · Issue #7386 · elastic/elasticsearch · GitHub

For posterity: if you nuke the contents of your node's disk after
stopping Elasticsearch 1.2 but before starting Elasticsearch 1.3 then
you won't end up with too much data that can't be cleared. The more
nodes you upgrade the more shards you'll be able to delete any
way.

Okay cool, thanks for the heads up and followup!

;; Lee

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53F65C2F.2090703%40gmail.com.
For more options, visit https://groups.google.com/d/optout.