ECE XFS Quota Question

(Rockybean) #1


I'm testing ECE 2.0 to see if this is a better solution for our current es cluster management work.
I see following warning logs in the cluster log.

[2018-11-02 16:20:59,091][WARN ][no.found.runner.allocation.elasticsearch.ElasticsearchDockerContainer] Quota path not initialized, creating directory for path: [/mnt/data/elastic/] {"ec_container_kind":"elasticsearch","ec_container_group":"1b01fd74758543d7857b6aa53a54389b","ec_container_name":"instance-0000000001"}

It seems like that XFS Quota does not function normally. And I see some strange display in ece cloud ui and Kibana Monitoring page.

For example, I create a cluster with 1GB Mem and 32GB Disk. Then in cloud ui, the es node is shown as 1GB Mem and 32GB Disk. But if I connect to Kibana and see nodes in Monitoring Panel, the content is different. The node is shown as 1GB Mem but 200GB disk which is the total size of the vm I run ece allocator.

Maybe this is relative to xfs quota. Please can you give me some advice for this?
This seems to be a severe issue which will bring confusion to our users.


(Alex Piggott) #2

I think that's what you get if XFS isn't set up properly. What is your fstab?

(Alex Piggott) #3

Following up on this with a bit more detail:

I have some recollection that we only describe configuring XFS on Ubuntu - - because it's usually installed/configured by default on RHEL/Centos, but there are some set ups that bypass that ... the key thing to check is whether you have a line like /dev/xvdg1 /mnt/data xfs defaults,nofail,pquota,prjquota 0 2 corresponding to the data directory


(Rockybean) #4

Sorry for late reply.

I will try your configuration.


(Rockybean) #5


I change fstab as you suggest.

But it does not solve my problem. The node still get the disk size of container host like below.

But at cloud ui, the disk size is as below:

My docker info is as below:

[root@ip-172-31-27-255 ~]# docker info
Containers: 20
 Running: 20
 Paused: 0
 Stopped: 0
Images: 7
Server Version: 18.03.1-ce
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
  Profile: default
Kernel Version: 3.10.0-957.1.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.26GiB
Docker Root Dir: /mnt/data/docker
Debug Mode (client): false
Debug Mode (server): false
Experimental: false
Insecure Registries:
Live Restore Enabled: false 

Please help!

(Alex Piggott) #6

Can you gist the output of _nodes/stats/fs?

Eg I get:

        "data": [
            "total_in_bytes": 34359738368,
            "free_in_bytes": 34347282432,
            "mount": "QuotaAwareFileStore(/app (/dev/mapper/data))",
            "path": "/app/data/nodes/0",
            "type": "xfs",
            "available_in_bytes": 34347282432

for a container that has 32GB available via XFS

Can you also check the contents of /app/config/

How big is the actual disk incidentally?

(Alex Piggott) #7

(oh other question .. what steps did you go through to get from "XFS not configured" to "XFS configured"? Eg you may need to reallocate the clusters, I'm not sure how "dynamic" that setting is?)

(Rockybean) #8

Sounds good! I will try to recreate the cluster. Wait a moment!


(Rockybean) #9

Still not working :sob:
_nodes/stats/fs info is as below

/app/config/ is as below:

[root@ip-172-31-27-255 ~]# docker exec -it ae bash
root@ae1af9b2cead:/# cat /app/config/
#Usage in xfs quota
#Tue Jan 22 15:03:22 GMT 2019

The actual disk size is as below:

[root@ip-172-31-27-255 ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      300G  6.2G  294G   3% /

I changed /etc/fstab and then restart all ece machine.

(Rockybean) #10

I find some quota log in allocator service logs.

[2019-01-22 15:03:22,688][INFO ][no.found.runner.managers.XFSQuotaManager] Adding quota using project id [XFSProjectName(es,61657d98b1454298aac2fefbb2f00e91,instance-0000000005)]
 with hard limit: [65536] MB {"ec_container_kind":"elasticsearch","ec_container_group":"61657d98b1454298aac2fefbb2f00e91","ec_container_name":"instance-0000000005"}
[2019-01-22 15:03:22,708][WARN ][no.found.runner.managers.XFSQuotaManager] Command [[sudo, -n, xfs_quota, -x, -c, limit -p bhard=65536m rtbhard=65536m 11292, /mnt/data]] returned
 status code [1]] with output: [O: [Setting up project 11292 (path /mnt/data/elastic/
stance-0000000005/data)...], O: [Processed 1 (/etc/projects and cmdline) paths for project 11292 with recursion depth infinite (-1).], O: [Setting up project 11292 (path /mnt/dat
a/elastic/], O: [Processed 1 (/etc/projects and cmdline) pa
ths for project 11292 with recursion depth infinite (-1).], O: [Setting up project 11292 (path /mnt/data/elastic/
8b1454298aac2fefbb2f00e91/instance-0000000005/heap_dumps/compressed)...], O: [Processed 1 (/etc/projects and cmdline) paths for project 11292 with recursion depth infinite (-1).]
, E: [xfs_quota: cannot set limits: Function not implemented]] {} 

Hope this can help

(Rockybean) #11

May it is caused by selinux.

I will try to disable it and reallocate cluster.

(Alex Piggott) #12

Looks like it's an XFS side issue ... XFS has some handy command line tools for debugging this sort of thing (eg see which shows use of xfs_quota -c report, and also has a candidate solution)


(Alex Piggott) #13

That Function not implemented is what you get when project quotas are not enabled, so I think we must still be missing some setup

What does mount | grep xfs return?

(Rockybean) #14

You are right.

The quota is not enabled.

[root@ip-172-31-27-255 ~]# mount|grep xfs
/dev/xvda1 on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)

This machine is on aws and I have restart it after I change fstab like below.

UUID=8c1540fa-e2b4-407d-bcd1-59848a73e463 / xfs defaults,nofail,pquota,prjquota 0 0

Do you know how to enable quota on aws ec2?


(Alex Piggott) #15

Looks like this might be the issue: see under If you see "noquota" in the xfs mount options for the / partition

(Rockybean) #16

I try to mount a dedicated volume to ece node,like below:

    /dev/xvda1 on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
    /dev/xvdf on /mnt type xfs (rw,relatime,seclabel,attr2,inode64,prjquota) 

But when I restart docker the es container continues to restarting.

Terrible problem. I find logs as below.

[root@ip-172-31-27-255 ~]# docker ps|grep fac
72480aa4690f        regist***/cloud-assets/elasticsearch:5.6.13-0               "/sbin/entry-point"   23 hours ago        Restarting (10) Less than a second ago                                                                                                          fac-61657d98b1454298aac2fefbb2f00e91-instance-0000000008
78bbf3883341        registr***/cloud-assets/elasticsearch:5.6.13-0               "/sbin/entry-point"   6 days ago          Restarting (10) Less than a second ago                                                                                                          fac-a83495d0b60947a28df8edc4aa64f641-instance-0000000003
[root@ip-172-31-27-255 ~]# docker logs --tail 10 72480aa4690f
usermod: no changes
groupmod: failure while writing changes to /etc/group
usermod: no changes
groupmod: failure while writing changes to /etc/group
usermod: no changes
groupmod: failure while writing changes to /etc/group
usermod: no changes
groupmod: failure while writing changes to /etc/group
usermod: no changes
groupmod: failure while writing changes to /etc/group

I cannot decide what causes this problem. I try to disable selinux but still got this error.

As you see, this container continues to restart.

It's very strange. Do you have any idea what is going on here?

(Alex Piggott) #17

I've seen this in 2 different cases (one good and one bad)

The good one is simply that this is a permissions error (see all the permissions set up required in the install docs eg .. let's assume it's that ... what permissions do you have set up?

The bad one is when this is a nasty OS/docker incompatibility - I've only seen this with certain pre-built Azure images that had some non-standard modules compiled in, so it's not likely this is the issue.

(Rockybean) #18

I solved this problem by reinstall ece with the dedicated disk with quota enabled.

Thanks for your time!

(system) closed #19

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.