ECE XFS Quota Question

rockybean · January 18, 2019, 1:11am

Hi,

I'm testing ECE 2.0 to see if this is a better solution for our current es cluster management work.
I see following warning logs in the cluster log.

[2018-11-02 16:20:59,091][WARN ][no.found.runner.allocation.elasticsearch.ElasticsearchDockerContainer] Quota path not initialized, creating directory for path: [/mnt/data/elastic/172.31.16.63/services/allocator/containers/elasticsearch/1b01fd74758543d7857b6aa53a54389b/instance-0000000001/data] {"ec_container_kind":"elasticsearch","ec_container_group":"1b01fd74758543d7857b6aa53a54389b","ec_container_name":"instance-0000000001"}

It seems like that XFS Quota does not function normally. And I see some strange display in ece cloud ui and Kibana Monitoring page.

For example, I create a cluster with 1GB Mem and 32GB Disk. Then in cloud ui, the es node is shown as 1GB Mem and 32GB Disk. But if I connect to Kibana and see nodes in Monitoring Panel, the content is different. The node is shown as 1GB Mem but 200GB disk which is the total size of the vm I run ece allocator.

Maybe this is relative to xfs quota. Please can you give me some advice for this?
This seems to be a severe issue which will bring confusion to our users.

thanks!

Alex_Piggott · January 18, 2019, 2:00am

I think that's what you get if XFS isn't set up properly. What is your fstab?

Alex_Piggott · January 18, 2019, 2:16pm

Following up on this with a bit more detail:

I have some recollection that we only describe configuring XFS on Ubuntu - https://www.elastic.co/guide/en/cloud-enterprise/2.0/ece-configure-hosts.html#ece-xfs-setup-trusty - because it's usually installed/configured by default on RHEL/Centos, but there are some set ups that bypass that ... the key thing to check is whether you have a line like /dev/xvdg1 /mnt/data xfs defaults,nofail,pquota,prjquota 0 2 corresponding to the data directory

Alex

rockybean · January 21, 2019, 12:30am

Sorry for late reply.

I will try your configuration.

thanks!

rockybean · January 22, 2019, 7:38am

@Alex_Piggott

I change fstab as you suggest.

But it does not solve my problem. The node still get the disk size of container host like below.

But at cloud ui, the disk size is as below:

My docker info is as below:

[root@ip-172-31-27-255 ~]# docker info
Containers: 20
 Running: 20
 Paused: 0
 Stopped: 0
Images: 7
Server Version: 18.03.1-ce
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-957.1.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.26GiB
Name: ip-172-31-27-255.cn-north-1.compute.internal
ID: GO5W:7PKF:DTBB:7WK2:2Z4Z:QQYX:2ZZQ:OI5R:E7FR:QZRS:XXQK:22IF
Docker Root Dir: /mnt/data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Please help!

Alex_Piggott · January 22, 2019, 2:53pm

Can you gist the output of _nodes/stats/fs?

Eg I get:

        "data": [
          {
            "total_in_bytes": 34359738368,
            "free_in_bytes": 34347282432,
            "mount": "QuotaAwareFileStore(/app (/dev/mapper/data))",
            "path": "/app/data/nodes/0",
            "type": "xfs",
            "available_in_bytes": 34347282432
          }

for a container that has 32GB available via XFS

Can you also check the contents of /app/config/fsquota.properties?

How big is the actual disk incidentally?

Alex_Piggott · January 22, 2019, 2:56pm

(oh other question .. what steps did you go through to get from "XFS not configured" to "XFS configured"? Eg you may need to reallocate the clusters, I'm not sure how "dynamic" that setting is?)

rockybean · January 22, 2019, 2:59pm

Sounds good! I will try to recreate the cluster. Wait a moment!

thanks!

rockybean · January 22, 2019, 3:08pm

Still not working
_nodes/stats/fs info is as below

gist.github.com

https://gist.github.com/rockybean/263978092d03eda669d2449a26a627c5

es_node_stats_fs

{
  "_nodes": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "cluster_name": "61657d98b1454298aac2fefbb2f00e91",
  "nodes": {
    "pQdNfyCJQC2lvG5wxmtWig": {
      "timestamp": 1548169455122,

This file has been truncated. show original

/app/config/fsquota.properties is as below:

[root@ip-172-31-27-255 ~]# docker exec -it ae bash
root@ae1af9b2cead:/# cat /app/config/fsquota.properties
#Usage in xfs quota
#Tue Jan 22 15:03:22 GMT 2019
remaining=9223372036854775807
total=9223372036854775807

The actual disk size is as below:

[root@ip-172-31-27-255 ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      300G  6.2G  294G   3% /

I changed /etc/fstab and then restart all ece machine.

rockybean · January 22, 2019, 3:12pm

@Alex_Piggott
I find some quota log in allocator service logs.

[2019-01-22 15:03:22,688][INFO ][no.found.runner.managers.XFSQuotaManager] Adding quota using project id [XFSProjectName(es,61657d98b1454298aac2fefbb2f00e91,instance-0000000005)]
 with hard limit: [65536] MB {"ec_container_kind":"elasticsearch","ec_container_group":"61657d98b1454298aac2fefbb2f00e91","ec_container_name":"instance-0000000005"}
[2019-01-22 15:03:22,708][WARN ][no.found.runner.managers.XFSQuotaManager] Command [[sudo, -n, xfs_quota, -x, -c, limit -p bhard=65536m rtbhard=65536m 11292, /mnt/data]] returned
 status code [1]] with output: [O: [Setting up project 11292 (path /mnt/data/elastic/172.31.27.255/services/allocator/containers/elasticsearch/61657d98b1454298aac2fefbb2f00e91/in
stance-0000000005/data)...], O: [Processed 1 (/etc/projects and cmdline) paths for project 11292 with recursion depth infinite (-1).], O: [Setting up project 11292 (path /mnt/dat
a/elastic/172.31.27.255/services/allocator/containers/elasticsearch/61657d98b1454298aac2fefbb2f00e91/instance-0000000005/logs)...], O: [Processed 1 (/etc/projects and cmdline) pa
ths for project 11292 with recursion depth infinite (-1).], O: [Setting up project 11292 (path /mnt/data/elastic/172.31.27.255/services/allocator/containers/elasticsearch/61657d9
8b1454298aac2fefbb2f00e91/instance-0000000005/heap_dumps/compressed)...], O: [Processed 1 (/etc/projects and cmdline) paths for project 11292 with recursion depth infinite (-1).]
, E: [xfs_quota: cannot set limits: Function not implemented]] {}

Hope this can help

rockybean · January 22, 2019, 3:25pm

May it is caused by selinux.

I will try to disable it and reallocate cluster.

Alex_Piggott · January 22, 2019, 3:34pm

Looks like it's an XFS side issue ... XFS has some handy command line tools for debugging this sort of thing (eg see https://unix.stackexchange.com/questions/224606/xfs-directory-quota-doesnt-work which shows use of xfs_quota -c report, and also has a candidate solution)

Alex

Alex_Piggott · January 22, 2019, 3:42pm

That Function not implemented is what you get when project quotas are not enabled, so I think we must still be missing some setup

What does mount | grep xfs return?

rockybean · January 22, 2019, 3:44pm

You are right.

The quota is not enabled.

[root@ip-172-31-27-255 ~]# mount|grep xfs
/dev/xvda1 on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)

This machine is on aws and I have restart it after I change fstab like below.

UUID=8c1540fa-e2b4-407d-bcd1-59848a73e463 / xfs defaults,nofail,pquota,prjquota 0 0

Do you know how to enable quota on aws ec2?

Thanks!

Alex_Piggott · January 22, 2019, 5:10pm

Looks like this might be the issue: https://help.directadmin.com/item.php?id=557 see under If you see "noquota" in the xfs mount options for the / partition

rockybean · January 23, 2019, 2:30pm

@Alex_Piggott
I try to mount a dedicated volume to ece node,like below:

    /dev/xvda1 on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
    /dev/xvdf on /mnt type xfs (rw,relatime,seclabel,attr2,inode64,prjquota)

But when I restart docker the es container continues to restarting.

Terrible problem. I find logs as below.

[root@ip-172-31-27-255 ~]# docker ps|grep fac
72480aa4690f        regist***/cloud-assets/elasticsearch:5.6.13-0               "/sbin/entry-point"   23 hours ago        Restarting (10) Less than a second ago                                                                                                          fac-61657d98b1454298aac2fefbb2f00e91-instance-0000000008
78bbf3883341        registr***/cloud-assets/elasticsearch:5.6.13-0               "/sbin/entry-point"   6 days ago          Restarting (10) Less than a second ago                                                                                                          fac-a83495d0b60947a28df8edc4aa64f641-instance-0000000003
[root@ip-172-31-27-255 ~]# docker logs --tail 10 72480aa4690f
usermod: no changes
groupmod: failure while writing changes to /etc/group
usermod: no changes
groupmod: failure while writing changes to /etc/group
usermod: no changes
groupmod: failure while writing changes to /etc/group
usermod: no changes
groupmod: failure while writing changes to /etc/group
usermod: no changes
groupmod: failure while writing changes to /etc/group

I cannot decide what causes this problem. I try to disable selinux but still got this error.

As you see, this container continues to restart.

It's very strange. Do you have any idea what is going on here?

Alex_Piggott · January 23, 2019, 2:45pm

I've seen this in 2 different cases (one good and one bad)

The good one is simply that this is a permissions error (see all the permissions set up required in the install docs eg https://www.elastic.co/guide/en/cloud-enterprise/current/ece-configure-hosts.html#ece-xfs-setup-trusty) .. let's assume it's that ... what permissions do you have set up?

The bad one is when this is a nasty OS/docker incompatibility - I've only seen this with certain pre-built Azure images that had some non-standard modules compiled in, so it's not likely this is the issue.

rockybean · January 24, 2019, 1:01am

I solved this problem by reinstall ece with the dedicated disk with quota enabled.

Thanks for your time!
@Alex_Piggott

system · February 7, 2019, 1:01am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Xfs_quota not working Elastic Cloud Enterprise (ECE)	4	2021	February 18, 2019
XFS quotas does not work in ECE beta1 Elastic Cloud Enterprise (ECE)	3	1253	March 29, 2017
ESE allocators xfs not showing allocated space Elastic Cloud Enterprise (ECE)	5	796	October 26, 2018
Not enough capacity... why? Elastic Cloud Enterprise (ECE)	4	1775	August 8, 2017
Elasticsearch Shard Allocation - ALLOCATION_FAILED due to apparent disk quota issues, nowhere near max Elasticsearch docker	2	1344	August 18, 2020

ECE XFS Quota Question

Related topics