100% io utilization after migrating to XFS from EXT4

Brandon_Kauffman · February 15, 2024, 5:34pm

We recently switched from EXT4 to XFS in an upgrade from rhel7 to 9. Both were using LVM and RAID-0.

We noticed that IO utilization stays near 100%

Before it was near 30%

That being said, disk performance seems to be improved:

xfs


Jobs: 1 (f=1): [m(1)][100.0%][r=179MiB/s,w=60.0MiB/s][r=45.8k,w=15.4k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=137015: Thu Feb 15 16:35:38 2024
  read: IOPS=53.8k, BW=210MiB/s (220MB/s)(37.5GiB/182764msec)
   bw (  KiB/s): min=162680, max=233032, per=100.00%, avg=215332.00, stdev=9183.35, samples=365
   iops        : min=40670, max=58258, avg=53832.96, stdev=2295.89, samples=365
  write: IOPS=17.9k, BW=70.0MiB/s (73.4MB/s)(12.5GiB/182764msec); 0 zone resets
   bw (  KiB/s): min=54584, max=77864, per=100.00%, avg=71764.69, stdev=3038.69, samples=365
   iops        : min=13646, max=19466, avg=17941.14, stdev=759.70, samples=365
  cpu          : usr=13.34%, sys=35.70%, ctx=1301039, majf=0, minf=466
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=9830837,3276363,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
   READ: bw=210MiB/s (220MB/s), 210MiB/s-210MiB/s (220MB/s-220MB/s), io=37.5GiB (40.3GB), run=182764-182764msec
  WRITE: bw=70.0MiB/s (73.4MB/s), 70.0MiB/s-70.0MiB/s (73.4MB/s-73.4MB/s), io=12.5GiB (13.4GB), run=182764-182764msec
Disk stats (read/write):
    dm-2: ios=9892502/3815818, merge=0/0, ticks=8119978/3427405, in_queue=11547383, util=100.00%, aggrios=9900533/3832297, aggrmerge=3/1918, aggrticks=8119893/3475893, aggrin_queue=11595786, aggrutil=100.00%
  sda: ios=9900533/3832297, merge=3/1918, ticks=8119893/3475893, in_queue=11595786, util=100.00%

ext4

Jobs: 1 (f=1): [m(1)][100.0%][r=138MiB/s,w=46.5MiB/s][r=35.2k,w=11.9k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=22582: Thu Feb 15 11:31:51 2024
   read: IOPS=40.1k, BW=157MiB/s (164MB/s)(37.5GiB/245228msec)
   bw (  KiB/s): min=69820, max=214576, per=100.00%, avg=160348.09, stdev=18431.93, samples=490
   iops        : min=17455, max=53644, avg=40086.98, stdev=4607.97, samples=490
  write: IOPS=13.4k, BW=52.2MiB/s (54.7MB/s)(12.5GiB/245228msec)
   bw (  KiB/s): min=22586, max=70568, per=100.00%, avg=53440.14, stdev=6173.39, samples=490
   iops        : min= 5646, max=17642, avg=13360.01, stdev=1543.36, samples=490
  cpu          : usr=16.55%, sys=75.96%, ctx=130793, majf=0, minf=10804
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=9830837,3276363,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
   READ: bw=157MiB/s (164MB/s), 157MiB/s-157MiB/s (164MB/s-164MB/s), io=37.5GiB (40.3GB), run=245228-245228msec
  WRITE: bw=52.2MiB/s (54.7MB/s), 52.2MiB/s-52.2MiB/s (54.7MB/s-54.7MB/s), io=12.5GiB (13.4GB), run=245228-245228msec
Disk stats (read/write):
    dm-3: ios=10027319/6362087, merge=0/0, ticks=4203766/1995710, in_queue=6209508, util=100.00%, aggrios=10033482/4346423, aggrmerge=1/2020208, aggrticks=4201752/1496265, aggrin_queue=5697005, aggrutil=100.00%
  sda: ios=10033482/4346423, merge=1/2020208, ticks=4201752/1496265, in_queue=5697005, util=100.00%

Test were run with fio

Any thoughts on what is causing utilization and iops to stay so high, and should we be concerned that they are?

Brandon_Kauffman · February 15, 2024, 5:36pm

Graph of before and after chg

system · March 14, 2024, 5:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch 7.1 have rediculously high disk read io (100%) Elasticsearch	3	1722	April 27, 2020
High disk read resulting in io wait on new Cluster 7.9 Elasticsearch	11	1973	April 20, 2022
High I/O usage on large Elasticsearch Instance Elasticsearch	9	2295	November 3, 2021
Meaning of different FS IO information from the Cluster stats? Elasticsearch	2	650	July 6, 2017
High I/O read (100%) during query time Elasticsearch	1	954	July 5, 2017

100% io utilization after migrating to XFS from EXT4

Related topics