Contents

How to check disk write and read speed

I have k8s cluster and ceph cluster are connected to each other. Both are built on HDD. Let’s name them hdd-cluster. Also, I have k8s cluster and ceph cluster are built on SSD. They are connected to each other. Let’s name them ssd-cluster

Ok, we have

  • hdd-cluster: the first k8s cluster can create persistentVolume from ceph(hdd)
  • ssd-cluster: the second k8s cluster can create persistentVolume from ceph(ssd)

Let’s create pod connected with pv. Do it for each cluster.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
cat <<EOF > pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: csi-rbd-sc
EOF

cat <<EOF > pod.yml
---
apiVersion: v1
kind: Pod
metadata:
  name: csi-rbd-demo-pod
spec:
  containers:
    - name: webserver
      image: nginx:latest
      volumeMounts:
        - name: mypvc
          mountPath: /data
  volumes:
    - name: mypvc
      persistentVolumeClaim:
        claimName: rbd-pvc
        readOnly: false
EOF

kubectl apply -f pvc.yml
kubectl get pvc
kubectl get pv

kubectl apply -f pod.yml
kubectl get po

kubectl exec -it pod/csi-rbd-demo-pod -- bash

I will check read and write in 2 directories:

  • ceph (/data)
  • host (/tmp)

We need to check sequential access and random access.

Definately, sequential access will be more prefered for big files.
For operation systems or databases you should choose random access.

The best advantage SSD is random access (it’s about 10-20 times more then hdd has)

Sequential access

To check the disk write speed using the dd command in Linux, you can use the following command syntax:

1
dd if=/dev/zero of=/path/to/your/file oflag=dsync bs=block_size count=block_count status=progress

Where:

  • if=/dev/zero specifies the data source (in this case, zero bytes)
  • of=/path/to/your/file specifies the file where the data will be written
  • bs=block_size defines the size of the data block
  • count=block_count specifies the number of data blocks to write
  • oflag=dsync writing without metadata

For example, to measure the write speed to the disk, you can run the command:

  • /data is mounted disk from ceph
  • Presumably, HDD disks also have cache from raid-controller or hypervisor in this instalation
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
time dd if=/dev/zero of=/tmp/tempfile-1M count=10000 oflag=dsync status=progress bs=1M
  10485760000 bytes (10 GB, 9.8 GiB) copied, 38.7035 s, 271 MB/s
time dd if=/dev/zero of=/tmp/tempfile-100M count=100 oflag=dsync status=progress bs=100M
  10485760000 bytes (10 GB, 9.8 GiB) copied, 13.7906 s, 760 MB/s
time dd if=/dev/zero of=/tmp/tempfile-200M count=51 oflag=dsync status=progress bs=200M
  10695475200 bytes (11 GB, 10 GiB) copied, 12.3486 s, 866 MB/s

time dd if=/dev/zero of=/data/tempfile-1M count=10000 oflag=dsync status=progress bs=1M
  10485760000 bytes (10 GB, 9.8 GiB) copied, 123.676 s, 84.8 MB/s
time dd if=/dev/zero of=/data/tempfile-100M count=100 oflag=dsync status=progress bs=100M
  10485760000 bytes (10 GB, 9.8 GiB) copied, 28.226 s, 371 MB/s
time dd if=/dev/zero of=/data/tempfile-200M count=50 oflag=dsync status=progress bs=200M
  10485760000 bytes (10 GB, 9.8 GiB) copied, 28.939 s, 362 MB/s

Clean cache and read files:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
/sbin/sysctl -w vm.drop_caches=3
echo 3 > /proc/sys/vm/drop_caches
# the first read will be without cache :)

dd if=/data/tempfile-1M of=/dev/null bs=1M count=10000
  10485760000 bytes (10 GB, 9.8 GiB) copied, 29.7642 s, 352 MB/s
dd if=/data/tempfile-100M of=/dev/null bs=100M count=100
  10485760000 bytes (10 GB, 9.8 GiB) copied, 20.5058 s, 511 MB/s
dd if=/data/tempfile-200M of=/dev/null bs=200M count=50
  10485760000 bytes (10 GB, 9.8 GiB) copied, 20.8717 s, 502 MB/s

Random access (with fio)

Random ssd (ceph):

25.5MB/s - omg, I guess that doesn’t ring true! I’m puzzled!

1
2
3
4
fio --filename=/data/fio-test --size=10GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=myjob --eta-newline=1
Run status group 0 (all jobs):
   READ: bw=24.3MiB/s (25.5MB/s), 24.3MiB/s-24.3MiB/s (25.5MB/s-25.5MB/s), io=2915MiB (3057MB), run=120031-120031msec
  WRITE: bw=24.3MiB/s (25.5MB/s), 24.3MiB/s-24.3MiB/s (25.5MB/s-25.5MB/s), io=2921MiB (3063MB), run=120031-120031msec

Random ssd (host):

1
2
3
4
fio --filename=/tmp/fio-test --size=10GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=myjob --eta-newline=1
Run status group 0 (all jobs):
   READ: bw=5063KiB/s (5184kB/s), 5063KiB/s-5063KiB/s (5184kB/s-5184kB/s), io=593MiB (622MB), run=120002-120002msec
  WRITE: bw=5072KiB/s (5194kB/s), 5072KiB/s-5072KiB/s (5194kB/s-5194kB/s), io=594MiB (623MB), run=120002-120002msec

Random hdd (ceph):

1
2
3
4
fio --filename=/data/fio-test --size=10GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=myjob --eta-newline=1
Run status group 0 (all jobs):
   READ: bw=316KiB/s (324kB/s), 316KiB/s-316KiB/s (324kB/s-324kB/s), io=37.5MiB (39.3MB), run=121389-121389msec
  WRITE: bw=328KiB/s (336kB/s), 328KiB/s-328KiB/s (336kB/s-336kB/s), io=38.9MiB (40.8MB), run=121389-121389msec

Random hdd (host):

1
2
3
4
fio --filename=/tmp/fio-test --size=10GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=myjob --eta-newline=1
Run status group 0 (all jobs):
   READ: bw=1077KiB/s (1103kB/s), 1077KiB/s-1077KiB/s (1103kB/s-1103kB/s), io=126MiB (132MB), run=120010-120010msec
  WRITE: bw=1087KiB/s (1113kB/s), 1087KiB/s-1087KiB/s (1113kB/s-1113kB/s), io=127MiB (134MB), run=120010-120010msec

Bonus, let’s check the speed on my local nvme disk.

Random local nvme (myhost):

1
2
3
4
fio --filename=/mnt/pve/nvme-pool/test/fio-test --size=10GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=myjob --eta-newline=1
Run status group 0 (all jobs):
   READ: bw=43.5MiB/s (45.6MB/s), 43.5MiB/s-43.5MiB/s (45.6MB/s-45.6MB/s), io=5227MiB (5481MB), run=120119-120119msec
  WRITE: bw=43.5MiB/s (45.7MB/s), 43.5MiB/s-43.5MiB/s (45.7MB/s-45.7MB/s), io=5230MiB (5485MB), run=120119-120119msec

The best Sample FIO Commands for Block Volume Performance Tests on Linux-based Instances