I have k8s cluster and ceph cluster are connected to each other. Both are built on HDD. Let’s name them hdd-cluster
.
Also, I have k8s cluster and ceph cluster are built on SSD. They are connected to each other. Let’s name them ssd-cluster
Ok, we have
hdd-cluster
: the first k8s cluster can create persistentVolume from ceph(hdd)
ssd-cluster
: the second k8s cluster can create persistentVolume from ceph(ssd)
Let’s create pod
connected with pv
. Do it for each cluster.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
|
cat <<EOF > pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rbd-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: csi-rbd-sc
EOF
cat <<EOF > pod.yml
---
apiVersion: v1
kind: Pod
metadata:
name: csi-rbd-demo-pod
spec:
containers:
- name: webserver
image: nginx:latest
volumeMounts:
- name: mypvc
mountPath: /data
volumes:
- name: mypvc
persistentVolumeClaim:
claimName: rbd-pvc
readOnly: false
EOF
kubectl apply -f pvc.yml
kubectl get pvc
kubectl get pv
kubectl apply -f pod.yml
kubectl get po
kubectl exec -it pod/csi-rbd-demo-pod -- bash
|
I will check read and write in 2 directories:
We need to check sequential access and random access.
Definately, sequential access will be more prefered for big files.
For operation systems or databases you should choose random access.
The best advantage SSD is random access (it’s about 10-20 times more then hdd has)
Sequential access
To check the disk write speed using the dd command in Linux, you can use the following command syntax:
1
|
dd if=/dev/zero of=/path/to/your/file oflag=dsync bs=block_size count=block_count status=progress
|
Where:
if=/dev/zero
specifies the data source (in this case, zero bytes)
of=/path/to/your/file
specifies the file where the data will be written
bs=block_size
defines the size of the data block
count=block_count
specifies the number of data blocks to write
oflag=dsync
writing without metadata
For example, to measure the write speed to the disk, you can run the command:
/data
is mounted disk from ceph
- Presumably, HDD disks also have cache from raid-controller or hypervisor in this instalation
1
2
3
4
5
6
7
8
9
10
11
12
13
|
time dd if=/dev/zero of=/tmp/tempfile-1M count=10000 oflag=dsync status=progress bs=1M
10485760000 bytes (10 GB, 9.8 GiB) copied, 38.7035 s, 271 MB/s
time dd if=/dev/zero of=/tmp/tempfile-100M count=100 oflag=dsync status=progress bs=100M
10485760000 bytes (10 GB, 9.8 GiB) copied, 13.7906 s, 760 MB/s
time dd if=/dev/zero of=/tmp/tempfile-200M count=51 oflag=dsync status=progress bs=200M
10695475200 bytes (11 GB, 10 GiB) copied, 12.3486 s, 866 MB/s
time dd if=/dev/zero of=/data/tempfile-1M count=10000 oflag=dsync status=progress bs=1M
10485760000 bytes (10 GB, 9.8 GiB) copied, 123.676 s, 84.8 MB/s
time dd if=/dev/zero of=/data/tempfile-100M count=100 oflag=dsync status=progress bs=100M
10485760000 bytes (10 GB, 9.8 GiB) copied, 28.226 s, 371 MB/s
time dd if=/dev/zero of=/data/tempfile-200M count=50 oflag=dsync status=progress bs=200M
10485760000 bytes (10 GB, 9.8 GiB) copied, 28.939 s, 362 MB/s
|
Clean cache and read files:
1
2
3
4
5
6
7
8
9
10
|
/sbin/sysctl -w vm.drop_caches=3
echo 3 > /proc/sys/vm/drop_caches
# the first read will be without cache :)
dd if=/data/tempfile-1M of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB, 9.8 GiB) copied, 29.7642 s, 352 MB/s
dd if=/data/tempfile-100M of=/dev/null bs=100M count=100
10485760000 bytes (10 GB, 9.8 GiB) copied, 20.5058 s, 511 MB/s
dd if=/data/tempfile-200M of=/dev/null bs=200M count=50
10485760000 bytes (10 GB, 9.8 GiB) copied, 20.8717 s, 502 MB/s
|
Random access (with fio)
Random ssd (ceph):
25.5MB/s - omg, I guess that doesn’t ring true! I’m puzzled!
1
2
3
4
|
fio --filename=/data/fio-test --size=10GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=myjob --eta-newline=1
Run status group 0 (all jobs):
READ: bw=24.3MiB/s (25.5MB/s), 24.3MiB/s-24.3MiB/s (25.5MB/s-25.5MB/s), io=2915MiB (3057MB), run=120031-120031msec
WRITE: bw=24.3MiB/s (25.5MB/s), 24.3MiB/s-24.3MiB/s (25.5MB/s-25.5MB/s), io=2921MiB (3063MB), run=120031-120031msec
|
Random ssd (host):
1
2
3
4
|
fio --filename=/tmp/fio-test --size=10GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=myjob --eta-newline=1
Run status group 0 (all jobs):
READ: bw=5063KiB/s (5184kB/s), 5063KiB/s-5063KiB/s (5184kB/s-5184kB/s), io=593MiB (622MB), run=120002-120002msec
WRITE: bw=5072KiB/s (5194kB/s), 5072KiB/s-5072KiB/s (5194kB/s-5194kB/s), io=594MiB (623MB), run=120002-120002msec
|
Random hdd (ceph):
1
2
3
4
|
fio --filename=/data/fio-test --size=10GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=myjob --eta-newline=1
Run status group 0 (all jobs):
READ: bw=316KiB/s (324kB/s), 316KiB/s-316KiB/s (324kB/s-324kB/s), io=37.5MiB (39.3MB), run=121389-121389msec
WRITE: bw=328KiB/s (336kB/s), 328KiB/s-328KiB/s (336kB/s-336kB/s), io=38.9MiB (40.8MB), run=121389-121389msec
|
Random hdd (host):
1
2
3
4
|
fio --filename=/tmp/fio-test --size=10GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=myjob --eta-newline=1
Run status group 0 (all jobs):
READ: bw=1077KiB/s (1103kB/s), 1077KiB/s-1077KiB/s (1103kB/s-1103kB/s), io=126MiB (132MB), run=120010-120010msec
WRITE: bw=1087KiB/s (1113kB/s), 1087KiB/s-1087KiB/s (1113kB/s-1113kB/s), io=127MiB (134MB), run=120010-120010msec
|
Bonus, let’s check the speed on my local nvme disk.
Random local nvme (myhost):
1
2
3
4
|
fio --filename=/mnt/pve/nvme-pool/test/fio-test --size=10GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=myjob --eta-newline=1
Run status group 0 (all jobs):
READ: bw=43.5MiB/s (45.6MB/s), 43.5MiB/s-43.5MiB/s (45.6MB/s-45.6MB/s), io=5227MiB (5481MB), run=120119-120119msec
WRITE: bw=43.5MiB/s (45.7MB/s), 43.5MiB/s-43.5MiB/s (45.7MB/s-45.7MB/s), io=5230MiB (5485MB), run=120119-120119msec
|
The best Sample FIO Commands for Block Volume Performance Tests on Linux-based Instances