DR failover and failback for RBD Async mirroring with minikube
This doc assumes that you already set up the rbd mirroring between two clusters.
Please refer to doc on how to setup RBD mirroring.
Set up initial DR components
we need to Enable two new sidecar deployment in the RBD provisioner pods which will help in achieving rbd mirroring
-
OMAP Regenerator -> This is a cephcsi component that regenerates the required OMAP data for cephcsi to perform CSI operations after failover.
-
Volume Replication operator -> Volume replication operator which exposes Replication CRD to perform the RBD async operations like enable and disable mirroring, promote, demote, force promote, and resync. Refer operator for more details about it.
Enable DR components
Upgrade to Rook v1.6.0+
Rook v1.6.0 comes with the new volume replication support and also cephcsi v3.3.0, Install/Upgrade to Rook v1.6.0 or higher versions.
Re-apply the common.yaml and crds.yaml
kubectl apply -f crds.yaml common.yaml
Once we reapply/create the crds.yaml we should see two new CRD’s related to volume replication
$ kubectl get crd |grep volumereplication
volumereplicationclasses.replication.storage.openshift.io 2021-04-28T05:57:57Z
volumereplications.replication.storage.openshift.io 2021-04-28T05:57:57Z
As we need to provide Edit rook-ceph-operator-config
configmap and add below
two configurations to the list
apiVersion: v1
data:
CSI_ENABLE_OMAP_GENERATOR: "true"
CSI_ENABLE_VOLUME_REPLICATION: "true"
$kubectl edit cm rook-ceph-operator-config -nrook-ceph
Once you edit and save the configmap you will see that CSI pods are getting recreated, just wait until you see 8 containers in
csi-rbdplugin-provisioner-xxx
pod.
kubectl get po -nrook-ceph
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-bkgkt 3/3 Running 0 5d23h
csi-cephfsplugin-provisioner-6544c5b576-ndf58 6/6 Running 0 5d23h
csi-rbdplugin-provisioner-7465df5764-5g8mj 8/8 Running 0 6m7s
csi-rbdplugin-vpq4k 3/3 Running 0 5d23h
rook-ceph-mgr-a-54bbfb54b9-pg827 1/1 Running 0 6d
rook-ceph-mon-a-79dfbdcdb6-qwssl 1/1 Running 0 6d
rook-ceph-operator-54cf7487d4-gdp5s 1/1 Running 0 6d
rook-ceph-osd-0-b99c76b7b-6cbnk 1/1 Running 0 6d
rook-ceph-osd-prepare-minicluster2-57552 0/1 Completed 0 6d
rook-ceph-rbd-mirror-a-d449c66db-2c4hr 1/1 Running 0 5d23h
rook-ceph-tools-78cdfd976c-nm2jx 1/1 Running 0 6d
kubectl logs po/csi-rbdplugin-provisioner-7465df5764-5g8mj -nrook-ceph
error: a container name must be specified for pod csi-rbdplugin-provisioner-7465df5764-5g8mj, choose one of: [csi-provisioner csi-resizer csi-attacher csi-snapshotter csi-omap-generator volume-replication csi-rbdplugin liveness-prometheus]
In the above logs we can see that
csi-omap-generator
andvolume-replication
containers got created incsi-rbdplugin-provisioner-xxx
pods.
Note The above steps need to be done on both clusters.
Now the initial bootstrapping is completed.
Provision storage on cluster1
Create a storageclass
and PVC
Create storageclass on cluster1
$cat <<EOF | kubectl --context=cluster1 apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
clusterID: rook-ceph
pool: replicapool
imageFormat: "2"
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Retain
EOF
storageclass.storage.k8s.io/rook-ceph-block created
Create a storageclass with
Retain
reclaimPolicy
so that we can use static binding for DR.
Create PersistentVolumeClaim on cluster1
$cat <<EOF | kubectl --context=cluster1 apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rbd-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: rook-ceph-block
EOF
Check PVC is in bound state
$kubectl get pvc --context=cluster1
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
rbd-pvc Bound pvc-57cfca36-cb76-4d79-b258-a107bc5c4d15 1Gi RWO rook-ceph-block 44s
Create Volume Replication class on cluster1
$cat <<EOF | kubectl --context=cluster1 apply -f -
apiVersion: replication.storage.openshift.io/v1alpha1
kind: VolumeReplicationClass
metadata:
name: rbd-volumereplicationclass
spec:
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
mirroringMode: snapshot
replication.storage.openshift.io/replication-secret-name: rook-csi-rbd-provisioner
replication.storage.openshift.io/replication-secret-namespace: rook-ceph
EOF
The volume replication class holds the storage admin information required for the volume replication operator.
Create Volume Replication for PVC on cluster1
$cat <<EOF | kubectl --context=cluster1 apply -f -
apiVersion: replication.storage.openshift.io/v1alpha1
kind: VolumeReplication
metadata:
name: pvc-volumereplication2
spec:
volumeReplicationClass: rbd-volumereplicationclass
replicationState: primary
dataSource:
apiGroup: ""
kind: PersistentVolumeClaim
name: rbd-pvc # Name of the PVC to which mirroring to be enabled.
EOF
replicationState
is the state of the volume being referenced. Possible values areprimary
,secondary
, andresync
.
primary
denotes that the volume is primarysecondary
denotes that the volume is secondaryresync
denotes that the volume needs to be resynced
Note the
VolumeReplication
is a namespace scoped object it should be in the namespace as of PVC.
Check VolumeReplication CR status
$ kubectl get volumereplication pvc-volumereplication -oyaml
...
spec:
dataSource:
apiGroup: ""
kind: PersistentVolumeClaim
name: rbd-pvc
replicationState: primary
volumeReplicationClass: rbd-volumereplicationclass
status:
conditions:
- lastTransitionTime: "2021-05-04T07:39:00Z"
message: ""
observedGeneration: 1
reason: Promoted
status: "True"
type: Completed
- lastTransitionTime: "2021-05-04T07:39:00Z"
message: ""
observedGeneration: 1
reason: Healthy
status: "False"
type: Degraded
- lastTransitionTime: "2021-05-04T07:39:00Z"
message: ""
observedGeneration: 1
reason: NotResyncing
status: "False"
type: Resyncing
lastCompletionTime: "2021-05-04T07:39:00Z"
lastStartTime: "2021-05-04T07:38:59Z"
message: volume is marked primary
observedGeneration: 1
state: Primary
Run commands on toolbox pod on cluster1 to check to mirror is enabled on ceph block pool
$kubectl get po --context=cluster1 -nrook-ceph -l app=rook-ceph-tools
NAME READY STATUS RESTARTS AGE
rook-ceph-tools-78cdfd976c-jrhjm 1/1 Running 0 3m53s
Verify that mirroring enabled on the image
$kubectl exec --context=cluster1 -nrook-ceph rook-ceph-tools-78cdfd976c-jrhjm -- rbd info csi-vol-1b82d349-2d5a-11eb-b8d5-0242ac110004 --pool=replicapool
rbd image 'csi-vol-1b82d349-2d5a-11eb-b8d5-0242ac110004':
size 1 GiB in 256 objects
order 22 (4 MiB objects)
snapshot_count: 1
id: 13a672b41f5f
block_name_prefix: rbd_data.13a672b41f5f
format: 2
features: layering
op_features:
flags:
create_timestamp: Mon Nov 23 07:04:21 2020
access_timestamp: Mon Nov 23 07:04:21 2020
modify_timestamp: Mon Nov 23 07:04:21 2020
mirroring state: enabled
mirroring mode: snapshot
mirroring global id: 7da10ec2-aa67-4b05-86f0-487b4fa9fdbb
mirroring primary: true
You can get the RBD image name from the PV object.
Verify that the image is mirrored on the secondary cluster
Run commands on toolbox pod on cluster2
$kubectl get po --context=cluster2 -nrook-ceph -l app=rook-ceph-tools
NAME READY STATUS RESTARTS AGE
rook-ceph-tools-78cdfd976c-2566m 1/1 Running 0 3m53s
$kubectl exec --context=cluster2 -nrook-ceph rook-ceph-tools-78cdfd976c-2566m -- rbd info csi-vol-1b82d349-2d5a-11eb-b8d5-0242ac110004 --pool=replicapool
rbd image 'csi-vol-1b82d349-2d5a-11eb-b8d5-0242ac110004':
size 1 GiB in 256 objects
order 22 (4 MiB objects)
snapshot_count: 1
id: 12a55edc8361
block_name_prefix: rbd_data.12a55edc8361
format: 2
features: layering, non-primary
op_features:
flags:
create_timestamp: Mon Nov 23 07:12:33 2020
access_timestamp: Mon Nov 23 07:12:33 2020
modify_timestamp: Mon Nov 23 07:12:33 2020
mirroring state: enabled
mirroring mode: snapshot
mirroring global id: 7da10ec2-aa67-4b05-86f0-487b4fa9fdbb
mirroring primary: false
Take a backup of PVC and PV object on cluster1
kubectl get pvc rbd-pvc -oyaml >pvc-bkp.yaml
kubectl get pv/pvc-0b93de3a-3946-4909-a189-f44dba0abc76 -oyaml >pv_bkp.yaml
Remove the claimRef on the backed up PV objects in yaml files
sample claimRef in PV object
apiVersion: v1
kind: PersistentVolume
metadata:
...
name: pvc-0b93de3a-3946-4909-a189-f44dba0abc76
resourceVersion: "2516"
selfLink: /api/v1/persistentvolumes/pvc-0b93de3a-3946-4909-a189-f44dba0abc76
uid: 0b1a1560-2c0b-4906-8059-d6be95954bc0
spec:
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: rbd-pvc
namespace: default
resourceVersion: "2512"
uid: 0b93de3a-3946-4909-a189-f44dba0abc76
...
Remove the claimRef in all PV backup’s
vi pv_bkp.yaml
Planned Storage Migration
Failover
Follow the Below steps for planned migration when you want to move the workload from one cluster to another cluster
- Scale down all the application pods which are using the mirrored PVC
- Take a back up of PVC and PV object from the cluster1
- This can be done using some backup tools like velero also
- update replicationState to secondary in VolumeReplication CR at Primary Site. When the operator sees this change, it will pass the information down to the driver via GRPC request to mark the dataSource as secondary.
- If you are manually recreating the PVC and PV on the secondary cluster. remove the claimRef section in the PV objects.
- Recreate the storageclass, PVC, and PV objects on the cluster2
- As you are creating the static binding between PVC and PV, a new PV won’t be created here, the PVC will get bind to the existing PV.
- Create the VolumeReplicationClass on the cluster2
- Create the VolumeReplications for all the PVC’s for which mirroring is enabled
replicationState
should beprimary
for all the PVC’s on the cluster1
- Check whether the image is primary on cluster2 by looking at toolbox or VolumeReplication CR status
- Once the Image is marked as Primary create the application to use the PVC In the case of planned migration,
Failback
Once the planned work is done and you want to failback, follow the above Failover steps again. the only difference, in this case, is we don’t need to recreate anything we just need to change replicationState
from secondary
to primary
in current cluster1.
Storage Disaster Recovery
Failover
In case of disaster recovery, create VolumeReplication CR at the Secondary Site. Since the connection to the Primary Site is lost, the operator automatically sends a GRPC request down to the driver to forcefully mark the dataSource as primary.
- If you are manually recreating the PVC and PV on the secondary cluster. remove the claimRef section in the PV objects.
- Recreate the storageclass, PVC, and PV objects on the cluster2
- As you are creating the static binding between PVC and PV, a new PV won’t be created here, the PVC will get bind to the existing PV.
- Create the VolumeReplicationClass and VolumeReplication CR on the cluster2
- Check whether the image is primary on cluster2 by looking at toolbox or VolumeReplication CR status
- Once the Image is marked as Primary create the application to use the PVC In the case of planned migration,
Failback
Once the failed cluster is recovered and you want to failback, follow the below steps
- Update the VolumeReplication CR
replicationState
state fromprimary
tosecondary
in cluster1 - Scale down the applications on the cluster2
- Update the VolumeReplication CR
replicationState
state fromprimary
tosecondary
in cluster2 - On the cluster1 verify that the VR status is marked as volume ready to use
- Once the volume is marked to ready to use, change the
replicationState
state fromsecondary
toprimary
in cluster1 - Scale up the applications again on cluster1