StatefulSet¶
StatefulSet is a workload API object used to manage stateful applications. Unlike Deployments, StatefulSets maintain a sticky identity for each pod and provide guarantees about ordering and uniqueness.
什么时候使用StatefulSet¶
Use StatefulSets when you need:
Stable, unique network identifiers: Each pod gets a persistent hostname
Stable, persistent storage: Volumes persist across pod rescheduling
Ordered, graceful deployment and scaling: Pods are created/deleted in order
Ordered, automated rolling updates: Updates happen in a specific sequence
Common Use Cases:
Databases (MySQL, PostgreSQL, MongoDB)
Distributed systems (Kafka, Elasticsearch, etcd)
Applications requiring stable network identity (ZooKeeper)
Stateful applications with leader election
StatefulSet vs Deployment¶
Feature |
Deployment |
StatefulSet |
|---|---|---|
Pod Identity |
Random names, interchangeable |
Stable, ordered names |
Network Identity |
Changes on restart |
Persistent hostname |
Storage |
Shared or ephemeral |
Persistent per pod |
Scaling Order |
Parallel, no guarantee |
Sequential, ordered |
Update Strategy |
Can be parallel |
Sequential, ordered |
Use Case |
Stateless apps |
Stateful apps |
StatefulSet基础概念¶
Pod Identity
StatefulSet pods have a unique, stable identity consisting of:
Ordinal Index: Pods are numbered 0 to N-1
Stable Network ID:
<statefulset-name>-<ordinal>Stable Hostname: Each pod has a predictable DNS name
Headless Service
StatefulSets require a Headless Service (ClusterIP: None) to manage network identity:
apiVersion: v1
kind: Service
metadata:
name: nginx-svc
spec:
clusterIP: None # Headless service
selector:
app: nginx
ports:
- port: 80
This creates DNS records for each pod: <pod-name>.<service-name>.<namespace>.svc.cluster.local
创建StatefulSet¶
Example: Nginx StatefulSet
apiVersion: v1
kind: Service
metadata:
name: nginx-svc
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None # Headless service
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: "nginx-svc" # Must match the headless service
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates: # Creates PVC for each pod
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "standard" # Use your storage class
resources:
requests:
storage: 1Gi
创建并验证:
# Create the StatefulSet
$ kubectl apply -f statefulset.yaml
service/nginx-svc created
statefulset.apps/web created
# Watch pods being created (notice the ordered creation)
$ kubectl get pods -w
NAME READY STATUS RESTARTS AGE
web-0 0/1 Pending 0 0s
web-0 0/1 ContainerCreating 0 0s
web-0 1/1 Running 0 2s
web-1 0/1 Pending 0 0s
web-1 0/1 ContainerCreating 0 0s
web-1 1/1 Running 0 2s
web-2 0/1 Pending 0 0s
web-2 0/1 ContainerCreating 0 0s
web-2 1/1 Running 0 2s
# View the StatefulSet
$ kubectl get statefulset
NAME READY AGE
web 3/3 1m
# View the pods (notice the ordinal names)
$ kubectl get pods -l app=nginx
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 1m
web-1 1/1 Running 0 1m
web-2 1/1 Running 0 1m
# View the PVCs (one per pod)
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY
www-web-0 Bound pvc-8c8c8c8c-8c8c-8c8c-8c8c-8c8c8c8c8c8c 1Gi
www-web-1 Bound pvc-9d9d9d9d-9d9d-9d9d-9d9d-9d9d9d9d9d9d 1Gi
www-web-2 Bound pvc-aeaeaeae-aeae-aeae-aeae-aeaeaeaeaeae 1Gi
稳定的网络标识¶
Each pod gets a stable DNS name:
# Format: <pod-name>.<service-name>.<namespace>.svc.cluster.local
# For example:
# - web-0.nginx-svc.default.svc.cluster.local
# - web-1.nginx-svc.default.svc.cluster.local
# - web-2.nginx-svc.default.svc.cluster.local
# Test DNS resolution
$ kubectl run -it --rm debug --image=busybox --restart=Never -- sh
# Inside the debug pod:
/ # nslookup web-0.nginx-svc.default.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: web-0.nginx-svc.default.svc.cluster.local
Address 1: 10.244.1.5 web-0.nginx-svc.default.svc.cluster.local
# Ping specific pod
/ # ping web-0.nginx-svc
PING web-0.nginx-svc (10.244.1.5): 56 data bytes
64 bytes from 10.244.1.5: seq=0 ttl=64 time=0.123 ms
Even if a pod is deleted and recreated, it maintains the same name and DNS record:
# Delete a pod
$ kubectl delete pod web-1
pod "web-1" deleted
# StatefulSet recreates it with the same name
$ kubectl get pods -l app=nginx
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 5m
web-1 1/1 Running 0 10s # Same name, new pod
web-2 1/1 Running 0 5m
持久化存储¶
Volume Claim Templates
StatefulSets use volumeClaimTemplates to automatically create a PVC for each pod:
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "fast-ssd"
resources:
requests:
storage: 10Gi
Key Points:
Each pod gets its own PVC
PVCs are named:
<template-name>-<statefulset-name>-<ordinal>PVCs persist even if pods are deleted
When pod is recreated, it reattaches to the same PVC
# Check PVCs
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS
data-web-0 Bound pv-001 10Gi RWO fast-ssd
data-web-1 Bound pv-002 10Gi RWO fast-ssd
data-web-2 Bound pv-003 10Gi RWO fast-ssd
# Delete a pod
$ kubectl delete pod web-1
# The PVC remains
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY
data-web-0 Bound pv-001 10Gi
data-web-1 Bound pv-002 10Gi # Still here!
data-web-2 Bound pv-003 10Gi
# New pod reattaches to the same PVC
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
web-1 1/1 Running 0 30s # Using data-web-1 PVC
扩缩容¶
Scaling Up (Sequential)
# Scale from 3 to 5 replicas
$ kubectl scale statefulset web --replicas=5
# Pods are created sequentially
$ kubectl get pods -w
web-3 0/1 Pending 0 0s
web-3 1/1 Running 0 2s
web-4 0/1 Pending 0 0s # web-4 starts after web-3 is ready
web-4 1/1 Running 0 2s
Scaling Down (Reverse Order)
# Scale from 5 to 3 replicas
$ kubectl scale statefulset web --replicas=3
# Pods are deleted in reverse order
$ kubectl get pods -w
web-4 1/1 Terminating 0 5m
web-4 0/1 Terminating 0 5m
web-3 1/1 Terminating 0 5m # web-3 deleted after web-4
web-3 0/1 Terminating 0 5m
# Final state
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 10m
web-1 1/1 Running 0 10m
web-2 1/1 Running 0 10m
Important: Scaling down does NOT delete PVCs automatically:
# PVCs remain after scaling down
$ kubectl get pvc
NAME STATUS VOLUME
data-web-0 Bound pv-001
data-web-1 Bound pv-002
data-web-2 Bound pv-003
data-web-3 Bound pv-004 # Still exists
data-web-4 Bound pv-005 # Still exists
# If you scale up again, pods reuse existing PVCs
$ kubectl scale statefulset web --replicas=5
更新策略¶
RollingUpdate (Default)
Updates pods in reverse ordinal order (N-1 to 0):
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 0 # Update all pods
# Update image
$ kubectl set image statefulset/web nginx=nginx:1.22
# Watch the rolling update (reverse order)
$ kubectl rollout status statefulset/web
waiting for statefulset rolling update to complete 0 pods at revision web-2...
waiting for statefulset rolling update to complete 1 pods at revision web-2...
waiting for statefulset rolling update to complete 2 pods at revision web-2...
statefulset rolling update complete 3 pods at revision web-2...
Partition Updates
Use partition to stage updates (canary deployments):
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 2 # Only update pods with ordinal >= 2
# This will only update web-2, web-3, web-4, etc.
# web-0 and web-1 remain at the old version
$ kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}'
OnDelete Strategy
Pods are updated only when manually deleted:
updateStrategy:
type: OnDelete
完整示例:MySQL StatefulSet¶
apiVersion: v1
kind: Service
metadata:
name: mysql
spec:
clusterIP: None
selector:
app: mysql
ports:
- port: 3306
name: mysql
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: mysql
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
value: "password"
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: data
mountPath: /var/lib/mysql
- name: config
mountPath: /etc/mysql/conf.d
volumes:
- name: config
configMap:
name: mysql-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "standard"
resources:
requests:
storage: 10Gi
连接到MySQL实例:
# Connect to a specific MySQL instance
$ kubectl run -it --rm mysql-client --image=mysql:8.0 --restart=Never -- \
mysql -h mysql-0.mysql.default.svc.cluster.local -p
# From another pod, you can connect to:
# - mysql-0.mysql.default.svc.cluster.local
# - mysql-1.mysql.default.svc.cluster.local
# - mysql-2.mysql.default.svc.cluster.local
StatefulSet管理¶
查看状态
# Get StatefulSet status
$ kubectl get statefulset web
NAME READY AGE
web 3/3 10m
# Describe StatefulSet
$ kubectl describe statefulset web
# Get detailed output
$ kubectl get statefulset web -o yaml
删除StatefulSet
# Delete StatefulSet but keep pods running
$ kubectl delete statefulset web --cascade=orphan
# Delete StatefulSet and pods (default)
$ kubectl delete statefulset web
# Note: PVCs are NOT deleted automatically
# Delete PVCs manually if needed
$ kubectl delete pvc www-web-0 www-web-1 www-web-2
暂停和恢复
# Pause rollout (for RollingUpdate)
$ kubectl rollout pause statefulset/web
# Resume rollout
$ kubectl rollout resume statefulset/web
# Check rollout history
$ kubectl rollout history statefulset/web
# Rollback to previous version
$ kubectl rollout undo statefulset/web
最佳实践¶
Always Use Headless Service
StatefulSets require a headless service for stable network identity.
Plan Storage Carefully
Choose appropriate StorageClass
Size PVCs appropriately (they can’t be shrunk)
Consider backup strategy
Set Pod Management Policy
spec: podManagementPolicy: OrderedReady # Default, sequential # OR podManagementPolicy: Parallel # Faster, but no ordering
Use Init Containers for Setup
Initialize data or configuration before main container starts.
Implement Readiness Probes
Ensure pods are truly ready before proceeding to next pod:
readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 10 periodSeconds: 5
Don’t Forget Resource Limits
resources: requests: memory: "1Gi" cpu: "500m" limits: memory: "2Gi" cpu: "1000m"
Plan for PVC Cleanup
Automate PVC deletion for scaled-down pods:
# Script to clean up unused PVCs kubectl get pvc | grep "www-web-" | awk '{if ($1 ~ /web-[3-9]/) print $1}' | xargs kubectl delete pvc
Use Anti-Affinity for HA
Spread pods across nodes:
affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - mysql topologyKey: kubernetes.io/hostname
故障排查¶
Pod Stuck in Pending
# Check PVC status
$ kubectl get pvc
# Check events
$ kubectl describe statefulset web
$ kubectl describe pod web-0
# Common causes:
# - No PV available for PVC
# - StorageClass doesn't exist
# - Insufficient node resources
Pod Not Getting Stable Identity
# Verify headless service exists
$ kubectl get svc nginx-svc
# Check if serviceName matches
$ kubectl get statefulset web -o yaml | grep serviceName
Slow Rolling Updates
# Check pod readiness probe
$ kubectl get pods
# Check events
$ kubectl get events --sort-by='.lastTimestamp'
# Consider using partition for staged rollouts
PVC Not Binding
# Check StorageClass
$ kubectl get storageclass
# Check PV availability
$ kubectl get pv
# Describe PVC
$ kubectl describe pvc data-web-0
参考资料¶
StatefulSet Basics: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/
StatefulSet Concepts: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
Run Replicated Stateful Application: https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/