Introduction to Kubernetes Scheduling and Node Affinity
You've deployed your first Kubernetes cluster, and your application is running. But have you ever wondered how Kubernetes decides which node to place each pod on? The default scheduler does a decent job, but real-world applications often have specific requirements that the default behavior doesn't handle. Maybe you need to place a database pod on a node with high memory, or keep a GPU-intensive application on nodes with dedicated hardware. This is where scheduling and node affinity come into play.
In this article, you'll learn how Kubernetes scheduling works under the hood, why it matters for your applications, and how to use node affinity to control pod placement with precision. Understanding these concepts will help you optimize resource utilization, improve performance, and avoid common scheduling pitfalls that can lead to application instability.
How Kubernetes Scheduling Works
Kubernetes uses a pluggable scheduler architecture. The default scheduler (kube-scheduler) is a single process that runs on the control plane and makes scheduling decisions for all pods that don't already have a node assigned. When a pod is created without a nodeName field, the scheduler evaluates all available nodes and selects the best one based on a set of predicates and priorities.
Scheduling Predicates
Predicates are rules that determine if a node can schedule a pod. The scheduler checks each predicate in sequence, and if any predicate fails, the node is eliminated from consideration. Common predicates include:
- PodFitsResources: Ensures the node has sufficient CPU and memory resources
- PodFitsHostPorts: Checks if the node has available host ports
- MatchNodeSelector: Verifies the node matches the pod's node selector
- HostName: Confirms the node matches the pod's
nodeNamefield
Scheduling Priorities
Once predicates filter the candidate nodes, priorities are applied to rank them. Higher-scoring nodes are preferred. Common priorities include:
- LeastRequestedPriority: Prefers nodes with fewer allocated resources
- ImageLocalityPriority: Prefers nodes that already have the pod's images
- InterPodAffinityPriority: Considers pod affinity and anti-affinity rules
The scheduler selects the node with the highest total priority score. If multiple nodes have the same score, it picks one arbitrarily.
Understanding Node Affinity
Node affinity is a Kubernetes feature that allows you to influence which nodes a pod is scheduled on. It's similar to node selectors but more powerful, offering both required and preferred rules.
Node Selector vs Node Affinity
Node selectors are simple key-value pairs that require exact matches. If a pod has a node selector, the pod can only be scheduled on nodes that have all the specified labels.
Node affinity, on the other hand, supports operators like In, NotIn, Exists, DoesNotExist, Gt, and Lt, giving you much more control over pod placement.
Required Node Affinity
Required node affinity rules are similar to node selectors but more flexible. If a pod specifies required affinity, the scheduler will only consider nodes that satisfy all the affinity rules.
In this example, the pod will only be scheduled on nodes that have both disktype=ssd and hardware=gpu labels. If no such node exists, the pod will remain in Pending state until one becomes available.
Node Selector Terms
Each nodeSelectorTerms list is an OR condition. If you have multiple nodeSelectorTerms, the pod can be scheduled on a node that matches any one of them. Within a nodeSelectorTerm, matchExpressions are AND conditions.
This pod can be scheduled on any node in zones us-east-1a or us-east-1b with instance type large.
Preferred Node Affinity
Preferred node affinity rules are soft constraints. The scheduler will try to satisfy them, but if no node meets all required affinity rules, the pod can still be scheduled on nodes that don't meet the preferred rules.
This pod prefers nodes with the cache label (weight 100) and nodes with the dedicated=redis label (weight 50). The scheduler will assign higher priority to nodes matching the first preference.
Weight Values
Weights range from 1 to 100. The scheduler calculates a total score for each node by summing the weights of all matching preferences. A node matching the cache label gets 100 points, while a node matching both cache and dedicated=redis gets 150 points.
Practical Example: Database Pod Scheduling
Let's walk through a real-world scenario where node affinity is essential. You're deploying a PostgreSQL database that requires:
- High memory (at least 8GB)
- SSD storage (not HDD)
- Placement on dedicated database nodes
This configuration ensures the database runs on nodes with SSD storage and database role, while also preferring nodes with more than 16GB of memory.
Node Affinity vs Pod Affinity
It's important to distinguish between node affinity and pod affinity. Node affinity controls which nodes a pod can run on, while pod affinity controls which pods should be co-located on the same node.
Pod Affinity Example
This pod will only be scheduled on nodes that already have a pod with app=database label. This is useful for keeping related workloads close together.
When to Use Which
Use node affinity when you need to control node characteristics (hardware, location, labels). Use pod affinity when you need to control pod placement relative to other pods (database and application co-location).
Common Scheduling Patterns
1. Dedicated Nodes for Critical Workloads
Create dedicated nodes for high-priority applications:
2. GPU-Accelerated Workloads
Schedule GPU pods on nodes with dedicated hardware:
3. Geographic Distribution
Schedule pods in specific availability zones:
Troubleshooting Scheduling Issues
Pod Stays in Pending State
If a pod remains in Pending state, check the events:
Look for messages like:
0/3 nodes are available: 3 Insufficient cpu.0/3 nodes are available: 3 node(s) didn't match node affinity.0/3 nodes are available: 3 node(s) had taint {key: value}, that the pod didn't tolerate.
Insufficient Resources
If the pod can't find a node with enough resources, check node capacity:
Consider increasing node sizes or adding more nodes to your cluster.
Node Affinity Not Working
Verify that nodes have the required labels:
If labels are missing, you can add them using node labels:
Best Practices
1. Use Required Affinity for Critical Constraints
For requirements that must be met (like hardware specifications), use requiredDuringSchedulingIgnoredDuringExecution. This prevents pods from being scheduled on incompatible nodes.
2. Prefer Lightweight Constraints
Avoid overly specific affinity rules that reduce node availability. If you need 8GB memory but 16GB is available, prefer Gt over exact matches.
3. Combine Affinity Rules
Use both required and preferred affinity to create flexible scheduling policies. Required rules ensure compatibility, while preferred rules optimize placement.
4. Document Your Scheduling Strategy
Document the labels and affinity rules you use so other team members understand pod placement decisions.
5. Monitor Scheduling Performance
Use metrics like pod scheduling duration and node utilization to evaluate if your scheduling strategy is effective.
Conclusion
Kubernetes scheduling and node affinity give you fine-grained control over pod placement, enabling you to optimize resource utilization, improve performance, and ensure your applications run on appropriate hardware. By understanding predicates, priorities, and affinity rules, you can design scheduling strategies that meet your application's specific requirements.
The key takeaways are:
- Use node selectors for simple exact matches
- Use node affinity for flexible, powerful pod placement rules
- Combine required and preferred affinity for optimal scheduling
- Monitor pod status and events to troubleshoot scheduling issues
As you scale your Kubernetes deployments, these concepts become increasingly important. Platforms like ServerlessBase simplify deployment management and can help you implement these scheduling strategies more easily by providing intuitive interfaces for configuring node labels and pod placement rules.
Next Steps
Now that you understand Kubernetes scheduling and node affinity, consider exploring related concepts:
- Pod Disruption Budgets: Ensure high availability during node maintenance
- Taints and Tolerations: Control pod placement on tainted nodes
- Cluster Autoscaling: Automatically add nodes based on resource requests
- Horizontal Pod Autoscaler: Scale pods based on CPU, memory, or custom metrics
Experiment with these features in your development cluster to build a deeper understanding of Kubernetes scheduling mechanics.