Share the love

As time has passed, AKS has incorporated mechanisms to cover necessary scenarios in corporate environments. This post shows a list and some examples of those features.

Private clusters

For some time now it has been possible to have worker nodes in their own private vnet, which prevents outside access to those nodes. The master nodes have never been accessible, but the API Server is. If there is public access to the Api Server: anyone who knows the URL of the Api Server will be able to access to our cluster. Of course, the access is secured by certificate, but since it is accessible there opens possible attack vectors, as well as the problem that we run into if a configuration file of kubectl is “found” by an unauthorized person.

To avoid that problem, AKS has two options:

  1. Filter the IPs that can access the Api Server. The Api Server is still public, but a firewall prevents access to all IPs that are not in a list. It is similar to the model followed by Azure SQL Server. See the AKS documentation for more information.
  2. Use a private cluster. In this case the Api Server obtains an IP within a private vnet. The scenario in this case is that the master nodes are still outside the customer’s Azure subscription (they still do not have access or pay for them), but an Azure private link is created between the private network of the API Server and the vnet of the client where the worker nodes are present.

The second option is more secure since the traffic never leaves the internal network. It either requires users to use a VM that has access to private IP of the API Server or a configured VPN connection. With the first option, they can use their own machine directly as long as their IP is on the list of allowed IPs.

Identity with AAD

Kubernetes has never had an identity management mechanism. Instead, administrators are expected to connect to some external system that can obtain those identities. These identities can then be referenced using RBAC and thus set permissions to users or groups of users.

In AKS, Azure Active Directory can be used as an identity management mechanism. Thus, users must authenticate to AAD before they can access the cluster. Once authenticated with AAD, the RBAC rules defined in the cluster will be applied, so that each user will only have access to those resources that an administrator has specified.

If you share a cluster between several environments and / or computers, it is almost mandatory to establish RBAC rules to prevent a developer from project X from being able to access resources from project Y, thus, minimizing the impact of errors. The problem is that, without an assigned identity check, it is very difficult for each user to obtain their settings. It could be achieved through the use of service accounts, but that implies a very high level of management and centralized creation and distribution of kubeconfig. In these contexts, we need an identity server, and using AAD is the best option in AKS. The first thing we need is, obviously, to be a user with permissions in AAD for administration tasks: create users, create groups and assign users to groups. To create a cluster that integrates with AAD, just use two parameters in the command az AKS create:

--enable-aad #To enable integration with AAD
--aad-admin-group-object-ids <object-id-of-admins-group> #To establish an AAD group whose users are cluster admins

Thus, if you want administrative access to the cluster, you must add your user to the administrators group and in this way you will be able to access all its resources. By default, other users will not be able to access anything, so you must give them explicit permissions.

Those explicit permissions can be given to them using Kubernetes RBAC. These RBAC rules can be applied to specific users (using kind: User in RoleBinding.spec.subjects) or to groups (using kind: Group).

Using Azure RBAC

Use Azure RBAC (not Kubernetes) to manage the permissions of the different users / groups that access the cluster. Currently that must be enabled when creating the cluster, using –enable-azure-rbac. Then we use az role assignment to assign a role to an AAD entity:

az role assignment create --role "Azure Kubernetes Service RBAC Admin" --assignee <AAD-ENTITY-ID> --scope <ID_AKS>/namespace/<namesapce>  # <ID_AKS> can be used only if permission is global
  1. Azure Kubernetes Service RBAC Viewer: Read-only for most objects in a namespace
  2. Azure Kubernetes Service RBAC Writer: As above but with read/write
  3. Azure Kubernetes Service RBAC Admin: Administrative privileges on a namespace or on the whole cluster
  4. Azure Kubernetes Service RBAC Cluster Admin: Super administrative access to the cluster (all namespaces).

We can create additional roles, using az role definition create using a JSON file with the permission definitions that the role can do. The advantage of using Azure RBAC instead of Kubernetes’ own RBAC is that all user and security management is done in Azure.

Pod Identity

It’s one thing for cluster users to have AAD-managed identity, but it’s also possible for the pods themselves to get it. That is, a pod can run with the permissions of an AAD identity.

Once we have the identities in Azure created, we must link them to the different pods. For this we use two CRDs:

  1. AzureIdentity: Represents an Azure identity within the cluster. Basically, for each Azure identity that we want to use, we will create one of those objects. The field spec.resourceID contains the ID of the identity in Azure, while the field spec.clientID contains the clientid of the identity. Finally there is a field (spec.type) that indicates the type of identity (managed identity or main service).
  2. AzureIdentityBinding: Represents the link between one AzureIdentity and a set of pods. Through that object we define a selector and those pods that fulfill it will use the identity defined in the AzureIdentity referenced by that object. The field spec.azureIdentity is the name (metadata.name) of the AzureIdentity referenced object , while spec.selector is the value that acts as a selector.

Finally, for a pod to use a specific identity, it must have a tag called aadpodidbinding whose value must be the one defined in spec.selector object’s field AzureIdentityBinding. Using pod identity, our pods can automatically gain access to certain Azure resources, without the need to save passwords or tokens as secrets. The most typical case is to access Key Vault from a pod.

Multi-tenant

If you deploy several applications in your cluster, you may be interested in having them isolated in a multitenant model. A multi-tenant approach in Kubernetes (AKS in this case) must be approached from several points:

  1. Scheduling (assigning pods to nodes): This is important to ensure that pods are based on what tenants are running on specific nodes.
  2. Network security rules: Prevent pods of different tenants from seeing each other
  3. Authentication and authorization
  4. Container security.

Scheduling

Kubernetes provides many mechanisms to suggest, or directly indicate, on which node or type of nodes a pod should run :

  • Assign a type of nodes to a type of pods, to ensure that only those pods will run on those nodes. That requires the use of taints and tolerations working together.
  • Force a pod to run on a specific type of node. This allows pods with certain requirements (eg high memory requests) to run on nodes with large amounts of memory. Nodeselectors are often used for this.
  • Ensure that certain pods are never run together (or the opposite is preferred). For this, affinities (node or between pods ) are usually used.
  • Ensure that there is always a percentage of available pods of a certain workload – PDBs (pod disruption budgets) are used for this. Those PDBs apply if a “disruption (pod removal)” event (whether voluntary or involuntary) affects pods.

Networking

Use networking policies to prevent pods from different teams / namespaces / projects from sending traffic between them. Also, to restrict the inbound or outbound traffic of the pods. To support those network policies in AKS you can use either Azure policies or Calico. There are some differences between them, for example the first requires using Azure CNI (AKS advanced networking) while the second supports both Azure CNI and kubenet (basic networking). In any case, when creating the cluster, you can choose which option you prefer.

Other options to consider are the cluster exit IPs (egress IPs) that allow you to specify under which IPs the pod traffic will go to the outside. This is important when you integrate with third-party services that may require registering those IPs in some type of list.

Authentication and authorization

These are the points we’ve seen before:

  • Use AAD for authentication
  • Use Kubernetes RBAC for authorization
  • Use Azure RBAC for authorization
  • Use pod identity for pod authorization

Container security

This includes using pod security contexts , using rootless images , as well as defining PSPs (pod security policies) that must be enforced for pods, so that they are accepted.

In AKS you can also use the Azure Policy Addon.

Cluster separation (logical vs physical)

Many organizations create “too many” AKS clusters, when AKS offers many tools to allow separating a physical cluster into several “logical clusters”:

  • Logical isolation: Use namespaces to divide between environments (dev, integration, QA, etc.) and also between teams or projects. The use of RBAC allows giving users and pods themselves, only access to those resources / namespaces required. In turn, the use of networking policies allows restricting network traffic between pods of different owners / environments.
  • Nodepools: Nodepools allow having different nodes depending on the need of each project. Thus, a project using ML and CUDA can run on nodes with a powerful GPU, leaving the rest of the nodes for other projects that require a different H/W. In turn, this allows mixing Windows and Linux workloads under the same cluster.

The advantage of logical separation of clusters is that they allow you to increase the density of pods per node, which generally results in lower costs. The downside is that, despite the security mechanisms applied, the cluster itself is the true unit of isolation. For a real level of isolation between tenants, you have to go to a physical separation of clusters with what that entails: lower density of pods per node and at the same time greater operational complexity.

Conclusion

In short, AKS has many options for “corporate” clients, this post only outlines the most important ones. I hope it is useful for you.