Member-only story
How to Implement Your Distributed Filesystem With GlusterFS And Kubernetes
Learn the advantages of using GlusterFS and how can it help in achieving a highly-scalable, distributed filesystem
Introduction
For anyone who has encountered any kind of container orchestration platform, most commonly Kubernetes, it’s generally known that managing storage can be a real pain. Not because of the complexity or the number of underlying components, but because of the dynamic nature of such a platform’s architecture.
While operating inside a distributed cluster, Kubernetes can manipulate containers and pods in all sorts of ways — create them, destroy, replicate, autoscale, etc. And as a maintainer, you often won’t know which physical (or virtual) machine they will be placed on. Therefore, to access persistent storage, some kind of external data provisioning services must be used.
Kubernetes has the concept of persistent volume, which can be implemented in numerous ways. Now one of the simplest and commonly used is of course NFS. All you need is to install the Helm chart and use the created StorageClass in any volume claim. But despite being so straightforward and easy to get started with, such practice is, unfortunately, not as effective and flexible as it may sometimes be needed.
The crucial limitation I faced was the inability to provide shared storage to pods in different namespaces. And even though I could neglect logical separation and combine namespaces, this issue in combination with poor scaling efficiency was the decisive factor in finding a replacement.
When searching for a distributed or clustered filesystem, the choice eventually comes down to a few options. Today, the outstanding ones are, of course, GlusterFS and CephFS. However, I don’t recommend making a choice basing only on this article. Do your own research. Because besides these ones, plenty of other options such as GFS 2, Lustre, MinIO, MooseFS, and more are also available. I also recommend checking out Rook which isn’t actually a DFS by itself, but rather the cloud-native storage orchestrator designed to extend Kubernetes storage management potential.