Ceph architecture is a comprehensive data storage solution that aims to provide completely distributed operations, hosted on-premises, without a single point of failure when property deployed. The Ceph open source platform delivers file block, and object storage all in one unified software stack.
Ceph architecture enables organizations to decouple data from physical hardware storage and build highly scalable and resilient data platforms. Its scalability and stability have made it a favorite for users spanning a broad range of industries, from academia to telecommunications. Ceph is highly compatible with Kubernetes, OpenStack, libvirt and KVM, making it an excellent solution for companies operating petabyte-scale storage that requires resilience and flexibility.
Wondering what Ceph architecture looks like and how it works? Keep reading to learn how it can make a difference in your organization and the benefits of deploying it on-prem.
How Does Ceph Work?
When the Ceph client reads or writes data (via its file, block or object interfaces), there are a number of logical layers to determine exactly which host in a cluster data will be stored upon or replicated to. This involves Ceph pools, CRUSH rulesets and placement groups. Let’s examine how these components work together:
Pools are logical dynamic partitions that are created for specific storage types, such as block devices, user groups or object gateways. A given pool has a policy applied to express intents such as the number of replicas that should be kept or where data can be placed across the cluster based upon the application’s requirements and/or cost model, among other things.
Placement groups are a subset of a pool. They are a shard of a pool that is made up from Ceph OSD daemons in a cluster relationship. Placement groups are less granular than objects in pools and ultimately help with balancing of data.
CRUSH rulesets are what allows Ceph to operate in a distributed nature: algorithms that govern how data is placed throughout a cluster. It is these rules that provide Ceph scalability and flexibility across large compute/storage systems.
Ceph Architecture Components
At the host layer, Ceph runs on commodity server hardware and networking gear in the datacenter. A Ceph cluster is comprised of the following key software components:
- Ceph monitors
- Ceph managers
- Ceph OSDs
- Ceph metadata servers (when deploying Ceph file system)
These components are collectively responsible for the storage cluster itself, exposing endpoints (i.e. file, block and object targets), replicating data across storage devices and processing metadata. This design keeps Ceph flexible and scalable for organizations using it for different types of use cases.
Let’s take a deeper dive into the components and their basic responsibilities:
Ceph Monitor
A Ceph monitor manages the cluster state, including monitor, manager, object storage daemon (OSD), metadata server (MDS) and CRUSH maps. At least three monitors are recommended for high availability and redundancy.
In addition, the maps mentioned above are considered critical-cluster states needed for Ceph daemons that later coordinate among each other. These monitors also manage authentication between clients and daemons.
Ceph Manager
Ceph manager daemons expose Ceph cluster information that includes the REST API and web-based Ceph dashboard. Furthermore, it reliably keeps track of the current Ceph cluster state and runtime metrics, including storage utilization, system loads and current performance metrics.
Ceph OSD
Ceph OSDs are the beating heart of a Ceph cluster. At least three OSDs are needed for maintaining data redundancy and high availability, although it is not uncommon for large installations to utilize hundreds or even thousands of OSDs. A Ceph OSD stores, handles, recovers and rebalances data by continually checking Ceph managers and monitors for heartbeats.
Ceph Metadata Server (MDS)
Ceph MDS stores metadata for applications using Ceph file systems.
When to Consider Using Ceph On-Premises
With Ceph architecture, administrators have a single, consolidated system that avoids silos and collects the storage within a common management framework. Ceph consolidates several storage use cases and improves resource utilization, letting organizations deploy servers where needed.
By using Ceph on-premises, organizations large and small can secure their critical data or IT infrastructure without prohibitive costs. Doing so can greatly reduce administration time and free up other resources because once deployed, the system is both self-healing and self-managing.
An on-premises Ceph cluster can help organizations carefully manage storage, provide flexibility and manage costs. Platina Systems natively handles Ceph configuration and maintenance in a single pane of glass, further simplifying the deployment and operations of petabyte scale storage. To see how Platina and Ceph can help you, request a demo today.