Deploying PostgreSQL as a StatefulSet in Kubernetes (2024)

There are different types of applications, from single large applications to microservices-based applications that cater to different needs. When it comes to the states of those applications, there are two states:

  • Stateless applications can be run independently in isolation without any knowledge of past transactions.
  • Stateful applications have full knowledge of the past information (state).

Most applications we use are stateful applications, and their state data may consist of user preferences, recent activity, database transactions, credentials, settings, etc.

Kubernetes provides StatefulSets when creating a stateful application in a Kubernetes cluster. Managing states within a containerized environment has become even more significant with the popularity of deploying database clusters in Kubernetes.

In this article, we will focus on how to deploy a PostgreSQL database on a Kubernetes cluster using StatefulSets.

(This article is part of our Kubernetes Guide. Use the right-hand menu to navigate.)

What is Kubernetes StatefulSets?

StatefulSet is a Kubernetes workload API object that can be used to manage stateful applications.

In a normal deployment, the user does not care how the pods are scheduled as long as it doesn’t have a negative impact on the deployed application. However, there is the need to properly identify pods to preserve the state in stateful applications with persistent storage.

StatefulSet provides this functionality by creating pods with a persistent identifier that will pertain to its value across rescheduling. This way, a pod will get correctly mapped to the storage volumes even if it is recreated, and the application state will be preserved.

StatefulSets use cases

There are several use cases for StatefulSets.

Deploying PostgreSQL as a StatefulSet in Kubernetes (1)

Ordered deploying & scaling

When an application relies on multiple containers, the ordered approach to scaling ensures that dependent containers are created in an orderly manner at deployments and scaling scenarios.

Ordered automated rolling updates

Updating applications or microservices that are dependent should also be updated in an orderly manner. Additionally, an update should not affect the functionality.

Therefore, users can decide the order in which the applications or microservices should be updated by using a StatefulSet.

Mapping persistent storage

When considering databases, persistent storage is the most critical part as applications need to store data. With a StatefulSet, users can:

  1. Define which pods correspond to each persistent storage
  2. Create resilient application deployments

Using unique network identifiers to create persistent network connectivity

With unique identifiers, network users can manage and route traffic for specific pods without worrying about IP changes at rescheduling. This provides greater control over the network communications between pods by providing the ability to configure persistent routing, policies, and security configs for desired pods.

Even with these benefits, StatefulSets do not provide a solution for all requirements. For instance, StatefulSets are not interchangeable with deployments or ReplicaSets—these are instead geared to stateless configurations.

Drawbacks of StatefulSets

StatefulSets also come with a set of limitations that users should be aware of before deploying the application.

  • The storage for a StatefulSet must be provisioned either by a PersistentVolume Provisioner based on the storage class or pre-provisioned.
  • Scaling or deleting pods will not affect the underlying persistent storage in order to ensure data safety. The provisioned volumes will remain within Kubernetes.
  • The user needs to create a headless service manually to ensure network identity in StatefulSets.
  • SatefulSets does not guarantee the termination of current pods when the StatefulSet is deleted. So, best practice is that the user implement an SOP to scale the StatefulSet to zero pods before deleting.
  • Rolling updates with the default pod management policy may cause issues when deploying if a pod is broken (due to an application config error, bad binary, etc.). In such instances, users need to:
    • Manually revert to a previous deployment template.
    • Delete the broken Pods before attempting to rerun the new updates.

Setting up a StatefulSet in a Kubernetes cluster

Now that we have a basic understanding of a StatefulSet, let’s look at a sample StatefulSet deployment.

StatefulSets are ideal for database deployments. In this example, we will create a PostgreSQL deployment as a StatefulSet with a persistent storage volume.

postgresql.yaml

# PostgreSQL StatefulSetapiVersion: apps/v1kind: StatefulSetmetadata:name: postgresql-dbspec:serviceName: postgresql-db-serviceselector:matchLabels:app: postgresql-dbreplicas: 2template:metadata:labels:app: postgresql-dbspec:containers:- name: postgresql-dbimage: postgres:latestvolumeMounts:- name: postgresql-db-diskmountPath: /dataenv:- name: POSTGRES_PASSWORDvalue: testpassword- name: PGDATAvalue: /data/pgdata# Volume ClaimvolumeClaimTemplates:- metadata:name: postgresql-db-diskspec:accessModes: ["ReadWriteOnce"]resources:requests:storage: 25Gi

In the above YAML file, we have defined a simple StatefulSet to deploy a PostgreSQL database. We are creating a StatefulSet called postgresql-db with two pods (replicas: 2).

Additionally, we are creating a Persistent Volume using the volumeClaimTemplate and using it in the StatefulSet to store the PostgreSQL data. The default Persistent Volume provisioner will provision the volume, and we can deploy this by running the following command.

kubectl apply -f postgresql.yaml

Result:

Deploying PostgreSQL as a StatefulSet in Kubernetes (2)

Now we have successfully created a PostgreSQL StatefulSet yet need a service to expose it outside of the Kubernetes cluster. That can be done by creating a service that points to the StatefulSet.

postgresql-service.yaml

# PostgreSQL StatefulSet ServiceapiVersion: v1kind: Servicemetadata:name: postgres-db-lbspec:selector:app: postgresql-dbtype: LoadBalancerports:- port: 5432targetPort: 5432

This will create a load balancer, and the service will expose our PostgreSQL database using the selector “app: postgresql-db.” You can create the service by running the below command.

kubectl apply -f postgresql-service.yaml

Result:
Deploying PostgreSQL as a StatefulSet in Kubernetes (3)

Now let’s see if both the StatefulSet and the service are successfully created in Kubernetes by running the following command.

kubectl get all

Result:

Deploying PostgreSQL as a StatefulSet in Kubernetes (4)

The result indicates that two pods are created and running with a load balancer service exposing the StatefulSet via IP 10.111.253.4 using port 5432.

Testing the connectivity

Having deployed PostgreSQL, we need to verify that we can access it without any issues. We will use the pgAdmin4 client to initialize the connection:

  1. Download the pgAdmin client in your environment.
  2. Connect and try to initialize a connection.

With the above deployment, we will use the external IP of the postgres-db-lb service (10.111.253.4) with port 5432. Since we only defined a password in our environment variables for the PostgreSQL StatefulSet, the configuration will have the default username “postgres” with the password we defined.

Deploying PostgreSQL as a StatefulSet in Kubernetes (5)

If all the details are correct, the connection will be initiated when you click on “Save,” and the user will be able to see the connected database.

Deploying PostgreSQL as a StatefulSet in Kubernetes (6)

Using ConfigMap in StatefulSet

In our earlier example, we defined the environment variables with the StatefulSet YAML.

However, the best practice would be to separate the environment variables using ConfigMaps and call the ConfigMap from the StatefulSet deployment. This makes it easier to manage and maintain each component of the deployment.

So, let’s create a ConfigMap and modify the StatefulSet YAML as shown below.

postgresql-configmap.yaml

# PostgreSQL StatefulSet ConfigMapapiVersion: v1kind: ConfigMapmetadata:name: postgres-db-configlabels:app: postgresql-dbdata:POSTGRES_DB: testdbPOSTGRES_USER: testdbuserPOSTGRES_PASSWORD: testdbuserpasswordPGDATA: /data/pgdata

In the above ConfigMap, we have extended our environment variable to specify a PostgreSQL database, user, password, and data store. Now let’s create the configMap and view the configurations using this command:

kubectl apply -f .postgresql-configmap.yaml

Result:

Deploying PostgreSQL as a StatefulSet in Kubernetes (7)

We can get the information of the created ConfigMap using the describe function:

kubectl describe configmap postgres-db-config

Result:

Deploying PostgreSQL as a StatefulSet in Kubernetes (8)

The next step is to modify the StatefulSet to call the data from the ConfigMap. That can be done by using the envFrom field to point to the above ConfigMap. This will also enable us to create a StatefulSet using the data in the ConfigMap.

postgresql.yaml

# PostgreSQL StatefulSet - ConfigMapapiVersion: apps/v1kind: StatefulSetmetadata:name: postgresql-dbspec:serviceName: postgresql-db-serviceselector:matchLabels:app: postgresql-dbreplicas: 2template:metadata:labels:app: postgresql-dbspec:containers:- name: postgresql-dbimage: postgres:latestvolumeMounts:- name: postgresql-db-diskmountPath: /data# Config from ConfigMapenvFrom:- configMapRef:name: postgres-db-config# Volume ClaimvolumeClaimTemplates:- metadata:name: postgresql-db-diskspec:accessModes: ["ReadWriteOnce"]resources:requests:storage: 25Gi

With our Persistent Volumes, deletions in the underlying database will be preserved even in case of Pod errors. As a StatefulSet, the state of Pods will also be preserved, and they will get assigned to the desired volumes correctly when recreated.

We can identify the pods using the following command:

kubectl get pvc

Result:

Deploying PostgreSQL as a StatefulSet in Kubernetes (9)

Managing state is crucial to app functionality

Kubernetes StatefulSets allows users to easily create and manage stateful applications or services within a Kubernetes cluster. However, these StatefulSets configurations involve some complexity—so you must carefully plan your deployments before them carrying out.

Additionally, StatefulSets are the ideal solution for dealing with database applications, payment services, etc., where managing state is a crucial part of the application functionality.

Related reading

  • BMC DevOps Blog
  • Kubernetes Best Practices for Enhanced Cluster Efficiency
  • 3 Kubernetes Patterns for Cloud Native Applications
  • The State of Containers Today
  • Containerized Machine Learning: An Intro to ML in Containers

These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

See an error or have a suggestion? Please let us know by emailing blogs@bmc.com.

Kubernetes

Deploying PostgreSQL as a StatefulSet in Kubernetes (2024)

FAQs

How to deploy PostgreSQL database on Kubernetes? ›

How use PostgreSQL on Kubernetes?
  1. Step 1 - Create a PostgreSQL Container Image with Docker. Create a file name “Dockerfile” for PostgreSQL. ...
  2. Step 2 - Build PostgreSQL Docker Image. ...
  3. Step 3 - Create a Storage Volume (Using GlusterFS) ...
  4. Step 4 - Deploy PostgreSQL with Docker on Kubernetes.
Jul 17, 2022

Should I deploy PostgreSQL on Kubernetes? ›

Running the PostgreSQL database in Kubernetes allows you to quickly scale on Kubernetes by adding more replicas (copies of the database) as needed. This allows scaling database usage-based traffic to ensure high availability. Using Kubernetes to deploy PostgreSQL makes it easy to manage the lifecycle of your database.

Is it safe to run Postgres on Kubernetes? ›

PostgreSQL should indeed run inside Kubernetes; alongside applications, not outside—provided that your organization has already adopted Kubernetes and possesses both Kubernetes and PostgreSQL skills. In terms of assistance, it is fine to rely on third-party professional support for either or both of them.

Why use StatefulSet instead of Deployment? ›

Unlike a Deployment, a StatefulSet maintains a sticky identity for each of its Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.

What is the difference between StatefulSet and deployment? ›

Kubernetes Deployment and Kubernetes StatefulSet are two powerful resources for managing containerized applications on Kubernetes. Deployments are useful for managing stateless applications, while StatefulSets are useful for managing stateful applications that require stable network identities and persistent storage.

How to deploy PostgreSQL for high availability? ›

How to deploy PostgreSQL along with Monitoring and high availability tools on VMware vSphere using Terraform
  1. Clone the GitHub Repository.
  2. Create a Postgres cluster (3-node—1 Master, 2 Standby)
  3. Set up EFM agents for each database node that will be responsible for failover management.
Jan 19, 2023

Is it a good idea to run database on Kubernetes? ›

If you are running on-prem or in a private cloud, Kubernetes in general is one of the best options since it unifies workload management and monitoring. In this case, for databases, Kubernetes will be the best choice if the database has an operator and is Kubernetes friendly.

Does airflow need PostgreSQL? ›

If you want to take a real test drive of Airflow, you should consider setting up a database backend to PostgreSQL, MySQL, or MSSQL. By default, Airflow uses SQLite, which is intended for development purposes only.

How to install PostgreSQL to Kubernetes cluster? ›

Helm gives you a quick and easy way to deploy a PostgreSQL instance on your cluster.
  1. Step 1: Add Helm Repository. ...
  2. Step 2: Create and Apply Persistent Storage Volume. ...
  3. Step 3: Create and Apply Persistent Volume Claim. ...
  4. Step 4: Install Helm Chart. ...
  5. Step 5: Connect to PostgreSQL Client.
Aug 26, 2021

What are the limitations of postgres? ›

PostgreSQL limitations
  • No horizontal scaling.
  • Lack of NoSQL Features.
  • Complex to manage.
  • Not great for analytics workloads.
  • Wrap up.
Feb 8, 2023

Is postgres still relevant? ›

And yet through it all, PostgreSQL continues to grow. DB-Engines data shows PostgreSQL is growing significantly faster than would-be rivals MongoDB and Redis. PostgreSQL isn't just a relational database. PostgreSQL invented many post-relational concepts like in-database functions and user defined datatypes.

When should I use StatefulSet? ›

StatefulSets are typically used for applications that require persistent storage for stateful workloads, and ordered, automated rolling updates.

Why is stateful better than stateless? ›

The key difference between stateful and stateless applications is that stateless applications don't “store” data whereas stateful applications require backing storage. Stateful applications like the Cassandra, MongoDB and mySQL databases all require some type of persistent storage that will survive service restarts.

What is difference between DaemonSet and StatefulSet? ›

Statefulsets is used for Stateful applications, each replica of the pod will have its own state, and will be using its own Volume. DaemonSet is a controller similar to ReplicaSet that ensures that the pod runs on all the nodes of the cluster.

Why do we use headless service in StatefulSet? ›

Advantages of Headless Services

Direct access to each pod. Easy Pod discovery in the StatefulSet. Pods can be addressed more generally by using their DNS names. Utilizes each pod's sticky identity in a stateful service (i.e. you can address a specific pod by name).

What are the use cases for StatefulSet? ›

Ans: StatefulSet can be used in the following cases: If you want the Redis pod to maintain access to the same volume, it is accessing before. Even if it is redeployed or restarted, you can use StatefulSet. If you have a Cassandra cluster that has its node to maintain access to its data.

Which is an example for StatefulSet application? ›

All databases, such as MySQL, Oracle, and PostgreSQL, are examples of stateful applications. Stateless applications, on the other hand, do not keep the data. Node. js and Nginx are examples of stateless applications.

How to deploy PostgreSQL database in cloud? ›

Deploying a Cloud SQL for PostgreSQL instance
  1. In the Google Cloud console, go to the SQL>Instances page. Go to Instances.
  2. Click Create instance, and then click Choose PostgreSQL.
  3. In the Create a PostgreSQL instance page, provide the following details: ...
  4. Click Create to deploy the PostgreSQL instance.

What is the best practice for creating a highly available PostgreSQL? ›

The first step towards achieving high availability is making sure you don't rely on a single database server: your data should be replicated to at least one standby replica/slave. Database replication can be done using the two options available with PostgreSQL community software: Streaming replication.

How to load data faster in PostgreSQL? ›

Improve PostgreSQL performance
  1. Use indexes in moderation. ...
  2. Reconsider foreign key constraints. ...
  3. Avoid unnecessary UNIQUE keys. ...
  4. Use separate disks for WAL and data. ...
  5. Use performant disks. ...
  6. Use parallel writes. ...
  7. Insert rows in batches. ...
  8. Properly configure shared_buffers.
Jun 29, 2020

How to deploy PostgreSQL database in Docker? ›

What are the 3 Steps to set up Docker PostgreSQL Environment
  1. Step 1: Download and Install Docker. Image Source: Self. ...
  2. Step 2: Download the Docker PostgreSQL Image. There are two different ways to download the Docker PostgreSQL Image that allows you to set up PostgreSQL on Docker. ...
  3. Step 3: Install PGAdmin on Docker.
Feb 7, 2022

Which is the best tool to migrate SQL server to PostgreSQL? ›

7. Pgloader. Pgloader is a free migration tool under a PostgreSQL license that enables users to load data from files into PostgreSQL or migrate the whole database to the target. The newest version is Pgloader 3.6.

Where can I deploy PostgreSQL database for free? ›

4 Best Free PostgreSQL Database Hosting
  • Heroku.
  • ElephantSQL.
  • HelioHost.
  • Amazon RDS.
Apr 26, 2023

What is the best database for Kubernetes? ›

Choosing a Database to Complement Kubernetes
  • Cassandra.
  • PostgreSQL.
  • MongoDB.
  • co*ckroachDB.
Mar 4, 2022

Why not use Kubernetes for database? ›

You should not use Kubernetes if:

If you have a small engineering team and do not have much need for scalability. If your application is relatively simple, does not require high availability, and is not performance intensive application. If your application is a traditional monolith application not need an upgrade.

What not to use Kubernetes for? ›

5 Reasons You Should NOT Use Kubernetes
  • If you don't need high availability. Kubernetes was created to solve a particular set of problems. ...
  • If your app is monolithic. What is meant by 'monolithic'? ...
  • The cost of the learning curve. ...
  • The cost … in general. ...
  • Its sheer complexity. ...
  • Simplifying Kubernetes.
Nov 1, 2021

How to connect PostgreSQL with Airflow? ›

How to Connect to Apache Airflow using Postgres Operator Airflow?
  1. Step 1: Import Airflow Modules.
  2. Step 2: Configure Default Airflow Postgres Operator Arguments.
  3. Step 3: Instantiate your Airflow DAG.
  4. Step 4: Set up Airflow Task using the Postgres Operator.
  5. Step 5: Configure Dependencies for Airflow Operators.
Feb 2, 2022

When should I use PostgreSQL? ›

In general, PostgreSQL is best suited for systems that require execution of complex queries, or data warehousing and data analysis. MySQL is the first choice for those web-based projects which require a database merely for data transactions and not anything intricate.

Which database is best for Airflow? ›

Airflow uses SQLAlchemy and Object Relational Mapping (ORM) in Python to connect with the metadata database from the application layer. Any database supported by SQLAlchemy can theoretically be configured to host Airflow's metadata.
...
The most common databases used are:
  • Postgres.
  • MySQL.
  • MSSQL.
  • SQLite.

How to connect to Postgres database in cluster? ›

How to Connect to PostgreSQL Database Clusters
  1. Retrieve Database Connection Details Using the CLI.
  2. Retrieve Database Connection Details Using the API.
  3. View PostgreSQL Cluster Connection Details.
  4. Download the SSL Certificate.
  5. Connect to the Database.

How to connect shell to PostgreSQL? ›

On Windows, press Windows keys -> All apps -> PostgreSQL 14 -> click on SQL Shell (psql), as shown below. This will launch SQL Shell (psql) command-line interface, as shown below. First, you need to enter the PostgreSQL server name. if it is on your localhost then press Enter.

How to connect PostgreSQL database from cloud shell? ›

To use the Google Cloud CLI, cURL, or PowerShell, see Create instances.
  1. In the Google Cloud console, go to the Cloud SQL Instances page. Go to Cloud SQL Instances.
  2. Click Create Instance.
  3. Click Choose PostgreSQL.
  4. Enter myinstance for Instance ID.
  5. Enter a password for the postgres user.
  6. Click Create.

What is the greatest weakness of Postgres? ›

The advantages and disadvantages of PostgreSQL at a glance
AdvantagesDisadvantages
Open sourceNot available on all hosts by default
Highly expandableExpandable documentation only available in English
Largely compliant with SQL standardComparatively low reading speed
6 more rows
Sep 14, 2022

How many rows is too much for PostgreSQL? ›

There is no limit on the number of rows in a table but it is limited to available disk space and memory/swap space. If you are storing rows that exceed 2 KB aggregated data size, then the maximum number of rows may be limited to 4 billion or less.

Can Postgres handle millions of records? ›

If you're simply filtering the data and data fits in memory, Postgres is capable of parsing roughly 5-10 million rows per second (assuming some reasonable row size of say 100 bytes). If you're aggregating then you're at about 1-2 million rows per second.

Does NASA use PostgreSQL? ›

Federal agencies using Postgres include the Federal Aviation Administration (FAA), the National Aeronautics and Space Administration (NASA), the Department of Labor and multiple agencies throughout the Department of Defense (DoD).

What is better than Postgres? ›

Most developers will tell you that MySQL is better for websites and online transactions, while PostgreSQL is better for large and complicated analytical processes.

Do big companies use PostgreSQL? ›

Who uses PostgreSQL? 6240 companies reportedly use PostgreSQL in their tech stacks, including Uber, Netflix, and Instagram.

How do I deploy a database in Kubernetes? ›

There are three vital resources for deploying databases on Kubernetes:
  1. ConfigMaps store the application configuration; you can use them as files, environment variables, or command-line arguments in pods.
  2. Secrets are the Kubernetes resources for storing sensitive data like passwords, tokens, or keys.

How to host PostgreSQL database on Docker? ›

Create a Dockerfile and Docker Image
  1. Pull down the latest Postgres image from the Docker Hub.
  2. Set the environment variable for password to 'docker'
  3. Create a database, let's call it 'world'
  4. Use a sql dump file to create the table schema and populate it with data.
Jan 28, 2021

How to deploy PostgreSQL server? ›

  1. Go to >> Connection Tab.
  2. Enter your Elastic IP (e.g. 54.210.213.181) as the 'Host name/address'
  3. Enter 'Port' as 5432, 'User name' (e.g. postgres) and 'Password' which you have previously generated.
  4. Click on 'Save'
Jan 8, 2018

Is Kubernetes good for database? ›

Running database clusters in public, private and hybrid environments gives you multiple benefits. Kubernetes provides the additional advantages of portability, no vendor lock-in, DevOps friendliness, scalability and cost-effectiveness.

Should I use docker for PostgreSQL? ›

While the quick start-up and easy configuration of Docker is a boon for development and testing, it's generally not advised to run production databases in Docker. The primary reason is that Docker containers are great for running stateless applications.

How to connect PostgreSQL database in Airflow? ›

How to Connect to Apache Airflow using Postgres Operator Airflow?
  1. Step 1: Import Airflow Modules.
  2. Step 2: Configure Default Airflow Postgres Operator Arguments.
  3. Step 3: Instantiate your Airflow DAG.
  4. Step 4: Set up Airflow Task using the Postgres Operator.
  5. Step 5: Configure Dependencies for Airflow Operators.
Feb 2, 2022

How to access local Postgres database from docker container? ›

Steps to Connect PostgreSQL from docker container
  1. Get the local IP address , similar to 192.168.1.111 (use ipconfig in linux based system)
  2. Get inside the docker using docker exec -it <containerid> <entrypoint>
  3. Try the Ping Command Ping 192.168. ...
  4. Try the Telnet Command telnet 192.168.

How to run PostgreSQL and pgAdmin using docker? ›

Now we need to run pgAdmin in a Docker container by setting a full command calling the image “dpage/pgadmin4” with the following information:
  1. Environment variables: User/Email and Password.
  2. Port: 8080.
  3. Network: the same network that you set for PostgreSQL.
  4. Name for the container.
Mar 7, 2023

How do I deploy a PostgreSQL database for free? ›

You can use Amazon RDS to deploy and run a Postgres database. The Amazon RDS comes with automatic backups and point-in-time recovery.
...
The Amazon RDS free tier includes the following:
  1. 20GB storage.
  2. extra 20GB storage for backups and snapshots.
  3. 750 hours of DB usage.
Oct 13, 2022

What is serverless PostgreSQL? ›

PostgreSQL and Serverless are primarily classified as "Databases" and "Serverless / Task Processing" tools respectively. "Relational database" is the primary reason why developers consider PostgreSQL over the competitors, whereas "API integration " was stated as the key factor in picking Serverless.

What is the host name for PostgreSQL? ›

The PostgreSQL database service is available on localhost and the default PostgreSQL port is 5432 . A default user ( hosting-db ) and database ( postgres ) exist so you can quickly test your connection and perform management tasks.

Top Articles
Latest Posts
Article information

Author: Foster Heidenreich CPA

Last Updated:

Views: 6033

Rating: 4.6 / 5 (76 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Foster Heidenreich CPA

Birthday: 1995-01-14

Address: 55021 Usha Garden, North Larisa, DE 19209

Phone: +6812240846623

Job: Corporate Healthcare Strategist

Hobby: Singing, Listening to music, Rafting, LARPing, Gardening, Quilting, Rappelling

Introduction: My name is Foster Heidenreich CPA, I am a delightful, quaint, glorious, quaint, faithful, enchanting, fine person who loves writing and wants to share my knowledge and understanding with you.