Azure Database for PostgreSQL Part 2— Data Migration

Jonathan
6 min readJun 3, 2021

If you have not read through part 1 of this series, please check it from here.

Create Sample Data in Source PostgreSQL Server

If everything is set up correctly in Part 1, PostgreSQL server could be accessed without issues. Follow through this article to create a new PostgreSQL DB inside the server and insert sample data. The output should be similar to below. This PostgreSQL server is without CMK data encryption.

# login the postgresql server
- psql "sslmode=verify-full sslrootcert="./BaltimoreCyberTrustRoot.crt" host=jonwpostgresqlsrv2.postgres.database.azure.com port=5432 dbname=postgres user=jonw@jonwpostgresqlsrv2 password=xxxx"
# show all databases in the server
- \l
# change to the right database
- \c <database name>
# show all tables in the target database
- \dt
# show all the data inside the target table
- select * from <table name>;

Create the Same Table Schema in Destination PostgreSQL Server / Database

We would need to ensure the data coming from source PostgreSQL server/DB could sit somewhere in the destination PostgreSQL server/DB with the same table schema. This PostgreSQL server is with CMK data encryption.

# login the postgresql server
- psql "sslmode=verify-full sslrootcert="./BaltimoreCyberTrustRoot.crt" host=jonwpostgresqlsrv2.postgres.database.azure.com port=5432 dbname=postgres user=jonw@jonwpostgresqlsrv2 password=xxxx"
# create a new database
- create database <database name>;
# show all databases in the server
- \l
# change to the right database
- \c <database name>
# create a new table
- CREATE TABLE users(
id SERIAL PRIMARY KEY,
email VARCHAR(40) NOT NULL UNIQUE
);
# show all tables in the database
- \dt
# show data in the table
- SELECT * FROM users;

Create Azure Data Factory (ADF)

ADF serves as the middle man of the data migration process as it could link to different Azure services and handle data operations. If you would need step-by-step guidance on the service creation, please check here. After everything completes, the service should be shown as below.

Click on “Author & Monitor” to access into the main user interface.

Data Migration

When you read through the article above, you would notice within ADF, we would need to create

  • Source PostgreSQL server linked service, source dataset
  • Destination PostgreSQL server linked service, destination dataset
  • Copy data pipeline

In this article, we would go through only source linked service and dataset creation as destination is just the same process but on another PostgreSQL server.

Create PostgreSQL Server Linked Service and Dataset

  • In ADF, click on the pencil icon (author) → ellipsis on the side of datasets → New dataset.
  • Search for “postgresql” → select “Azure Database for PostgreSQL”
  • Provide the new linked service a name that could be easily identified, such as “linkedservice-<postgreSQL server name>”. Click on “+ New” to start the configuration.
  • Make sure the integration runtime (IR) is in the same Azure region your PostgreSQL server is located. This is to ensure the managed compute resource (IR) has the smallest communication latency with the target PostgreSQL resource. For more information what IR is, please check here.
  • Continue to setup managed private endpoint for PostgreSQL server. This is to provide ADF a way to communicate with PostgreSQL server with private IP address in the same Azure virtual network. If no other public access is needed, “Deny public access” could be set to “yes”. For more information on managed private endpoints, please check here.
  • At first, the managed private endpoint state would be in “pending”. Not until you head over to PostgreSQL server’s private endpoint section to approve the connection would it change to “approved”. Make sure the state shows “approved” both in PostgreSQL server and ADF.
  • Make sure the server and database are the ones you are targeting.
  • Input the username, password and select SSL. Click on “Test connection” to see whether managed private endpoint to the target PostgreSQL server works.

** There is a caveat here that “Test connection” might not work even if all the setup is correct. This could be caused by the delayed synchrnozation between ADF and other Azure services. However, if every other setting is set correctly (all the steps provided previously), you can freely click on “Create” on the bottom left of the panel. Mine is showing “Apply” because the linkage has already been made in the past.**

  • “Test connection” could still be performed in the below panel. Please ensure you select the right table name.
  • After confirming all the settings, you could safely publish the newly created linked service and dataset.

Please repeat the whole process for destination PostgreSQL server linked service and dataset.

Create Copy Data Pipeline

  • Head back to the pencil icon (author) → click on ellipsis of pipelines → select “New pipeline”
  • Expand “Move & transform” → drag “Copy data” and drop it on the canvas on the right
  • Select source PostgreSQL server linked service as “Source”
  • Select destination PostgreSQL server as “Sink”
  • Validate the pipeline
  • Debug the pipeline and look for the reseult
  • If everything goes well, we would be seeing the same data shown in source PostgreSQL server also shown in destination PostgreSQL server’s table.

That is all for migrating data from one PostgreSQL to another. In this 2-part series, you would notice having CMK data encryption or not does not really affect the migration process. So, if your environment also needs to have security reinforcement, this could be a good place to start. Happy learning!

--

--

Jonathan

Started my career as a consultant, moved to support engineer, service engineer and now a product manager. Trying to be a better PM systematically every day.