Thursday 12 May 2016

Database Migration and Integration using AWS DMS



Amazon Web Services (AWS) recently released a product called AWS Data Migration Services (DMS) to migrate data between databases.

The experiment

I have used AWS DMS to try a migration from a source MySQL database to a target MySQL database, a homogeneous database migration.

The DMS service lets you use a resource in the middle Replication Instance - an automatically created EC2 instance - plus source and target Endpoints. Then you move data from the source database to the target database. Simple as that. DMS is also capable of doing heterogeneous database migrations like from MySQL to Oracle and even synchronous integrations. In addition AWS DMS also gives you a client tool called AWS Schema Converter tool which helps you convert your source database objects like stored procedures to the target database format. All things a cloud data integration project needs!

In my experiment and POC, I was particularly interested in the ability of the tool to move a simple data model as below, with 1-n relationship between tables t0(parent) and t1(child) like below.

(Pseudo code to quickly create two tables t0, t1 with 1-n relationship to try it. Create the tables both on source and target database)

t0 -> t1 Table DDL (Pseudo code)

CREATE TABLE `t0` (
  `id` int(11) NOT NULL,
  `txt` varchar(100) CHARACTER SET ucs2 DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `t1` (
  `id` mediumint(9) NOT NULL AUTO_INCREMENT,
  `t0id` int(9) DEFAULT NULL,
  `txt` char(100) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `t0id` (`t0id`),
  CONSTRAINT `t1_ibfk_1` FOREIGN KEY (`t0id`) REFERENCES `t0` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


In this experiment, I didn't want to see just a migration, a copy, of a table from source database to a target database. I was interested more to see how easy is to migrate a data model - with Primary Key and Foreign Key relationship in place -  from the source database to the target database with zero downtime and using their CDC (Changed data Capture) or Ongoing-Replication migration option and capabilities of AWS DMS. That is, zero downtime database migration.

Here are the results of the experiment.

AWS DMS is ubiquitous, you can quickly set-up an agent (Replication Instance) and define source & target endpoints and start mapping your tables to be migrated from source database to target database with the tool. All conveniently using the AWS console.

Once you setup your replication instance and endpoints, create a Migration Task (say Alpha) and do an initial full migration (load) from the source database to the target database. Do this with the foreign keys (FKs) disabled on the target. This is a recommendation in the AWS DMS Guide in order to dump the data super fast as it does it with parallel threads, at least this is the recommendations for MySQL targets.

Then you can create a second Migration Task (say Beta) using a different endpoint, but this time with the foreign keys enabled on the target. You can do this even before your full load with Alpha to avoid waiting times. Configure Beta interface/task to run forever and let it integrate and sync the delta which occurred during the initial load. You can even start the Beta interface from a cut-off timestamp point. It uses source MySQL database's binlogs to propagate the changes. If you don't create beta interface, that is to use a different endpoint for the target with the parameter which enables the FKs, the DELETE SQL statements on the source which occur during the migration will not propagate to the target correctly and the CASCADEs to the child tables will not work on the target. CASCADE is a property of the Foreign Key.

To reconcile, to find out if you have migrated everything, I had to count the rows in each table on source and the target databases to monitor and see if it all worked. To do that I used Pentaho Spoon CE to quickly create a job to count the rows on both source and target database and validate migration/integration interfaces.

Overall, I found AWS DMS very easy to use, it quickly helps you wire an integration interface in the Cloud and start pumping and syncing data between sources and targets databases be it on Premise or Cloud. A kind of Middleware setup in AWS style, in the Cloud. No more middleware tools for data migration, AWS now has it's own.