Migrating a Django app from MySQL to PostgreSQL
We just finished migrating the database for our Django 1.6 app from MySQL to PostgreSQL. If you have a clean environment this process is as simple as running syncdb and/or migrate to create the tables, truncating the data in those tables, and running the dumpdata and loaddata management commands.
Here are the steps to do this:
Step 1: Create an empty database in your PostgreSQL instance
CREATE DATABASE dbname OWNER rolename;
Step 2: Create a database alias for PostgreSQL in your Django settings file
In your settings.py, change the database setting to something like this:
DATABASES = { 'default': { 'ENGINE': 'django.db.backends.mysql', 'NAME': 'dbname', 'USER': 'dbuser', 'PASSWORD': 'dbpass', 'HOST': 'mysql.example.com', 'PORT': '', }, 'postgresql': { 'ENGINE': 'django.db.backends.postgresql_psycopg2', 'NAME': 'dbname', 'USER': 'dbuser', 'PASSWORD': 'dbpass', 'HOST': 'postgresql.example.com', 'PORT': '', } }
This will allow us to specify which database to run the management commands on in the later steps.
Step 3: Create the tables in the PostgreSQL instance
python manage.py syncdb --migrate --no-initial-data --database=postgresql
This will run syncdb and South migrations without the initial data against the PostgreSQL database.
Step 4: Truncate the newly created tables
Even though we specified --no-initial-data in the previous step, it's probably a good idea to perform this step just in case the app has custom data migrations which added records to the tables. We can generate an SQL script we can run against the database with this command:
python manage.py sqlflush --database=postgresql
Step 5: Dump the data from the MySQL database into a JSON file
Django has a management command called dumpdata which you'd normally want to use as it will generate a database-agnostic backup. The default format is JSON.
python manage.py dumpdata --all --natural --indent=4 > dbname.json
The --all option will make sure Django's base manager is used when dumping the models in case you have custom managers that filter or modify the data. The --natural option will tell Django to use natural keys if available. The --indent option just makes the output more readable.
You may also want to exclude the data for certain apps or models such as celery logs. You do this with the --exclude option. For example:
python manage.py dumpdata --all --natural --exclude=djcelery --exclude=search.SearchLog --indent=4 > dbname.json
Step 6: Load the JSON data into the PostgreSQL database
python manage.py loaddata dbname.json --database=postgresql
That's pretty much it! You've now fully migrated your database to PostgreSQL. All you have to do now is change your database setting and make the PostgreSQL one the default.
DATABASES = { 'default': { 'ENGINE': 'django.db.backends.postgresql_psycopg2', 'NAME': 'dbname', 'USER': 'dbuser', 'PASSWORD': 'dbpass', 'HOST': 'postgresql.example.com', 'PORT': '', } }
Now, in our case, we don't really have a clean, standard Django environment. Our database was originally created for a legacy app (PHP) and we're currently still in the process of getting rid of it. We have tables in the database outside of Django that are still used by this application. To move these over to PostgreSQL, I used this tool which made the process very easy.
Things to watch out for
Django signals
You'd probably want to disable these. You may have signals that send emails out to your users when a record is created. To avoid spamming them, make sure they're disabled before loading the data. Here's one way to handle this by using a decorator to make sure the signal is not triggered when running the loaddata management command.
Constraints (such as not null, unique, foreign key)
I've ran into a bunch of these. There were records in the database with foreign key constraints to another table but the related record no longer exists. There were also records with null values but the model definition didn't allow it. Same thing with duplicates when there's a unique constraint. Most likely due to manual intervention. I just had to clean them all up with SQL queries. Dumpdata and loaddata will detect these, so your database should be in a clean, consistent stage after the migration.
Hardcoded primary keys
This was painful. We had hardcoded primary keys everywhere in our test suite and a a lot of tests failed when using PostgreSQL because the sequence is handled differently in PostgreSQL. I had to go through 700+ tests to remove these and fix the tests. Most were simple changes but there were some that were very time consuming as I had to change the test logic and there were tests that were just difficult to follow.
Raw SQL queries
Fortunately, we only have one of these. The query used MySQL functions that aren't available in PostgreSQL. We just had to modify it to use the equivalent functions in PostgreSQL.
Case sensitivity
String comparisons are case sensitive in PostgreSQL but not in MySQL. I also ran into an issue while moving the non-Django tables when the indexes were being created as the command was looking for 'id' but the column name was 'Id' (capital 'I'). I just had to rename that column to 'id' and everything worked fine.
Tags: howto, python, django, tech, software development, database, postgresql, mysql