Django, MySQL & UTF-8

Last week I finally decided to switch my hosting service from MediaTemple to SliceHost. I have been postponing this move for almost a year, since I simply didn’t have the time to redeploy ten separate domains and move all their e-mail accounts. But, in a moment of foolishness, I cancelled my MediaTemple account and opened a brand new SliceHost one with Ubuntu 8.04 - little did I know what I was getting myself into…

Actually, the move was pretty painless. MediaTemple’s insistence on using CentOS has been putting me off for the past couple of years, and redeploying my Django sites basically meant changing a few lines in my Fabric deployment scripts. I had been forwarding all my mail to separate Gmail accounts for a while now, so I ended up opening a Google for Domains account and that was ready as well, after some DNS setup.

The only problem I faced was with MySQL, as I had expected. I never use MySQL for new projects anymore, Postgres provides a much more robust and consistent solution. However, my MediaTemple service didn’t have a PostgreSQL server, so all my Django projects had to run on MySQL. I did mysqldumps on the CentOS server, and moved everything over to Ubuntu, and the moment I checked the sites, I started seeing the dreaded encoding issues.

My first reaction was to run iconv on the SQL files and import them again. I was creating my new databases with the CHARSET UTF8 option, so I thought everything would be okay. But when the iconv converted files started showing encoding issues as well, I started to panic. My MediaTemple account was already closed, so the actual database server was no longer accessible (they don’t give you a grace period).

Thankfully, the next thing I thought about was MySQL’s collation settings. I really don’t know why these are different than the charset option, and I don’t want to know. But they were causing the issues. I did some research, and I found out that I could update the MySQL settings to make sure all new databases are created with the correct UTF8 charset and the utf8_unicode_ci collation. So I edited the /etc/mysql/my.cnf:

    [mysqld]
character-set-server = utf8
collation-server = utf8_unicode_ci
init_connect = 'set collation_connection = utf8_unicode_ci;'

Finally, I also added a line for the MySQL client, to make sure it uses UTF8 when displaying the data:

    [client]
default-character-set = utf8

I re-imported all the SQL dumps, and no more encoding issues. Next step: migrate everything to PostgreSQL!

Comments (4)