Last week I finally decided to switch my hosting service from MediaTemple to SliceHost. I have been postponing this move for almost a year, since I simply didn’t have the time to redeploy ten separate domains and move all their e-mail accounts. But, in a moment of foolishness, I cancelled my MediaTemple account and opened a brand new SliceHost one with Ubuntu 8.04 - little did I know what I was getting myself into…
Actually, the move was pretty painless. MediaTemple’s insistence on using CentOS has been putting me off for the past couple of years, and redeploying my Django sites basically meant changing a few lines in my Fabric deployment scripts. I had been forwarding all my mail to separate Gmail accounts for a while now, so I ended up opening a Google for Domains account and that was ready as well, after some DNS setup.
The only problem I faced was with MySQL, as I had expected. I never use MySQL for new projects anymore, Postgres provides a much more robust and consistent solution. However, my MediaTemple service didn’t have a PostgreSQL server, so all my Django projects had to run on MySQL. I did mysqldumps on the CentOS server, and moved everything over to Ubuntu, and the moment I checked the sites, I started seeing the dreaded encoding issues.
My first reaction was to run iconv on the SQL files and import them again. I was creating my new databases with the CHARSET UTF8 option, so I thought everything would be okay. But when the iconv converted files started showing encoding issues as well, I started to panic. My MediaTemple account was already closed, so the actual database server was no longer accessible (they don’t give you a grace period).
Thankfully, the next thing I thought about was MySQL’s collation settings. I really don’t know why these are different than the charset option, and I don’t want to know. But they were causing the issues. I did some research, and I found out that I could update the MySQL settings to make sure all new databases are created with the correct UTF8 charset and the utf8_unicode_ci collation. So I edited the /etc/mysql/my.cnf:
[mysqld]
character-set-server = utf8
collation-server = utf8_unicode_ci
init_connect = 'set collation_connection = utf8_unicode_ci;'
Finally, I also added a line for the MySQL client, to make sure it uses UTF8 when displaying the data:
[client]
default-character-set = utf8
I re-imported all the SQL dumps, and no more encoding issues. Next step: migrate everything to PostgreSQL!
The painless move to postgresql is possible by using JSON dumps. I believe django-command-extensions can do that.
Julian - your question deserves a good answer, so I am going to write an in-depth blog post about it.
Nick - that’s what I was thinking as well.
Glad you were OK with a case insensitive collation. I’ve been bitten by the case insensitive collation in the past, where the only case-sensitive utf8 collation MySQL provides is utf8_bin, but that causes problems in mysql-python and django. You end up getting binary chunks back instead of unicode strings. I’ve also had to use “SET NAMES utf8” in the init-connect string to ensure that things work smoothly.
Beyond these things though, I haven’t had problems with MySQL. If you get around to it, I’d certainly enjoy a post on Postgres advantages.
© Copyright 2001-2010 Taylan Pince. All rights reserved.
What do you mean when you say Postres is more robust and consistent? On which occasion did you see that?