MySQL Replication

The situation

We’ve got a web application that uses a mysql database as it’s backend. Some of the data held in that application needed to be used in another web application, but only read, never written to. So what we had to come up with is a method of using that data.

Options

1. Use the Rails ActiveResource class

Using this method we could read restful data from the remote system directly, and use the ruby objects we pull into our site to display the necessary data.
This is a relatively easy method of achieving our goal, requiring minimal amounts of coding on the remote application and we keep one source of data.

Some of the downsides to this method are if the remote system goes away(network failure, mysql crashes etc etc), then our new web app falls over. Also the remote system will always have to do most of the grunt work, running the mysql queries, creating the xml, etc etc. If it’s a busy remote system this may have a negative impact on how both systems run.

2. Use the Rails ActiveRecord class

The downsides to this problem are the same as the ActiveResource method, with a couple of additional problems. We’d then have access to all the database tables, and there are some tables that contain sensitive data, so that would require some additional work. Also, you can only define one database per application, so if we wanted to add any tables to our new web application, we’d have to add them to the database that powers the remote application as well. That could prove very difficult to manage.

3. XML

This is really a less efficient ActiveResource type solution with the same pro’s and con’s.

4. Master/Slave MySQL Replication

The only real downside for this method is that we’d never done it before. After a bit of testing we quickly found out that we could eliminate all the problems of the other methods.

We can synchronise only the tables we want, so we wouldn’t run into problems with sensitive data.
If the remote system goes away then our system will remain unaffected and the data changes will “catch up” once the remote comes back online.
There is no additional load on the remote system as the new system will query itself.
If we need to add any tables to our new application then that will have no effect on the remote application at all.

MySQL Replication

Replication follows a Master/Slave methodology, in our case the master is the application that is already in place, and the slave will be the new application.

I’ve left out how to create the user for replication(slave_user) and give it the necessary permissions, you can find out how to do that in one of the pages in the article above if you don’t know how to. Also don’t forget to set the bind-address to the IP of the slave on the master, and open up the firewall on the master on port 3306 for the IP of the slave.

First off we need to enable binary logging on the master, set the server-id, the database we want to replicate, (we’ll specify the tables we want in the slave), and some other variables to keep the system from getting out of hand.

MASTER
sudo vi /etc/mysql/my.cnf

Either add these lines to the [mysqldb] section or uncomment and edit them if they are already there.

server-id=1
log_bin=/var/log/mysql/mysql-bin.log
expire_logs_days=10
max_binlog_size=100M
binlog_do_db=database_name

restart the mysql server

sudo /etc/init.d/mysql restart

Now we need to tell the slave mysql server that it is the slave, and also which tables to replicate. You can also set some other management options here.

SLAVE
sudo vi /etc/mysql/my.cnf

Either add these lines to the [mysqldb] section or uncomment and edit them if they are already there.

server-id=2
master-connect-retry=60
replicate-do-table=database_name.table_name

restart the mysql server

sudo /etc/init.d/mysql restart

As we already have data in our master system we’re using the mysqldump method to get the current data(via Sequel Pro). In order for the data in the master to be the same as that in the slave initially, we need to stop any commits on tables, or LOCK the tables of the master database, prior to taking the data dump for the slave.

MASTER
mysql -uusername -ppassword
mysql>use database_name;
mysql>FLUSH TABLES WITH READ LOCK;

This will block all commits until you close that mysql session. Export the sequel dump from the master now and import it into the slave. I did this with Sequel Pro and the import/export tools available in that.

We now need to grab some details from the master that we’ll need later on when we tell the slave to start replicating.

mysql>SHOW MASTER STATUS;
———————-——————————-—————————+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
———————-——————————-—————————+
| mysql-bin.000004 | 20060 | database_name| |
———————-——————————-—————————+

Now back on the slave we need to use these variables and some others to setup the replication.

SLAVE
mysql -uusername -ppassword
mysql>STOP SLAVE; (check if the slave is already running or not)
mysql>CHANGE MASTER TO MASTER_HOST=‘IP of master’, MASTER_USER=‘slave_user’, MASTER_PASSWORD=‘slave_users password’, MASTER_LOG_FILE=‘mysql-bin.000004’, MASTER_LOG_POS=20060;
mysql>START SLAVE;
mysql>quit;

Go back to the master and close the mysql session you started earlier. This will release the lock on the tables and allow updates and commits on the master again.

MASTER
mysql>quit;

That’s it, any changes in database_name.table_name on the master will be replicated over to the slave. If the master goes away for any reason, the slave system will still function and will ‘catch up’ when the master comes back online.

What we’ve now got working is a central data storage point that pushes any changes in it’s own data out to a remote system as and when the data changes. It’s pretty easy to add more slaves if you need to for scalability.

If you had a lot of people/systems all working on the same dataset, and you wanted to make certain that they were all using the same data, then the remote systems that only read data could be the slaves, and any systems that need to write data could use the master. You can also set up master to master replication, so that if some remote systems needed to write data as well, then they could.