Feb 232011
 

Overview

MySQL Replication is one of the most used and valued features of the MySQL Server.

Unlike some other products on the market, it’s out-of-the-box, easy to configure, non-paid and smart features.

Most of our medium/large/super-large installation base are using replication to achieve “scale-out” scaling. Some will use it for backup purposes (not as HA though) and some will use it for geographic redundancy.
In this article I won’t discuss replication as there are hundreds of posts out there on how and what replication can do for you.

Rather, I would like to discuss our new extension to the good old replication mechanism: Semi-Synchronous Replication.

Before I go on, I think I should define “Semi-Synchronous Replication”. What is “Synchronous” and why it’s only “Semi”.

Asynchronous VS. Synchronous

  • Synchronous: Every write transaction (insert, update, delete) is replicated to the slaves immediately, and the transaction is not considered complete, until all slaves reply they have received the data (and maybe even flushed it to disk).
    If even one slave replies with failure, the transaction will rolled back from all the servers and the failure will be communicated to the application.
    Usually, the master is well aware of all its slaves.

    • Pros: Redundancy; we’re sure the data safely resides in more than one server.
    • Cons: Performance; each transaction takes time till all slaves and the master acknowledge the transaction is replicated and complete.
  • Asynchronous: The write transactions will be written to the master disk and to some kind of replicating log (in MySQL it’s the “binary log”), the application gets acknowledgement that the transaction is complete even though the slaves have not received the data yet.
    The slaves, on their own time, access the master and retrieve data they need to replicate, since the last time they accessed the master bin log.
    That means the master doesn’t “know” about the slaves nor how far behind they are, or even if they are still replicating.

    • Pros: Performance; the transaction waits only for the master to flush and acknowledge.
    • Cons: Redundancy; we have no idea whether the data is on all the slaves, some of the slaves or maybe on none of the slaves.

Get the picture?

This is one of the basic trade-offs of “Performance VS. Redundancy”. You cannot get both. Never.

That is, until now… Our development guys came up with a great idea combining both.

Async Replication you get via MySQL out-of-the-box replication.

Sync Replication you can get with DRBD and Linux-HA.

SemiSync you get if you install MySQL server 5.5 and later.

So what Semi stand for?

In SemiSync replication, the master holds the transaction the same way it does with Sync replication, but it waits only for one of the slaves to acknowledge.

The acknowledgement from the slave means, it has received the transaction and applied it to the relay log and flushed it to disk. It doesn’t mean the data has actually gone into the database.

Just to make this statement clearer, it means the transaction was written to the relay log, and the relay log was flushed to disk.
There is still the need for the SQL thread to pick it up and apply it to the database. It’s very unlikely not to complete this task, but that way we’re getting a bit more performance.

As soon as the master receives the acknowledgement it will complete the transaction and reply to the application.

This article will show you how it works. As I try to keep my posts as hands-on as possible, this one will include lots of code blocks as well.

Preparing the environment

We need at least couple of MySQL Servers version 5.5 or later (either Enterprise or Community) with the SemiSync plugin. (If you download from Oracle eDelivery or MySQL site, it’ll already be inside)

Loading the SemiSync Replication plugin

  • On the master, as user with “super” privileges
mysql> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
Query OK, 0 rows affected (0.01 sec)
  • On the slave, as user with “super” privileges
mysql> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.01 sec)
  • Making sure it’s installed by issue the command
mysql> show plugins;

Look for the output

| rpl_semi_sync_master  | ACTIVE   | REPLICATION        | semisync_master.so | PROPRIETARY |

If you receive an error saying ERROR 1126 (HY000): Can’t open shared library follow the instructions on

http://dev.mysql.com/doc/refman/5.5/en/replication-semisync-installation.html

http://dev.mysql.com/doc/refman/5.5/en/install-plugin.html

Setting Asynchronous Replication

This is the first step. The SemiSync replication is basically the out-of-the-box Async Replication with additional plugin that manages the SemiSync part.

You can follow http://dev.mysql.com/doc/refman/5.5/en/replication.html

We’re doing replication from scratch, so we don’t have to dump the current database from the master to the slave.
For more details, make sure you’re getting your Async replication up and running before you activate the SemiSync replication.

On the master

  • Adding/Uncomment on /etc/my.cnf
[mysqld]

log-bin=mysql-bin
server-id=1
  • Restart the server as root
[root@master ~]# /etc/init.d/mysql restart
Shutting down MySQL..                                      [  OK  ]
Starting MySQL..                                           [  OK  ]
  • Create the replication user (please use less apparent password…)
mysql> CREATE USER 'repl'@'%' IDENTIFIED BY 'slave';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

On the Slave

  • Adding/Uncomment on /etc/my.cnf
[mysqld]
server-id=2
  • Restart the server as root
[root@slave ~]# /etc/init.d/mysql restart
Shutting down MySQL..                                      [  OK  ]
Starting MySQL..                                           [  OK  ]

Initiating the replication

  • On the master, let’s make sure the master is ready for Async replication
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |      524 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
  • On the slave, we now issue a CHANGE MASTER TO & START SLAVE commands to tell the slave to start replicate
mysql> CHANGE MASTER TO
  MASTER_HOST='192.168.0.136',
  MASTER_USER='repl',
  MASTER_PASSWORD='slave',
  MASTER_PORT=3306,
  MASTER_LOG_FILE='mysql-bin.000001',
  MASTER_CONNECT_RETRY=10;
Query OK, 0 rows affected (0.01 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.136
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 1264
               Relay_Log_File: slave-relay-bin.000003
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1264
              Relay_Log_Space: 1295
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
1 row in set (0.00 sec)
  • Let’s see if that works, on the slave (making sure there are no tables in the “test” database)
mysql> show tables;
Empty set (0.00 sec)
  • On the master, let’s create new table inside “test” database
mysql> CREATE TABLE test1 (id INT, name TEXT);
Query OK, 0 rows affected (0.01 sec)

mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| test1          |
+----------------+
1 row in set (0.00 sec)
  • On the slave, confirm the table creation has replicate across
mysql> SHOW TABLES;
+----------------+
| Tables_in_test |
+----------------+
| test1          |
+----------------+
1 row in set (0.00 sec)
  • We saw that the table has been created on the slave. That will work for data as well. On the master
mysql> INSERT INTO test1 VALUES (1,'Test 1');
Query OK, 1 row affected (0.01 sec)

mysql> SELECT * FROM test1;
+------+--------+
| id   | name   |
+------+--------+
|    1 | Test 1 |
+------+--------+
1 row in set (0.00 sec)
  • On the slave
mysql> SELECT * FROM test1;
+------+--------+
| id   | name   |
+------+--------+
|    1 | Test 1 |
+------+--------+
1 row in set (0.00 sec)
  • Let’s block the connection between the master and the slave, on the slave shell
[root@slave ~]# iptables -A INPUT -s 192.168.0.136 -j DROP
  • Now we’ll add another record to the master and check the slave
mysql> INSERT INTO test1 VALUES (2,'Test 2');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO test1 VALUES (3,'Test 3');
Query OK, 1 row affected (0.03 sec)

mysql> SELECT * from test1;
+------+--------+
| id   | name   |
+------+--------+
|    1 | Test 1 |
|    2 | Test 2 |
|    3 | Test 3 |
+------+--------+
3 rows in set (0.00 sec)
  • On the slave
mysql> SELECT * FROM test1;
+------+--------+
| id   | name   |
+------+--------+
|    1 | Test 1 |
+------+--------+
1 row in set (0.00 sec)

Issuing a SHOW SLAVE STATUS command won’t show anything, as the slave doesn’t know it’s offline and the master doesn’t know there’s a problem. Transactions just keep on going into the master without any issues.

As soon as we “fix” the network block and re-initiate the slave replication, the replication will catch up

[root@slave ~]# iptables -D INPUT -s 192.168.0.136 -j DROP
  • On the slave
mysql> START SLAVE;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> SELECT * FROM test1;
+------+--------+
| id   | name   |
+------+--------+
|    1 | Test 1 |
|    2 | Test 2 |
|    3 | Test 3 |
+------+--------+
3 rows in set (0.00 sec)

Moving into Semi-Synchronous Replication

We already installed the plugins, but we have to set the SemiSync replication. We can do it dynamically or permanently via my.cnf.

More info http://dev.mysql.com/doc/refman/5.5/en/replication-semisync-installation.html

  • We will do it dynamically. On the master
mysql> SET GLOBAL rpl_semi_sync_master_enabled = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> SET GLOBAL rpl_semi_sync_master_timeout = 10000;
Query OK, 0 rows affected (0.00 sec)
  • On the slave
mysql> SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> STOP SLAVE IO_THREAD; START SLAVE IO_THREAD;
Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)
  • Confirming it worked. On the master
mysql> SHOW VARIABLES LIKE 'rpl_semi_sync%';
+------------------------------------+-------+
| Variable_name                      | Value |
+------------------------------------+-------+
| rpl_semi_sync_master_enabled       | ON    |
| rpl_semi_sync_master_timeout       | 10000 |
| rpl_semi_sync_master_trace_level   | 32    |
| rpl_semi_sync_master_wait_no_slave | ON    |
+------------------------------------+-------+
4 rows in set (0.00 sec)

mysql> SHOW STATUS LIKE 'Rpl_semi_sync%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 1     |
| Rpl_semi_sync_master_net_avg_wait_time     | 0     |
| Rpl_semi_sync_master_net_wait_time         | 0     |
| Rpl_semi_sync_master_net_waits             | 0     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
| Rpl_semi_sync_master_tx_wait_time          | 0     |
| Rpl_semi_sync_master_tx_waits              | 0     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 0     |
+--------------------------------------------+-------+
14 rows in set (0.00 sec)

That is quite self explanatory, but let’s look at some that we’re interested in (the rest are in http://dev.mysql.com/doc/refman/5.5/en/server-status-variables.html#statvar_Rpl_semi_sync_master_status)

  1. Rpl_semi_sync_master_status – Whether semisynchronous replication currently is operational on the master.
  2. Rpl_semi_sync_master_yes_tx – The number of commits that were acknowledged successfully by a slave.
  3. Rpl_semi_sync_master_no_tx – The number of commits that were not acknowledged successfully by a slave.
  • On the slave
mysql> SHOW VARIABLES LIKE 'rpl_semi_sync%';
+---------------------------------+-------+
| Variable_name                   | Value |
+---------------------------------+-------+
| rpl_semi_sync_slave_enabled     | ON    |
| rpl_semi_sync_slave_trace_level | 32    |
+---------------------------------+-------+
2 rows in set (0.00 sec)

mysql> SHOW STATUS LIKE 'Rpl_semi_sync%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Rpl_semi_sync_slave_status | ON    |
+----------------------------+-------+
1 row in set (0.00 sec)

Now let’s see what the “Semi” means. We’ll add new records to the master, while the slave has connection, than we’ll block the connection and we’ll try to ad new record. Let’s see what happens.

  • On the master
mysql> INSERT INTO test1 VALUES (4,'Test 4');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * from test1;
+------+--------+
| id   | name   |
+------+--------+
|    1 | Test 1 |
|    2 | Test 2 |
|    3 | Test 3 |
|    4 | Test 4 |
+------+--------+
4 rows in set (0.00 sec)

mysql> SHOW STATUS LIKE 'Rpl_semi_sync%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 1     |
| Rpl_semi_sync_master_net_avg_wait_time     | 602   |
| Rpl_semi_sync_master_net_wait_time         | 602   |
| Rpl_semi_sync_master_net_waits             | 1     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
| Rpl_semi_sync_master_tx_wait_time          | 0     |
| Rpl_semi_sync_master_tx_waits              | 0     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 1     |
+--------------------------------------------+-------+
14 rows in set (0.00 sec)

Notice Rpl_semi_sync_master_yes_tx is now 1!

  • On the slave
mysql> SELECT * FROM test1;
+------+--------+
| id   | name   |
+------+--------+
|    1 | Test 1 |
|    2 | Test 2 |
|    3 | Test 3 |
|    4 | Test 4 |
+------+--------+
4 rows in set (0.00 sec)
  • Now let’s block the slave network and try to add another record to the master. On the slave
[root@slave ~]# iptables -A INPUT -s 192.168.0.136 -j DROP
  • On the master
mysql> INSERT INTO test1 VALUES (5,'Test 5');
Query OK, 1 row affected (10.00 sec)

mysql> SELECT * from test1;
+------+--------+
| id   | name   |
+------+--------+
|    1 | Test 1 |
|    2 | Test 2 |
|    3 | Test 3 |
|    4 | Test 4 |
|    5 | Test 5 |
+------+--------+
5 rows in set (0.00 sec)

mysql> SHOW STATUS LIKE 'Rpl_semi_sync%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 1     |
| Rpl_semi_sync_master_net_avg_wait_time     | 602   |
| Rpl_semi_sync_master_net_wait_time         | 602   |
| Rpl_semi_sync_master_net_waits             | 1     |
| Rpl_semi_sync_master_no_times              | 1     |
| Rpl_semi_sync_master_no_tx                 | 1     |
| Rpl_semi_sync_master_status                | OFF   |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
| Rpl_semi_sync_master_tx_wait_time          | 0     |
| Rpl_semi_sync_master_tx_waits              | 0     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 1     |
+--------------------------------------------+-------+
14 rows in set (0.00 sec)

Look closely!
The transaction took 10 seconds to complete. It finally went through and it’s in the database, but two things happened:

Rpl_semi_sync_master_status is OFF
Rpl_semi_sync_master_no_tx increased from 0 to 1

This is what happens when the SemiSync replication fails: it falls back to Async mode and you can monitor your replication by queering Rpl_semi_sync_master_status.

  • We’ll recover and revert to SemiSync. On the slave
[root@slave ~]# iptables -D INPUT -s 192.168.0.136 -j DROP
mysql> SELECT * FROM test1;
+------+--------+
| id   | name   |
+------+--------+
|    1 | Test 1 |
|    2 | Test 2 |
|    3 | Test 3 |
|    4 | Test 4 |
+------+--------+
4 rows in set (0.00 sec)

mysql> STOP SLAVE IO_THREAD; START SLAVE IO_THREAD;
Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM test1;
+------+--------+
| id   | name   |
+------+--------+
|    1 | Test 1 |
|    2 | Test 2 |
|    3 | Test 3 |
|    4 | Test 4 |
|    5 | Test 5 |
+------+--------+
5 rows in set (0.00 sec)
  • On the master
mysql> SHOW STATUS LIKE 'Rpl_semi_sync%';
+--------------------------------------------+----------+
| Variable_name                              | Value    |
+--------------------------------------------+----------+
| Rpl_semi_sync_master_clients               | 1        |
| Rpl_semi_sync_master_net_avg_wait_time     | 28584321 |
| Rpl_semi_sync_master_net_wait_time         | 57168643 |
| Rpl_semi_sync_master_net_waits             | 2        |
| Rpl_semi_sync_master_no_times              | 1        |
| Rpl_semi_sync_master_no_tx                 | 1        |
| Rpl_semi_sync_master_status                | ON       |
| Rpl_semi_sync_master_timefunc_failures     | 0        |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0        |
| Rpl_semi_sync_master_tx_wait_time          | 0        |
| Rpl_semi_sync_master_tx_waits              | 0        |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0        |
| Rpl_semi_sync_master_wait_sessions         | 0        |
| Rpl_semi_sync_master_yes_tx                | 1        |
+--------------------------------------------+----------+
14 rows in set (0.00 sec)

mysql> INSERT INTO test1 VALUES (6,'Test 6');
Query OK, 1 row affected (0.01 sec)

Clearly, we’re back in business with the SemiSync replication ON again.

That’s it! We’re done….

I would strongly recommend to read on the official user manual, especially this part: http://dev.mysql.com/doc/refman/5.5/en/replication-notes.html

  3 Responses to “MySQL Semi-Synchronous Replication. See the Magic. Try the Magic.”

  1. Great article. It would be worth noting that as well as DRBD, MySQL Cluster (www.mysql.com/cluster) also supports synchronous replication, with higher levels of write scalability and faster failover/recovery

  2. Thanks Ronen!

    I’ve been waiting to read more blogs about semi-sync. This covers the basics very well (like what variables to watch). Well done!

    Now I hope someone will blog about all the gotchas! How does failover happen? What if I have multiple slaves? What is all this talk about global transaction id’s?

    I look forward to evaluating semi-sync replication myself in my new job next month. Maybe I’ll have something to share too.

Sorry, the comment form is closed at this time.