Replication is a very useful functionality which enables creating identical database copies (replicas) on different servers. It is an asynchronous operation and it adds insignificant disk & network I/O load to the primary server. I will try to explain some aspects, benefits & problems that may arise mainly for statement-based replication which is most commonly used (there are a lot similarities with row-based replication though).
Read-Replicas are commonly used for:
- Resource-heavy Read Queries, e.g. BI calculations
- Debugging & Query Optimization
Common Replication Terms
- Primary [Server]:
- the MySQL server from which data are being replicated. All DML statements that modify data (INSERT, DELETE, UPDATE) are (must be) executed here. That goes for DDL statements as well which is also imperative to execute on this server. Every change is replicated as is.
- Replica [Server]:
- the MySQL server which replicates from the Primary. Nothing should be written or modified on this server. All changes to data or structures should pass through the Primary or else you might run into duplicate-key errors which will stop the replication until resolved.
- [Primary] Binary Logs:
- Every data modification statement and DDL statement is written in a special binary format on these files. They are usually located in /var/log/mysql with a filename pattern such as mysql-bin.NNNNNN, unless otherwise specified in the configuration file. Every time you restart the server or execute
FLUSH LOGSa new log is created and the index, used as filename suffix, is incremented.
Tip: mysqlbinlog is a utility that converts binary logs to plain text.
- [Replica] Relay Logs:
- Used to store locally in an identical to Primary Binary Logs format SQL statements from the Primary Binary logs. Usually located in /var/lib/mysql and follows the pattern hostname-relay-bin.NNNNNN. As you may have already guessed, these files can also be read with the
mysqlbinlogutility, mainly for troubleshooting.
- SQL Thread:
- Runs on the Replica and is responsible for executing the SQL statements that need to be replicated.
- IO Thread:
- Runs on the Replica but reads from the Primary binary logs SQL statements and stores them locally on the Relay binary logs. Together with the SQL Thread perform all necessary operations to accomplish asynchronous replication.
- Replication Lag:
- the distance between the Primary and Replica. An approximate measure for this is the value of the Seconds_Behind_Master field, shown by the SHOW SLAVE STATUS command.
- Broken Replication:
- If an error occurs, preventing the SQL thread to execute a certain statement (e.g. a DUPLICATE KEY error) the replication SQL thread stops until the issue is resolved or the statement is skipped. This requires manual handling for the majority of occurrences.
Useful Commands and Configuration Parameters
- starts the entire replication procedure.
- stops the entire replication procedure.
START SLAVE IO_THREAD
- starts the IO thread. It is very useful to keep the IO thread running because the Primary server removes the Binary Logs after some amount of time (specified in the Primary /etc/mysql/my.cnf file).
STOP SLAVE IO_THREAD
- Stops the IO thread.
START SLAVE SQL_THREAD
- Starts the SQL thread. When this starts, SQL statements will start to be executed on the slave. Statements
SHOW SLAVE STATUS
- Shows the status of the replication process. You can then combine it with grep to find what you really need easily or create a BASH script based on the output.
- Replication threads do not start automatically when you restart the MySQL server.
- Specify which databases should be replicated. For multiple values you must include multiple entries of this parameter.
replicate-do-db = db1 replicate-do-db = db2
- Reduce the tables being replicated to specific tables. You must specify the database name as well using the dot notation:
replicate-do-table = db1.tableA replicate-do-table = db1.tableD ... replicate-do-table = db2.tableC
A common issue if you target replication to specific tables is that you may run into replication errors involving cross-table DML statements. If we are replicating table A from database db1 and not table B the following example SQL statement will break the replication:
INSERT INTO A(A_col1, A_col2) SELECT B_col1, B_col2 FROM B WHERE B_id < 500
What can I use replication for?
You can rotate your connection to the database among many database servers, thus distributing the load in a Round-Robin manner:
MYSQL_REPLICAS = ['replica1', 'replica2', ... , 'replicaN'] def get_replica_connection(): current_host = MYSQL_REPLICAS[rand(0, N - 1)] return MySQL.connect(current_host)
This is specifically for read performance, not read & write. If, for example, you have a web application which writes data, you may need to be extra careful.
The above code is simplistic; you should use connection pools and something similar to SQLAlchemy binds, in order to simultaneously have multiple database connections available.
Resource-Heavy & Long-Running Queries
A database replica allows you to execute long running queries for reporting purposes. By controlling replication, “consistent” database snapshots can be obtained, thus enabling you to extract aggregated data without worrying that they will interfere with actual traffic. This reduces the load off your primary server, avoiding possible performance degradation which might be perceivable by users.
Backup & Failover
Backup against a server crash, hard disk failure etc . Replication is not backup but it is a well established method of having a live copy of your databases ready to use in case something goes terribly wrong on the Primary server hardware. If this happens you just reverse the configuration, in order to perform a failover – the Replica becomes Primary.
You can use pt-slave-delay from Percona Toolkit to hold back your Replica server in order to help you prevent damage to your data as well.
An automated failover process can also achieve High-Availability in cases where server restarts are required, such as OS upgrades. In this scenario, the replica is upgraded first (also to ensure that the upgrade will not result in failures) and the replica is promoted and then traffic is redirected to the newly promoted primary. Of course this is a very interesting and complicated process, requiring a lot of coordination and testing. Some details on this procedure can also be found in AWS RDS Read Replica Failover description.