一.MHA简介
1.1.简介
MHA(Master HA ) 是一款开源的 MySQL高可用程序,它为MYSQL 主从复制架构提供了automating master failover
主节点自动迁移功能.MHA在监控到master节点故障时候,会自动提升拥有的数据最近进于主节点的其他从节点为主节点,
并且在此期间,会通过其他节点获取额外的信息来避免数据不一致性的问题.MHA也提供master节点的在线切换功能,
即按需切换master/slave节点.
1.2.MHA服务角色
MHA Manager (管理节点): 通用单独部署一台机器专门用于管理一个或者多个master/slaver集群,每个master/slave
集群成为一个application;
MHA node(数据节点) : 运行在每台mysql 服务器上(master或slave或manager),它通过监控具备解析和清理logs
功能的脚本来实现以及加速故障转移
1.3.MHA故障转移步骤
1.4 MHA组件
1.4.1 Manager节点工具程序:
1.4.2 Node节点工具程序:
1.4.3 其定义拓展:
二.MHA环境要求
2.1 Mysql replication 环境准备
2.1.1 MHA mysql 复制环境要求:
2.1.2 MHA初始化以及配置文件
三. MHA 实例
3.1.环境说明:
3.2.mysql 主从复制环境创建
3.2.1 master-node1配置:
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
skip_name_resolve=ON
innodb_file_per_table=ON
server_id=1
relay-log=relay-bin
log-bin=master-bin
MariaDB [(none)]> GRANT ALL ON *.* TO 'mhaadmin'@'10.1.%.%' IDENTIFIED BY '000000';
MariaDB [(none)]> GRANT REPLICATION CLIENT,REPLICATION SLAVE ON *.* TO 'repluser'@'10.1.%.%' IDENTIFIED BY '000000';
- MariaDB [(none)]> SHOW MASTER STATUS;
+-------------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-------------------+----------+--------------+------------------+ | master-bin.000003 | 465 | | | +-------------------+----------+--------------+------------------+ 1 row in set (0.00 sec)
3.2.2 slave-node 配置:
vim /etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
skip_name_resolve=ON
innodb_file_per_table=ON
server_id=2
relay-log=relay-bin
log-bin=master-bin
relay_log_purge=0
read_only=1
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='10.1.249.184',MASTER_USER='repluser',MASTER_PASSWORD='000000',MASTER_LOG_FILE='master-bin.000003',MASTER_LOG_POS=465;
MariaDB [(none)]> GRANT ALL ON *.* TO 'mhaadmin'@'10.1.%.%' IDENTIFIED BY '000000';
- MariaDB [(none)]> START SLAVE;
- MariaDB [(none)]> SHOW SLAVE STATUS\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.1.249.184
Master_User: mhaadmin
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000003
Read_Master_Log_Pos: 633
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 698
Relay_Master_Log_File: master-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 633
Relay_Log_Space: 986
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
3.2.3 主从复制测试:
MariaDB [(none)]> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| test |
| test2 |
+--------------------+
5 rows in set (0.00 sec)
MariaDB [(none)]>
3.2.4 manager-node 配置:
#创建私钥
- [root@node5 .ssh]# ssh-keygen -t rsa -P ''
- Generating public/private rsa key pair.Enter file in which to save the key (/root/.ssh/id_rsa): Your identification has been saved in /root/.ssh/id_rsa.Your public key has been saved in /root/.ssh/id_rsa.pub.The key fingerprint is:7f:d0:79:0b:ad:30:fa:0e:61:0d:4c:13:dc:60:1f:20 root@node5The key's randomart image is:+--[ RSA 2048]----+| E.B=. || =.o.. || o . || o . o || S = + o || . + + + . || o . o . || o . || .o |+-----------------+
- #添加公钥到认证信息存放文件中
- [root@node5 .ssh]# cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
#修改认证文件权限
[root@node5 .ssh]# chmod go= /root/.ssh/authorized_keys
#复制认证文件等到其他各节点
- [root@node5 ~]# scp /root/.ssh/authorized_keys id_rsa 10.1.249.70:/root/.ssh/
- [root@node5 ~]# scp /root/.ssh/authorized_keys id_rsa 10.1.252.218:/root/.ssh/
- [root@node5 ~]# scp /root/.ssh/authorized_keys id_rsa 10.1.249.184:/root/.ssh/
- #复制完以后需要逐个登录,测试manager-node可否使用ssh登录其他全部节点
- 注意:最好先清除全部节点/root/.ssh/下的文件再复制认证文件
2. 安装MHA
#官方源中无rpm,需要自行下载rpm包安装
其他所有节点:
3.2.5 初始化MHA
1.创建配置文件,(此处使用application为集群提供默认配置,路径为/etc/masterha/app1.cnf)
vim /etc/masterha/app1.cnf
[server default]
user=mhaadmin
password=000000
manager_workdir=/data/masterha/app1
manager_log=/data/masterha/app1/manager.log
remote_workdir=/data/masterha/app1
ssh_user=root
repl_user=repluser
repl_password=replpass
ping_interval=1
[server1]
hostname=10.1.249.184
candidate_master=1
#ssh_port=22022
[server2]
hostname=10.1.252.218
candidate_master=1
#ssh_port=22022
[server3]
hostname=10.1.249.70
#ssh_port=22022
#no_master=1
2. 修改全部节点的/etc/hosts文件:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.1.249.83 node5.com node5
10.1.249.184 node1.com node1
10.1.252.218 node2.com node2
10.1.249.70 node4.com node4
3.测试各节点间ssh通信配置:
[root@node5 .ssh]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Sun Nov 27 17:24:20 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Nov 27 17:24:20 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Nov 27 17:24:20 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Nov 27 17:24:20 2016 - [info] Starting SSH connection tests..
Sun Nov 27 17:24:21 2016 - [debug]
Sun Nov 27 17:24:20 2016 - [debug] Connecting via SSH from root@10.1.249.184(10.1.249.184:22) to root@10.1.252.218(10.1.252.218:22)..
Sun Nov 27 17:24:21 2016 - [debug] ok.
Sun Nov 27 17:24:21 2016 - [debug] Connecting via SSH from root@10.1.249.184(10.1.249.184:22) to root@10.1.249.70(10.1.249.70:22)..
Warning: Permanently added '10.1.249.70' (ECDSA) to the list of known hosts.
Sun Nov 27 17:24:21 2016 - [debug] ok.
Sun Nov 27 17:24:22 2016 - [debug]
Sun Nov 27 17:24:21 2016 - [debug] Connecting via SSH from root@10.1.252.218(10.1.252.218:22) to root@10.1.249.184(10.1.249.184:22)..
Sun Nov 27 17:24:21 2016 - [debug] ok.
Sun Nov 27 17:24:21 2016 - [debug] Connecting via SSH from root@10.1.252.218(10.1.252.218:22) to root@10.1.249.70(10.1.249.70:22)..
Sun Nov 27 17:24:22 2016 - [debug] ok.
Sun Nov 27 17:24:22 2016 - [debug]
Sun Nov 27 17:24:21 2016 - [debug] Connecting via SSH from root@10.1.249.70(10.1.249.70:22) to root@10.1.249.184(10.1.249.184:22)..
Sun Nov 27 17:24:22 2016 - [debug] ok.
Sun Nov 27 17:24:22 2016 - [debug] Connecting via SSH from root@10.1.249.70(10.1.249.70:22) to root@10.1.252.218(10.1.252.218:22)..
Sun Nov 27 17:24:22 2016 - [debug] ok.
Sun Nov 27 17:24:22 2016 - [info] All SSH connection tests passed successfully.
4. 检查MHA管理的mysql集群主从复制以及连接配置等是否正常:
[root@node5 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
........
......
......
Sun Nov 27 19:46:05 2016 - [info] Checking replication health on 10.1.252.218..Sun Nov 27 19:46:05 2016 - [info] ok.Sun Nov 27 19:46:05 2016 - [info] Checking replication health on 10.1.249.70..Sun Nov 27 19:46:06 2016 - [info] ok.Sun Nov 27 19:46:06 2016 - [warning] master_ip_failover_script is not defined.Sun Nov 27 19:46:06 2016 - [warning] shutdown_script is not defined.Sun Nov 27 19:46:06 2016 - [info] Got exit code 0 (Not master dead).MySQL Replication Health is OK.
5. 启动MHA,并测试主节点工作状况
[root@node5 .ssh]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&1&
[1] 7852
#nohup:后台运行masterha_manager,否则当终端关闭时候也会将
masterha_manager关闭- #后续管道作用: 将启动信息等导入指定的日志文件中
#检查主节点状态:
[root@node5 .ssh]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:7852) is running(0:PING_OK), master:10.1.249.184
#若主节点或集群工作不正常,则会显示"app1 is stopped,......"等类似信息
[root@node5 .ssh]# masterha_stop --conf=/etc/masterha/app1.cnf
[root@node5 .ssh]# Stopped app1 successfully.
3.3 故障转移测试
3.3.1 主节点故障转移
1. 在master节点关闭mariadb服务
[root@node1 ~]# ps aux | grep mysql
mysql 23655 0.0 0.0 113252 456 ? Ss 12:53 0:00 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
mysql 23872 0.0 12.4 1102560 80756 ? Sl 12:53 0:11 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/var/lib/mysql/mysql.sock
root 30281 0.0 0.1 112644 952 pts/0 R+ 20:16 0:00 grep --color=auto mysql
[root@node1 ~]#
[root@node1 ~]#
[root@node1 ~]# killall mysqld mysqld_safe
vim /data/masterha/app1/manager.log
......................
Sun Nov 27 20:19:06 2016 - [info] Master is down!
Sun Nov 27 20:19:06 2016 - [info] Terminating monitoring script.
Sun Nov 27 20:19:06 2016 - [info] Got exit code 20 (Master dead).
Sun Nov 27 20:19:06 2016 - [info] MHA::MasterFailover version 0.56.
Sun Nov 27 20:19:06 2016 - [info] Starting master failover.
Sun Nov 27 20:19:06 2016 - [info]
Sun Nov 27 20:19:06 2016 - [info] * Phase 1: Configuration Check Phase..
Sun Nov 27 20:19:06 2016 - [info]
Sun Nov 27 20:19:06 2016 - [info] GTID failover mode = 0
Sun Nov 27 20:19:06 2016 - [info] Dead Servers:
Sun Nov 27 20:19:06 2016 - [info] 10.1.249.184(10.1.249.184:3306)
Sun Nov 27 20:19:06 2016 - [info] Checking master reachability via MySQL(double check)...
Sun Nov 27 20:19:06 2016 - [info] ok.
Sun Nov 27 20:19:06 2016 - [info] Alive Servers:
Sun Nov 27 20:19:06 2016 - [info] 10.1.252.218(10.1.252.218:3306)
Sun Nov 27 20:19:06 2016 - [info] 10.1.249.70(10.1.249.70:3306)
Sun Nov 27 20:19:06 2016 - [info] The latest binary log file/position on all slaves is master-bin.000003:1353
- Sun Nov 27 20:19:08 2016 - [info] * Phase 3.3: Determining New Master Phase..
Sun Nov 27 20:19:08 2016 - [info]
Sun Nov 27 20:19:08 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Sun Nov 27 20:19:08 2016 - [info] All slaves received relay logs to the same position. No need to resync each other.
Sun Nov 27 20:19:08 2016 - [info] Searching new master from slaves..
Sun Nov 27 20:19:08 2016 - [info] Candidate masters from the configuration file:
Sun Nov 27 20:19:08 2016 - [info] 10.1.252.218(10.1.252.218:3306) Version=5.5.44-MariaDB-log (oldest major version between slaves) log-bin:enabled
Sun Nov 27 20:19:08 2016 - [info] Replicating from 10.1.249.184(10.1.249.184:3306)
Sun Nov 27 20:19:08 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Nov 27 20:19:08 2016 - [info] Non-candidate masters:
Sun Nov 27 20:19:08 2016 - [info] Searching from candidate_master slaves which have received the latest relay log events..- Sun Nov 27 20:19:09 2016 - [info] Setting read_only=0 on 10.1.252.218(10.1.252.218:3306).. Sun Nov 27 20:19:08 2016 - [info] New master is 10.1.252.218(10.1.252.218:3306)
Sun Nov 27 20:19:08 2016 - [info] Starting master failover..- ..........................
- Applying log files succeeded. Sun Nov 27 20:19:09 2016 - [info] All relay logs were successfully applied. Sun Nov 27 20:19:09 2016 - [info] Getting new master's binlog name and position.. Sun Nov 27 20:19:09 2016 - [info] master-bin.000003:629 Sun Nov 27 20:19:09 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.1.252.218', MASTER_PORT=3306, MASTER_LOG_FILE='master-bin.000003', MASTER_LOG_POS=629, MASTER_USER='repluser', MASTER_PASSWORD='xxx'; Sun Nov 27 20:19:09 2016 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address. Sun Nov 27 20:19:09 2016 - [info] Setting read_only=0 on 10.1.252.218(10.1.252.218:3306).. Sun Nov 27 20:19:09 2016 - [info] ok. Sun Nov 27 20:19:09 2016 - [info] ** Finished master recovery successfully. Sun Nov 27 20:19:09 2016 - [info] * Phase 3: Master Recovery Phase completed. Sun Nov 27 20:19:09 2016 - [info] Sun Nov 27 20:19:09 2016 - [info] * Phase 4: Slaves Recovery Phase.. Sun Nov 27 20:19:09 2016 - [info] Sun Nov 27 20:19:09 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
- .....................
3.4 故障修复后重上线
[root@node5 .ssh]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&1&
[1] 10188
[root@node5 .ssh]#
[root@node5 .ssh]#
[root@node5 .ssh]#
[root@node5 .ssh]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:10188) is running(0:PING_OK), master:10.1.252.218
[root@node5 .ssh]#
[root@node5 .ssh]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:10188) is running(0:PING_OK), master:10.1.252.218 [root@node5 .ssh]# masterha_check_repl --conf=/etc/masterha/app1.cnf Sun Nov 27 21:03:04 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Nov 27 21:03:04 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sun Nov 27 21:03:04 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Sun Nov 27 21:03:04 2016 - [info] MHA::MasterMonitor version 0.56. Sun Nov 27 21:03:04 2016 - [warning] SQL Thread is stopped(no error) on 10.1.252.218(10.1.252.218:3306) Sun Nov 27 21:03:04 2016 - [info] Multi-master configuration is detected. Current primary(writable) master is 10.1.252.218(10.1.252.218:3306) Sun Nov 27 21:03:04 2016 - [info] Master configurations are as below: Master 10.1.252.218(10.1.252.218:3306), replicating from 10.1.249.184(10.1.249.184:3306) Master 10.1.249.184(10.1.249.184:3306), replicating from 10.1.252.218(10.1.252.218:3306), read-only Sun Nov 27 21:03:04 2016 - [info] GTID failover mode = 0 Sun Nov 27 21:03:04 2016 - [info] Dead Servers: Sun Nov 27 21:03:04 2016 - [info] Alive Servers: Sun Nov 27 21:03:04 2016 - [info] 10.1.249.184(10.1.249.184:3306) Sun Nov 27 21:03:04 2016 - [info] 10.1.252.218(10.1.252.218:3306) Sun Nov 27 21:03:04 2016 - [info] 10.1.249.70(10.1.249.70:3306) Sun Nov 27 21:03:04 2016 - [info] Alive Slaves: Sun Nov 27 21:03:04 2016 - [info] 10.1.249.184(10.1.249.184:3306) Version=5.5.44-MariaDB-log (oldest major version between slaves) log-bin:enabled Sun Nov 27 21:03:04 2016 - [info] Replicating from 10.1.252.218(10.1.252.218:3306) Sun Nov 27 21:03:04 2016 - [info] Primary candidate for the new Master (candidate_master is set) Sun Nov 27 21:03:04 2016 - [info] 10.1.249.70(10.1.249.70:3306) Version=5.5.44-MariaDB-log (oldest major version between slaves) log-bin:enabled Sun Nov 27 21:03:04 2016 - [info] Replicating from 10.1.252.218(10.1.252.218:3306)
.....................
- Sun Nov 27 21:03:08 2016 - [info] Checking replication health on 10.1.249.184.. Sun Nov 27 21:03:08 2016 - [info] ok. Sun Nov 27 21:03:08 2016 - [info] Checking replication health on 10.1.249.70.. Sun Nov 27 21:03:08 2016 - [info] ok. Sun Nov 27 21:03:08 2016 - [warning] master_ip_failover_script is not defined. Sun Nov 27 21:03:08 2016 - [warning] shutdown_script is not defined. Sun Nov 27 21:03:08 2016 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
原创文章,作者:ldt195175108,如若转载,请注明出处:http://www.178linux.com/61094