前言: 这篇博客的实验主要是配置两个节点基于corosync + pacemaker的高考用lamp, 是我搞得最痛苦的一次,并且结果还不稳定。主要问题是corosync 1.x + pacemaker 时,如果把pacemaker当成插件使用,尝试很多次都不成功,后来把pacemaker当成半独立的服务进行配置。 但是如此一来crm就没办法进行资源配置,只能使用pcs配置,但是由于没有pcsd,所以pcs只能配置不能获取状态信息。如此一来各种蛋疼。最后结果是两个节点,一个节点总是不稳定,我也不知道为什么。
一、 使用ansible做准备工作
1.配置主机名: 一共四台主机,主机名如下
192.168.98.128node1.playground.com 192.168.98.129 node2.playground.com 192.168.98.130node3.playground.com 192.168.98.131node4.playground.com
要永久生效,需写在/etc/sysconfig/network中
2. 在node1.playground.com上面安装ansible
http://docs.ansible.com/ansible/intro_installation.html # getting-ansible # mkdir /etc/ansible # cp -r ./ansible/examples/* /etc/ansible # vim /etc/ansible/hosts [all_nodes] 192.168.253.134 192.168.253.135 192.168.253.136 [AMP] 192.168.253.134 192.168.253.135 [nfs] 192.168.253.136
3. 把node1的秘钥复制到各节点上
# ssh-keygen -t rsa # ssh-copy-id /root/.ssh/id_rsa.pub root@node2.playground.com # ssh-copy-id /root/.ssh/id_rsa.pub root@node3.playground.com # ssh-copy-id /root/.ssh/id_rsa.pub root@node4.playground.com
4. 关闭防火墙
# ansible all_nodes -m service -a "name=iptables state=stopped enabled=no" # ansible all_nodes -m copy -a "src=/etc/selinux/config dest=/etc/selinux/config"
5. 同步时间
# ansible all_nodes -m yum -a "name=ntp state=present" # ansible all_nodes -m service -a "name=ntpd state=started " # ansible all_nodes -a "date"
node3.playground.com | success | rc=0 >> Fri Jan 22 13:22:12 SGT 2016 node2.playground.com | success | rc=0 >> Fri Jan 22 13:22:12 SGT 2016 node4.playground.com | success | rc=0 >> Fri Jan 22 13:22:12 SGT 2016
6. 同步hosts文件
# ansible all_nodes -m copy -a "src=/etc/hosts dest=/etc/hosts"
7. 使用yum源安装httpd
# ansible AMP -m yum -a "name=httpd state=present"
8. 使用yum源安装php,php-mysql模块,推送php.ini配置文件
# ansible AMP -m yum -a "name=php state=present" # ansible AMP -a "ls /usr/lib64/httpd/modules/libphp5.so"
node2.playground.com | success | rc=0 >> /usr/lib64/httpd/modules/libphp5.so node3.playground.com | success | rc=0 >> /usr/lib64/httpd/modules/libphp5.so
# ansible AMP -m yum -a "name=php-mysql state=present"
默认php.ini中要加一下面两行 extension = /usr/lib64/php/modules/mysql.so extension = /usr/lib64/php/modules/mysqli.so 并推送到两个节点上。
9. 使用yum安装php-xcache
# ansible AMP -m yum -a "name=php-xcache state=present"
10. 提供一个php测试页,启动服务并测试,可以使用物理机浏览器进行测试
# vim hello.php <html> <head> <title>PHP Test</title> </head> <body> <?php echo '<p>Hello World</p>'; ?> </body> </html # ansible AMP -m copy -a "src=./hello.php dest=/var/www/html/" # ansible AMP -m service -a "name=httpd enabled=yes state=started "
11. 部署nfs服务器,并导出mysql共享目录和httpd的共享目录
# vim exports /mysqldata 192.168.253.134(rw,no_root_squash) 192.168.253.135(rw,no_root_squash) /httpddata 192.168.253.134(rw) 192.168.253.135(rw) # ansible nfs -m copy -a "src=./exports dest=/etc/exports" # ansible nfs -a 'groupadd -g 133 mysql' # ansible nfs -a 'useradd -u 133 -g 133 mysql' # ansible nfs -m service -a "name=nfs state=started enabled=yes" # ansible nfs -a "exportfs -v" node4.playground.com | success | rc=0 >> /mysqldata 192.168.98.129(rw,wdelay,no_root_squash,no_subtree_check) /mysqldata 192.168.98.130(rw,wdelay,root_squash,no_subtree_check) /httpddata 192.168.98.129(rw,wdelay,root_squash,no_subtree_check) /httpddata 192.168.98.130(rw,wdelay,root_squash,no_subtree_check) # ansible all_nodes -m group -a "gid=48 name=apache state=present" # ansible all_nodes -m user -a "uid=48 name=apache state=present" # ansible nfs -m acl -a "entity=apache etype=user name=/httpddata permissions=rwx state=present"
12. 使用二进制文件在node2,node3上面部署mysql(mariadb)服务
部署服务 # ansible AMP -m copy -a "src=./mariadb-10.0.23-linux-x86_64.tar.gz dest=/root/" # ansible AMP -a "tar -xf /root/mariadb-10.0.23-linux-x86_64.tar.gz -C /usr/local/" # ansible AMP -a "ln -s /usr/local/mariadb-10.0.23-linux-x86_64 /usr/local/mysql" 创建mysql用户 # ansible all_nodes -m group -a "gid=133 name=mysql state=present" # ansible all_nodes -m user -a "uid=133 name=mysql state=present" # ansible AMP -a "chown -R mysql.mysql /usr/local/mariadb-10.0.23-linux-x86_64 " 给nfs导出文件夹mysql的读写权限 # ansible nfs -m acl -a "entity=mysql etype=user name=/mysqldata permissions=rw state=present" 推送配置文件,配置文件如下 [mysqld] datadir=/data/mysqldata socket=/tmp/mysql.sock user=mysql log-bin=/data/mysqldata/mysql_bin # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid 把配置文件加上datadir=/data/mysqldata后推送到各节点 # ansible AMP -m copy -a "src=./my.cnf dest=/etc/my.cnf" 推送服务脚本 # ansible AMP -m copy -a "src=mysqld dest=/etc/rc.d/init.d/mysqld mode=755" 把服务脚本加到服务列表 # ansible AMP -m service -a "name=mysqld enabled=yes state=stopped" 备注: 这里记得做一件事情就是,把/usr/local/mysql/bin 目录导出来成为PATH, 否则后面pacemaker启动mysql会失败。
13. 创建共享数据文件夹
# ansible AMP -a "mkdir -pv /data/mysqldata"
14. 链接一个AMP主机初始化数据库,node2为例,删除匿名用户等等
# mount -t nfs node4.playground.com:/mysqldata /data/mysqldata # ./scripts/mysql_install_db 不需要加选项,会自动读取配置文件中的选项。 在node3上面也尝试挂载一下,连入mysql服务器,测试正常,说明两个mysql库可以共享后端nfs了 MariaDB [(none)]> CREATE DATABASE discuzData MariaDB [(none)]> GRANT ALL ON discuzData.* TO 'discuzUser'@'192.168.253.%' IDENTIFIED BY 'discuz'; MariaDB [(none)]> flush privilesges
二、 安装discuz论坛站点。并配置,需要在node4(nfs)节点上进行
1. 下载部署discuz论坛
# wget http://download.comsenz.com/DiscuzX/3.2/Discuz_X3.2_SC_UTF8.zip # unzip Discuz_X3.2_SC_UTF8.zip # cp upload/* -r /httpddata/ # chown -R apache.apache httpddata 这里要确认一下,apache用户的用户ID和组ID,要确保和httpd节点一致 # ansible all_nodes -a "id apache " 192.168.253.136 | success | rc=0 >> uid=48(apache) gid=48(apache) groups=48(apache) 192.168.253.134 | success | rc=0 >> uid=48(apache) gid=48(apache) groups=48(apache) 192.168.253.135 | success | rc=0 >> uid=48(apache) gid=48(apache) groups=48(apache)
2. 在MySQL上面创建discuzData库,并授权用户
MariaDB [(none)]> CREATE DATABASE discuzData MariaDB [(none)]> GRANT ALL on discuzData.* TO "discuzUser"@'192.168.98.%' IDENTIFIED BY 'discuzpass'; MariaDB [(none)]> FLUSH PRIVILEGES
2. 安装discuz论坛。
1) 这里默认mysql服务不一定和httpd在一起,所以给mysql一个单独vip 192.168.98.111。安装时使用这个vip安discuz.
2) 安装时,把任意一个AMP节点挂载nfs即可,mysqld和httpd(php)不一定在同一节点
3) 后面配置高可用时候,也需要配置两个vip资源,一个给mysql一个给httpd
具体安装,这里就不详述了
三、 配置corosync + pacemacker + crmsh 高可用组件
1. 安装并配置corosync信息传输层
1) 使用yum源安装corosync和pacemaker
# ansible AMP -m yum -a "name=corosync state=present" # ansible AMP -m yum -a "name=pacemaker state=present"
2) AMP组的两个节点相互交换秘钥
node2上 # ssh-keygen -t rsa # ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.253.135 node3 # ssh-keygen -t rsa # ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.253.134
3) 在node2上准备配置文件,并且同步到node3上面
# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf # vim /etc/corosync/corosync.conf 修改一下几项 bindnetaddr: 192.168.253.0 # 由于AMP所在是192.168.253.0网段 mcastaddr: 239.255.12.1# 选择一个独立多播地址段 logfile: /var/log/corosync.log# 设置一个存在的路径,存放日志 # 添加一下内容 service { ver: 1 name: pacemaker # use_mgmtd: yes } aisexec { user: root group: root } 解释一下ver这个指令 0: pacemaker 作为corosync的插件使用 1: pacemaker 自己作为独立的服务,但是接受corosync作为插件调度。 先启动corosync 然后启动pacemaker 我在配置的时候,先使用0, 后来发现总是出问题,cib进程总是不正常。 后来选择让他们独立工作,故障排除了。 # scp /etc/corosync/corosync.conf root@node3.playground.com:/etc/corosync/corosync.conf
4) 生成节点间通信使用的秘钥文件(node2)
# scp /etc/corosync/authkey # scp /etc/corosync/authkey node3.playground.com:/etc/corosync/authkey
5) 在控制节点 node1使用ansible为AMP组节点,安装crmsh,pssh
# ansible AMP -m copy -a 'src=./crmsh-1.2.6-4.el6.x86_64.rpm dest=/tmp/crmsh-1.2.6-4.el6.x86_64.rpm' # ansible AMP -m copy -a 'src=./pssh-2.3.1-2.el6.x86_64.rpm dest=/tmp/pssh-2.3.1-2.el6.x86_64.rpm' # ansible AMP -m yum -a 'name=/tmp/crmsh-1.2.6-4.el6.x86_64.rpm state=present' # ansible AMP -m yum -a 'name=/tmp/pssh-2.3.1-2.el6.x86_64.rpm state=present'
6) 用控制节点node1启动AMP组上的corosync服务
# ansible AMP -m service -a "name=corosync state=started " # ansible AMP -m service -a "name=pacemaker state=started " 查看corosync引擎是否正常启动: # grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log Jan 20 18:18:28 corosync [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service. Jan 20 18:18:28 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. 查看初始化成员节点通知是否正常发出: # grep TOTEM /var/log/cluster/corosync.log Jan 20 18:18:28 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). Jan 20 18:18:28 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Jan 20 18:18:29 corosync [TOTEM ] The network interface [192.168.253.134] is now up. Jan 20 18:18:29 corosync [TOTEM ] Process pause detected for 627 ms, flushing membership messages. Jan 20 18:18:29 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jan 20 18:18:30 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. 检查启动过程中是否有错误产生。下面的错误信息表示packmaker不久之后将不再作为corosync的插件运行,因此,建议使用cman作为集群基础架构服务;此处可安全忽略。 # grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources Jan 20 18:18:29 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon. Jan 20 18:18:29 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN 查看pacemaker是否正常启动: # grep pcmk_startup /var/log/cluster/corosync.log Jan 22 10:23:29 corosync [pcmk ] info: pcmk_startup: CRM: Initialized Jan 22 10:23:29 corosync [pcmk ] Logging: Initialized pcmk_startup Jan 22 10:23:29 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615 Jan 22 10:23:29 corosync [pcmk ] info: pcmk_startup: Service: 9 Jan 22 10:23:29 corosync [pcmk ] info: pcmk_startup: Local hostname: node2.playground.com 使用crmsh查看一下集群状态 # crm status Last updated: Fri Jan 22 10:40:29 2016 Last change: Fri Jan 22 10:15:51 2016 Stack: classic openais (with plugin) Current DC: node2.playground.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 0 Resources configured Online: [ node2.playground.com node3.playground.com ]
2. 使用pcs配置资源。 主要是crm在pacemaker 单独工作的时候,没法兼容
1) 由于是两个节点所以关闭stonith
# pcs property set stonith-enabled=false
2) 创建web服务相关的资源
# pcs resource create webip ocf:heartbeat:IPaddr2 ip=192.168.98.133 cidr_netmask=24 nic=eth0 op monitor interval=60s # pcs resource create webData ocf:heartbeat:Filesystem device=192.168.98.131:/httpddata directory=/var/www/html fstype=nfs # pcs resource create webservice lsb:heartbeat:httpd
3) 创建web服务组
# resource group add webserviceGroup webip webData webservice # pcs resource group add webserviceGroup webservice --after webData # pcs resource group add webserviceGroup webData --after webip
4) 创建mysql服务相关资源
# pcs resource create mysqlip ocf:heartbeat:IPaddr2 ip=192.168.98.111 cidr_netmask=24 nic=eth0 op monitor interval=60s # pcs resource create mysqlData ocf:heartbeat:Filesystem device=192.168.98.131:/mysqldata directory=/data/mysqldata fstype=nfs # pcs resource create mysqlservice lsb:mysqld
5) 创建mysql服务组
# pcs resource group add mysqlserviceGroup mysqlip mysqlData mysqlservice # pcs resource group add mysqlserviceGroup mysqlservice --after mysqlData # pcs resource group add mysqlserviceGroup mysqlData --after mysqlservice # pcs resource cleanup # pcs resource Resource Group: mysqlserviceGroup mysqlip(ocf::heartbeat:IPaddr2):Started node.playground.com mysqlData(ocf::heartbeat:Filesystem):Started node2.playground.com mysqlservice(lsb:mysqld):Started node2.playground.com Resource Group: webserviceGroup webip(ocf::heartbeat:IPaddr2):Started node2.playground.com webData(ocf::heartbeat:Filesystem):Started node2.playground.com webservice(lsb:httpd):Started node2.playground.com 可以看到所有资源在node2节点上面,运行了
6) 切换节点,这里使用crmsh工具, 因为在pcsd没启动的情况下,总是很蛋疼
crm(live)node# standby node2.playground.com crm(live)node# standby node3.playground.com crm(live)node# online node2.playground.com # pcs resource Resource Group: mysqlserviceGroup mysqlip(ocf::heartbeat:IPaddr2):Started node2.playground.com mysqlData(ocf::heartbeat:Filesystem):Started node2.playground.com mysqlservice(lsb:mysqld):Started node2.playground.com Resource Group: webserviceGroup webip(ocf::heartbeat:IPaddr2):Started node2.playground.com webData(ocf::heartbeat:Filesystem):Started node2.playground.com webservice(lsb:httpd):Started node2.playground.com
到这里为止,大体位置就配置完了,但是我的node3还经常出问题不怎么稳定,webservice服务总是启动不起来,但是手工启动运转良好。不知道怎么搞。
总的来说,这是我碰到的最困难的实验之一。 各种配置跟视频都不一样。 痛苦死了。。。总算弄个半成品出来。就先这样吧
原创文章,作者:以马内利,如若转载,请注明出处:http://www.178linux.com/11361