拓补图:
服务器用了4个网卡
两个万兆网卡做了bond连到netgear交换机,交换机端口access 30 对应IP段10.199.16.0/22,网关10.199.16.1做在netgear上
两个千兆网卡做了bond连到cisco 3750交换机,交换机端口truck 30 40 1001-1300 对应IP段10.199.16.0/22、10.176.4.0/22、kvm虚拟机内网段,网关10.176.0.4.1做在cisco 3750上
netgear和cisco 3750均做了port-channel
服务器配置:
1.ISCSI多路径配置
defaults {
udev_dir /dev
polling_interval 10
path_selector "round-robin 0"
# path_grouping_policy multibus
path_grouping_policy failover
getuid_callout "/lib/udev/scsi_id –whitelisted –device=/dev/%n"
prio alua
path_checker readsector0
rr_min_io 100
max_fds 8192
rr_weight priorities
failback immediate
no_path_retry fail
user_friendly_names yes
}
multipaths {
multipath {
wwid 36000d31003157200000000000000000a
alias primary1
}
multipath {
wwid 36000d310031572000000000000000003
alias primary2
}
multipath {
wwid 36000d31003157200000000000000000b
alias primary3
}
multipath {
wwid 36000d31003157200000000000000001b
alias qdisk
}
}
2.网卡配置
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=none
MASTER=bond0
SLAVE=yes
USERCTL=no
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=none
MASTER=bond0
SLAVE=yes
USERCTL=no
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
TYPE=Bond
ONBOOT=yes
BOOTPROTO=none
BRIDGE=cloudbr0
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth4
DEVICE=eth4
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=none
MASTER=bond1
SLAVE=yes
USERCTL=no
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth5
DEVICE=eth5
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=none
MASTER=bond0
SLAVE=yes
USERCTL=no
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond1
DEVICE=bond1
TYPE=Bond
ONBOOT=yes
BOOTPROTO=none
NAME=bond1
BRIDGE=cloudbr1
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-cloudbr0
DEVICE=cloudbr0
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=none
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-cloudbr1
DEVICE=cloudbr1
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.199.16.101
NETMASK=255.255.252.0
GATEWAY=10.199.16.1
DNS1=114.114.114.114
[root@hmkvm01 ~]# tail -f -n 5 /etc/modprobe.d/dist.conf
alias char-major-89-* i2c-dev
alias bond0 bonding
options bond0 mode=0 miimon=100
alias bond1 bonding
options bond1 mode=0 miimon=100
现象:
1.有一台服务器出现卡顿现象,从办公网络ping kvm虚拟机会有丢包现象,ping网关无丢包
2.RHCS集群新建后node是正常的,再添加别的机器不能Join Cluster,luci面板报红色错误,cman和clvmd不能运行,
而且只要手动启动cman服务该节点就会进入无限重启的死循环
3.在luci面板修改Expected votes值不生效,手动修改配置文件设成1,当失败节点再Join Cluster时依然失败,Expected votes值又会改变,
指定network模式为UDP Multicast时地址为239开头的IP,在hmkvm01节点能ping通,在令外的节点ping不通,手动指定Multicast addresses不生效
[root@hmkvm01 ~]# cman_tool status
Version: 6.2.0
Config Version: 28
Cluster Name: hmcloud
Cluster Id: 50417
Cluster Member: Yes
Cluster Generation: 992
Membership state: Cluster-Member
Nodes: 3
Expected votes: 7
Quorum device votes: 3
Total votes: 6
Node votes: 1
Quorum: 4
Active subsystems: 11
Flags:
Ports Bound: 0 11 177 178
Node name: hmkvm01
Node ID: 1
Multicast addresses: 255.255.255.255
Node addresses: 10.199.16.101
4.启动cman服务 Waiting for quorum… Timed-out waiting for cluster,修改network下的模式为UDP Broadcast或在配置文件加cman broadcast="yes",Post Join Delay 改成600,
手动修改配置文件Expected votes值为1,重启全部服务器,三台服务器状态都正常了,再看配置文件
[root@hmkvm01 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="28" name="hmcloud">
<clusternodes>
<clusternode name="hmkvm01" nodeid="1">
<fence>
<method name="hmkvm01">
<device name="hmkvm01"/>
</method>
</fence>
</clusternode>
<clusternode name="hmkvm02" nodeid="2">
<fence>
<method name="hmkvm02">
<device name="hmkvm02"/>
</method>
</fence>
</clusternode>
<clusternode name="hmkvm04" nodeid="3">
<fence>
<method name="hmkvm04"/>
</fence>
</clusternode>
<clusternode name="pcs1" nodeid="4"/>
</clusternodes>
<cman broadcast="yes" expected_votes="7"/>
<fence_daemon post_join_delay="600"/>
<fencedevices>
<fencedevice agent="fence_idrac" ipaddr="10.199.2.224" login="root" name="hmkvm01" passwd="HMIDC#88878978"/>
<fencedevice agent="fence_idrac" ipaddr="10.199.2.225" login="root" name="hmkvm02" passwd="HMIDC#88878978"/>
<fencedevice agent="fence_idrac" ipaddr="10.199.2.227" login="root" name="hmkvm04" passwd="HMIDC#88878978"/>
</fencedevices>
<quorumd label="qdisk" min_score="1">
<heuristic interval="10" program="ping -c3 -t2 10.199.16.1" tko="10"/>
</quorumd>
<logging debug="on"/>
</cluster>
5.当集群正常后,在某一节点echo c>/proc/sysrq-trigger,当节点重启后必须重复4现象才能正常加入集群。
6.仲裁磁盘qdisk能在每台机器发现,qdisk配置如下
[root@hmkvm01 ~]# mkqdisk -L
mkqdisk v3.0.12.1
/dev/block/253:5:
/dev/disk/by-id/dm-name-qdisk:
/dev/disk/by-id/dm-uuid-mpath-36000d31003157200000000000000001b:
/dev/dm-5:
/dev/mapper/qdisk:
Magic: eb7a62c2
Label: qdisk
Created: Mon Jun 13 16:23:05 2016
Host: hmkvm01
Kernel Sector Size: 512
Recorded Sector Size: 512
6.fence设备是正常的
[root@hmkvm01 ~]# fence_idrac -a 10.199.2.227 -l root -p ****** -o status
Status: ON
7.查看日志没有发现特别的地方
8.重启网卡,或中断几秒钟,当前节点就会重启
问题如下:
1.我的网卡绑定是否有有需要修改的地方?
2.多路径配置是否有问题?
2.我的集群有没有配置错误?
3.Multicast addresses是各node都能ping通么?
4.network下的红色方框中的IP地址是什么关系?
5.tcpdump抓包并没有发现个节点有跟Multicast addresses通信,这是正常的么?
6.现象8重启的时间在哪里设置?
原创文章,作者:Mrl_Eric,如若转载,请注明出处:http://www.178linux.com/18356