keepalive

HA Cluster:

集群类型:LB(lvs/nginx(http/upstream, stream/upstream))、HA、HP

SPoF: Single Point of Failure

系统可用性的公式:A=MTBF/(MTBF+MTTR)
(0,1), 95%
几个9(指标): 99%, …, 99.999%,99.9999%;
99%: %1, 99.9%, 0.1%

系统故障:
硬件故障:设计缺陷、wear out、自然灾害、……
软件故障:设计缺陷、

提升系统高用性的解决方案之降低MTTR:

手段:冗余(redundant)

active/passive(主备),active/active(双主) 
active –> HEARTBEAT –> passive 
active <–> HEARTBEAT <–> active

高可用的是“服务”:
HA nginx service:
vip/nginx process[/shared storage]

资源:组成一个高可用服务的“组件”;

(1) passive node的数量?
(2) 资源切换?

shared storage:
NAS:文件共享服务器;
SAN:存储区域网络,块级别的共享;

Network partition:网络分区
隔离设备:
node:STONITH = Shooting The Other Node In The Head
资源:fence 

quorum:
with quorum: > total/2
without quorum: <= total/2

TWO nodes Cluster?
辅助设备:ping node, quorum disk; 

Failover:故障切换,即某资源的主节点故障时,将资源转移至其它节点的操作;
Failback:故障移回,即某资源的主节点故障后重新修改上线后,将转移至其它节点的资源重新切回的过程; 

HA Cluster实现方案:
vrrp协议的实现
keepalived
ais:完备HA集群
RHCS(cman)
heartbeat
corosync

keepalived:

vrrp协议:Virtual Redundant Routing Protocol
术语:
虚拟路由器:Virtual Router 
虚拟路由器标识:VRID(0-255)
物理路由器:
master:主设备
backup:备用设备
priority:优先级
VIP:Virtual IP 
VMAC:Virutal MAC (00-00-5e-00-01-VRID)
GraciousARP

通告:心跳,优先级等;周期性;

抢占式,非抢占式;

安全工作:
认证:
无认证
简单字符认证
MD5

工作模式:
主/备:单虚拟路径器;
主/主:主/备(虚拟路径器1),备/主(虚拟路径器2)

keepalived:
vrrp协议的软件实现,原生设计的目的为了高可用ipvs服务:
vrrp协议完成地址流动;
为vip地址所在的节点生成ipvs规则(在配置文件中预先定义);
为ipvs集群的各RS做健康状态检测;
基于脚本调用接口通过执行脚本完成脚本中定义的功能,进而影响集群事务;

组件:
核心组件:
vrrp stack
ipvs wrapper
checkers
控制组件:配置文件分析器
IO复用器
内存管理组件

HA Cluster的配置前提:
(1) 各节点时间必须同步;
ntp, chrony
(2) 确保iptables及selinux不会成为阻碍;
(3) 各节点之间可通过主机名互相通信(对KA并非必须);
建议使用/etc/hosts文件实现; 
(4) 各节点之间的root用户可以基于密钥认证的ssh服务完成互相通信;(并非必须)

keepalived安装配置:
CentOS 6.4+随base仓库提供;

程序环境:
主配置文件:/etc/keepalived/keepalived.conf
主程序文件:/usr/sbin/keepalived
Unit File:keepalived.service
Unit File的环境配置文件:/etc/sysconfig/keepalived

配置文件组件部分:
TOP HIERACHY
GLOBAL CONFIGURATION
Global definitions
Static routes/addresses
VRRPD CONFIGURATION
VRRP synchronization group(s):vrrp同步组;
VRRP instance(s):每个vrrp instance即一个vrrp路由器;
LVS CONFIGURATION
Virtual server group(s)
Virtual server(s):ipvs集群的vs和rs;

单主配置示例:
! Configuration File for keepalived

global_defs {
notification_email {
root@localhost
}
notification_email_from keepalived@localhost
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id node1
vrrp_mcast_group4 224.0.100.19
}

vrrp_instance VI_1 {
state BACKUP
interface eno16777736
virtual_router_id 14
priority 98
advert_int 1
authentication {
auth_type PASS
auth_pass 571f97b2
}
virtual_ipaddress {
10.1.0.91/16 dev eno16777736
}
}

配置语法:
配置虚拟路由器:
vrrp_instance <STRING> {
….
}

专用参数:
state MASTER|BACKUP:当前节点在此虚拟路由器上的初始状态;只能有一个是MASTER,余下的都应该为BACKUP;
interface IFACE_NAME:绑定为当前虚拟路由器使用的物理接口;
virtual_router_id VRID:当前虚拟路由器的惟一标识,范围是0-255;
priority 100:当前主机在此虚拟路径器中的优先级;范围1-254;
advert_int 1:vrrp通告的时间间隔;
authentication {
auth_type AH|PASS
auth_pass <PASSWORD>
}
virtual_ipaddress {
<IPADDR>/<MASK> brd <IPADDR> dev <STRING> scope <SCOPE> label <LABEL>
192.168.200.17/24 dev eth1
192.168.200.18/24 dev eth2 label eth2:1
}
track_interface {
eth0
eth1

}
配置要监控的网络接口,一旦接口出现故障,则转为FAULT状态;
nopreempt:定义工作模式为非抢占模式;
preempt_delay 300:抢占式模式下,节点上线后触发新选举操作的延迟时长;

定义通知脚本:
notify_master <STRING>|<QUOTED-STRING>:当前节点成为主节点时触发的脚本;
notify_backup <STRING>|<QUOTED-STRING>:当前节点转为备节点时触发的脚本;
notify_fault <STRING>|<QUOTED-STRING>:当前节点转为“失败”状态时触发的脚本;

notify <STRING>|<QUOTED-STRING>:通用格式的通知触发机制,一个脚本可完成以上三种状态的转换时的通知;

回顾:
HA Cluster:
A=MTBF/(MTBF+MTTR)
99.5%, …, 99.9999%

实现:
vrrp协议:keepalived
ais:OpenAIS

vrrp:
virtual router:
VRID,VIP,VMAC,priority, …

keepalived:
vrrp/ipvs_wrapper/checkers

keepalived(2)

双主模型示例:
! Configuration File for keepalived

global_defs {
notification_email {
root@localhost
}
notification_email_from keepalived@localhost
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id node1
vrrp_mcast_group4 224.0.100.19
}

vrrp_instance VI_1 {
state MASTER
interface eno16777736
virtual_router_id 14
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 571f97b2
}
virtual_ipaddress {
10.1.0.91/16 dev eno16777736
}
}

vrrp_instance VI_2 {
state BACKUP
interface eno16777736
virtual_router_id 15
priority 98
advert_int 1
authentication {
auth_type PASS
auth_pass 578f07b2
}
virtual_ipaddress {
10.1.0.92/16 dev eno16777736
}
}

通知脚本的使用方式:
示例通知脚本:

!/bin/bash

contact=’root@localhost’

notify() {
mailsubject=”$(hostname) to be $1, vip floating”
mailbody=”$(date +’%F %T’): vrrp transition, $(hostname) changed to be $1″
echo “$mailbody” | mail -s “$mailsubject” $contact
}

case $1 in
master)
notify master
;;
backup)
notify backup
;;
fault)
notify fault
;;
*)
echo “Usage: $(basename $0) {master|backup|fault}”
exit 1
;;
esac

脚本的调用方法:
notify_master “/etc/keepalived/notify.sh master”
notify_backup “/etc/keepalived/notify.sh backup”
notify_fault “/etc/keepalived/notify.sh fault”

虚拟服务器:
配置参数:
virtual_server IP port |
virtual_server fwmark int 
{

real_server {

}

}

常用参数:
delay_loop <INT>:服务轮询的时间间隔;
lb_algo rr|wrr|lc|wlc|lblc|sh|dh:定义调度方法;
lb_kind NAT|DR|TUN:集群的类型;
persistence_timeout <INT>:持久连接时长;
protocol TCP:服务协议,仅支持TCP;
sorry_server <IPADDR> <PORT>:备用服务器地址;
real_server <IPADDR> <PORT>
{
weight <INT>
notify_up <STRING>|<QUOTED-STRING>
notify_down <STRING>|<QUOTED-STRING>
HTTP_GET|SSL_GET|TCP_CHECK|SMTP_CHECK|MISC_CHECK { … }:定义当前主机的健康状态检测方法;
}

HTTP_GET|SSL_GET:应用层检测

HTTP_GET|SSL_GET {
url {
path <URL_PATH>:定义要监控的URL;
status_code <INT>:判断上述检测机制为健康状态的响应码;
digest <STRING>:判断上述检测机制为健康状态的响应的内容的校验码;
}
nb_get_retry <INT>:重试次数;
delay_before_retry <INT>:重试之前的延迟时长;
connect_ip <IP ADDRESS>:向当前RS的哪个IP地址发起健康状态检测请求
connect_port <PORT>:向当前RS的哪个PORT发起健康状态检测请求
bindto <IP ADDRESS>:发出健康状态检测请求时使用的源地址;
bind_port <PORT>:发出健康状态检测请求时使用的源端口;
connect_timeout <INTEGER>:连接请求的超时时长;
}

TCP_CHECK {
connect_ip <IP ADDRESS>:向当前RS的哪个IP地址发起健康状态检测请求
connect_port <PORT>:向当前RS的哪个PORT发起健康状态检测请求
bindto <IP ADDRESS>:发出健康状态检测请求时使用的源地址;
bind_port <PORT>:发出健康状态检测请求时使用的源端口;
connect_timeout <INTEGER>:连接请求的超时时长;
}

高可用的ipvs集群示例:
! Configuration File for keepalived

global_defs {
notification_email {
root@localhost
}
notification_email_from keepalived@localhost
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id node1
vrrp_mcast_group4 224.0.100.19
}

vrrp_instance VI_1 {
state MASTER
interface eno16777736
virtual_router_id 14
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 571f97b2
}
virtual_ipaddress {
10.1.0.93/16 dev eno16777736
}
notify_master “/etc/keepalived/notify.sh master”
notify_backup “/etc/keepalived/notify.sh backup”
notify_fault “/etc/keepalived/notify.sh fault”
}

virtual_server 10.1.0.93 80 {
delay_loop 3
lb_algo rr
lb_kind DR
protocol TCP

sorry_server 127.0.0.1 80

real_server 10.1.0.69 80 {
weight 1
HTTP_GET {
url {
path /
status_code 200
}
connect_timeout 1
nb_get_retry 3
delay_before_retry 1
}
}
real_server 10.1.0.71 80 {
weight 1
HTTP_GET {
url {
path /
status_code 200
}
connect_timeout 1
nb_get_retry 3
delay_before_retry 1
}
}
}

博客作业:第一部分
双主模式的lvs集群,拓扑、实现过程;

配置示例(一个节点):

! Configuration File for keepalived

global_defs {
notification_email {
root@localhost
}
notification_email_from kaadmin@localhost
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id node1
vrrp_mcast_group4 224.0.100.67
}

vrrp_instance VI_1 {
state MASTER
interface eno16777736
virtual_router_id 44
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass f1bf7fde
}
virtual_ipaddress {
172.16.0.80/16 dev eno16777736 label eno16777736:0
}
track_interface {
eno16777736
}
notify_master “/etc/keepalived/notify.sh master”
notify_backup “/etc/keepalived/notify.sh backup”
notify_fault “/etc/keepalived/notify.sh fault”
}

vrrp_instance VI_2 {
state BACKUP
interface eno16777736
virtual_router_id 45
priority 98
advert_int 1
authentication {
auth_type PASS
auth_pass f2bf7ade
}
virtual_ipaddress {
172.16.0.90/16 dev eno16777736 label eno16777736:1
}
track_interface {
eno16777736
}
notify_master “/etc/keepalived/notify.sh master”
notify_backup “/etc/keepalived/notify.sh backup”
notify_fault “/etc/keepalived/notify.sh fault”
}

virtual_server fwmark 3 {
delay_loop 2
lb_algo rr
lb_kind DR
nat_mask 255.255.0.0
protocol TCP
sorry_server 127.0.0.1 80

real_server 172.16.0.69 80 {
weight 1
HTTP_GET {
url {
path /
status_code 200
}
connect_timeout 2
nb_get_retry 3
delay_before_retry 3
}
}
real_server 172.16.0.6 80 {
weight 1
HTTP_GET {
url {
path /
status_code 200
}
connect_timeout 2
nb_get_retry 3
delay_before_retry 3
}
}
}

keepalived调用外部的辅助脚本进行资源监控,并根据监控的结果状态能实现优先动态调整;
分两步:(1) 先定义一个脚本;(2) 调用此脚本;
vrrp_script <SCRIPT_NAME> {
script “”
interval INT 
weight -INT 
}

track_script {
SCRIPT_NAME_1
SCRIPT_NAME_2

}

示例:高可用nginx服务

! Configuration File for keepalived

global_defs {
notification_email {
root@localhost
}
notification_email_from keepalived@localhost
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id node1
vrrp_mcast_group4 224.0.100.19
}

vrrp_script chk_down {
script “[[ -f /etc/keepalived/down ]] && exit 1 || exit 0”
interval 1
weight -5
}

vrrp_script chk_nginx {
script “killall -0 nginx && exit 0 || exit 1”
interval 1
weight -5
fall 2
rise 1
}

vrrp_instance VI_1 {
state MASTER
interface eno16777736
virtual_router_id 14
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 571f97b2
}
virtual_ipaddress {
10.1.0.93/16 dev eno16777736
}
track_script {
chk_down
chk_nginx
}
notify_master “/etc/keepalived/notify.sh master”
notify_backup “/etc/keepalived/notify.sh backup”
notify_fault “/etc/keepalived/notify.sh fault”
}

博客作业:
(1)双主模型的ipvs高可用集群;
(2)双主模型的nginx proxy高可用集群; 

测试:ipvs使用sh算法或持久连接时,故障切换后,同一个客户端是否依然能关联至此前绑定的RS;
  nginx使用ip_hash或hash $request_uri算法时,故障切换后,同一个客户端是否依然能关联至此前绑定的upstream server;
  

原创文章,作者:shewei,如若转载,请注明出处:http://www.178linux.com/76696

(0)
sheweishewei
上一篇 2017-05-22
下一篇 2017-05-22

相关推荐

  • grep初步认识

    grep初步认识

    Linux干货 2017-12-03
  • 了解sed

    本博客分为四个部分:sed介绍、sed用法、sed高级用法(简略带过)、相关例题。通过本文可以大致了解sed命令。深度可以当成是课前预习吧。 1、sed介绍Stream EDitor, 行编辑器 sed是一种流编辑器,它一次处理一行内容。处理时,把 当前处理的行存储在临时缓冲区中,称为“模式空间”( pattern space),接着用sed命令处理缓冲区中…

    Linux干货 2017-04-25
  • 浅谈技术管理(转载,讲的非常不错,技术和产品都值得一看)

      针对这些年旁观和经历过的技术产品场景,做一些个人的总结和判定,尽量不涉及争议性话题,比如对一个互联网公司而言,技术重要还是产品重要之类的,这种话题一扯开,各有道理,谁也别指望说服谁。     此外,加一个前缀,主要针对非技术领导者所面临的技术管理困境,在很多从传统企业转型或个人站转型的互联网企业里,这个问…

    Linux干货 2015-04-04
  • MySQL 多实例详解

    目录 一、基本概念 1、MySQL多实例        就是在一台机器上面开启多个不同的端口,运行多个MySQL服务进程。这些MySQL多实例公用一套安装程序,使用不同的(也可以是相同的)配置文件,启动程序,数据文件。在提供服务时候,多实例MySQL在逻辑上看来是各自独立的,多个实例的自身是根据配置…

    Linux干货 2015-09-23
  • 马哥教育21期网络班—第六周课程+练习—-成长进行时

    请详细总结vim编辑器的使用并完成以下练习题 1、复制/etc/rc.d/rc.sysinit文件至/tmp目录,将/tmp/rc.sysinit文件中的以至少一个空白字符开头的行的行首加#; [root@localhost ~]# cp /etc/rc.d/rc.sysinit /tmp/rc.sysinit&nbs…

    Linux干货 2016-08-03
  • 计算机的组成部分

    运算器:负责数据的运算和逻辑运算。 存储器:实现记忆功能的部件用来存放计算程序及参与运算的各种数据。 控制器:负责对程序规定的控制信息进行分析,控制并协调输入,输出操作或内存访问。 输入设备:实现计算程序和原始数据的输入。 输出设备:实现计算结果输出。

    Linux干货 2017-08-19