一、準備工作
Heartbeat 3.0.6:
1
|
# wget http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/958e11be8686.tar.bz2
|
Cluster Glue 1.0.12:
1
|
# wget http://hg.linux-ha.org/glue/archive/0a7add1d9996.tar.bz2
|
Resource Agents 3.9.6:
1
|
# wget https://github.com/ClusterLabs/resource-agents/archive/v3.9.6.tar.gz
|
1
2
3
4
|
# yum install gcc gcc-c++ autoconf automake libtool glib2-devel libxml2-devel bzip2 bzip2-devel e2fsprogs-devel libxslt-devel libtool-ltdl-devel asciidoc
# groupadd haclient
# useradd -g haclient hacluster
# yum install httpd
|
二、編譯Cluster Glue
1
2
3
4
5
6
|
# tar -jxvf cluster-clue-1.0.12.tar.bz2
# cd Reusable-Cluster-Components-glue--0a7add1d9996/
# ./autogen.sh
# ./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1' ##注:32位系統去掉64
# make
# make install
|
編譯錯誤1:
1
2
3
4
5
|
Making all in libltdl
gmake[1]: 進入目錄「/root/Reusable-Cluster-Components-glue--0a7add1d9996/libltdl」
gmake[1]: *** 沒有規則可以創建目標「all」。 停止。
gmake[1]: 離開目錄「/root/Reusable-Cluster-Components-glue--0a7add1d9996/libltdl」
make: *** [all-recursive] 錯誤 1
|
解決:
1
|
# yum install libtool-ltdl-devel
|
編譯錯誤2:
1
2
3
4
5
6
|
collect2: error: ld returned 1 exit status
gmake[2]: *** [ipctest] 錯誤 1
gmake[2]: 離開目錄「/root/Reusable-Cluster-Components-glue--0a7add1d9996/lib/clplumbing」
gmake[1]: *** [all-recursive] 錯誤 1
gmake[1]: 離開目錄「/root/Reusable-Cluster-Components-glue--0a7add1d9996/lib」
make: *** [all-recursive] 錯誤 1
|
解決:
1
|
# ./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1'
|
注:如使用32位系統時,將LIBS改爲LIBS='/lib/libuuid.so.1'
編譯錯誤3:
1
2
3
4
5
6
|
gmake[2]: a2x:命令未找到
gmake[2]: *** [hb_report.8] 錯誤 127
gmake[2]: 離開目錄「/root/Reusable-Cluster-Components-glue--0a7add1d9996/doc」
gmake[1]: *** [all-recursive] 錯誤 1
gmake[1]: 離開目錄「/root/Reusable-Cluster-Components-glue--0a7add1d9996/doc」
make: *** [all-recursive] 錯誤 1
|
解決:
1
|
# yum install asciidoc
|
三、編譯Resource Agents
1
2
3
4
5
6
|
# tar -zxvf resource-agents-3.9.6.tar.gz
# cd resource-agents-3.9.6
# ./autogen.sh
#./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1'
# make
# make install
|
四、編譯Heartbeat
1
2
3
4
5
6
7
|
# tar -jxvf heartbeat-3.0.6.tar.bz2
# cd Heartbeat-3-0-958e11be8686/
# ./bootstrap
# export CFLAGS="$CFLAGS -I/usr/local/heartbeat/include -L/usr/local/heartbeat/lib"
# ./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1'
# make
# make install
|
1
2
3
4
5
6
7
8
|
# cp doc/{ha.cf,haresources,authkeys} /usr/local/heartbeat/etc/ha.d/
# chkconfig --add heartbeat
# chkconfig heartbeat on
# chmod 600 /usr/local/heartbeat/etc/ha.d/authkeys
# mkdir -pv /usr/local/heartbeat/usr/lib/ocf/lib/heartbeat/
# cp /usr/lib/ocf/lib/heartbeat/ocf-* /usr/local/heartbeat/usr/lib/ocf/lib/heartbeat/
# ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/RAExec/* /usr/local/heartbeat/lib/heartbeat/plugins/RAExec/
# ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/* /usr/local/heartbeat/lib/heartbeat/plugins/
|
編譯錯誤1:
1
2
3
4
5
6
7
8
9
10
11
12
|
checking heartbeat/glue_config.h usability... no
checking heartbeat/glue_config.h presence... no
checking for heartbeat/glue_config.h... no
configure: error: in `/root/Heartbeat-3-0-958e11be8686':
configure: error: Core development headers were not found
See `config.log' for more details
checking heartbeat/glue_config.h usability... no
checking heartbeat/glue_config.h presence... no
checking for heartbeat/glue_config.h... no
configure: error: in `/root/Heartbeat-3-0-958e11be8686':
configure: error: Core development headers were not found
See `config.log' for more details
|
解決:
1
|
# export CFLAGS="$CFLAGS -I/usr/local/heartbeat/include -L/usr/local/heartbeat/lib"
|
編譯錯誤2:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
In file included from ../include/lha_internal.h:41:0,
from uuid_parse.c:25:
/usr/local/heartbeat/include/heartbeat/glue_config.h:105:0: error: "HA_HBCONF_DIR" redefined [-Werror]
#define HA_HBCONF_DIR "/usr/local/heartbeat/etc/ha.d/"
^
In file included from ../include/lha_internal.h:38:0,
from uuid_parse.c:25:
../include/config.h:390:0: note: this is the location of the previous definition
#define HA_HBCONF_DIR "/usr/local/heartbeat/etc/ha.d"
^
uuid_parse.c:36:26: fatal error: replace_uuid.h: No such file or directory
#include <replace_uuid.h>
^
cc1: all warnings being treated as errors
compilation terminated.
gmake[1]: *** [uuid_parse.lo] 錯誤 1
gmake[1]: 離開目錄「/root/Heartbeat-3-0-958e11be8686/replace」
make: *** [all-recursive] 錯誤 1
|
解決:
1
|
# ./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1'
|
五、Heartbeat配置
Heartbeat的配置主要涉及到ha.cf、haresources、authkeys這三個文件。其中ha.cf是主配置文件,haresource用來配置要讓Heartbeat託管的服務,authkey是用來指定Heartbeat的認證方式。
1.配置ha.cf----主配置文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
# cat /usr/local/heartbeat/etc/ha.d/ha.cf |grep ^[^#]
debugfile /var/log/ha-debug ##用於記錄heartbeat的調試信息
logfile/var/log/ha-log ##用於記錄heartbeat的日誌信息
logfacilitylocal0 ##設置heartbeat的日誌,這裏用的是系統日誌
keepalive 2 ##設定心跳(監測)時間時間爲2秒
deadtime 30 ##指定若備用節點在30秒內未收到主節點心跳信號,則接管主服務器資源
warntime 10 ##指定心跳延遲的時間爲10秒,10秒內備節點不能接收主節點心跳信號,
即往日誌寫入警告日誌,但不會切換服務
initdead 120 ##系統啓動或重啓後預留的忽略時間段,取值至少爲deadtime的兩倍
udpport 694 ##廣播/單播通訊使用的Udp端口
bcast eno16777736 # Linux ##使用網卡eno16777736發送心跳檢測
#mcast eth0 225.0.0.1 694 1 0 ##採用網卡eth0的Udp多播來組織心跳,一般在備用節點
不止一臺時使用。Bcast、ucast和mcast分別代表廣播、單播和多播,是組織心跳的的方式,任選其一
#ucast eno16777736 192.168.10.133 ##採用網卡eno16777736的udp單播來組織心跳,後面跟的IP地址爲雙機對方IP地址
auto_failback on ##定義當主節點恢復後,是否將服務自動切回
#watchdog /dev/watchdog ##可選配置,通過Heartbeat監控系統運行狀態。
node node1 ##主節點名稱,與uname -n顯示一致
node node2 ##備用節點名稱
ping 192.168.10.1 ##通過ping網關檢測心跳是否正常,僅用來測試網絡
respawn hacluster /usr/local/heartbeat/libexec/heartbeat/ipfail ##指定和heartbeat一起啓動、關閉的進程,可選
#apiauth ipfail gid=haclient uid=hacluster ##設置啓動IPfail的用戶和組
|
注:
①watchdog /dev/watchdog:可選配置,通過Heartbeat監控系統運行狀態。該特性需在內核中載入"softdog"內核模塊,用來生成實際的設備文件,如系統中沒有該模塊,需進行指定,重新編譯內核。編譯完成輸入 "insmod softdog"加載模塊,然後輸入"grep misc /proc/devices",輸入"cat /proc/misc |grep watchdog",最後生成設備文件:"mknod /dev/watchdog c 10 130" 即可使用
②espawn hacluster /usr/lib/heartbeat/ipfail:指定和heartbeat一起啓動、關閉的進程,可選。這些進程一般是和heartbeat集成的插件,遇到故障可自動重啓。IPfail進程用於檢測和處理網絡故障,需配合ping語句指定ping node檢測網絡連通性;hacluster表示啓動IPfail進程的用戶。
2.配置haresources-----資源文件
Haresources文件用於指定雙機系統的主節點、集羣IP、子網掩碼、廣播地址及啓動服務集羣資源,
文件每一行可包含一個或多個資源腳本名,資源間使用空格隔開,參數間使用兩個冒號隔開,主節點
和備份節點中資源文件haresources要完全一樣。
一般格式爲:
node-name network <resource-group>
node-name表示主節點的主機名,必須和ha.cf文件中指定的節點名一致。network用於設定集羣的
IP地址、子網掩碼和網絡設備標識等。resource-group用於指定需Heartbeat託管的服務(即這些
服務可由Heartbeat來啓動和關閉)。
注意:這裏指定的IP地址就是集羣對外服務的IP地址;
如要託管這些服務,必須將服務寫成可通過start/stop來啓動或關閉的腳本,放到/etc/init.d/
或/etc/ha.d/resource.d/目錄下,Heartbeat會根據腳本名稱自動去/etc/init.d或者
/etc/ha.d/resource.d目錄下找到相應腳本進行啓動或關閉操作。
1
2
|
# cat /usr/local/heartbeat/etc/ha.d/haresources |grep -v "#"
node1 IPaddr::192.168.10.222/24/eno16777736
|
node1是HA集羣的主節點,IPaddr爲heartbeat自帶的執行腳本,heartbeat首先將執行/etc/ha.d/resource.d/IPaddr 192.168.10.222/24 start的操作,即虛擬一個子網掩碼爲255.255.255.0,IP爲192.168.10.222的地址,此IP爲heartbeat對外提供服務的網絡地址,同時指定此IP使用的網絡接口
注:如下有haresources詳細中文解釋
http://blog.chinaunix.net/uid-20788470-id-1841644.html
3.配置authkeys-----心跳密鑰驗證文件
1
2
3
|
# grep -v "#" /usr/local/heartbeat/etc/ha.d/authkeys
auth 2
2 sha1 HI!
|
注:auth後填序號,可任意填寫,但第二行開頭必須爲序號名,然後爲驗證方式,支持三種( crc md5 sha1 )方式驗證,最後面是自定義密鑰。
六、配置雙機互信(可選)並複製文件至備機
HA-01(192.168.10.132):
1
2
|
ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
ssh-copy-id -i .ssh/id_rsa.pub
[email protected]
|
HA-02(192.168.10.133):
1
2
|
ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
ssh-copy-id -i .ssh/id_rsa.pub
[email protected]
|
複製配置文件至備機:
1
|
# scp /usr/local/heartbeat/etc/ha.d/*
[email protected]:/usr/local/heartbeat/etc/ha.d/
|
七、測試
1
2
3
4
|
# systemctl start httpd
# /etc/init.d/heartbeat start ##開啓heartbeat
# getenforce 0
# systemctl stop firewalld
|
查看log信息
1
2
3
4
5
6
|
# tail /var/log/ha-log
Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Illegal directive [ucast] in /usr/local/heartbeat/etc/ha.d//ha.cf
Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Illegal directive [ping] in /usr/local/heartbeat/etc/ha.d//ha.cf
Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Client child command [/usr/lib/heartbeat/ipfail] is not executable
Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Heartbeat not started: configuration error.
Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Configuration error, heartbeat not started.
|
問題解決:
更改IPfail路徑:
1
|
respawn hacluster /usr/local/heartbeat/libexec/heartbeat/ipfail
|
建立plugin軟鏈接:
1
2
|
# ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/RAExec/* /usr/local/heartbeat/lib/heartbeat/plugins/RAExec/
# ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/* /usr/local/heartbeat/lib/heartbeat/plugins/
|
繼續查看log信息
1
2
3
4
5
6
7
8
9
10
11
|
# tail /var/log/ha-log
Oct 26 13:11:46 node1 heartbeat: [9744]: info: remote resource transition completed.
Oct 26 13:11:46 node1 heartbeat: [9744]: info: node1 wants to go standby [foreign]
Oct 26 13:11:46 node1 heartbeat: [9744]: info: standby: node2 can take our foreign resources
Oct 26 13:11:46 node1 heartbeat: [11892]: info: give up foreign HA resources (standby).
Oct 26 13:11:46 node1 heartbeat: [11892]: info: foreign HA resource release completed (standby).
Oct 26 13:11:46 node1 heartbeat: [9744]: info: Local standby process completed [foreign].
Oct 26 13:11:47 node1 heartbeat: [9744]: WARN: 1 lost packet(s) for [node2] [11:13]
Oct 26 13:11:47 node1 heartbeat: [9744]: info: remote resource transition completed.
Oct 26 13:11:47 node1 heartbeat: [9744]: info: No pkts missing from node2!
Oct 26 13:11:47 node1 heartbeat: [9744]: info: Other node completed standby takeover of foreign resources.
|
問題解決:
1
2
|
# vi /usr/local/heartbeat/etc/ha.d/haresources
node1 IPaddr::192.168.10.222/24/eno16777736
|
注:haresources下需添加IPaddr::
問題:
1
2
3
4
5
6
7
8
9
10
11
|
# tail /var/log/ha-log
Oct 26 17:01:55 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (425 messages in queue)
Oct 26 17:01:56 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (426 messages in queue)
Oct 26 17:01:57 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (427 messages in queue)
Oct 26 17:01:57 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (428 messages in queue)
Oct 26 17:01:58 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (429 messages in queue)
Oct 26 17:01:59 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (430 messages in queue)
Oct 26 17:01:59 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (431 messages in queue)
Oct 26 17:02:00 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (432 messages in queue)
Oct 26 17:02:01 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (433 messages in queue)
Oct 26 17:02:01 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (434 messages in queue)
|
解決:node2未關閉防火牆,systemctl stop firewalld關閉防火牆問題解決
問題:
1
2
3
|
# tail /var/log/ha-log
IPaddr(IPaddr_192.168.10.222)[6854]:2015/10/26_17:20:58 ERROR: Setup problem: couldn't find command: ifconfig
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.10.222)[6828]:2015/10/26_17:20:58 ERROR: Program is not installed
|
解決:yum install net-tools後即可使用ifconfig命令
重啓heartbeat,繼續查看log信息:
1
2
3
4
5
6
7
8
9
10
11
12
|
# systemctl restart hearbeat
# tail /var/log/ha-log
Oct 26 19:25:36 node1 heartbeat: [1783]: info: Configuration validated. Starting heartbeat 3.0.6
Oct 26 19:25:37 node1 heartbeat: [1783]: info: heartbeat: version 3.0.6
Oct 26 19:25:37 node1 heartbeat: [1783]: info: Heartbeat generation: 1445827146
Oct 26 19:25:37 node1 heartbeat: [1783]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eno16777736
Oct 26 19:25:37 node1 heartbeat: [1783]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eno16777736 - Status: 1
Oct 26 19:25:37 node1 heartbeat: [1783]: info: glib: ping heartbeat started.
Oct 26 19:25:37 node1 heartbeat: [1783]: info: Local status now set to: 'up'
Oct 26 19:25:37 node1 heartbeat: [1783]: info: Link 192.168.10.1:192.168.10.1 up.
Oct 26 19:25:37 node1 heartbeat: [1783]: info: Status update for node 192.168.10.1: status ping
Oct 26 19:25:37 node1 heartbeat: [1783]: info: Link node1:eno16777736 up.
|
使用ifconfig命令查看
瀏覽器輸入http://localhost查看
down掉node1節點,查看會不會漂移至node2節點
node1:
1
|
# systemctl stop heartbeat
|
node2:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
# tail /var/log/ha-log
mach_down(default)[1937]:2015/10/26_20:03:58 info: Taking over resource group IPaddr::192.168.10.222/24/eno16777736
ResourceManager(default)[1964]:2015/10/26_20:03:58 info: Acquiring resource group: node1 IPaddr::192.168.10.222/24/eno16777736
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.10.222)[1992]:2015/10/26_20:03:58 INFO: Resource is stopped
ResourceManager(default)[1964]:2015/10/26_20:03:58 info: Running /usr/local/heartbeat/etc/ha.d//resource.d/IPaddr 192.168.10.222/24/eno16777736 start
IPaddr(IPaddr_192.168.10.222)[2083]:2015/10/26_20:03:58 INFO: Using calculated netmask for 192.168.10.222: 255.255.255.0
IPaddr(IPaddr_192.168.10.222)[2083]:2015/10/26_20:03:58 INFO: eval ifconfig eno16777736:0 192.168.10.222 netmask 255.255.255.0 broadcast 192.168.10.255
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.10.222)[2057]:2015/10/26_20:03:58 INFO: Success
mach_down(default)[1937]:2015/10/26_20:03:58 info: /usr/local/heartbeat/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[1937]:2015/10/26_20:03:58 info: mach_down takeover complete for node node1.
Oct 26 20:03:58 node2 heartbeat: [1711]: info: mach_down takeover complete.
mach_down(default)[1937]:2015/10/26_20:03:58 info: mach_down takeover complete for node node1.
Oct 26 20:03:58 node2 heartbeat: [1711]: info: mach_down takeover complete.
Oct 26 20:04:29 node2 heartbeat: [1711]: WARN: node node1: is dead
Oct 26 20:04:29 node2 heartbeat: [1711]: info: Dead node node1 gave up resources.
Oct 26 20:04:29 node2 heartbeat: [1711]: info: Link node1:eno16777736 dead.
Oct 26 20:04:29 node2 ipfail: [1737]: info: Status update: Node node1 now has status dead
Oct 26 20:04:29 node2 ipfail: [1737]: info: NS: We are still alive!
Oct 26 20:04:29 node2 ipfail: [1737]: info: Link Status update: Link node1/eno16777736 now has status dead
Oct 26 20:04:30 node2 ipfail: [1737]: info: Asking other side for ping node count.
Oct 26 20:04:30 node2 ipfail: [1737]: info: Checking remote count of ping nodes.
|
使用ifconfig命令查看IP是否漂移至node2:
IP已漂移至node2,使用瀏覽器輸入http://localhost查看
OK啦!
附:heartbeat官網:
http://www.linux-ha.org/wiki/Main_Page
本文出自 「記事本」 博客,請務必保留此出處http://wangzhijian.blog.51cto.com/6427016/1708694