备注:适用于CoroSync 2.3.3+Pacemaker+Rabbitmq集群+Openstack Icehouse环境
当在Openstack环境中创建虚拟机或进行其它操作失败,查看日志错误原因为RabbitMQ连接超时时,可执行以下操作尝试解决问题:
一、确认RabbitMQ状态,连接任意controller节点:
1)执行以下命令查看pacemaker资源状态,确认以下示例中红色标示信息中包含了所有controller节点
[root@node-3 ~](controller)# pcs resource
vip__public (ocf::mirantis:ns_IPaddr2): Started
Clone Set: clone_ping_vip__public [ping_vip__public]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
vip__management (ocf::mirantis:ns_IPaddr2): Started
Clone Set: clone_p_openstack-heat-engine [p_openstack-heat-engine]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
p_openstack-ceilometer-central (ocf::mirantis:ceilometer-agent-central): Started
p_openstack-ceilometer-alarm-evaluator (ocf::mirantis:ceilometer-alarm-evaluator): Started
Clone Set: clone_p_neutron-openvswitch-agent [p_neutron-openvswitch-agent]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started
Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
Clone Set: clone_p_neutron-l3-agent [p_neutron-l3-agent]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
Clone Set: clone_p_mysql [p_mysql]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
Clone Set: clone_p_rabbitmq-server [p_rabbitmq-server]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] Clone Set: clone_p_haproxy [p_haproxy]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
2)如果以上信息正常,分别登陆所有controller节点执行以下命令查看rabbitmq集群状态,如果输出信息为类似以下信息,说明RabbitMQ集群出现问题:
[root@node-1 ~](controller)# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-1' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2']}]},
{running_nodes,['rabbit@node-1','rabbit@node-2']},
{cluster_name,<<"rabbit@node-1.abc.com">>}, {partitions,[]}]
...done.
[root@node-2 ~](controller)# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-2' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2']}]},
{running_nodes,['rabbit@node-1','rabbit@node-2']},
{cluster_name,<<"rabbit@node-1.abc.com">>}, {partitions,[]}]
...done.
Cluster status of node 'rabbit@node-3' ...
[{nodes,[{disc,['rabbit@node-3']}]},
{running_nodes,['rabbit@node-3']},
{cluster_name,<<"rabbit@node-3.abc.com">>}, {partitions,[]}]
...done.
如果看到类似以上信息,说明环境中出现了两个RabbitMQ集群,分别为
{cluster_name,<<"rabbit@node-1.abc.com">>}和
{cluster_name,<<"rabbit@node-3.abc.com">>}或者可以理解为node-3没有加入
{cluster_name,<<"rabbit@node-1.abc.com">>}这个集群中
二、解决问题
通过以上分析,我们只需要将node-3加入{cluster_name,<<"
rabbit@node-1.abc.com">>}这个集群中即可,登陆任意controller节点,执行以下命令
[root@node-3 ~](controller)# pcs resource ban
p_rabbitmq-server node-3.abc.com[root@node-3 ~](controller)# pcs resource clear
p_rabbitmq-server node-3.abc.com三、再次查看状态,确认问题已经解决
[root@node-3 ~](controller)# pcs resource
vip__public (ocf::mirantis:ns_IPaddr2): Started
Clone Set: clone_ping_vip__public [ping_vip__public]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
vip__management (ocf::mirantis:ns_IPaddr2): Started
Clone Set: clone_p_openstack-heat-engine [p_openstack-heat-engine]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
p_openstack-ceilometer-central (ocf::mirantis:ceilometer-agent-central): Started
p_openstack-ceilometer-alarm-evaluator (ocf::mirantis:ceilometer-alarm-evaluator): Started
Clone Set: clone_p_neutron-openvswitch-agent [p_neutron-openvswitch-agent]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started
Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
Clone Set: clone_p_neutron-l3-agent [p_neutron-l3-agent]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
Clone Set: clone_p_mysql [p_mysql]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
Clone Set: clone_p_rabbitmq-server [p_rabbitmq-server]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] Clone Set: clone_p_haproxy [p_haproxy]
Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
[root@node-1 ~](controller)# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-1' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2','rabbit@node-3']}]},
{running_nodes,['rabbit@node-3','rabbit@node-2','rabbit@node-1']},
{cluster_name,<<"rabbit@node-1.abc.com">>}, {partitions,[]}]
...done.
[root@node-2 ~](controller)# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-2' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2','rabbit@node-3']}]},
{running_nodes,['rabbit@node-3','rabbit@node-1','rabbit@node-2']},
{cluster_name,<<"rabbit@node-1.abc.com">>}, {partitions,[]}]
...done.
[root@node-3 ~](controller)# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-3' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2','rabbit@node-3']}]},
{running_nodes,['rabbit@node-1','rabbit@node-2','rabbit@node-3']},
{cluster_name,<<"rabbit@node-1.abc.com">>}, {partitions,[]}]
...done.