所以我正在用一些本地虚拟机测试一些玩具postgresql基础结构,以确定pgpool在故障转移时的行为。我配置了一个基本的设置,其中有两台数据库机器(192.168.0.2和192.168.0.3)和一台pgpool机器(192.168.0.4)。已使用流复制将192.168.0.3设置为192.168.0.2的从属设备。 pgpool-ii已经使用以下配置:主/从模式下的pgpool-ii:我如何最容易触发故障切换?
listen_addresses = '*'
backend_hostname0 = '192.168.0.2'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/9.4/main/'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '192.168.0.3'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/9.4/main/'
backend_flag1 = 'ALLOW_TO_FAILOVER'
enable_pool_hba = on
replication_mode = false
master_slave_mode = on
master_slave_sub_mode = 'stream'
fail_over_on_backend_error = true
failover_command = '/root/pgpool_failover_stream.sh %d %H /tmp/postgresql.trigger.5432'
load_balance_mode = false
我已经证实了这一切的作品。也就是说,当我更改master数据库时,复制工作正常,我可以通过示例应用程序连接到master,slave和pgpool-ii,并获得我期望的结果。
现在,我已经开始了一个连接到pgpool的长时间运行的应用程序,然后尝试通过SSH进入主数据库服务器并强制结束postgres任务(以root用户身份登录service postgresql stop
)进行故障转移。我的应用程序保持正确执行查询,但不发生故障转移(脚本尚未运行)。我甚至测试过直接连接到master数据库,当我停止postgres服务时,我最终崩溃了应用程序。
我做错了什么?我没有正确配置我的pgpool吗?还是有更好的方法来触发故障转移?
编辑:按照要求,这里是哪里出现的第一个错误日志的部分:
...
2016-03-15 18:47:15: pid 1232: DEBUG: initializing backend status
2016-03-15 18:47:15: pid 1231: DEBUG: initializing backend status
2016-03-15 18:47:15: pid 1230: DEBUG: initializing backend status
2016-03-15 18:47:15: pid 1209: ERROR: failed to authenticate
2016-03-15 18:47:15: pid 1209: DETAIL: invalid authentication message response type, Expecting 'R' and received 'E'
2016-03-15 18:47:15: pid 1209: LOG: find_primary_node: checking backend no 1
2016-03-15 18:47:15: pid 1209: ERROR: failed to authenticate
2016-03-15 18:47:15: pid 1209: DETAIL: invalid authentication message response type, Expecting 'R' and received 'E'
2016-03-15 18:47:15: pid 1209: DEBUG: find_primary_node: no primary node found
...
奇怪的是,我仍然可以连接到pgpool和执行查询,所以我显然不明白的东西那里。
编辑2:这些是我在主人的service postgresql shutdown
后得到的错误。我显示了一切,开始关闭pgpool。
...
2016-03-16 17:24:57: pid 1012: DEBUG: session context: clearing doing extended query messaging. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: session context: setting doing extended query messaging. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: session context: setting query in progress. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: reading backend data packet kind
2016-03-16 17:24:57: pid 1012: DETAIL: backend:0 of 2 kind = 'E'
2016-03-16 17:24:57: pid 1012: DEBUG: processing backend response
2016-03-16 17:24:57: pid 1012: DETAIL: received kind 'E'(45) from backend
2016-03-16 17:24:57: pid 1012: ERROR: unable to forward message to frontend
2016-03-16 17:24:57: pid 1012: DETAIL: FATAL error occured on backend
2016-03-16 17:24:57: pid 1012: DEBUG: session context: setting query in progress. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: decide where to send the queries
2016-03-16 17:24:57: pid 1012: DETAIL: destination = 3 for query= "DISCARD ALL"
2016-03-16 17:24:57: pid 1012: DEBUG: waiting for query response
2016-03-16 17:24:57: pid 1012: DETAIL: waiting for backend:0 to complete the query
2016-03-16 17:24:57: pid 1012: FATAL: unable to read data from DB node 0
2016-03-16 17:24:57: pid 1012: DETAIL: EOF encountered with backend
2016-03-16 17:24:57: pid 998: DEBUG: reaper handler
2016-03-16 17:24:57: pid 998: LOG: child process with pid: 1012 exits with status 256
2016-03-16 17:24:57: pid 998: LOG: fork a new child process with pid: 1033
2016-03-16 17:24:57: pid 998: DEBUG: reaper handler: exiting normally
2016-03-16 17:24:57: pid 1033: DEBUG: initializing backend status
2016-03-16 17:25:02: pid 1031: DEBUG: PCP child receives shutdown request signal 2
2016-03-16 17:25:02: pid 1029: LOG: child process received shutdown request signal 2
...
请注意,我的示例应用程序事实上在主站关闭时死亡。
编辑3:错误我得到在新的日志,经过合理设置sr_check_period
,sr_check_user
,sr_check_password
,所有先前的错误,现在都没有了:
2016-03-31 17:45:00: pid 18363: DEBUG: detect error: kind: 1
2016-03-31 17:45:00: pid 18363: DEBUG: reading backend data packet kind
2016-03-31 17:45:00: pid 18363: DETAIL: backend:0 of 2 kind = '1'
...
2016-03-31 17:45:00: pid 18363: DEBUG: detect error: kind: S
嗨Raveesh,谢谢你的回复!我已启用日志记录,甚至在启动时我已经注意到一些错误似乎可能是相关的。我编辑了我的问题以包含必要的信息。 – gdoug
您可以给出关闭主设备后发生的日志。我认为这些日志没有指出“为什么故障转移不执行脚本”的真正问题 –
再次请求更新日志信息。 – gdoug