· 7 years ago · Nov 22, 2018, 05:26 PM
1>>> Setting up STOP handlers...
2
3
4>>> STARTING SSH (if required)...
5
6
7>>> TUNING UP SSH CLIENT...
8
9
10> STARTING SSH SERVER...
11
12
13>>> STARTING POSTGRES...
14
15
16>>> SETTING UP POLYMORPHIC VARIABLES (repmgr=3+postgres=9 | repmgr=4, postgres=10)...
17
18
19>>> TUNING UP POSTGRES...
20
21
22>>> Cleaning data folder which might have some garbage...
23
24
25>>> Check all partner nodes for common upstream node...
26
27
28>>>>>> Checking NODE=pgmaster...
29
30
31psql: could not connect to server: Connection refused
32
33
34 Is the server running on host "pgmaster" (10.0.0.3) and accepting
35
36
37 TCP/IP connections on port 5432?
38
39
40>>>>>> Skipping: failed to get master from the node!
41
42
43>>> Auto-detected master name: ''
44
45
46>>> Setting up repmgr...
47
48
49>>> Setting up repmgr config file '/etc/repmgr.conf'...
50
51
52>>> Setting up upstream node...
53
54
55cat: /var/lib/postgresql/data/standby.lock: No such file or directory
56
57
58>>> Previously Locked standby upstream node LOCKED_STANDBY=''
59
60
61>>> Waiting for upstream postgres server...
62
63
64>>> Wait schema replication_db.repmgr_pg_cluster on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
65
66
67psql: could not connect to server: Connection refused
68
69
70 Is the server running on host "pgmaster" (10.0.0.3) and accepting
71
72
73 TCP/IP connections on port 5432?
74
75
76>>>>>> Host pgmaster:5432 is not accessible (will try 30 times more)
77
78
79>>>>>> Schema replication_db.repmgr_pg_cluster is still not accessible on host pgmaster:5432 (will try 29 times more)
80
81
82>>>>>> Schema replication_db.repmgr_pg_cluster is still not accessible on host pgmaster:5432 (will try 28 times more)
83
84
85>>>>>> Schema replication_db.repmgr_pg_cluster exists on host pgmaster:5432!
86
87
88>>> REPLICATION_UPSTREAM_NODE_ID=1
89
90
91>>> Sending in background postgres start...
92
93
94>>> Waiting for upstream postgres server...
95
96
97>>> Wait schema replication_db.repmgr_pg_cluster on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
98
99
100>>>>>> Schema replication_db.repmgr_pg_cluster exists on host pgmaster:5432!
101
102
103>>> Starting standby node...
104
105
106>>> Instance hasn't been set up yet.
107
108
109>>> Clonning primary node...
110
111
112>>> Waiting for upstream postgres server...
113
114
115>>> Wait schema replication_db.repmgr_pg_cluster on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
116
117
118NOTICE: destination directory '/var/lib/postgresql/data' provided
119
120
121INFO: connecting to upstream node
122
123
124INFO: Successfully connected to upstream node. Current installation size is 37 MB
125
126
127INFO: checking and correcting permissions on existing directory /var/lib/postgresql/data ...
128
129
130>>>>>> Schema replication_db.repmgr_pg_cluster exists on host pgmaster:5432!
131
132
133>>> Waiting for cloning on this node is over(if any in progress): CLEAN_UP_ON_FAIL=, INTERVAL=30
134
135
136>>> Replicated: 4
137
138
139NOTICE: starting backup (using pg_basebackup)...
140
141
142INFO: executing: '/usr/lib/postgresql/10/bin/pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/data -h pgmaster -p 5432 -U replication_user -c fast -X stream -S repmgr_slot_424 '
143
144
145NOTICE: standby clone (using pg_basebackup) complete
146
147
148NOTICE: you can now start your PostgreSQL server
149
150
151HINT: for example : pg_ctl -D /var/lib/postgresql/data start
152
153
154HINT: After starting the server, you need to register this standby with "repmgr standby register"
155
156
157[REPMGR EVENT] Node id: 424; Event type: standby_clone; Success [1|0]: 1; Time: 2018-11-22 16:29:34.355397+00; Details: Cloned from host 'pgmaster', port 5432; backup method: pg_basebackup; --force: Y
158
159
160>>> Configuring /var/lib/postgresql/data/postgresql.conf
161
162
163>>>>>> Will add configs to the exists file
164
165
166>>>>>> Adding config 'max_replication_slots'='10'
167
168
169>>>>>> Adding config 'shared_preload_libraries'=''repmgr_funcs''
170
171
172>>> Starting postgres...
173
174
175>>> Waiting for local postgres server recovery if any in progress:LAUNCH_RECOVERY_CHECK_INTERVAL=30
176
177
178>>> Recovery is in progress:
179
180
1812018-11-22 16:29:34.741 UTC [156] LOG: listening on IPv4 address "0.0.0.0", port 5432
182
183
1842018-11-22 16:29:34.741 UTC [156] LOG: listening on IPv6 address "::", port 5432
185
186
1872018-11-22 16:29:34.808 UTC [156] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
188
189
1902018-11-22 16:29:34.931 UTC [165] LOG: database system was interrupted; last known up at 2018-11-22 16:29:32 UTC
191
192
1932018-11-22 16:29:35.282 UTC [165] LOG: entering standby mode
194
195
1962018-11-22 16:29:35.359 UTC [165] LOG: redo starts at 0/2000028
197
198
1992018-11-22 16:29:35.410 UTC [165] LOG: consistent recovery state reached at 0/20000F8
200
201
2022018-11-22 16:29:35.411 UTC [156] LOG: database system is ready to accept read only connections
203
204
2052018-11-22 16:29:35.450 UTC [169] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
206
207
208>>> Cloning is done
209
210
211>>>>>> WAL id: 000000010000000000000003
212
213
214>>>>>> WAL_RECEIVER_FLAG=1!
215
216
217>>> Not in recovery state (anymore)
218
219
220>>> Waiting for local postgres server start...
221
222
223>>> Wait schema replication_db.public on aaa5ca7e285c:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90)
224
225
226>>>>>> Schema replication_db.public exists on host aaa5ca7e285c:5432!
227
228
229>>> Unregister the node if it was done before
230
231
232DELETE 0
233
234
235>>> Registering node with role standby
236
237
238INFO: connecting to standby database
239
240
241INFO: connecting to master database
242
243
244INFO: retrieving node list for cluster 'pg_cluster'
245
246
247INFO: registering the standby
248
249
250[REPMGR EVENT] Node id: 424; Event type: standby_register; Success [1|0]: 1; Time: 2018-11-22 16:30:05.013207+00; Details:
251
252
253INFO: standby registration complete
254
255
256NOTICE: standby node correctly registered for cluster pg_cluster with id 424 (conninfo: user=replication_user password=replication_pass host=aaa5ca7e285c dbname=replication_db port=5432 connect_timeout=2)
257
258
259 Locking standby (NEW_UPSTREAM_NODE_ID=1)...
260
261
262>>> Starting repmgr daemon...
263
264
265[2018-11-22 16:30:05] [NOTICE] looking for configuration file in current directory
266
267
268[2018-11-22 16:30:05] [NOTICE] looking for configuration file in /etc
269
270
271[2018-11-22 16:30:05] [NOTICE] configuration file found at: /etc/repmgr.conf
272
273
274[2018-11-22 16:30:05] [INFO] connecting to database 'user=replication_user password=replication_pass host=aaa5ca7e285c dbname=replication_db port=5432 connect_timeout=2'
275
276
277[2018-11-22 16:30:05] [INFO] connected to database, checking its state
278
279
280[2018-11-22 16:30:05] [INFO] connecting to master node of cluster 'pg_cluster'
281
282
283[2018-11-22 16:30:05] [INFO] retrieving node list for cluster 'pg_cluster'
284
285
286[2018-11-22 16:30:05] [INFO] checking role of cluster node '1'
287
288
289[2018-11-22 16:30:05] [INFO] checking cluster configuration with schema 'repmgr_pg_cluster'
290
291
292[2018-11-22 16:30:05] [INFO] checking node 424 in cluster 'pg_cluster'
293
294
295[2018-11-22 16:30:05] [INFO] reloading configuration file
296
297
298[2018-11-22 16:30:05] [INFO] configuration has not changed
299
300
301[2018-11-22 16:30:05] [INFO] starting continuous standby node monitoring
302
303
3042018-11-22 16:37:19.236 UTC [415] LOG: incomplete startup packet
305
306
3072018-11-22 17:16:43.597 UTC [169] FATAL: could not receive data from WAL stream: server closed the connection unexpectedly
308
309
310 This probably means the server terminated abnormally
311
312
313 before or while processing the request.
314
315
3162018-11-22 17:16:43.597 UTC [165] LOG: invalid record length at 0/3004230: wanted 24, got 0
317
318
3192018-11-22 17:16:43.622 UTC [1555] FATAL: could not connect to the primary server: server closed the connection unexpectedly
320
321
322 This probably means the server terminated abnormally
323
324
325 before or while processing the request.
326
327
328[2018-11-22 17:16:45] [ERROR] connection to database failed: could not translate host name "pgmaster" to address: Temporary failure in name resolution
329
330
331
332
333[2018-11-22 17:16:45] [ERROR] unable to connect to upstream node: could not translate host name "pgmaster" to address: Temporary failure in name resolution
334
335
336
337
338[2018-11-22 17:16:45] [ERROR] connection to database failed: could not translate host name "pgmaster" to address: Temporary failure in name resolution
339
340
341
342
343[2018-11-22 17:16:45] [WARNING] connection to master has been lost, trying to recover... 15 seconds before failover decision
344
345
3462018-11-22 17:16:48.617 UTC [1557] FATAL: could not connect to the primary server: could not translate host name "pgmaster" to address: Temporary failure in name resolution
347
348
349[2018-11-22 17:16:50] [WARNING] connection to master has been lost, trying to recover... 10 seconds before failover decision
350
351
3522018-11-22 17:16:53.614 UTC [1560] FATAL: could not connect to the primary server: could not translate host name "pgmaster" to address: Temporary failure in name resolution
353
354
355[2018-11-22 17:16:55] [WARNING] connection to master has been lost, trying to recover... 5 seconds before failover decision
356
357
3582018-11-22 17:16:58.627 UTC [1571] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
359
360
361[2018-11-22 17:17:00] [ERROR] unable to reconnect to master (timeout 60 seconds)...
362
363
364[2018-11-22 17:17:05] [NOTICE] this node is the best candidate to be the new master, promoting...
365
366
367NOTICE: looking for configuration file in current directory
368
369
370NOTICE: looking for configuration file in /etc
371
372
373NOTICE: configuration file found at: /etc/repmgr.conf
374
375
376DEBUG: slot name initialised as: repmgr_slot_424
377
378
379INFO: connecting to standby database
380
381
382DEBUG: connecting to: 'user=replication_user password=replication_pass host=aaa5ca7e285c dbname=replication_db port=5432 connect_timeout=2 fallback_application_name='repmgr''
383
384
385DEBUG: set_config():
386
387
388SET synchronous_commit TO 'local'
389
390
391INFO: connected to standby, checking its state
392
393
394DEBUG: is_standby(): SELECT pg_catalog.pg_is_in_recovery()
395
396
397INFO: retrieving node list for cluster 'pg_cluster'
398
399
400DEBUG: get_master_connection():
401
402
403 SELECT id, conninfo, CASE WHEN type = 'master' THEN 1 ELSE 2 END AS type_priority FROM "repmgr_pg_cluster".repl_nodes WHERE cluster = 'pg_cluster' AND type != 'witness' ORDER BY active DESC, type_priority, priority, id
404
405
406INFO: checking role of cluster node '1'
407
408
409DEBUG: connecting to: 'user=replication_user password=replication_pass host=pgmaster dbname=replication_db port=5432 connect_timeout=2 fallback_application_name='repmgr''
410
411
412DEBUG: set_config():
413
414
415SET synchronous_commit TO 'local'
416
417
418DEBUG: is_standby(): SELECT pg_catalog.pg_is_in_recovery()
419
420
421DEBUG: get_master_connection(): current master node is 1
422
423
424ERROR: this cluster already has an active master server
425
426
427[2018-11-22 17:17:05] [INFO] retrieving node list for cluster 'pg_cluster'
428
429
430[2018-11-22 17:17:05] [INFO] checking role of cluster node '1'
431
432
433[2018-11-22 17:17:05] [NOTICE] Original master reappeared before this standby was promoted - no action taken
434
435
436[2018-11-22 17:17:07] [INFO] connecting to master node of cluster 'pg_cluster'
437
438
439[2018-11-22 17:17:07] [INFO] retrieving node list for cluster 'pg_cluster'
440
441
442[2018-11-22 17:17:07] [INFO] checking role of cluster node '1'
443
444
445[2018-11-22 17:17:07] [INFO] checking cluster configuration with schema 'repmgr_pg_cluster'
446
447
448[2018-11-22 17:17:07] [INFO] checking node 424 in cluster 'pg_cluster'
449
450
451[2018-11-22 17:17:07] [INFO] reloading configuration file
452
453
454[2018-11-22 17:17:07] [INFO] configuration has not changed
455
456
457[2018-11-22 17:17:07] [INFO] starting continuous standby node monitoring
458
459
4602018-11-22 17:17:25.804 UTC [1569] ERROR: cannot execute CREATE TABLE in a read-only transaction
461
462
4632018-11-22 17:17:25.804 UTC [1569] STATEMENT: create table test (id int, name varchar(324));
464
465
4662018-11-22 17:18:15.745 UTC [1618] ERROR: cannot execute CREATE TABLE in a read-only transaction
467
468
4692018-11-22 17:18:15.745 UTC [1618] STATEMENT: create table test (id int, name varchar(234));
470
471
4722018-11-22 17:19:51.205 UTC [1571] FATAL: could not receive data from WAL stream: server closed the connection unexpectedly
473
474
475 This probably means the server terminated abnormally
476
477
478 before or while processing the request.
479
480
4812018-11-22 17:19:51.205 UTC [165] LOG: invalid record length at 0/3019460: wanted 24, got 0
482
483
4842018-11-22 17:19:51.211 UTC [1673] FATAL: could not connect to the primary server: could not connect to server: Connection refused
485
486
487 Is the server running on host "pgmaster" (10.0.0.3) and accepting
488
489
490 TCP/IP connections on port 5432?
491
492
493[2018-11-22 17:19:51] [ERROR] connection to database failed: could not connect to server: Connection refused
494
495
496 Is the server running on host "pgmaster" (10.0.0.3) and accepting
497
498
499 TCP/IP connections on port 5432?
500
501
502
503
504[2018-11-22 17:19:51] [ERROR] unable to connect to upstream node: could not connect to server: Connection refused
505
506
507 Is the server running on host "pgmaster" (10.0.0.3) and accepting
508
509
510 TCP/IP connections on port 5432?
511
512
513
514
515[2018-11-22 17:19:51] [ERROR] connection to database failed: could not connect to server: Connection refused
516
517
518 Is the server running on host "pgmaster" (10.0.0.3) and accepting
519
520
521 TCP/IP connections on port 5432?
522
523
524
525
526[2018-11-22 17:19:51] [WARNING] connection to master has been lost, trying to recover... 15 seconds before failover decision
527
528
5292018-11-22 17:19:56.220 UTC [1675] FATAL: could not connect to the primary server: could not translate host name "pgmaster" to address: Temporary failure in name resolution
530
531
532[2018-11-22 17:19:56] [WARNING] connection to master has been lost, trying to recover... 10 seconds before failover decision
533
534
5352018-11-22 17:20:01.221 UTC [1676] FATAL: could not connect to the primary server: could not translate host name "pgmaster" to address: Temporary failure in name resolution
536
537
538[2018-11-22 17:20:01] [WARNING] connection to master has been lost, trying to recover... 5 seconds before failover decision
539
540
5412018-11-22 17:20:06.231 UTC [1678] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
542
543
544[2018-11-22 17:20:06] [ERROR] unable to reconnect to master (timeout 60 seconds)...
545
546
547[2018-11-22 17:20:11] [NOTICE] this node is the best candidate to be the new master, promoting...
548
549
550NOTICE: looking for configuration file in current directory
551
552
553NOTICE: looking for configuration file in /etc
554
555
556NOTICE: configuration file found at: /etc/repmgr.conf
557
558
559DEBUG: slot name initialised as: repmgr_slot_424
560
561
562INFO: connecting to standby database
563
564
565DEBUG: connecting to: 'user=replication_user password=replication_pass host=aaa5ca7e285c dbname=replication_db port=5432 connect_timeout=2 fallback_application_name='repmgr''
566
567
568DEBUG: set_config():
569
570
571SET synchronous_commit TO 'local'
572
573
574INFO: connected to standby, checking its state
575
576
577DEBUG: is_standby(): SELECT pg_catalog.pg_is_in_recovery()
578
579
580INFO: retrieving node list for cluster 'pg_cluster'
581
582
583DEBUG: get_master_connection():
584
585
586 SELECT id, conninfo, CASE WHEN type = 'master' THEN 1 ELSE 2 END AS type_priority FROM "repmgr_pg_cluster".repl_nodes WHERE cluster = 'pg_cluster' AND type != 'witness' ORDER BY active DESC, type_priority, priority, id
587
588
589INFO: checking role of cluster node '1'
590
591
592DEBUG: connecting to: 'user=replication_user password=replication_pass host=pgmaster dbname=replication_db port=5432 connect_timeout=2 fallback_application_name='repmgr''
593
594
595DEBUG: set_config():
596
597
598SET synchronous_commit TO 'local'
599
600
601DEBUG: is_standby(): SELECT pg_catalog.pg_is_in_recovery()
602
603
604DEBUG: get_master_connection(): current master node is 1
605
606
607ERROR: this cluster already has an active master server
608
609
610[2018-11-22 17:20:11] [INFO] retrieving node list for cluster 'pg_cluster'
611
612
613[2018-11-22 17:20:11] [INFO] checking role of cluster node '1'
614
615
616[2018-11-22 17:20:11] [NOTICE] Original master reappeared before this standby was promoted - no action taken
617
618
619[2018-11-22 17:20:13] [INFO] connecting to master node of cluster 'pg_cluster'
620
621
622[2018-11-22 17:20:13] [INFO] retrieving node list for cluster 'pg_cluster'
623
624
625[2018-11-22 17:20:13] [INFO] checking role of cluster node '1'
626
627
628[2018-11-22 17:20:13] [INFO] checking cluster configuration with schema 'repmgr_pg_cluster'
629
630
631[2018-11-22 17:20:13] [INFO] checking node 424 in cluster 'pg_cluster'
632
633
634[2018-11-22 17:20:13] [INFO] reloading configuration file
635
636
637[2018-11-22 17:20:13] [INFO] configuration has not changed
638
639
640[2018-11-22 17:20:13] [INFO] starting continuous standby node monitoring