mongodb 复制集随机同步数据
记一次MongoDB主从切换,重新同步数据。
副本集架构:
PRIMARY> rs.conf()<br />
{<br />
“_id” : “poptask”,<br />
“version” : 4,<br />
“members” : [<br />
{<br />
“_id” : 0,<br />
“host” : “10.0.0.105:20011″<br />
},<br />
{<br />
“_id” : 1,<br />
“host” : “10.0.0.106:20011″<br />
},<br />
{<br />
“_id” : 2,<br />
“host” : “10.0.0.107:20011″<br />
},<br />
{<br />
“_id” : 3,<br />
“host” : “10.0.0.107:20012”,<br />
“arbiterOnly” : true<br />
},<br />
{<br />
“_id” : 4,<br />
“host” : “10.0.0.106:20012”,<br />
“arbiterOnly” : true<br />
},<br />
{<br />
“_id” : 5,<br />
“host” : “10.0.0.105:20012”,<br />
“arbiterOnly” : true<br />
}<br />
]<br />
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
)
{
,
,
[
{
,
,
{
,
,
{
,
,
{
,
,
true
,
{
,
,
true
,
{
,
,
true
}
]
}
|
将主节点从105切换到106,
步骤:
107执行 rs.freeze(),
105执行 rs.stepdown()
切换后的环境:
10.0.0.105(secondary)
10.0.0.106(primary)
10.0.0.107 (secondary)
18号晚上开始切换,105的数据重新同步,数据较快同步完了,昨天在建索引,
今天重新同步107的数据,发现了一点小意外,算是有惊无险。
关掉107的mongod进程后,数据开始同步,此时发现105 down了,105的数据自动删除,然后重新开始同步。
此时停掉105,让107同步完。
经过分析,找到了原因:
1.105上的数据并非是看起来同步完了,关掉107的时刻,105还在做同步;
2.105是从107(从节点)上做的同步,并非是从106主节点同步数据;
3.107关掉时候,105发现连接不到107了,就自己删除已经同步的所有数据,然后从106主节点开始同步数据。
总结:mongodb同步有了新的认识,并非都是从主节点同步数据;即使做切换这样一个简单的操作的时候也要慎重,考虑周全。
105上log:
Tue Dec 18 20:00:15 [clientcursormon] mem (MB) res:61273 virt:1966680 mapped:982801<br />
Tue Dec 18 20:02:48 [rsHealthPoll] replSet info 10.0.0.107:20012 is down (or slow to respond): socket exception<br />
Tue Dec 18 20:02:48 [rsHealthPoll] replSet member 10.0.0.107:20012 is now in state DOWN<br />
Tue Dec 18 20:02:48 [rsHealthPoll] replSet info 10.0.0.107:20011 is down (or slow to respond): socket exception<br />
Tue Dec 18 20:02:48 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state DOWN<br />
Tue Dec 18 20:02:49 [rsHealthPoll] replSet info 10.0.0.105:20012 is down (or slow to respond): socket exception<br />
Tue Dec 18 20:02:49 [rsHealthPoll] replSet member 10.0.0.105:20012 is now in state DOWN<br />
Tue Dec 18 20:02:49 [rsHealthPoll] replSet info 10.0.0.106:20011 is down (or slow to respond): socket exception<br />
Tue Dec 18 20:02:49 [rsHealthPoll] replSet member 10.0.0.106:20011 is now in state DOWN<br />
Tue Dec 18 20:02:49 [rsHealthPoll] replSet info 10.0.0.106:20012 is down (or slow to respond): socket exception<br />
Tue Dec 18 20:02:49 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state DOWN<br />
Tue Dec 18 20:02:50 [rsHealthPoll] replSet member 10.0.0.107:20012 is up<br />
Tue Dec 18 20:02:50 [rsHealthPoll] replSet member 10.0.0.107:20012 is now in state ARBITER<br />
Tue Dec 18 20:02:50 [rsHealthPoll] replSet member 10.0.0.107:20011 is up<br />
Tue Dec 18 20:02:50 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state SECONDARY<br />
Tue Dec 18 20:02:51 [rsHealthPoll] replSet member 10.0.0.105:20012 is up<br />
Tue Dec 18 20:02:51 [rsHealthPoll] replSet member 10.0.0.105:20012 is now in state ARBITER<br />
Tue Dec 18 20:02:51 [rsHealthPoll] replSet member 10.0.0.106:20011 is up<br />
Tue Dec 18 20:02:51 [rsHealthPoll] replSet member 10.0.0.106:20011 is now in state SECONDARY<br />
Tue Dec 18 20:02:51 [rsHealthPoll] replSet member 10.0.0.106:20012 is up<br />
Tue Dec 18 20:02:51 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state ARBITER<br />
Tue Dec 18 20:02:57 [rsHealthPoll] replSet member 10.0.0.106:20011 is now in state PRIMARY<br />
Tue Dec 18 20:02:57 [rsSync] replSet syncing to: 10.0.0.106:20011<br />
Tue Dec 18 20:04:15 got kill or ctrl c or hup signal 15 (Terminated), will terminate after current cmd ends<br />
Tue Dec 18 20:04:15 [interruptThread] now exiting<br />
Tue Dec 18 20:04:15 dbexit:<br />
Tue Dec 18 20:04:15 [interruptThread] shutdown: going to close listening sockets…<br />
Tue Dec 18 20:04:15 [interruptThread] closing listening socket: 7<br />
Tue Dec 18 20:04:15 [interruptThread] closing listening socket: 8<br />
Tue Dec 18 20:04:15 [interruptThread] closing listening socket: 9<br />
Tue Dec 18 20:04:15 [interruptThread] removing socket file: /tmp/mongodb-20011.sock<br />
Tue Dec 18 20:04:15 [interruptThread] shutdown: going to flush diaglog…<br />
Tue Dec 18 20:04:15 [interruptThread] shutdown: going to close sockets…<br />
Tue Dec 18 20:04:15 [interruptThread] shutdown: waiting for fs preallocator…<br />
Tue Dec 18 20:04:15 [interruptThread] shutdown: lock for final commit…<br />
Tue Dec 18 20:04:15 [interruptThread] shutdown: final commit…<br />
Tue Dec 18 20:04:15 [interruptThread] shutdown: closing all files…<br />
Tue Dec 18 20:04:15 [rsSync] Socket recv() errno:9 Bad file descriptor 10.0.0.106:20011<br />
Tue Dec 18 20:04:15 [rsSync] SocketException: remote: 10.0.0.106:20011 error: 9001 socket exception [1] server [10.0.0.106:20011]<br />
Tue Dec 18 20:04:15 ERROR: Client::shutdown not called: slaveTracking<br />
Tue Dec 18 20:04:15 [rsSync] replSet syncThread: 10278 dbclient error communicating with server: 10.0.0.106:20011<br />
25/491 5%<br />
392/491 79%<br />
Tue Dec 18 20:04:19 [interruptThread] closeAllFiles() finished<br />
Tue Dec 18 20:04:19 [interruptThread] journalCleanup…<br />
Tue Dec 18 20:04:19 [interruptThread] removeJournalFiles<br />
Tue Dec 18 20:04:19 [interruptThread] shutdown: removing fs lock…<br />
Tue Dec 18 20:04:19 dbexit: really exiting now<br />
***** SERVER RESTARTED *****<br />
Tue Dec 18 20:19:24 [initandlisten] MongoDB starting : pid=10105 port=20011 dbpath=/export/data/mongodb_data/ 64-bit host=.com<br />
Tue Dec 18 20:19:24 [initandlisten] db version v2.0.4, pdfile version 4.5<br />
Tue Dec 18 20:19:24 [initandlisten] git version: 329f3c47fe8136c03392c8f0e548506cb21f8ebf<br />
Tue Dec 18 20:19:24 [initandlisten] build info: <a href=”http://www.ttlsa.com/linux/”>Linux</a> ip-10-110-9-236 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41<br />
Tue Dec 18 20:19:24 [initandlisten] options: { dbpath: “/export/data/mongodb_data/”, directoryperdb: true, fork: true, keyFile: “/export/data/key/key”, logappend: true, logpath: “/export/data/logs/rs.log”, maxConns: 12000, port: 20011, replSet: “poptask”, rest: true }<br />
Tue Dec 18 20:19:24 [initandlisten] journal dir=/export/data/mongodb_data/journal<br />
Tue Dec 18 20:19:24 [initandlisten] recover : no journal files present, no recovery needed<br />
Tue Dec 18 20:19:24 [initandlisten] preallocateIsFaster=true 4.48<br />
Tue Dec 18 20:19:25 [initandlisten] preallocateIsFaster=true 8.12<br />
Tue Dec 18 20:19:26 [initandlisten] preallocateIsFaster=true 7.96<br />
Tue Dec 18 20:19:26 [initandlisten] preallocating a journal file /export/data/mongodb_data/journal/prealloc.0<br />
Tue Dec 18 20:19:30 [initandlisten] preallocating a journal file /export/data/mongodb_data/journal/prealloc.1<br />
Tue Dec 18 20:19:34 [initandlisten] preallocating a journal file /export/data/mongodb_data/journal/prealloc.2<br />
Tue Dec 18 20:19:37 [rsStart] replSet can’t get local.system.replset config from self or any seed (EMPTYCONFIG)<br />
Tue Dec 18 20:19:37 [rsStart] replSet info you may need to run replSetInitiate — rs.initiate() in the <a href=”http://www.ttlsa.com/shell/”>shell</a> — if that is not already done<br />
Tue Dec 18 20:19:47 [rsStart] trying to contact 10.0.0.105:20012<br />
Tue Dec 18 20:19:47 [rsStart] trying to contact 10.0.0.106:20011<br />
Tue Dec 18 20:19:47 [rsStart] trying to contact 10.0.0.106:20012<br />
Tue Dec 18 20:19:47 [rsStart] trying to contact 10.0.0.107:20011<br />
Tue Dec 18 20:19:47 [rsStart] trying to contact 10.0.0.107:20012<br />
Tue Dec 18 20:19:47 [rsStart] replSet got config version 4 from a remote, saving locally<br />
Tue Dec 18 20:19:47 [rsStart] replSet info saving a newer config version to local.system.replset<br />
…<br />
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.107:20011 is up<br />
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.106:20011 is up<br />
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.105:20012 is up<br />
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.107:20012 is up<br />
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.106:20011 is now in state PRIMARY<br />
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.105:20012 is now in state ARBITER<br />
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.107:20012 is now in state ARBITER<br />
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.106:20012 is up<br />
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state SECONDARY<br />
Tue Dec 18 20:19:49 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state ARBITER</p>
<p>Tue Dec 18 20:22:56 [FileAllocator] done allocating datafile /export/data/mongodb_data/local/local.26, size: 2047MB, took 7.54 secs<br />
Tue Dec 18 20:22:58 [rsSync] ******<br />
Tue Dec 18 20:22:58 [rsSync] replSet initial sync pending<br />
Tue Dec 18 20:22:58 [rsSync] replSet syncing to: 10.0.0.107:20011<br />
Tue Dec 18 20:22:58 [rsSync] build index local.me { _id: 1 }<br />
Tue Dec 18 20:22:58 [rsSync] build index done 0 records 0 secs<br />
Tue Dec 18 20:22:58 [rsSync] replSet initial sync drop all databases<br />
Tue Dec 18 20:22:58 [rsSync] dropAllDatabasesExceptLocal 1<br />
Tue Dec 18 20:22:58 [rsSync] replSet initial sync clone all databases<br />
Tue Dec 18 20:22:58 [rsSync] replSet initial sync cloning db: admin</p>
<p>Tue Dec 18 23:00:36 [rsSync] 643882501 objects cloned so far from collection poptask_mg.TASK_SUCCESS<br />
Tue Dec 18 23:00:48 [rsSync] build index poptask_mg.TASK_SUCCESS { _id: 1 }<br />
Tue Dec 18 23:00:52 [clientcursormon] mem (MB) res:58149 virt:828136 mapped:413735<br />
2137000/644712203 0%<br />
4484800/644712203 0%<br />
6770300/644712203 1%<br />
…<br />
643722400/644712203 99%<br />
Tue Dec 18 23:48:50 [rsSync] external sort used : 645 files in 2881 secs<br />
116300/644712203 0%<br />
…<br />
643308900/644712203 99%<br />
Wed Dec 19 05:42:30 [rsSync] done building bottom layer, going to commit<br />
Wed Dec 19 05:45:33 [rsSync] build index done 644712203 records 24284.9 secs<br />
Wed Dec 19 05:45:35 [rsSync] build index poptask_mg.TASK_EXPIRED { _id: 1 }<br />
Wed Dec 19 05:45:35 [rsSync] build index done 8 records 0.03 secs<br />
Wed Dec 19 05:45:41 [rsSync] build index poptask_mg.TASK_FAILURE { _id: 1 }<br />
Wed Dec 19 05:45:42 [rsSync] build index done 115927 records 0.62 secs<br />
Wed Dec 19 05:45:42 [rsSync] build index poptask_mg.TASK_TO_DO { _id: 1 }<br />
Wed Dec 19 05:45:42 [rsSync] build index done 0 records 0.01 secs<br />
Wed Dec 19 05:46:35 [clientcursormon] mem (MB) res:25750 virt:869184 mapped:434205<br />
Wed Dec 19 05:46:42 [rsSync] clone poptask_mg.TASK_SUCCESS201211 3464575<br />
Wed Dec 19 05:46:43 [rsSync] 3530351 objects cloned so far from collection poptask_mg.TASK_SUCCESS201211</p>
<p>Wed Dec 19 07:31:46 [rsSync] info: indexing in foreground on this replica; was a background index build on the primary<br />
Wed Dec 19 07:31:46 [rsSync] build index poptask_mg.TASK_SUCCESS { taskId: 1.0 }<br />
Wed Dec 19 07:31:49 [clientcursormon] mem (MB) res:53381 virt:1131366 mapped:565213<br />
2164200/644712203 0%<br />
…<br />
643273700/644712203 99%<br />
Wed Dec 19 08:23:09 [rsSync] external sort used : 645 files in 3083 secs</p>
<p>Wed Dec 19 13:25:07 [rsSync] done building bottom layer, going to commit<br />
Wed Dec 19 13:25:22 [rsSync] old journal file will be removed: /export/data/mongodb_data/journal/j._115<br />
Wed Dec 19 13:25:22 [rsSync] old journal file will be removed: /export/data/mongodb_data/journal/j._116<br />
Wed Dec 19 13:26:25 [rsSync] DR101 latency warning on journal file open 659ms<br />
Wed Dec 19 13:27:44 [rsSync] build index done 644712203 records 21357.6 secs<br />
Wed Dec 19 13:27:44 [rsSync] info: indexing in foreground on this replica; was a background index build on the primary<br />
Wed Dec 19 13:27:44 [rsSync] build index poptask_mg.TASK_SUCCESS { externalTaskId: 1.0 }<br />
1451500/644712203 0%<br />
…<br />
644000000/644712203 99%<br />
Wed Dec 19 14:34:52 [rsSync] external sort used : 645 files in 4027 secs<br />
96200/644712203 0%<br />
226100/644712203 0%<br />
…<br />
643000000/644712203 99%<br />
Thu Dec 20 02:05:41 [rsSync] external sort used : 645 files in 3595 secs<br />
169700/644712203 0%<br />
…<br />
643867200/644712203 99%<br />
Thu Dec 20 06:44:39 [rsSync] done building bottom layer, going to commit</p>
<p>Thu Dec 20 06:46:25 [rsSync] DR101 latency warning on journal file open 1695ms</p>
<p>Thu Dec 20 06:47:11 [rsSync] build index done 644712203 records 20485.7 secs<br />
Thu Dec 20 06:47:11 [rsSync] info: indexing in foreground on this replica; was a background index build on the primary<br />
…<br />
Thu Dec 20 06:47:16 [rsSync] build index poptask_mg.TASK_SUCCESS201211 { taskId: 1.0 }<br />
1271200/33636123 3%<br />
…<br />
Thu Dec 20 09:49:13 [rsSync] done building bottom layer, going to commit<br />
Thu Dec 20 09:49:17 [rsSync] old journal file will be removed: /export/data/mongodb_data/journal/j._159<br />
Thu Dec 20 09:49:38 [rsSync] old journal file will be removed: /export/data/mongodb_data/journal/j._160<br />
Thu Dec 20 09:50:01 [rsSync] build index done 197750701 records 2770.61 secs<br />
Thu Dec 20 09:50:01 [rsSync] replSet initial sync query minValid<br />
Thu Dec 20 09:50:12 [rsSync] replSet initial oplog application from 10.0.0.107:20011 starting at Dec 18 20:22:52:11 to Dec 20 09:49:55:d2<br />
Thu Dec 20 09:50:29 [rsSync] replSet initialSyncOplogApplication applied 84000 operations, synced to Dec 18 20:35:44:1c<br />
Thu Dec 20 09:50:40 [rsSync] replSet initialSyncOplogApplication applied 219000 operations, synced to Dec 18 20:59:46:8<br />
…<br />
Thu Dec 20 10:18:29 [rsSync] replSet initialSyncOplogApplication applied 13481000 operations, synced to Dec 20 00:58:42:123<br />
Thu Dec 20 10:18:35 [rsSync] Socket recv() errno:104 Connection reset by peer 10.0.0.107:20011<br />
Thu Dec 20 10:18:35 [rsSync] SocketException: remote: 10.0.0.107:20011 error: 9001 socket exception [1] server [10.0.0.107:20011]<br />
Thu Dec 20 10:18:35 [rsSync] replSet initial sync failing, error applying oplog 10278 dbclient error communicating with server: 10.0.0.107:20011<br />
Thu Dec 20 10:18:35 [rsSync] Socket flush send() errno:32 Broken pipe 10.0.0.107:20011<br />
Thu Dec 20 10:18:35 [rsSync] caught exception (socket exception) in destructor (~PiggyBackData)<br />
Thu Dec 20 10:18:35 [rsSync] replSet initial sync failed during applyoplog<br />
Thu Dec 20 10:18:35 [rsSync] replSet cleaning up [1]<br />
Thu Dec 20 10:18:35 [rsSync] replSet cleaning up [2]<br />
Thu Dec 20 10:18:36 [rsHealthPoll] DBClientCursor::init call() failed<br />
Thu Dec 20 10:18:36 [rsHealthPoll] replSet info 10.0.0.107:20011 is down (or slow to respond): DBClientBase::findN: transport error: 10.0.0.107:20011 query: { replSetHeartbeat: “poptask”, v: 4, pv: 1, checkEmpty: false, from: “10.0.0.105:20011” }Thu Dec 20 10:18:36 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state DOWN<br />
Thu Dec 20 10:18:41 [rsSync] replSet initial sync pending<br />
Thu Dec 20 10:18:41 [rsSync] replSet syncing to: 10.0.0.106:20011<br />
Thu Dec 20 10:18:41 [rsSync] replSet initial sync drop all databases<br />
Thu Dec 20 10:18:41 [rsSync] dropAllDatabasesExceptLocal 3<br />
Thu Dec 20 10:18:41 [rsSync] removeJournalFiles<br />
Thu Dec 20 10:18:42 [rsSync] removeJournalFiles<br />
Thu Dec 20 10:18:43 [conn22750] warning: virtual size (480327MB) – mapped size (471258MB) is large (9003MB). could indicate a memory leak
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
|
982801
exception
DOWN
exception
DOWN
exception
DOWN
exception
DOWN
exception
DOWN
up
ARBITER
up
SECONDARY
up
ARBITER
up
SECONDARY
up
ARBITER
PRIMARY
20011
ends
exiting
:
.
7
8
9
20011.sock
.
.
.
.
.
.
20011
]
slaveTracking
20011
%
%
finished
.
removeJournalFiles
.
now
*
com
4.5
329f3c47fe8136c03392c8f0e548506cb21f8ebf
#1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41
}
journal
needed
4.48
8.12
7.96
0
1
2
)
done
20012
20011
20012
20011
20012
locally
replset
.
up
up
up
up
PRIMARY
ARBITER
ARBITER
up
SECONDARY
ARBITER
secs
*
pending
20011
}
secs
databases
1
databases
admin
TASK_SUCCESS
}
413735
%
%
%
.
%
secs
%
.
%
commit
secs
}
secs
}
secs
}
secs
434205
3464575
TASK_SUCCESS201211
primary
}
565213
%
.
%
secs
commit
_115
_116
659ms
secs
primary
}
%
.
%
secs
%
%
.
%
secs
%
.
%
commit
1695ms
secs
primary
.
}
%
.
commit
_159
_160
secs
minValid
d2
1c
8
.
123
20011
]
20011
20011
)
applyoplog
]
]
failed
DOWN
pending
20011
databases
3
removeJournalFiles
removeJournalFiles
leak
|
107重做同步完毕
Sat Dec 22 02:00:07 [rsSync] replSet initialSyncOplogApplication applied 21693000 operations, synced to Dec 22 01:08:12:e<br />
Sat Dec 22 02:00:15 [rsSync] replSet initial sync finishing up<br />
Sat Dec 22 02:00:15 [rsSync] replSet set minValid=50d49943:20<br />
Sat Dec 22 02:00:15 [rsSync] build index local.replset.minvalid { _id: 1 }<br />
Sat Dec 22 02:00:15 [rsSync] build index done 0 records 0.021 secs<br />
Sat Dec 22 02:00:15 [rsSync] replSet initial sync done<br />
Sat Dec 22 02:00:16 [rsSync] replSet syncing to: 10.0.0.106:20011<br />
Sat Dec 22 02:00:16 [rsSync] replSet SECONDARY
1
2
3
4
5
6
7
8
|
e
up
20
}
secs
done
20011
SECONDARY
|
107
[root@107 ~]# sync
[root@107 ~]# free -m
total used free shared buffers cached
Mem: 64454 64287 167 0 305 59896
-/+ buffers/cache: 4085 60369
Swap: 16386 560 15825
[root@107 ~]# echo 3 >/proc/sys/vm/drop_caches
[root@107 ~]# free -m
total used free shared buffers cached
Mem: 64454 56087 8367 0 1 52068
-/+ buffers/cache: 4017 60437
Swap: 16386 560 15825
[root@107 ~]# free -g
total used free shared buffers cached
Mem: 62 54 8 0 0 50
-/+ buffers/cache: 3 59
Swap: 16 0 15
numactl –interleave=all /export/servers/mongodb/bin/mongod –replSet poptask –keyFile /export/data/key/key –port 20011 –dbpath /export/data/mongodb_data/ –logpath /export/data/logs/rs.log –logappend –rest –directoryperdb –maxConns 12000 –fork
root 11095 9799 0 10:51 pts/0 00:00:00 grep mongo
numactl –interleave=all /export/servers/mongodb/bin/mongod -f /export/servers/mongodb/etc/arbiter.cnf
[root@107 ~]#init 6
[root@107 ~]# free -m
total used free shared buffers cached
Mem: 64454 430 64024 0 13 251
-/+ buffers/cache: 166 64288
Swap: 16386 0 16386
107重启后(此时105mongod是停掉的,arbiter启动状态),主节点106降级了,应用不能写了,这算是一个事故。
降级原因,整个副本集一半成员都是down掉的。
106:
Mon Dec 24 10:52:30 [rsHealthPoll] DBClientCursor::init call() failed<br />
Mon Dec 24 10:52:30 [rsHealthPoll] replSet info 10.0.0.107:20011 is down (or slow to respond): DBClientBase::findN: transport error: 10.0.0.107:20011 query: { replSetHeartbeat: “poptask”,<br />
v: 4, pv: 1, checkEmpty: false, from: “10.0.0.106:20011” }<br />
Mon Dec 24 10:52:30 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state DOWN<br />
Mon Dec 24 10:52:37 [rsHealthPoll] DBClientCursor::init call() failed<br />
Mon Dec 24 10:52:37 [rsHealthPoll] replSet info 10.0.0.107:20012 is down (or slow to respond): DBClientBase::findN: transport error: 10.0.0.107:20012 query: { replSetHeartbeat: “poptask”,<br />
v: 4, pv: 1, checkEmpty: false, from: “10.0.0.106:20011” }<br />
Mon Dec 24 10:52:37 [rsHealthPoll] replSet member 10.0.0.107:20012 is now in state DOWN<br />
Mon Dec 24 10:52:57 [rsMgr] can’t see a majority of the set, relinquishing primary<br />
Mon Dec 24 10:52:57 [rsMgr] replSet relinquishing primary state<br />
Mon Dec 24 10:52:57 [rsMgr] replSet SECONDARY<br />
Mon Dec 24 10:52:57 [rsMgr] replSet closing client sockets after reqlinquishing primary<br />
Mon Dec 24 10:52:57 [rsHealthPoll] replSet info 10.0.0.106:20012 is down (or slow to respond): socket exception<br />
Mon Dec 24 10:52:57 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state DOWN<br />
Mon Dec 24 10:52:57 [rsMgr] replSet can’t see a majority, will not try to elect self<br />
Mon Dec 24 10:52:59 [rsHealthPoll] replSet info 10.0.0.105:20012 is down (or slow to respond): socket exception<br />
Mon Dec 24 10:52:59 [rsHealthPoll] replSet member 10.0.0.105:20012 is now in state DOWN<br />
Mon Dec 24 10:52:59 [rsHealthPoll] replSet member 10.0.0.106:20012 is up<br />
Mon Dec 24 10:52:59 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state ARBITER<br />
Mon Dec 24 10:53:01 [rsHealthPoll] replSet member 10.0.0.105:20012 is up<br />
Mon Dec 24 10:53:01 [rsHealthPoll] replSet member 10.0.0.105:20012 is now in state ARBITER<br />
Mon Dec 24 10:56:08 [clientcursormon] mem (MB) res:60093 virt:1373714 mapped:685986<br />
Mon Dec 24 10:57:53 [rsHealthPoll] replSet member 10.0.0.107:20011 is up<br />
Mon Dec 24 10:57:53 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state SECONDARY<br />
Mon Dec 24 10:57:53 [rsMgr] not electing self, 10.0.0.105:20012 would veto<br />
Mon Dec 24 10:57:58 [rsMgr] replSet info electSelf 1<br />
Mon Dec 24 10:57:58 [rsMgr] replSet PRIMARY<br />
Mon Dec 24 10:58:45 [rsHealthPoll] DBClientCursor::init call() failed<br />
Mon Dec 24 10:58:45 [rsHealthPoll] replSet info 10.0.0.107:20011 is down (or slow to respond): DBClientBase::findN: transport error: 10.0.0.107:20011 query: { replSetHeartbeat: “poptask”,<br />
v: 4, pv: 1, checkEmpty: false, from: “10.0.0.106:20011” }<br />
Mon Dec 24 10:58:45 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state DOWN<br />
Mon Dec 24 10:59:05 [rsMgr] can’t see a majority of the set, relinquishing primary<br />
Mon Dec 24 10:59:05 [rsMgr] replSet relinquishing primary state<br />
Mon Dec 24 10:59:05 [rsMgr] replSet SECONDARY<br />
Mon Dec 24 10:59:05 [rsMgr] replSet closing client sockets after reqlinquishing primary<br />
Mon Dec 24 10:59:05 [rsHealthPoll] replSet info 10.0.0.105:20012 is down (or slow to respond): socket exception<br />
Mon Dec 24 10:59:05 [rsHealthPoll] replSet member 10.0.0.105:20012 is now in state DOWN<br />
Mon Dec 24 10:59:06 [rsHealthPoll] replSet info 10.0.0.106:20012 is down (or slow to respond): socket exception<br />
Mon Dec 24 10:59:06 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state DOWN<br />
Mon Dec 24 10:59:07 [rsHealthPoll] replSet member 10.0.0.105:20012 is up<br />
Mon Dec 24 10:59:07 [rsHealthPoll] replSet member 10.0.0.105:20012 is now in state ARBITER<br />
Mon Dec 24 10:59:08 [rsHealthPoll] replSet member 10.0.0.106:20012 is up<br />
Mon Dec 24 10:59:08 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state ARBITER<br />
Mon Dec 24 10:59:41 [rsHealthPoll] replSet member 10.0.0.107:20011 is up<br />
Mon Dec 24 10:59:41 [rsHealthPoll] replSet member 10.0.0.107:20011 is now in state SECONDARY<br />
Mon Dec 24 10:59:41 [rsMgr] not electing self, 10.0.0.105:20012 would veto<br />
Mon Dec 24 10:59:46 [rsMgr] replSet info electSelf 1<br />
Mon Dec 24 10:59:46 [rsMgr] replSet PRIMARY<br />
Mon Dec 24 11:01:08 [clientcursormon] mem (MB) res:60108 virt:1373684 mapped:685986<br />
Mon Dec 24 11:01:14 [rsHealthPoll] replSet member 10.0.0.107:20012 is up<br />
Mon Dec 24 11:01:14 [rsHealthPoll] replSet member 10.0.0.107:20012 is now in state STARTUP2<br />
Mon Dec 24 11:01:16 [rsHealthPoll] replSet member 10.0.0.107:20012 is now in state ARBITER<br />
Mon Dec 24 11:06:08 [clientcursormon] mem (MB) res:60132 virt:1373462 mapped:685986
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
|
failed
,
}
DOWN
failed
,
}
DOWN
‘t see a majority of the set, relinquishing primary
Mon Dec 24 10:52:57 [rsMgr] replSet relinquishing primary state
Mon Dec 24 10:52:57 [rsMgr] replSet SECONDARY
Mon Dec 24 10:52:57 [rsMgr] replSet closing client sockets after reqlinquishing primary
Mon Dec 24 10:52:57 [rsHealthPoll] replSet info 10.0.0.106:20012 is down (or slow to respond): socket exception
Mon Dec 24 10:52:57 [rsHealthPoll] replSet member 10.0.0.106:20012 is now in state DOWN
self
exception
DOWN
up
ARBITER
up
ARBITER
685986
up
SECONDARY
veto
1
PRIMARY
failed
,
}
DOWN
primary
state
SECONDARY
primary
exception
DOWN
exception
DOWN
up
ARBITER
up
ARBITER
up
SECONDARY
veto
1
PRIMARY
685986
up
STARTUP2
ARBITER
685986
|
105升级操作系统后
***** SERVER RESTARTED *****<br />
Mon Dec 24 16:15:23 permissions on /export/data/mongodb/key/key are too open<br />
Mon Dec 24 16:15:23 dbexit:<br />
Mon Dec 24 16:15:23 shutdown: going to close listening sockets…<br />
Mon Dec 24 16:15:23 shutdown: going to flush diaglog…<br />
Mon Dec 24 16:15:23 shutdown: going to close sockets…<br />
Mon Dec 24 16:15:23 shutdown: waiting for fs preallocator…<br />
Mon Dec 24 16:15:23 shutdown: lock for final commit…<br />
Mon Dec 24 16:15:23 shutdown: final commit…<br />
Mon Dec 24 16:15:23 shutdown: closing all files…<br />
Mon Dec 24 16:15:23 closeAllFiles() finished<br />
Mon Dec 24 16:15:23 dbexit: really exiting now
1
2
3
4
5
6
7
8
9
10
11
12
|
*
open
:
.
.
.
.
.
.
.
finished
now
|
http://duoyun.org/topic/51ceaffa0acf7bac02001f4b