Проект

Общее

Профиль

Remove-failed-disk » История » Версия 4

Рамиль Абдулбяров, 2015-04-29 18:26

1 1 Рамиль Абдулбяров
h1. Изъятие сбойного диска из массива
2
3 2 Рамиль Абдулбяров
h1. Собираем информацию о диске
4
5 1 Рамиль Абдулбяров
http://serverfault.com/questions/381177/megacli-get-the-dev-sd-device-name-for-a-logical-drive
6
7
Нас интересует 'Target Id' из вывода
8
*megacli -ldinfo -Lall -aall*
9
<pre>
10
Virtual Drive: 5 (Target Id: 5)
11
Name                :r0-2-ssd
12
</pre>
13
14
Поставил lshw, сравниваем 'Target Id' и 'bus info':
15
<pre>
16
bus info: scsi@0:2.5.0
17
logical name: /dev/sdf
18
</pre>
19
20
Смотрим какой раздел находится на этом диске:
21
*lvs -o +seg_pe_ranges |grep /dev/sdf*
22
<pre>
23
ssd-kvm321-chi-slave-db       ssd   -wi-ao-- 400.00g      /dev/sdf:2560-104959 
24
</pre>
25
26
Собираем информацию по "Other Error Count: 1".
27
*megacli AdpEventLog -GetEvents -f megacli.log -a0*
28
29
Из файла megacli.log:
30
<pre>
31
===========
32
Device ID: 15
33
Enclosure Index: 32
34
Slot Number: 15
35
Error: 3
36
37
seqNum: 0x00000f9a
38
Time: Sun Sep  7 15:54:17 2014
39
40
Code: 0x00000071
41
Class: 0
42
Locale: 0x02
43
Event Description: Unexpected sense: PD 0f(e0x20/s15) Path 500056b36789abdc, CDB: 2a 00 17 6e 08 00 00 00 80 00, Sense: 6/29/00
44
Event Data:
45
===========
46
</pre>
47
48
<pre>
49
Slot Number: 15
50
Серийный номер: OCZ-6R12G0UG3MU5KHK2OCZ-VERTEX460
51
SCSI WWN: 5e83a970e3f8ae05
52 2 Рамиль Абдулбяров
</pre>
53
54
h3. Смотрим какому виртуальному устройству соответствует сбойный физический диск
55
56
*megacli -LdPdInfo -a0 -nolog*
57
<pre>
58
Virtual Drive: 5 (Target Id: 5)
59
Name                :r0-2-ssd
60
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
61
Size                : 446.625 GB
62
Sector Size         : 512
63
Parity Size         : 0
64
State               : Optimal
65
Strip Size          : 64 KB
66
Number Of Drives    : 1
67
Span Depth          : 1
68
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
69
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
70
Default Access Policy: Read/Write
71
Current Access Policy: Read/Write
72
Disk Cache Policy   : Disk's Default
73
Encryption Type     : None
74
Default Power Savings Policy: Controller Defined
75
Current Power Savings Policy: None
76
Can spin up in 1 minute: No
77
LD has drives that support T10 power conditions: No
78
LD's IO profile supports MAX power savings with cached writes: No
79
Bad Blocks Exist: No
80
Is VD Cached: No
81
Number of Spans: 1
82
Span: 0 - Number of PDs: 1
83
84
PD: 0 Information
85
Enclosure Device ID: 32
86
Slot Number: 15
87
Drive's position: DiskGroup: 5, Span: 0, Arm: 0
88
Enclosure position: 1
89
Device Id: 15
90
WWN: 5e83a970e3f8ae05
91
Sequence Number: 2
92
Media Error Count: 0
93
Other Error Count: 3
94
Predictive Failure Count: 0
95
Last Predictive Failure Event Seq Number: 0
96
PD Type: SATA
97
98
Raw Size: 447.130 GB [0x37e436b0 Sectors]
99
Non Coerced Size: 446.630 GB [0x37d436b0 Sectors]
100
Coerced Size: 446.625 GB [0x37d40000 Sectors]
101
Sector Size:  0
102
Firmware state: Online, Spun Up
103
Device Firmware Level: 1.0 
104
Shield Counter: 0
105
Successful diagnostics completion on :  N/A
106
SAS Address(0): 0x500056b36789abdc
107
Connected Port Number: 0(path0) 
108
Inquiry Data: OCZ-6R12G0UG3MU5KHK2OCZ-VERTEX460                           1.0     
109
FDE Capable: Not Capable
110
FDE Enable: Disable
111
Secured: Unsecured
112
Locked: Unlocked
113
Needs EKM Attention: No
114
Foreign State: None 
115
Device Speed: 6.0Gb/s 
116
Link Speed: 6.0Gb/s 
117
Media Type: Solid State Device
118
Drive:  Not Certified
119
Drive Temperature : N/A
120
PI Eligibility:  No 
121
Drive is formatted for PI information:  No
122
PI: No PI
123
Port-0 :
124
Port status: Active
125
Port's Linkspeed: 6.0Gb/s 
126
Drive has flagged a S.M.A.R.T alert : No
127
</pre>
128
129
h3. Смотрим массив
130
131
*megacli -LDGetProp -Name -L5 -a0 -nolog*
132
<pre>
133
Adapter 0-VD 5(target id: 5): Name: r0-2-ssd
134
135
Exit Code: 0x00
136 1 Рамиль Абдулбяров
</pre>
137 3 Рамиль Абдулбяров
138
h1. Выводим диск
139
140
h2. Разбираем массив
141
142
*megacli -CfgLdDel -L5 -a0 -nolog*
143
<pre>
144
Adapter 0: Deleted Virtual Drive-5(target id-5)
145
146
Exit Code: 0x00
147
</pre>
148
149
h2. Смотрим физический диск по его Enclosure Device ID и Slot Number - [E:S]
150
151
*megacli -pdInfo -PhysDrv [32:15] -a0 -nolog*
152
153
<pre>
154
Enclosure Device ID: 32
155
Slot Number: 15
156
Enclosure position: 1
157
Device Id: 15
158
WWN: 5e83a970e3f8ae05
159
Sequence Number: 3
160
Media Error Count: 0
161
Other Error Count: 3
162
Predictive Failure Count: 0
163
Last Predictive Failure Event Seq Number: 0
164
PD Type: SATA
165
166
Raw Size: 447.130 GB [0x37e436b0 Sectors]
167
Non Coerced Size: 446.630 GB [0x37d436b0 Sectors]
168
Coerced Size: 446.625 GB [0x37d40000 Sectors]
169
Sector Size:  0
170
Firmware state: Unconfigured(good), Spun Up
171
Device Firmware Level: 1.0 
172
Shield Counter: 0
173
Successful diagnostics completion on :  N/A
174
SAS Address(0): 0x500056b36789abdc
175
Connected Port Number: 0(path0) 
176
Inquiry Data: OCZ-6R12G0UG3MU5KHK2OCZ-VERTEX460                           1.0     
177
FDE Capable: Not Capable
178
FDE Enable: Disable
179
Secured: Unsecured
180
Locked: Unlocked
181
Needs EKM Attention: No
182
Foreign State: None 
183
Device Speed: 6.0Gb/s 
184
Link Speed: 6.0Gb/s 
185
Media Type: Solid State Device
186
Drive:  Not Certified
187
Drive Temperature : N/A
188
PI Eligibility:  No 
189
Drive is formatted for PI information:  No
190
PI: No PI
191
Port-0 :
192
Port status: Active
193
Port's Linkspeed: 6.0Gb/s 
194
Drive has flagged a S.M.A.R.T alert : No
195
196
Exit Code: 0x00
197
</pre>
198 4 Рамиль Абдулбяров
199
http://tech.2499.pl/?p=96
200
201
h2. 1. Перевод в offline (пропускаем)
202
203
Так как у нас диск в статусе "Firmware state: Unconfigured(good), Spun Up", то переводить его в offline не нужно.
204
Он был бы в online если бы принадлежал одному из логических дисков.
205
206
*-megacli -PDOffline -PhysDrv [32:15] -a0 -nolog-*
207
<pre>
208
Adapter: 0: Failed to change PD state at EnclId-32 SlotId-15.
209
210
Exit Code: 0x01
211
</pre>
212
213
h2. 2. Помечаем диск, как отсутствующий
214
215
*-megacli -PDMarkMissing -PhysDrv [32:15] -a0 -nolog-*
216
<pre>
217
Adapter: 0: Failed to change PD state at EnclId-32 SlotId-15.
218
219
FW error description: 
220
  The specified device is in a state that doesn't support the requested command.  
221
222
Exit Code: 0x32
223
</pre>
224
225
Не сработало. Видимо так и нужно на контроллерах PERC H710P
226
Mark the drive as missing (seems to not work on R510 H700 card, so just do the next step)
227
http://www.maths.cam.ac.uk/computing/docs/public/megacli_raid_lsi.html
228
229
h2. 3. Подготавливаем диск к изъятию
230
231
*megacli -PdPrpRmv -PhysDrv [32:15] -a0 -nolog*
232
<pre>
233
Prepare for removal Success
234
235
Exit Code: 0x00
236
</pre>
237
238
после этого шага меняется: Firmware state: Unconfigured(good), Spun down
239
240
h2. 4. Включаем подсветку диска
241
242
*megacli -PdLocate -start -PhysDrv [32:15] -a0 -nolog*
243
<pre>
244
Adapter: 0: Device at EnclId-32 SlotId-15  -- PD Locate Start Command was successfully sent to Firmware 
245
246
Exit Code: 0x00
247
</pre>
248
249
h2. 5. Изъятие диска
250
251
h2. 6. Проверка, тот ли изъяли
252
253
*megacli -pdinfo -physdrv [32:15] -aall -nolog*
254
<pre>
255
Adapter 0: Device at Enclosure - 32, Slot - 15 is not found.
256
257
Exit Code: 0x00
258
259
</pre>
260
Все верно