Remove-failed-disk » История » Версия 4
  Рамиль Абдулбяров, 2015-04-29 18:26 
  
| 1 | 1 | Рамиль Абдулбяров | h1. Изъятие сбойного диска из массива | 
|---|---|---|---|
| 2 | |||
| 3 | 2 | Рамиль Абдулбяров | h1. Собираем информацию о диске | 
| 4 | |||
| 5 | 1 | Рамиль Абдулбяров | http://serverfault.com/questions/381177/megacli-get-the-dev-sd-device-name-for-a-logical-drive | 
| 6 | |||
| 7 | Нас интересует 'Target Id' из вывода | ||
| 8 | *megacli -ldinfo -Lall -aall* | ||
| 9 | <pre> | ||
| 10 | Virtual Drive: 5 (Target Id: 5) | ||
| 11 | Name :r0-2-ssd | ||
| 12 | </pre> | ||
| 13 | |||
| 14 | Поставил lshw, сравниваем 'Target Id' и 'bus info': | ||
| 15 | <pre> | ||
| 16 | bus info: scsi@0:2.5.0 | ||
| 17 | logical name: /dev/sdf | ||
| 18 | </pre> | ||
| 19 | |||
| 20 | Смотрим какой раздел находится на этом диске: | ||
| 21 | *lvs -o +seg_pe_ranges |grep /dev/sdf* | ||
| 22 | <pre> | ||
| 23 | ssd-kvm321-chi-slave-db ssd -wi-ao-- 400.00g /dev/sdf:2560-104959 | ||
| 24 | </pre> | ||
| 25 | |||
| 26 | Собираем информацию по "Other Error Count: 1". | ||
| 27 | *megacli AdpEventLog -GetEvents -f megacli.log -a0* | ||
| 28 | |||
| 29 | Из файла megacli.log: | ||
| 30 | <pre> | ||
| 31 | =========== | ||
| 32 | Device ID: 15 | ||
| 33 | Enclosure Index: 32 | ||
| 34 | Slot Number: 15 | ||
| 35 | Error: 3 | ||
| 36 | |||
| 37 | seqNum: 0x00000f9a | ||
| 38 | Time: Sun Sep 7 15:54:17 2014 | ||
| 39 | |||
| 40 | Code: 0x00000071 | ||
| 41 | Class: 0 | ||
| 42 | Locale: 0x02 | ||
| 43 | Event Description: Unexpected sense: PD 0f(e0x20/s15) Path 500056b36789abdc, CDB: 2a 00 17 6e 08 00 00 00 80 00, Sense: 6/29/00 | ||
| 44 | Event Data: | ||
| 45 | =========== | ||
| 46 | </pre> | ||
| 47 | |||
| 48 | <pre> | ||
| 49 | Slot Number: 15 | ||
| 50 | Серийный номер: OCZ-6R12G0UG3MU5KHK2OCZ-VERTEX460 | ||
| 51 | SCSI WWN: 5e83a970e3f8ae05 | ||
| 52 | 2 | Рамиль Абдулбяров | </pre> | 
| 53 | |||
| 54 | h3. Смотрим какому виртуальному устройству соответствует сбойный физический диск | ||
| 55 | |||
| 56 | *megacli -LdPdInfo -a0 -nolog* | ||
| 57 | <pre> | ||
| 58 | Virtual Drive: 5 (Target Id: 5) | ||
| 59 | Name :r0-2-ssd | ||
| 60 | RAID Level : Primary-0, Secondary-0, RAID Level Qualifier-0 | ||
| 61 | Size : 446.625 GB | ||
| 62 | Sector Size : 512 | ||
| 63 | Parity Size : 0 | ||
| 64 | State : Optimal | ||
| 65 | Strip Size : 64 KB | ||
| 66 | Number Of Drives : 1 | ||
| 67 | Span Depth : 1 | ||
| 68 | Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU | ||
| 69 | Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU | ||
| 70 | Default Access Policy: Read/Write | ||
| 71 | Current Access Policy: Read/Write | ||
| 72 | Disk Cache Policy : Disk's Default | ||
| 73 | Encryption Type : None | ||
| 74 | Default Power Savings Policy: Controller Defined | ||
| 75 | Current Power Savings Policy: None | ||
| 76 | Can spin up in 1 minute: No | ||
| 77 | LD has drives that support T10 power conditions: No | ||
| 78 | LD's IO profile supports MAX power savings with cached writes: No | ||
| 79 | Bad Blocks Exist: No | ||
| 80 | Is VD Cached: No | ||
| 81 | Number of Spans: 1 | ||
| 82 | Span: 0 - Number of PDs: 1 | ||
| 83 | |||
| 84 | PD: 0 Information | ||
| 85 | Enclosure Device ID: 32 | ||
| 86 | Slot Number: 15 | ||
| 87 | Drive's position: DiskGroup: 5, Span: 0, Arm: 0 | ||
| 88 | Enclosure position: 1 | ||
| 89 | Device Id: 15 | ||
| 90 | WWN: 5e83a970e3f8ae05 | ||
| 91 | Sequence Number: 2 | ||
| 92 | Media Error Count: 0 | ||
| 93 | Other Error Count: 3 | ||
| 94 | Predictive Failure Count: 0 | ||
| 95 | Last Predictive Failure Event Seq Number: 0 | ||
| 96 | PD Type: SATA | ||
| 97 | |||
| 98 | Raw Size: 447.130 GB [0x37e436b0 Sectors] | ||
| 99 | Non Coerced Size: 446.630 GB [0x37d436b0 Sectors] | ||
| 100 | Coerced Size: 446.625 GB [0x37d40000 Sectors] | ||
| 101 | Sector Size: 0 | ||
| 102 | Firmware state: Online, Spun Up | ||
| 103 | Device Firmware Level: 1.0 | ||
| 104 | Shield Counter: 0 | ||
| 105 | Successful diagnostics completion on : N/A | ||
| 106 | SAS Address(0): 0x500056b36789abdc | ||
| 107 | Connected Port Number: 0(path0) | ||
| 108 | Inquiry Data: OCZ-6R12G0UG3MU5KHK2OCZ-VERTEX460 1.0 | ||
| 109 | FDE Capable: Not Capable | ||
| 110 | FDE Enable: Disable | ||
| 111 | Secured: Unsecured | ||
| 112 | Locked: Unlocked | ||
| 113 | Needs EKM Attention: No | ||
| 114 | Foreign State: None | ||
| 115 | Device Speed: 6.0Gb/s | ||
| 116 | Link Speed: 6.0Gb/s | ||
| 117 | Media Type: Solid State Device | ||
| 118 | Drive: Not Certified | ||
| 119 | Drive Temperature : N/A | ||
| 120 | PI Eligibility: No | ||
| 121 | Drive is formatted for PI information: No | ||
| 122 | PI: No PI | ||
| 123 | Port-0 : | ||
| 124 | Port status: Active | ||
| 125 | Port's Linkspeed: 6.0Gb/s | ||
| 126 | Drive has flagged a S.M.A.R.T alert : No | ||
| 127 | </pre> | ||
| 128 | |||
| 129 | h3. Смотрим массив | ||
| 130 | |||
| 131 | *megacli -LDGetProp -Name -L5 -a0 -nolog* | ||
| 132 | <pre> | ||
| 133 | Adapter 0-VD 5(target id: 5): Name: r0-2-ssd | ||
| 134 | |||
| 135 | Exit Code: 0x00 | ||
| 136 | 1 | Рамиль Абдулбяров | </pre> | 
| 137 | 3 | Рамиль Абдулбяров | |
| 138 | h1. Выводим диск | ||
| 139 | |||
| 140 | h2. Разбираем массив | ||
| 141 | |||
| 142 | *megacli -CfgLdDel -L5 -a0 -nolog* | ||
| 143 | <pre> | ||
| 144 | Adapter 0: Deleted Virtual Drive-5(target id-5) | ||
| 145 | |||
| 146 | Exit Code: 0x00 | ||
| 147 | </pre> | ||
| 148 | |||
| 149 | h2. Смотрим физический диск по его Enclosure Device ID и Slot Number - [E:S] | ||
| 150 | |||
| 151 | *megacli -pdInfo -PhysDrv [32:15] -a0 -nolog* | ||
| 152 | |||
| 153 | <pre> | ||
| 154 | Enclosure Device ID: 32 | ||
| 155 | Slot Number: 15 | ||
| 156 | Enclosure position: 1 | ||
| 157 | Device Id: 15 | ||
| 158 | WWN: 5e83a970e3f8ae05 | ||
| 159 | Sequence Number: 3 | ||
| 160 | Media Error Count: 0 | ||
| 161 | Other Error Count: 3 | ||
| 162 | Predictive Failure Count: 0 | ||
| 163 | Last Predictive Failure Event Seq Number: 0 | ||
| 164 | PD Type: SATA | ||
| 165 | |||
| 166 | Raw Size: 447.130 GB [0x37e436b0 Sectors] | ||
| 167 | Non Coerced Size: 446.630 GB [0x37d436b0 Sectors] | ||
| 168 | Coerced Size: 446.625 GB [0x37d40000 Sectors] | ||
| 169 | Sector Size: 0 | ||
| 170 | Firmware state: Unconfigured(good), Spun Up | ||
| 171 | Device Firmware Level: 1.0 | ||
| 172 | Shield Counter: 0 | ||
| 173 | Successful diagnostics completion on : N/A | ||
| 174 | SAS Address(0): 0x500056b36789abdc | ||
| 175 | Connected Port Number: 0(path0) | ||
| 176 | Inquiry Data: OCZ-6R12G0UG3MU5KHK2OCZ-VERTEX460 1.0 | ||
| 177 | FDE Capable: Not Capable | ||
| 178 | FDE Enable: Disable | ||
| 179 | Secured: Unsecured | ||
| 180 | Locked: Unlocked | ||
| 181 | Needs EKM Attention: No | ||
| 182 | Foreign State: None | ||
| 183 | Device Speed: 6.0Gb/s | ||
| 184 | Link Speed: 6.0Gb/s | ||
| 185 | Media Type: Solid State Device | ||
| 186 | Drive: Not Certified | ||
| 187 | Drive Temperature : N/A | ||
| 188 | PI Eligibility: No | ||
| 189 | Drive is formatted for PI information: No | ||
| 190 | PI: No PI | ||
| 191 | Port-0 : | ||
| 192 | Port status: Active | ||
| 193 | Port's Linkspeed: 6.0Gb/s | ||
| 194 | Drive has flagged a S.M.A.R.T alert : No | ||
| 195 | |||
| 196 | Exit Code: 0x00 | ||
| 197 | </pre> | ||
| 198 | 4 | Рамиль Абдулбяров | |
| 199 | http://tech.2499.pl/?p=96 | ||
| 200 | |||
| 201 | h2. 1. Перевод в offline (пропускаем) | ||
| 202 | |||
| 203 | Так как у нас диск в статусе "Firmware state: Unconfigured(good), Spun Up", то переводить его в offline не нужно. | ||
| 204 | Он был бы в online если бы принадлежал одному из логических дисков. | ||
| 205 | |||
| 206 | *-megacli -PDOffline -PhysDrv [32:15] -a0 -nolog-* | ||
| 207 | <pre> | ||
| 208 | Adapter: 0: Failed to change PD state at EnclId-32 SlotId-15. | ||
| 209 | |||
| 210 | Exit Code: 0x01 | ||
| 211 | </pre> | ||
| 212 | |||
| 213 | h2. 2. Помечаем диск, как отсутствующий | ||
| 214 | |||
| 215 | *-megacli -PDMarkMissing -PhysDrv [32:15] -a0 -nolog-* | ||
| 216 | <pre> | ||
| 217 | Adapter: 0: Failed to change PD state at EnclId-32 SlotId-15. | ||
| 218 | |||
| 219 | FW error description: | ||
| 220 | The specified device is in a state that doesn't support the requested command. | ||
| 221 | |||
| 222 | Exit Code: 0x32 | ||
| 223 | </pre> | ||
| 224 | |||
| 225 | Не сработало. Видимо так и нужно на контроллерах PERC H710P | ||
| 226 | Mark the drive as missing (seems to not work on R510 H700 card, so just do the next step) | ||
| 227 | http://www.maths.cam.ac.uk/computing/docs/public/megacli_raid_lsi.html | ||
| 228 | |||
| 229 | h2. 3. Подготавливаем диск к изъятию | ||
| 230 | |||
| 231 | *megacli -PdPrpRmv -PhysDrv [32:15] -a0 -nolog* | ||
| 232 | <pre> | ||
| 233 | Prepare for removal Success | ||
| 234 | |||
| 235 | Exit Code: 0x00 | ||
| 236 | </pre> | ||
| 237 | |||
| 238 | после этого шага меняется: Firmware state: Unconfigured(good), Spun down | ||
| 239 | |||
| 240 | h2. 4. Включаем подсветку диска | ||
| 241 | |||
| 242 | *megacli -PdLocate -start -PhysDrv [32:15] -a0 -nolog* | ||
| 243 | <pre> | ||
| 244 | Adapter: 0: Device at EnclId-32 SlotId-15 -- PD Locate Start Command was successfully sent to Firmware | ||
| 245 | |||
| 246 | Exit Code: 0x00 | ||
| 247 | </pre> | ||
| 248 | |||
| 249 | h2. 5. Изъятие диска | ||
| 250 | |||
| 251 | h2. 6. Проверка, тот ли изъяли | ||
| 252 | |||
| 253 | *megacli -pdinfo -physdrv [32:15] -aall -nolog* | ||
| 254 | <pre> | ||
| 255 | Adapter 0: Device at Enclosure - 32, Slot - 15 is not found. | ||
| 256 | |||
| 257 | Exit Code: 0x00 | ||
| 258 | |||
| 259 | </pre> | ||
| 260 | Все верно |