xaminmo: (Computer Drive)
[personal profile] xaminmo posting in [community profile] eserver
Info from digging into DS6800 (baby shark) which I didn't find online.
It's pretty sparse for the actual raw troubleshooting.
Basically though, it looks like there are recurring I/O channel failures for the primary processor card.
This has been replaced before, so it's probably a chassis issue.
There were power failures, so maybe it's a poor power regulation issue.
The PSU should take the hit and not kill the proc cards.

Since the site PDU was replaced, hopefully those problems will be gone.

Anyway, a CE is coming out to replace this processor card.


kona = processor card. 2 per unit.
shark = unit, or drawer
reef = cluster (multiple units)
complex = all DS6800s managed by the same storage server.

Unit 1 was stuck with proc0 comm lost.
ncrestartSM - restarts the storage manager on the local shark/unit.
ncwd_kill -d - kills and stops processes
ncwd_kill -a - restarts any missing processes
ncl - runs command on the other kona for this shark/unit.

# ncwd_kill -d ; sleep 5 ; ncwd_kill -a
after about 5 mins, some java procs were running on proc 0, but not the same as on primary
Storage Manager shows in discovery now, and upgrading firmware utilities.


####################################
An hour later, SMGUI said "cannot get password"
Updated storage manager with no change
Firmware update isn't working

Copying SEA.tar manually to the controllers, but activate fails.
Same for the "Normal" unit without controller problems:

CMUN80091E
IBM.1750-####### The concurrent firmware update operation reports an internal error. The concurrent firmware update did not start because prerequisite conditions are not met, or the concurrent firmware update stopped prematurely.

###########################################
/persist/scratch
/curr_lic
/other_lic
/other

# rss_warmstart
ERROR: Dev Open connect rc: -1, curSock: 3, errno: 111
Failed to open/connect to LCPSS socket - -1
getdebugreef bad ioctl return code - 6

# rss_reboot -s
rss_reboot: pid=29823 Error - machine in FENCED stat

## from other node:
# rss_reboot -d
rss_reboot: pid=16308 Error - machine in FENCED stat

### Force reboot
# rss_reboot -f -d
rss_reboot: pid=16308 Error - machine in FENCED stat

# ncwd_kill -Cd
# ncwd_kill -d
# sleep 60
# shutdown -r now

took 85 seconds to come up.
It went into single cluster IML mode, and had dual-boot processes.
After about 5 minutes, it rebooted again.
Timeout is 3600 seconds....
NOTE: all nodes have hostname "noname"

Tried to start cpss manually with /lic/ucode/bin/rc.cpss
[Tue Apr 16 12:53:50] root@noname:/lic/bin # more /home/shark/config/sm_mnta.cfg
rsPkgInstalled=TRUE
plantOfManufacturing=13
boxTypeModel=1750511
boxSerialNumber=#####
dateOfManufacturing=1772005
wwnn=50050763########
as400SerialNum=###
c0PartClass=R04F24F2
c0SerialNumber=YM10MY######
c1PartClass=R04F24F2
c1SerialNumber=YM10MY######
mfgMode=FALSE
microcodeECLevel=.565
osVRMFValues.VRMF_STRING=5.2.2.565
setdumpnum=1,4,4

root@noname:/lic/bin # rss_warmstart
WARMSTART !!
ERROR: Dev Open connect rc: -1, curSock: 3, errno: 111
Failed to open/connect to LCPSS socket - -1
getdebugreef bad ioctl return code - 6

root@noname:/lic/bin # /lic/ucode/bin/rc.cpss
Waiting for machine to become ready before starting lcpss . done
Loading lcpssddm.o Kernel module
Warning: loading /lib/modules/2.4.19-178/misc/lcpssddm.o will taint the kernel: forced load
See http://www.tux.org/lkml/#export-tainted for information about tainted modules
Module lcpssddm loaded, with warnings
Setting up swapspace version 1, size = 65532 KiB
Loading ublkdev.o kernel module
Warning: loading /lib/modules/2.4.19-178/misc/ublkdev.o will taint the kernel: forced load
See http://www.tux.org/lkml/#export-tainted for information about tainted modules
Module ublkdev loaded, with warnings

ls: cpsslocal.log.*: No such file or directory
ls: cpssremote.log.*: No such file or directory
Collecting Network Data - nciplnetcfg
log[main]: load_configuration returned 0
log[write_binary_configuration]: write returned 32
log[main]: write_binary_configuration returned 0
DEBUG: chksmcfg2 called, sm_mnta results:
-rw-r--r-- 1 root root 342 Apr 16 12:08 /home/shark/config/sm_mnta.cfg
404 root 2860 S /bin/bash /usr/sbin/lnxsysdaemon
lnxsysdaemon, already started
Waiting for IML to complete...
lcpss reported IML start failed
failcode = 53
failpoint = lnxioctl(OS_IOCTL_PLATFORM,
line = &ioctlBuffer) 898
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
rc.cpss failed in lcpssabort
with return code = 53
at phase marker = lnxioctl(OS_IOCTL_PLATFORM,
and line number = &ioctlBuffer) 898
Abort IML.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Broadcast message from root :

The system is going down for reboot NOW!

### errno 53 is connection aborted, or bad address (for connection)
### I don't see the slip connection here...

#
root@noname:~ # ncwd_kill -Ca
root@noname:~ # ncwd_kill -Sd
root@noname:~ # clStat
hangs
#
set hostnames on all units from guest, no change
#
root@noname:~ # /home/shark/microcode/lcpss
...
LCPSSDDM_IRQ_SIGNAL 46
lnxmemio.c: lnxSetupRegion: failed to open: No such device
lnxMemIoSetup(): lnxTWInit failed! Dying...
lnxcpssconfig: lnxMemIoSetup **FAILED** STOP IML
lnxmain: lnxcpssconfig **FAILED** EXITING IML rc=22

root@noname:~ # /home/shark/essni/bin/niStopServers
root@noname:~ # /home/shark/essni/bin/niStartServers
hangs, disconnects way later.

Audit log doesn't really show anything useful. Just connections and gui commands.

root@noname:~ # cmdmenu.pl
+------------------------------------------------------------------------+
| Text Based Menu v0.03 running on noname |
+------------------------------------------------------------------------+
| 1) Clear Message Router Files |
| 2) Check and Clear Failed Controller Flag (Window Files) |
| 3) Display and Reset Controller Reboot Count (imlretry) |
| 4) Display/Modify Controller Autoboot Flag (norsStart) |
| 5) Delete Nonvolatile Write Cache Data (CST) |
| 6) Rebuild Configuration Database (Clean PDM) |
| 7) Delete Config and Return to Factory Defaults (Clear&Pave part 1/2) |
| 8) Delete Config and Return to Factory Defaults (Clear&Pave part 2/2) |
| 9) Force CPSS Dump |
| 10) PE_Package (...) |
| 11) Statesaves (...) |
| 12) Arrowhead_Dumps (...) |
| 13) FTP a File (from current node) |
| 14) Exit |
+------------------------------------------------------------------------+
>>> Your choice? 10

+---------------------------------------------------+
| > Sub-menu of 'PE_Package' |
+---------------------------------------------------+
| 1) Generate pe_package |
| 2) Generate pe_package and ftp to remote machine |
| 3) Back to parent menu |
| 4) Exit |
+---------------------------------------------------+
>>> Your choice? 1
Begin Procedure: Generate pe_package
[timestamp] /lic/sm/bin/rss_mkpe -noftp on kona 0
[timestamp] rss_mkpe: Generate the PEPackage. Please wait few minutes ...
pe_generate log available in /persist/dc/pepackage/pe_generate.log
adding: dc/pepackage/pepackage.tar.gz (deflated 1%)
adding: dc/pepackage/header (stored 0%)

rss_mkpe: pepackge /dc/pepackage/.############.cl0.pe.zip created

[5 mins later] End Procedure: Generate pe_package


### cat smsanity.out

Sanity Checker v0.30 invoked on c0 (noname)
------------------------------------------ Kona 0 --------- Kona 1 ---------
Checking dates............................ same same
Checking free memory...................... Passed Passed
Verifying RW partitions................... Passed Passed
Verifying Kona replacement is enabled..... Passed Passed
Checking running processes................ FAILED! Passed
Checking disk space....................... Passed Passed
Checking SBR status....................... skipped skipped
Verifying four online DA partitions....... FAILED! Passed
Verifying certain files do not exist...... Passed Passed
Verifying that LCPSS is in Dual mode...... FAILED! FAILED!
Verifying no open hardware problems....... FAILED! FAILED!
Verifying no open software problems....... FAILED! FAILED!
Checking file permissions................. Passed Passed
Verifying no open cabling problems........ Passed Passed
Verifying no open data loss problems...... Passed Passed
Checking symbolic links................... Passed Passed
Checking number of IML retries............ Passed Passed
Verifying no CF R/W errors................ Passed Passed
Scanning ranks............................ FAILED! Passed
Checking serials in ncipl (strict)........ FAILED! FAILED!
Checking serials in ncipl (vote).......... skipped skipped
Checking PDM ISS consistency.............. skipped skipped
Checking PDM corruption................... Passed skipped
Checking Pulled out BANJO................. FAILED! FAILED!
----------------------------------------------------------------------------

(*) Detailed information about the failed checks:


(*) This situation is OK for the following scenarios:

- Node config status report
- Nonconcurrent code load (before quiesce)
- Concurrent code load (before quiesce)

### cat smsanity.err
processes: Kona 0: The following processes are not running: niStartServers
4-dapart: Kona 0: DA Partition /dapart/s0 is missing
DA Partition /dapart/s2 is missing

dual-lcpss: Kona 0: Unable to determine LCPSS status
dual-lcpss: Kona 1: Bad LCPSS status 'Single Cluster Operational'
hw-problems: Kona 0: Open problem of type 0 (hardware) found, id=date.time.pmr from 2 yrs ago
hw-problems: Kona 1: Open problem of type 0 (hardware) found, id=date.time.pmr from 2 yrs ago
sw-problems: Kona 0: Open problem of type 1 (software) found, id=date.time.pmr from 2 yrs ago
sw-problems: Kona 1: Open problem of type 1 (software) found, id=date.time.pmr from 2 yrs ago
scanrank: Kona 0: Failed to execute 'dacmd -x scanrank'
ncipl-strict: C0 serial number was found in only 2 files: /dapart/s1/ncipl.da,/dapart/s3/ncipl.da
Box serial number was found in only 2 files: /dapart/s1/ncipl.da,/dapart/s3/ncipl.da
C1 serial number was found in only 2 files: /dapart/s1/ncipl.da,/dapart/s3/ncipl.da
pulled_out_banjo: Kona 0: Pulled out BANJO detected
pulled_out_banjo: Kona 1: Pulled out BANJO detected


### cat sm.out
>: gSmpmSMgrNumOfSaels = 70
(PM) gSmpmSMgrNumOfCrus = 210
(PM) gSmpmSMgrSaelPoolThresholdPercent = 0.800000
ERROR: Dev Open connect rc: -1, curSock: 53, errno: 111
Failed to open/connect to LCPSS socket - -1
ERROR: Dev Open connect rc: -1, curSock: 53, errno: 111
Failed to open/connect to LCPSS socket - -1
.
.
>: gSmpmSMgrNumOfSaels = 70
(PM) gSmpmSMgrNumOfCrus = 210
(PM) gSmpmSMgrSaelPoolThresholdPercent = 0.800000
ERROR: Dev Open connect rc: -1, curSock: 53, errno: 111
Failed to open/connect to LCPSS socket - -1
ERROR: Dev Open connect rc: -1, curSock: 53, errno: 111
Failed to open/connect to LCPSS socket - -1
[LOG] DEBUG: [timestamp] in function smrrcSageMutexInit, tualQueue.c:55: Start smrrcSageMutexInit
[LOG] DEBUG: [timestamp] in function smrrcSageMutexInit, tualQueue.c:60: Finish smrrcSageMutexInit
(PM) : node state: 2, node Id: 0, sm mode: 2
(PM) : CNM thread created. ID: 6817
/lic/bin/ncupdate: [dbg] --updatedapart is set
/lic/bin/ncupdate: [log] /persist/etc/ncFailed2UpdateDapartOnRemoteKona Doesn't exist.
(SO)- smrrcSoCiPrepareLocationLocks: found enclosure - 13#####
(SO)- smrrcSoCiPrepareLocationLocks: found enclosure - 13#####
(SO)- smrrcSoCiPrepareLocationLocks: found enclosure - 13#####
(SO)- smrrcSoCiPrepareLocationLocks: found enclosure - 13#####
(SO)- smrrcSoCiPrepareLocationLocks: found enclosure - 13#####
(SO)- smrrcSoCiPrepareLocationLocks: found enclosure - 68#####
(SO)- smrrcSoCiPrepareLocationLocks: found enclosure - 13#####
(SO)- smrrcSoCiPrepareLocationLocks: found enclosure - 11#####
STARTING THE CCW LOGGER!!!!!!!!!!!!!!!!!!!!
Can't find any properties files using FileInputStream on client
Can't find essclientlogging.properties, attempting 800 server values.
Can't find essserverlogging.properties, attempting DS6000 server values.
Found DS6000 logging properties from fccwlogging.properties
Trace debugging is NOT enabled
CJL0001E No listeners are registered with the logger .
ERROR: Dev Open connect rc: -1, curSock: 42, errno: 111
Failed to open/connect to LCPSS socket - -1
(PM) : smpmSMgrGenerateSael was called (SRN = 0xBE850071. numOfCrus = 0)
(PM) : Sending ErrorInfo (SRN = 0xBE850071. numOfCrus = 0) to Leader
!!! smrrcPhyGetAllAdapters [136] Leave Failure!
!!!FILE smrrcOmlHarvestHelper.c LINE 1824
!!!Leaving function smrrcOmlHarvestGetIssAdapters with failure
(Harvest) : failed in getting Iss adapters
(OML): ObjectListener: processId = 7363
(OML): opening socket = 48
timestamp SM components started
timestamp Start Compress the SM dumps in /var/log/sm/,/var/log/essni/
ERROR: Dev Open connect rc: -1, curSock: 97, errno: 111
Failed to open/connect to LCPSS socket - -1
(PM) : SIM was offloaded for problem 0xbe8d0001 , simId: 0, deviceAddr: 0
(OML): Current time = timestamp
(OML): Time we want to start from = timestamp
(OML): Time of last error log entry = timestamp
(OML): lastEntryLogSegNumIn = 392
(OML): Start processing from event with seq num = 337
(OML): SN 337 won't be processed in replay mode
(OML): SN 338 won't be processed in replay mode
(OML): Ignoring event = L/F: Time: timestamp, SN: 343, ESC: 0xCEB6, Format: 0x0911, AH: 0x4442, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 344, ESC: 0xCEB6, Format: 0x0911, AH: 0x6017, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 345, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x1D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 346, ESC: 0xCEB6, Format: 0x0911, AH: 0x6017, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 347, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x1D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 348, ESC: 0xCEB6, Format: 0x0911, AH: 0x6017, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 349, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x1D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 350, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x5D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 351, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x5D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 352, ESC: 0xCEBC, Format: 0x0911, AH: 0x0501, KCQ: 0x0000
(OML): SN 353 won't be processed in replay mode
(OML): SN 355 won't be processed in replay mode
(OML): SN 357 won't be processed in replay mode
(OML): SN 359 won't be processed in replay mode
(OML): SN 361 won't be processed in replay mode
(OML): SN 363 won't be processed in replay mode
(OML): SN 365 won't be processed in replay mode
(OML): SN 367 won't be processed in replay mode
(OML): Ignoring event = L/F: Time: timestamp, SN: 343, ESC: 0xCEB6, Format: 0x0911, AH: 0x4442, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 344, ESC: 0xCEB6, Format: 0x0911, AH: 0x6017, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 345, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x1D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 346, ESC: 0xCEB6, Format: 0x0911, AH: 0x6017, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 347, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x1D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 348, ESC: 0xCEB6, Format: 0x0911, AH: 0x6017, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 349, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x1D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 350, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x5D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 351, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x5D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 352, ESC: 0xCEBC, Format: 0x0911, AH: 0x0501, KCQ: 0x0000
timestamp Finish Compress the SM dumps /var/log/SMLogs2.tar.gz created.
(OML): Ignoring event = L/F: Time: timestamp, SN: 343, ESC: 0xCEB6, Format: 0x0911, AH: 0x4442, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 344, ESC: 0xCEB6, Format: 0x0911, AH: 0x6017, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 345, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x1D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 346, ESC: 0xCEB6, Format: 0x0911, AH: 0x6017, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 347, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x1D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 348, ESC: 0xCEB6, Format: 0x0911, AH: 0x6017, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 349, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x1D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 350, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x5D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 351, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x5D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 352, ESC: 0xCEBC, Format: 0x0911, AH: 0x0501, KCQ: 0x0000
(OML): SN 391 won't be processed in replay mode
(OML): SN 392 won't be processed in replay mode
(OML): Can't open /persist/sm/cl/cl.log file
(OML): errno message = No such file or directory
ERROR: Dev Open connect rc: -1, curSock: 98, errno: 111
Failed to open/connect to LCPSS socket - -1
(PM) : SIM was offloaded for problem 0xbe876017 , simId: 4, deviceAddr: 0
ERROR: Dev Open connect rc: -1, curSock: 98, errno: 111
Failed to open/connect to LCPSS socket - -1
(PM) : SIM was offloaded for problem 0xbe876017 , simId: 5, deviceAddr: 0
ERROR: Dev Open connect rc: -1, curSock: 98, errno: 111
Failed to open/connect to LCPSS socket - -1
(PM) : SIM was offloaded for problem 0xbe876017 , simId: 7, deviceAddr: 0
(OML): L/F: Time: timestamp, SN: 393, ESC: 0x2403, Format: 0x09B8
(OML): eThresholdEventType = 2
(OML): Event can be processed, no need to be thresholded
(OML): Location code 0 =
(OML): Calling special function
(OML): Finished to process SN: 393
(OML): Ignoring event = L/F: Time: timestamp, SN: 343, ESC: 0xCEB6, Format: 0x0911, AH: 0x4442, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 344, ESC: 0xCEB6, Format: 0x0911, AH: 0x6017, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 345, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x1D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 346, ESC: 0xCEB6, Format: 0x0911, AH: 0x6017, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 347, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x1D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 348, ESC: 0xCEB6, Format: 0x0911, AH: 0x6017, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 349, ESC: 0xCEB0, Format: 0x0911, AH: 0x1000, KCQ: 0x1D00
(OML): Ignoring event = L/F: Time: timestamp, SN: 352, ESC: 0xCEBC, Format: 0x0911, AH: 0x0501, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 404, ESC: 0xCEBC, Format: 0x0911, AH: 0x040C, KCQ: 0x0000
(OML): sResourceLogicalNameOut = iss001
(OML): Ignoring event = L/F: Time: timestamp, SN: 405, ESC: 0x88A4, Format: 0x0938
(OML): sResourceLogicalNameOut = iss001
(OML): Ignoring event = L/F: Time: timestamp, SN: 405, ESC: 0x88A4, Format: 0x0938
(OML): sResourceLogicalNameOut = iss001
(OML): Ignoring event = L/F: Time: timestamp, SN: 405, ESC: 0x88A4, Format: 0x0938
(OML): sResourceLogicalNameOut = iss001
(OML): Ignoring event = L/F: Time: timestamp, SN: 405, ESC: 0x88A4, Format: 0x0938
(OML): sResourceLogicalNameOut = iss001
(OML): Ignoring event = L/F: Time: timestamp, SN: 405, ESC: 0x88A4, Format: 0x0938
(OML): sResourceLogicalNameOut = iss001
(OML): Ignoring event = L/F: Time: timestamp, SN: 405, ESC: 0x88A4, Format: 0x0938
(OML): sResourceLogicalNameOut = iss001
(OML): Ignoring event = L/F: Time: timestamp, SN: 405, ESC: 0x88A4, Format: 0x0938
(OML): sResourceLogicalNameOut = iss001
(OML): Ignoring event = L/F: Time: timestamp, SN: 419, ESC: 0x88A4, Format: 0x0938
(OML): Ignoring event = L/F: Time: timestamp, SN: 404, ESC: 0xCEBC, Format: 0x0911, AH: 0x040C, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 404, ESC: 0xCEBC, Format: 0x0911, AH: 0x040C, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 404, ESC: 0xCEBC, Format: 0x0911, AH: 0x040C, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 404, ESC: 0xCEBC, Format: 0x0911, AH: 0x0407, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 404, ESC: 0xCEBC, Format: 0x0911, AH: 0x040C, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 404, ESC: 0xCEBC, Format: 0x0911, AH: 0x040C, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 404, ESC: 0xCEBC, Format: 0x0911, AH: 0x040C, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 404, ESC: 0xCEBC, Format: 0x0911, AH: 0x040C, KCQ: 0x0000
(OML): Ignoring event = L/F: Time: timestamp, SN: 404, ESC: 0xCEBC, Format: 0x0911, AH: 0x040C, KCQ: 0x0000
(OML): sResourceLogicalNameOut = cpserver0
(OML): L/F: Time: timestamp, SN: 432, ESC: 0x3004, Format: 0x0942
(OML): eThresholdEventType = 2
(OML): Event can be processed, no need to be thresholded
(OML): Location code 0 =
(OML): Calling special function
(OML): Statesave file = cpssdump01
(OML): stateSaveNum = 1
(OML): smdcNotifyNewStatesave rc = 0
(OML): Finished to process SN: 432
(OML): strerror(errno) = Broken pipe
(OML): smrrcOmlSendMsg failed: last error log entry
(OML): closing socket = 48
(OML): shutdown() rc = 0
(OML): close() rc = 0
(OML): opening socket = 48
(OML): Try to reconnect
(OML): Didn't succeed to connect to RELM, trying to reconnect
(OML): errno = Connection refused
timestamp Shutdown hook envoked
timestamp Start SMMaster terminator tread.
timestamp SMMaster dumpSm started
timestamp SM components flushing started
(OML): Didn't succeed to connect to RELM, trying to reconnect
(OML): errno = Connection refused
(OML): Didn't succeed to connect to RELM, trying to reconnect
(OML): errno = Connection refused
(OML): Didn't succeed to connect to RELM, trying to reconnect
(OML): errno = Connection refused
(OML): Didn't succeed to connect to RELM, trying to reconnect
(OML): errno = Connection refused
(OML): Didn't succeed to connect to RELM, trying to reconnect
(OML): errno = Connection refused
(OML): Didn't succeed to connect to RELM, trying to reconnect
(OML): errno = Connection refused
(OML): Didn't succeed to connect to RELM, trying to reconnect
(OML): errno = Connection refused
timestamp Before flush(): Component SBR
timestamp Before flush(): Component PDM
timestamp Before flush(): Component smrrcPhysical
timestamp Before flush(): Component OmlHarvest
timestamp Before flush(): Component SMRRC_LOGICAL
timestamp Before flush(): Component SAGE
timestamp Before flush(): Component PM
timestamp Before flush(): Component SO
timestamp SM components flushing finished
timestamp SM components dumping started
timestamp Before dump(): Component CrossNodeMR
timestamp Before dump(): Component ESSNI
timestamp Before dump(): Component HEARTBEAT
timestamp Before dump(): Component FASTCCW
timestamp Before dump(): Component SMWatchdogWhispererComponent
timestamp Before dump(): Component SBR
ERROR: Dev Open connect rc: -1, curSock: 110, errno: 111
Failed to open/connect to LCPSS socket - -1
timestamp Before dump(): Component PDM
(OML): Didn't succeed to connect to RELM, trying to reconnect
(OML): errno = Connection refused
timestamp Before dump(): Component SMPEPackageGenerationComponent
timestamp Before dump(): Component smrrcPhysical
timestamp Before dump(): Component OmlHarvest
timestamp Before dump(): Component SMRRC_LOGICAL
timestamp Before dump(): Component SAGE
timestamp Before dump(): Component PM
timestamp Before dump(): Component SO
timestamp SM dump finished
timestamp SMMaster dumpSm ended
timestamp SMMaster shutdown started
timestamp Before stop(): Component CrossNodeMR
timestamp Before stop(): Component ESSNI
timestamp Before stop(): Component HEARTBEAT
timestamp Before stop(): Component FASTCCW
timestamp Before stop(): Component SMWatchdogWhispererComponent
timestamp Before stop(): Component SBR
timestamp Before stop(): Component PDM
timestamp Before stop(): Component SMPEPackageGenerationComponent
timestamp Before stop(): Component smrrcPhysical



####################################
While waiting, walking through errors in the last 4 days
BE860104 The storage unit is unable to communicate with the SMTP server provided for thr Call Home functionality. Call Home notifications cannot be sent.
BE8FFFFF A heartbeat record was generated. Heartbeat records are generated periodically, indicating that the storage unit is healthy enough to communicate with the outside world. If the Call Home option is enabled, your SMTP server is configured properly and the IP network is working, then a Call Home record is sent to IBM.
BE8D0001 Due to a software problem, the system management process on the storage unit is semi operational. Configuration commands can fail. Collect the PE package and contact IBM support for further assistance. Navigate to the Contact IBM page or call the next level of support.
BE862002 I/O is running in Single Cluster mode. The storage unit is now using only the other processor card, and might lose data on a single additional component failure. If multiple paths from the host to storage have not been provided, there can be loss of access to data. The problem can result from a hardware problem. Verify that there is an open problem that is associated with one of the processor cards, one of the battery backup units or with a firmware update. If there is no open problem, collect the PE package and contact IBM support for further assistance. Navigate to the Contact IBM page or call the next level of support.
BE862006 The storage unit has encountered an internal error but is functioning properly. However, the error condition prevents future firmware update tasks from initiating. Collect the PE package and contact IBM support for further assistance. Navigate to the Contact IBM page or call the next level of support.

IBM CE is enroute with a replacement processor card.

Profile

eserver: (Default)
IBM POWER servers

June 2017

S M T W T F S
    123
45678 910
11121314151617
18192021222324
252627282930 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 21st, 2017 10:40 am
Powered by Dreamwidth Studios