xaminmo: Josh 2016 (Default)
[personal profile] xaminmo
This changes periodically, but for today, here is what I would do.

My PowerHA selection process would be:
* 7.1.3 SP06 if I needed to deploy quickly, because I have build docs for that.
* 7.1.4 doesn't exist, but if it came out before deployment, I would consider it. Whichever was a newer
Read more... )
xaminmo: Josh 2016 (Default)
[personal profile] xaminmo
Plan is to import this from LJ, but LJ is refusing my login for community import.
My guess is that they throttled me after my personal journal import. Will try again later.
xaminmo: (Josh 2014)
[personal profile] xaminmo
I always forget instfix and oslevel -rl....
tags: aix oslevel incorrect backlevel wrong upgrade update

When these things show nothing:
lppchk -v
oslevel -sl `oslevel -sq 2>/dev/null | head -1`

and yout bos.rte.install, and bos.mp64, show the correct level compared to:

You should see the correct level here as well:
oslevel -sq | head

Check these other two things.
oslevel -r -l `oslevel -rq 2>/dev/null | sed -n '1p'`
instfix -icqk 6100-09-06-1543 | grep ":-:"
xaminmo: (Josh 2014)
[personal profile] xaminmo
In a couple of instances, I've found bos.rte.* filesets broken during upgrade, perhaps with the root part missing.

It's always a pain, and I always forget how to fix it.

The problem is that the AIX base media does not include base install images for these. They are S (single) updates instead of I (install) images. This is because, during install, a bff called "bos" is laid down first, and that includes 10-20 core filesets, /usr, /, and all the core stuff. It's basically a prototype mksysb. Sort of.

Anyway, in rare instances, when there is a known defect, IBM will release a fileset as a patch through support/ztrans to get you fixed. If you don't have time to wait, or if you are a biz partner, working with a customer who hasn't yet approved you using their support, then you might have to fix it yourself.

list of errors )
The solution was ODM surgery.

First, I took a mksysb and copied it to somewhere safe (another server with NIM installed).

Then, I looked into ODM, and found /etc/objrepos/product was missing the entry for this version.
You might be able to copy from /usr/lib/objrepos, but I copied from a valid clone of this system.

# export ODMDIR=/etc/objrepos
# ssh goodserver odmget -q lpp_name=bos.rte.security product | odmadd

Then, I needed to add the history line, which was identical between root and usr:
# odmget -q name=bos.rte.security lpp     (note the lpp_id)
# ODMDIR=/usr/lib/objrepos odmget -q lpp_name=39 history | ODMDIR=/etc/objrepos odmadd

The "inventory" ODM is accessed with lpp_name also, but that had a long list of files already. I did not mess with any of that.

Now, install_all_updates from my TLSP worked fine.
xaminmo: (Josh 2014)
[personal profile] xaminmo
Because there is NOTHING on the web about this.
PRODUCT: Recover Now / Double Take / MIMIX / EchoStream

Vision Solutions bought Double-Take. Double-Take wrote Recover Now, which is called MIMIX on AS-400. The replication tools underneath are called "EchoStream".

NOTE: Documentation is hard to find, but here is a shortened URL form of the Windows docs: http://omnitech.net/u/rn35docs

Most functions can be managed from the web GUI:
Obviously, put your correct IP here if you are not on the same host.

Install Licenses

Stop the license manager on PRIMARY:
stopsrc -cs scrt_lca-1

Stop the license manager on BACKUP:
stopsrc -cs scrt_aba-1

Copy the new license files:
scp -rp NIMSERVER:/export/Vision/license.perm/*_`hostname`_ES_node_license.properties /usr/scrt/run/node_license.properties

Start the license manager on PRIMARY:
startsrc -s scrt_lca-1

Start the license manager on BACKUP:
startsrc -s scrt_aba-1

Define initial contexts

/usr/scrt/bin/rtdr -C PRIMARYID (usually 1) -F BACKUPID (usually 1010) setup

Query RN Contexts

Context 1 is Primary. DR shows as BACKUP to this, and prod shows PRODUCTION for this.
Context 101 is Recovery. DR shows PRIMARY for this, and prod shows BACKUP for this.


# /usr/scrt/bin/sccfgd_getctxs

Uninstall EchoStream

/usr/scrt/bin/scsetup -R -C1
/usr/scrt/bin/sclist -DD -C1
odmdelete -o SCCuAt
odmdelete -o SCCuObj
odmdelete -o SCCuRel


### EchoStream start
/usr/scrt/bin/rtstart -C1

### RN Check to see if kernel module is loaded
/usr/scrt/bin/scconfig -sC1

### RN Check if services are online
lssrc -a | grep scrt
scrt_lca-1 sender
scrt_aba-101 is receiver

### Protected filesystem mount
NOTE: This is usually handled by rtstart.
/usr/scrt/bin/rtmnt -C1

### Protected filesystem umount
NOTE: This is usually handled by rtstop.
/usr/scrt/bin/rtumnt -C1

### EchoStream sync, stop, and unload service
/usr/scrt/bin/rtstop -SC1

### EchoStream stop & unload service
/usr/scrt/bin/rtstop -C1

### EchoStream stop & unload kernel extension
/usr/scrt/bin/rtstop -FC1

### Check dirty blocks in state map
This will show how many blocks need to be sync'd for the recovery group:
/usr/scrt/bin/scconfig -PC1

### RN List buffer utilization
NOTE: When the local buffer overflows, just reverts to state-map tracking withour point-in-time recovery.
/usr/scrt/bin/esmon 1

### Shutdown all contexts
NOTE: This can be added to /etc/rc.shutdown, or in cluster start/stop scripts.


Much is missing here. This is what I could find on the internet.
You can also do this from the WebUI.

### Fail back to Primary Server
/usr/scrt/bin/rtdr -qC 1 failback

### Failover to Recovery Server
/usr/scrt/bin/rtdr -qC 101 failback

### Make clone of filesystem
/usr/scrt/bin/scrt_ra -C1 -X

### Release clone of filesystem
/usr/scrt/bin/scrt_ra -C1 -W -L /dev/dbfs01lv


### RN Primary Manual start
In troubleshooting and testing, these commands can start Recover Now manually:
varyonvg rnvspvg
/usr/bin/startsrc -s scrt_scconfigd
/usr/scrt/bin/rtstart -C1

### Start without mount and fsck
/usr/scrt/bin/rtstart -C1 -M

### RN Primary Manual stop
In troubleshooting and testing, these commands will stop Recover Now manually:

# Unmount the protected filesystems
/usr/scrt/bin/rtumnt -DC1 | tee -a $log

# Kill processes if the filesystem is still mounted.
for i in `/usr/scrt/bin/sclist -C1 -f` ; do
mount | grep $i
if [[ $? -eq 0 ]]; then
fuser -kxuc $i

# Try rtumnt again due to some timing issues observed.
sleep 3
/usr/scrt/bin/rtumnt -DC1

# Sync outstanding lfc's to DR server
/usr/scrt/bin/scconfig -SC1

# Stop RecoverNow
/usr/scrt/bin/rtstop -FkC1

Recover Now Reset State Map

This will cause the entire recovery group to be resync'd as if new, clearing any rollback points.

First, manually stop all resources first, as listed above, then bring the context online:
varyonvg rnvspvg
/usr/scrt/bin/scconfig -MC1

### RTDR Resync
# Remote of prod from DR
/usr/scrt/bin/sccfgd_cmd -H PRODNODE -T "1 resync"

# Local on DR
/usr/scrt/bin/rtdr -qC101 resync

### Mount the filesystems on Primary
/usr/scrt/bin/rtmnt -C1

### Mount the filesystems on Recovery
/usr/scrt/bin/rtmnt -C101

### Unmount filesystems
/usr/scrt/bin/rtumnt -C1 # or -C 101

Recover Now Release Stuck Config

For errors such as:
scsmutil: log anchor cksum mismatch
ERROR: Failed to load EchoStream Production Server Drivers
ERROR: Drivers not loaded... Will not mount into an unprotected state

Clear the error:
/usr/scrt/bin/scsetup -MC1
/usr/scrt/bin/scconfig -uC1

Then you can use rtstart as normal.


### Start Recover Now
varyonvg rnvspvg
/usr/bin/startsrc -s scrt_scconfigd
/usr/scrt/bin/rtstart -C1
Context not properly defined on this system

# /usr/scrt/bin/sccfgd_getctxs
HOSTID (new hostid)
IPADDRESS (multiple lines)
No context for production or backup listed

#/usr/scrt/bin/rtdr -C 1 -F 101 setup
/usr/scrt/bin/rtdr[14]: test: argument expected
rtdr: Configuration error -
rtdr: Primary Context ID <1> is not enabled.
rtdr: The Primary Context ID <1> must be enabled
rtdr: when creating a Failover Context ID.

### Shutdown the context
# /usr/scrt/bin/scsetup -MC1
scsetup: AET_TMO_NOVOTE: Setup failed.
scsetup: Detail: On wrong host.

# /usr/scrt/bin/scconfig -uC1
scconfig: AET_TMO_NOVOTE: Unexpected error
scconfig: Detail: On wrong host.

# cat /usr/scrt/run/node_license.properties
## begin signed data
vision.license.expirydatemig=YYYY-MM-DD HH\:MM\:SS

### Vision support is via:
RecoverNow/GeoCluster AIX, Replicate1 24x7 CustomerCare Technical Support:
U.S. and Canada: (800) 337-8214
International: +1 (949) 724-5465
CustomerCare Support Email: support@visionsolutions.com

After hours will just page out, but not make a ticket.
Email will have a ticket created within a few minutes.

### Test startup
# /usr/scrt/bin/scsetup
scsetup: AET_TMO_NOVOTE: Setup failed.
scsetup: Detail: On wrong host.

### Set path properly
cat <<'EOF' >> /etc/environment
export PATH=/usr/scrt/bin:$PATH

### Collect reference info from "production" node and "backup" node.
/usr/scrt/bin/scconfig -v
/usr/scrt/bin/scconfig -q
/usr/scrt/bin/rtattr -C1 -a HostId
/usr/scrt/bin/rtattr -C101 -a HostId

### Update hostid for changed production node
/usr/scrt/bin/rtattr -C1 -a HostId -o production -v $HOSTID
/usr/scrt/bin/rtattr -C101 -a HostId -o backup -v $HOSTID
ssh BACKUPNODE /usr/scrt/bin/rtattr -C1 -a HostId -o production -v $HOSTID

### Re-collect all of the same reference data as above.

### Reconfigure the repository
scconfig -sC1
ssh BACKUPNODE /usr/scrt/bin/rtdr -C1 -F101 setup
/usr/scrt/bin/rtdr -C1 -F101 setup

### Restart everything
/usr/scrt/bin/rtstart -C1 && startsrc -s scrt_scconfigd
until df -k /databasedir 2>/dev/null >/dev/null ; do date ; sleep 10 ; done
/opt/visionsolutions/http/vsisvr/httpsvr/bin/strvsisvr 2>/dev/null
xaminmo: (Josh 2004 Happy)
[personal profile] xaminmo
This is from AIXMIND, on March 26, 2010 2:46 pm
It doesn't show up high enough in search queries, so I'm duplicating it here.
Note that AIX recreates most of /dev on boot, but we need a certain amount.

Problem(Abstract): The /dev directory was accidentally deleted.
Symptom: System wont boot
Environment: AIX 5.3 (and others)
Resolving the problem

Boot system into maintenance mode.
Access a root volume group before mounting filesystems

mount /dev/hd4 /mnt
mount /dev/hd2 /mnt/usr
mknod /mnt/dev/hd1 b 10 8
mknod /mnt/dev/hd2 b 10 5
mknod /mnt/dev/hd3 b 10 7
mknod /mnt/dev/hd4 b 10 4
mknod /mnt/dev/hd5 b 10 1
mknod /mnt/dev/hd6 b 10 2
mknod /mnt/dev/hd8 b 10 3
mknod /mnt/dev/hd9var b 10 6
umount /mnt/usr
umount /mnt
shutdown -Fr

source: http://www.aixmind.com/?p=728
Ref: http://wp.me/p3ecOp-zh
xaminmo: (Josh 2004 Happy)
[personal profile] xaminmo
yAy. More TSM issues. There's defect in TSM client 7.1, but IBM says it only exists in 6.3 and 6.4. There's a patch level about 2 weeks old (, but there are no lists of what's fixed in this patch.

I installed this, plus of the VMWare agent, and the node that runs the GUI failed to update. The installer won't uninstall, reinstall, or repair. Windows uninstall just runs their installer.

IBM says that I should not call them, but I should use their simple website to submit a problem report. To report the defect, I have to use my customer's ID. I can't do this as a business partner unless I have my own support contract, beyond the money we pay to have access to support and software on a yearly basis.

To have access to a customer's ID, I have to wait for approval, of course. Beyond that, it shows up in a list that simply says "United States". So if I have, say 10 customers, I have no idea which number is for which customer.

Also, when selecting the product I want, there is no tree. It's a JAVA APPLET which has a list of products. I can search, but the naming is not consistent. Some say "Tivoli Storage Manager" and some say "TSM". Even for different versions of the same product, this naming difference occurs.

When I find it, it says that there will be a delay if I chose this product. Am I sure I want to chose this product? WTF?

There are no places to report any of these errors through the support organization, and no links on the pages to report them either. I have to report them to a general form 8 links away that may or may not be able to help.

Ginny Rometty is so focused on cutting cost to boost stock prices so her stock options at company exit have value, that she's downright gutting the infrastructure required for things like quality assurance and customer usability. Yes, everything is being updated for usability, but if it has worse functionality, or breaks entirely, then it's not REALLY a usability update.

Anyway, after supper, I'm going to call on the phone and listen to all of the messages telling me how easy it is to open a support request online, and that I should hang up and visit the web instead of wasting their dollars to fill out a new PMR that takes them months of training to be able to almost figure out. Then, I'll wait for an email because they don't ever call back anymore, and haven't been live-call-in for years.

The email will ask me to uninstall the software and try again. I won't be able to preemptively tell them anything in advance because I won't have online access to the PMR because I'm waiting for approval and then I have to remember which customer number to look under.
xaminmo: (Josh 2004 Happy)
[personal profile] xaminmo
Description: All print queues are hung.

Solution: Kill off all hung print IO processes
This should resolve anything other than a printer offline.
If you cannot ping the printer, then that has to be fixed first.

stopsrc -g spool
sleep 60
ps auxww | grep pio

kill everything listed
Use kill -9 if needed.

cd /var/spool/lpd/qdir
ls -alt
remove any bad or old jobs listed that are not needed (usually anything over a few hours or days)

cd /var/spool/lpd/stat
ls -alt
remove any stale status files listed (or all of them and they will regenreate.

ls /etc/q*
verify qconfig.bin is equal or newer in date from qconfig. If not, remove qconfig.bin

Restart the print subsystem
startsrc -g spooler

All should be well.
Try using enq or lpr to print a job.
lpstat -p printername to list jobs.
lpstat with no flags will list all print queues, but will hang on unpingable printers.
xaminmo: (Josh 2004 Happy)
[personal profile] xaminmo
More and more are moving to access through fixcentral, but for now, HMC recovery media is at:


That looks to be the same as service.software and ftp.boulder.
xaminmo: (Josh 2004 Happy)
[personal profile] xaminmo
This is to prime search engines with something I couldn't find last time I looked.

The new TSM 7.1 operation center, also TSM, also called TSM Operations Center, or internally even TSM Control Center...

The default URL is https://xxx.xxx.xxx.xxx:11090/oc/

The port wasn't in the docs, and the path wasn't in some of the docs. None of it came up Googling.

So frustrating. I had a screen scrape of my install, but somehow missed this, or maybe it's only under advanced?

I couldn't find it in other docs, but Will was able to help me out.
xaminmo: (Josh 2004 Happy)
[personal profile] xaminmo
In the past, I set up TSM.PWD as root, but this seems to not be what I needed.

I'm posting because the error messages and IBM docs don't cover this.

tsmdbmgr.log shows:
ANS2119I An invalid replication server address return code rc value = 2 was received from the server.

TSM Activity log shows:
ANR2983E Database backup terminated due to environment or setup issue related to DSMI_DIR - DB2 sqlcode -2033 sqlerrmc 168. (SESSION: 1, PROCESS: 9)

db2diag.log shows:

2014-02-26- E415619A371 LEVEL: Error
PID : 15138852 TID : 1 PROC : db2vend
INSTANCE: tsminst1 NODE : 000
HOSTNAME: tsmserver
FUNCTION: DB2 UDB, database utilities, sqluvint, probe:321
DATA #1 : TSM RC, PD_DB2_TYPE_TSM_RC, 4 bytes
TSM RC=0x000000A8=168 -- see TSM API Reference for meaning.

EDUID : 38753 EDUNAME: db2med.35926.0 (TSMDB1) 0
FUNCTION: DB2 UDB, database utilities, sqluMapVend2MediaRCWithLog, probe:656
DATA #1 : String, 134 bytes
Vendor error: rc = 11 returned from function sqluvint.
Return_code structure from vendor library /tsm/tsminst1/sqllib/adsm/libtsm.a:

DATA #2 : Hexdump, 48 bytes
0x0A00030462F0C4D0 : 0000 00A8 3332 3120 3136 3800 0000 0000 ....321 168.....
0x0A00030462F0C4E0 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0A00030462F0C4F0 : 0000 0000 0000 0000 0000 0000 0000 0000 ................

EDUID : 38753 EDUNAME: db2med.35926.0 (TSMDB1) 0
FUNCTION: DB2 UDB, database utilities, sqluMapVend2MediaRCWithLog, probe:696
MESSAGE : Error in vendor support code at line: 321 rc: 168

RC 168 per dsmrc.h means:
#define DSM_RC_NO_PASS_FILE 168 /* password file needed and user is
not root */

Verified everything required for this:
• passworddir points to the right directory
• DSMI_DIR points to the right directory
• dsmtca runs okay
• dsmapipw runs okay

Verified hostname info was correct

dsmffdc.log shows:
[ FFDC_GENERAL_SERVER_ERROR ]: (rdbdb.c:4200) GetOtherLogsUsageInfo failed, rc=2813, archLogDir = /tsm/arch.

Checked, and the log directory inside dsmserv.opt was typoed as /tsm/arch instead of /tsm/arc as was used to create the instance and as exists on the filesystems.

Updated dsmserv.opt and restarted tsm server. No change other than fixing Q LOG

The TSM.PWD file must be owned by the instance user, not by root.
Make sure to run the dsmapipw as the instance user, or chown the file after.

Simple, and fairly obvious, but maybe not always so obvious.
xaminmo: (Josh 2004 Happy)
[personal profile] xaminmo
Sometimes IBM Director seems to lose the navitation pane, tab bar, etc so I only see the current "window". There is no way "back". There is no logout option, because it's part of the navigation pane. If I just go to logout.do, then it says "Cross Site Forgery", aka "SCREW OFF! WE NEVER MAKE MISTAKES!"

I had to log in with a different browser to see what the logout link is, then go back to the broken one, and find a link with the SS variable. This is the session key. Then, replace the front part with /ibm/console/logout.do. *sigh*
xaminmo: (Josh 2004 Happy)
[personal profile] xaminmo
I ran into an issue that might be procedural, but I though you guys might want to know anyway.

We are pursuing with IBM HW support as of 2013-07-18.
I am going to test further in my lab. I suspect this may be related to
bkprofata and rstprofdata copying over some internal seed for MAC addresses

I plan to try this in my lab on p5 rackmount servers via an HMC.
If I can reproduce it there, then I expect support's response to be "don't do that".
As such, I also will try a factory reset to see if that will clear the condition.

If I cannot reproduce it there, then it's either SDMC/FSM related (which is going away),
or it's blade/Flex Node related (No other test resources, but maybe L3 can help).

If L3 decides that rstprofdata cannot be used on a different system,
then I would want them to A) Limit the command to that functionality,
and B) Update documentation for both commands to reflect this.

bkprofdata & rstprofdata were used to clone the LPAR layout from one blade to another.
To reset the WWNs, I was able to delete and re-add the virtual fibre adapters.
New LPARs and new virtual fibre adapters automatically get WWNs with the blade/node number as part of the WWN.
This part works as I would expect.

To reset the MAC addresses, this did not work.
Delete and re-add virtual ethernet adapters does not change the MAC addresses.
Adding a new adapter that did not exist before to the same slot number,
on the same LPAR ID, on two different Flex nodes, and both get the same MAC accress.

Current resolution is to override the MAC address with a user specified value in the LPAR profile.
This can be done from Profile -> Virtual -> Ethernet -> Advanced -> checkbox

Change from commandline:
chsyscfg -m Server-7895-23X-SN1012345 -r prof -i \

To remove and Readd:
chsyscfg -m Server-7895-23X-SN1012345 -r prof -i \
chsyscfg -m Server-7895-23X-SN1012345 -r prof -i \

I've never seen this happen on any other POWER series servers, and I've built a lot of p7 systems, ranging from p710 to p780, including matching LPARs between CECs. This is on top of the whole slew of LPARable systems I've built and/or supported.

I looked into the profile data backup files themselves, and there is no mention of system serial, system name, WWN prefix, or MAC prefix.

I restored mode 3 of the profile data backups prior to any config work, and when adding new virtual NICs to LPARs, the MAC addresses still mirror eachother.

I plan to test this with two p505 systems on an HMC to see if similar issues occur.

I don't have the resources to test this on blades, or on another SDMC.

We are pursuing with IBM HW support as of 2013-07-18

### END NOTICE ###

After a week, still no no response from support,
but I think I found out why this was a problem.

On physical hardware, "lssyscfg -r lpar" will show virtual_eth_mac_base_value=
On the flex nodes, this value is not exposed.

I can't tell if this is an SDMC/FSM limitation, or a flex node limitation.
I know that IVM sees it, but am not sure about HMC.

So, when LPAR profiles are copied over, they will bring the VEMBV,
and there is no way to change it short of deleting and re-creating.

All in all, it may just be easier to use mksyscfg from the start.
An example might be:

mksyscfg -r lpar -m Server-8205-E6D-SN10FFFFF -i profile_name=DefaultProfile,\

But there's already reference online for this sort of command.

Also, while working on a p740 via IVM, I ran into more differences from HMC/SDMC.
When you add a client LPAR with virtual SCSI, IVM automagically creates the VIO server virtual scsi server adapter. In addition, +1 from that slot it creates a virtual serial adapter for mkvterm.

If you're used to adding virtual scsi adapters in order, and you don't skip a slot on the mksyscfg lines, then you'll get this error:
[VIOSE01050173-0290] Cannot create virtual serial adapter in the management partition in the virtual slot number specified 20.

I couldn't find this error anywhere else on the internet, and it was a little confusing since I wasn't making a virtual serial adapter.
xaminmo: (Josh 2004 Happy)
[personal profile] xaminmo
Not just for GLVM, this can be good if you have a bunch of LV backed devices from 2 different VIO servers, all in the same volume group.

xaminmo: (Computer Drive)
[personal profile] xaminmo
Info from digging into DS6800 (baby shark) which I didn't find online.
It's pretty sparse for the actual raw troubleshooting.
Basically though, it looks like there are recurring I/O channel failures for the primary processor card.
This has been replaced before, so it's probably a chassis issue.
There were power failures, so maybe it's a poor power regulation issue.
The PSU should take the hit and not kill the proc cards.

Since the site PDU was replaced, hopefully those problems will be gone.

Anyway, a CE is coming out to replace this processor card.
Read more... )


Dec. 4th, 2012 12:45 pm
xaminmo: (Josh 2004 Happy)
[personal profile] xaminmo
So, it came out 2 months ago, but here's the summary:
* D model numbers (9117-MMD, 9719-MHD, etc)
* Up to 128 cores in a p780+ (vs 96 in MMC)
* Double the RAM capacity (4 TB)
* CoD CPU and RAM can be in a pool shared by multiple systems.

* AIX 7.1 TL2, 71TL01SP06, 71TL00SP08, 61TL08, 61TL07SP06, 61TL06SP10
* AIX 5.3 TL12 SP07 (Expected but not released yet. Only for extended support)
* VIO 2220, VIO2215 (Dec19)
* HMC 7.7.6 (CR3 or later, and 3GB of RAM if over 256 LPARs total)
* i6.1 only supported through VIO or i7.1 client.

New 1.8" SSD enclosure:
* UltraSSD: New 1U drawer with 30SSDs (GX++ PCIe Cable and two SAS RAID controllers)
* UltraSSD 1U drawer has four 4xSAS ports for running two EXP24S 2U drawers.
* UltraSSD controllers will support EASY TIER for AIX and VIO in 2013.
* UltraSSD will be added to DS8k line in 2013.

New Disk-as-Tape device:
* "RDX" Removable Disk - looks like tape, but it hot-swap disk to replace pre-LTO tech.
* RDX SATA supported on iSeries as optical
* RDX USB supported on AIX and VIO as well.

New I/O components:
* IBM Rackswitch options, with 1GB, 10GB and 40GB ethernet ports.
* PCIe2 dual-port Remote DMA over Ethernet (vs Infiniband for low latency MPI)
* GX++ Dual Port 10gbit FCoE or 10gbit FC Adapters for p770/p780 (no Linux. iSeries through VIO)
* GX++ Dual Port 16GBFC or 10GBFCoE Adapters for p795 (no Linux or iSeries)

Hardware Enhancements:
* 4 sockets per CPU card (vs 2-sockets)
* Supports 64GB DIMMs (vs 32GB)
* Lower heat/power consumption with 32nm vs 45nm
* Better performance per core with 10MB L3 Cache vs 4MB and up to 4.4GHz
* Active Memory Expansion performance improved with on-chip Compression Accelerator
* Crypto accelerator for AES, SHA and RSA
* Random number generator on die
* Four floating point pipelines vs 2 (single precision takes 1, DP takes 2)
* Higher concurrency during firmware updates (Can reset one core at a time)
* Higher uptime with redundant lanes in cache and in CEC interconnect cables
* CPU upgrades for MMA, MMB and MMC will include new CEC enclosures.
* Free CoD: Includes 240GB memory days and 15 processor days per CPU initially shipped
* Free CoD: Includes 90 days of full activation (one shot)
* http://www-03.ibm.com/partnerworld/partnerinfo/src/atsmastr.nsf/WebIndex/TD105846

FLEX Hardware:
* p260 and p460 dual-port FCoE Mezzanine to support dual VIO
* New FCoE 8-port switch module to support new FCoE mezzanine cards
* New FC switch module
* New v7000 module
* New USB-3 storage drawer (1x RDX, 2X DVD-RAM)

Hardware Withdrawals:
* No PCI-X, HSL, RIO-1, or IOP support in POWER8
* 3.5" SAS drawers to be withdrawn in 2013.
* SCSI DISK SUPPORT IS DROPPED!!! SCSI Tape still okay on PCI-X #5736 in I/O drawer
xaminmo: (Logo IBM CATE)
[personal profile] xaminmo
I always run into issues when I work in a multiple VLAN environment, because it's not *that* common for my builds. This is a reminder for me.

The magic is when using multiple VLANs:
1) Don't use the real VLAN ID for the trunk PVID unless you know for certain that was set on the switch. It is stripped off of all packets, and who knows what the PVID of the switch is, if any.
2) Any mismatch between PVID on the SEA and the trunk will cause packets to be dropped.
3) Don't use IEEE VLAN mode for the client adapter unless you're going to add VLAN interfaces from AIX. When not in VLAN mode, the PVID is ADDED to all packets on client adapters.
4) When using multiple trunks on one SEA, they all have to be the same trunk priority. ha_mode=sharing balances not using trunk priority, but based on the order of the virt_adapters field.
xaminmo: Josh 2016 (Default)
[personal profile] xaminmo
This is from a decade ago, so I thought it time to update the URLs and post it to LJ.

Here is information on how to decode SCSI Sense Data. This revolved around IBM Magstar products since that is where I was first exposed to the guts of SCSI errors.

The AIX Error Report records for TAPE_ERR# (usually 1-6) often include SENSE DATA in the Detail section. A SCSI LOG PAGE 06h can be parsed manually to provide the SENSE KEY, ASC and ASCQ values, as well as the ERROR CODE which will tell us if it is current or past errors being reported. An example Log Page 6 is below:
	0600 0000 0300 0000 FF80 0000 0000 0000 0000 0000 7000 0000 0000 0015 0000 000B 
	0000 0000 001C 7F00 2000 0033 7E58 0000 0000 0000 0000 0000 0000 0000 0000 0000 
	0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
	0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
	0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 B041 0000 0000 

If you'll notice, byte 0 is 06. Also note that there are 32 bytes per line, and two hex digits per byte.

Byte 20 represents the SCSI error class. Valid classes are:
    * 70 - Current Error (Direct Access Logical Block NOT From Sense Data).
    * F0 - Current Error (Direct Access Logical Block IS From Sense Data)
    * 71 - Deferred Error (Direct Access Logical Block NOT From Sense Data).
    * 7F - Vendor Spec. Error (Direct Access Logical Block NOT From Sense Data).
    * EE - Encryption Error
    * F1 - Deferred Error (Direct Access Logical Block IS From Sense Data).
    * FF - Vendor Spec. Error (Direct Access Logical Block IS From Sense Data).

In this example, EC (byte 20) is 70, which is valid and means this is a current error.

When the error class is valid, we can get the sense key from byte 22.

In this example, the sense key is 00 (zero) which means "NO ADDITIONAL SENSE". The standard list of sense keys is:
	X0 - No Sense         X6 - Unit Attention    XC - Equal.
	X1 - Recovered Error  X7 - Data Protect      XD - Volume Overflow.
	X2 - Not Ready        X8 - Blank Check       XE - Miscompare.
	X3 - Medium Error     X9 - Vendor Specific   XF - RESERVED.
	X4 - Hardware Error   XA - Copy Aborted
	X5 - Illegal Request  XB - Aborted Command

ASC is at byte 32 (first byte on line 2) and ASCQ is byte 33.

The ASC and ASCQ chart is pretty extensive. Please see the ASC/ASCQ Code Listing from the SCSI Technical Committee for an authoritative reference:

Note also that sometimes the ASC/ASCQ pair you're looking up may fall under a different sense key than is expected. The Sense key gives general information, such as "Recovered error", "hardware error", or the like. The ASC/ASCQ pair tells what the actual problem is. This isn't always 100% helpful, but is close.

Good reference was had from the 3590 Maintenance Information Guide, Msgs section. This gives 90% of what anyone would need to decode SCSI LOG PAGE 06h messages for IBM tape drives. The Jaguar Tape Drives (IBM 3590 & 3592) Information Center is at:

Included within are how to decode SIM/MIM Records, Log Page 6, and other related information. The 3590 Hardware Reference Guide, Appendix B also shows decent information in regards to non SIM/MIM errors. It makes reference to sense key and ASC/ASCQ bytes. You can acquire PDF copies of tape removable media storage systems' manuals via the following URLS:

The Magstar Maintenance and Ultrium SCSI Reference books makes reference to "Fault Symptom Codes" which are more definitive; however, due to confidentiality of the 3590 microcode, a complete list of fault symptom codes is not available.

For encryption records, see the Troubleshooting section of the IBM TS3500 Tape Library (IBM 3584) Information Center:

The above also has general SCSI SENSE KEY/ASC/ASCQ and extended IBM codes under the Reference section.

There are other ways to get this information, but this was easiest for me.

Yours truly,
Josh Davis
xaminmo: (Logo IBM AIX 3.2.5)
[personal profile] xaminmo
On most AIX 7.1 systems, I find stray object files in /.

I finally got around to looking at them, and they are libiconv shared objects.

This is most likely an error in packaging of bos.rte.iconv.

The ones inside of /usr/lib/libiconv.a are from 2010 (,
but the ones in / are from 2011 (

It's rare to run into NLS problems, so it's not been worth the hassle of calling in.

I typically leave them there, in case there is a real reason, or if IBM fixes/cleans them up in a future PTF.
xaminmo: (Logo IBM CATE)
[personal profile] xaminmo
If you've ended up upgraded to newer AIX, and are SAN boot from SDDPCM, there is hope:
* mksysb -eXpi /dev/rmt0 ## just in case it all blows up
* ## Stop/quiesce what you can, unmount filesystems, vary off volume groups.
* vi /usr/lpp/devices.sddpcm.53/deinstl/devices.sddpcm.53.rte.pre_d
* ## Add "exit 0" as the first line after the shebang.
* ZZ
* installp -ug devices.sddpcm.53
* installp -acXYgd /export/lpp_source/sddpcm devices.fcp.disk.ibm.mpio.rte devices.sddpcm.71.rte
* lspv | grep rootvg | cut -f 1 -d \ | xargs -n1 bosboot -ad
* shutdown -Fr now

This worked for me at on several different systems.


eserver: (Default)
IBM POWER servers

June 2017

45678 910


RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Oct. 23rd, 2017 09:30 am
Powered by Dreamwidth Studios