xaminmo: Josh 2016 (Default)
[personal profile] xaminmo
This changes periodically, but for today, here is what I would do.

My PowerHA selection process would be:
* 7.1.3 SP06 if I needed to deploy quickly, because I have build docs for that.
* 7.1.4 doesn't exist, but if it came out before deployment, I would consider it. Whichever was a newer
Read more... )
http://omnitech.net/reference/2017/06/09/aix-and-powerha-versions-2017-06/
xaminmo: (Josh 2014)
[personal profile] xaminmo
In a couple of instances, I've found bos.rte.* filesets broken during upgrade, perhaps with the root part missing.

It's always a pain, and I always forget how to fix it.

The problem is that the AIX base media does not include base install images for these. They are S (single) updates instead of I (install) images. This is because, during install, a bff called "bos" is laid down first, and that includes 10-20 core filesets, /usr, /, and all the core stuff. It's basically a prototype mksysb. Sort of.

Anyway, in rare instances, when there is a known defect, IBM will release a fileset as a patch through support/ztrans to get you fixed. If you don't have time to wait, or if you are a biz partner, working with a customer who hasn't yet approved you using their support, then you might have to fix it yourself.

list of errors )
The solution was ODM surgery.

First, I took a mksysb and copied it to somewhere safe (another server with NIM installed).

Then, I looked into ODM, and found /etc/objrepos/product was missing the entry for this version.
You might be able to copy from /usr/lib/objrepos, but I copied from a valid clone of this system.

# export ODMDIR=/etc/objrepos
# ssh goodserver odmget -q lpp_name=bos.rte.security product | odmadd


Then, I needed to add the history line, which was identical between root and usr:
# odmget -q name=bos.rte.security lpp     (note the lpp_id)
# ODMDIR=/usr/lib/objrepos odmget -q lpp_name=39 history | ODMDIR=/etc/objrepos odmadd


The "inventory" ODM is accessed with lpp_name also, but that had a long list of files already. I did not mess with any of that.

Now, install_all_updates from my TLSP worked fine.
xaminmo: (Josh 2014)
[personal profile] xaminmo
Because there is NOTHING on the web about this.
PRODUCT: Recover Now / Double Take / MIMIX / EchoStream

Vision Solutions bought Double-Take. Double-Take wrote Recover Now, which is called MIMIX on AS-400. The replication tools underneath are called "EchoStream".

NOTE: Documentation is hard to find, but here is a shortened URL form of the Windows docs: http://omnitech.net/u/rn35docs

Most functions can be managed from the web GUI:
http://127.0.0.1:8410/ui/portal
Obviously, put your correct IP here if you are not on the same host.

Install Licenses


Stop the license manager on PRIMARY:
stopsrc -cs scrt_lca-1

Stop the license manager on BACKUP:
stopsrc -cs scrt_aba-1

Copy the new license files:
scp -rp NIMSERVER:/export/Vision/license.perm/*_`hostname`_ES_node_license.properties /usr/scrt/run/node_license.properties

Start the license manager on PRIMARY:
startsrc -s scrt_lca-1

Start the license manager on BACKUP:
startsrc -s scrt_aba-1

Define initial contexts


/usr/scrt/bin/rtdr -C PRIMARYID (usually 1) -F BACKUPID (usually 1010) setup

Query RN Contexts


Context 1 is Primary. DR shows as BACKUP to this, and prod shows PRODUCTION for this.
Context 101 is Recovery. DR shows PRIMARY for this, and prod shows BACKUP for this.

root@BACKUPNODE
/usr/scrt/bin/sccfgd_getctxs
HOSTID HEXNUMBER
IPADDRESS MULTIPLELINES
BACKUP 1
PRODUCTION 101

root@PRODUCTIONNODE
# /usr/scrt/bin/sccfgd_getctxs
HOSTID DIFFERENTHEXNUMBER
IPADDRESS MULTIPLELINES
PRODUCTION 1
BACKUP 101


Uninstall EchoStream


/usr/scrt/bin/scsetup -R -C1
/usr/scrt/bin/sclist -DD -C1
odmdelete -o SCCuAt
odmdelete -o SCCuObj
odmdelete -o SCCuRel



NORMAL OPERATIONS


### EchoStream start
/usr/scrt/bin/rtstart -C1

### RN Check to see if kernel module is loaded
/usr/scrt/bin/scconfig -sC1

### RN Check if services are online
lssrc -a | grep scrt
scrt_lca-1 sender
scrt_aba-101 is receiver


### Protected filesystem mount
NOTE: This is usually handled by rtstart.
/usr/scrt/bin/rtmnt -C1

### Protected filesystem umount
NOTE: This is usually handled by rtstop.
/usr/scrt/bin/rtumnt -C1

### EchoStream sync, stop, and unload service
/usr/scrt/bin/rtstop -SC1

### EchoStream stop & unload service
/usr/scrt/bin/rtstop -C1

### EchoStream stop & unload kernel extension
/usr/scrt/bin/rtstop -FC1

### Check dirty blocks in state map
This will show how many blocks need to be sync'd for the recovery group:
/usr/scrt/bin/scconfig -PC1

### RN List buffer utilization
NOTE: When the local buffer overflows, just reverts to state-map tracking withour point-in-time recovery.
/usr/scrt/bin/esmon 1

### Shutdown all contexts
NOTE: This can be added to /etc/rc.shutdown, or in cluster start/stop scripts.
/usr/scrt/bin/rn_shutdown


FAILOVER PROCEDURES


Much is missing here. This is what I could find on the internet.
You can also do this from the WebUI.

### Fail back to Primary Server
/usr/scrt/bin/rtdr -qC 1 failback

### Failover to Recovery Server
/usr/scrt/bin/rtdr -qC 101 failback

### Make clone of filesystem
/usr/scrt/bin/scrt_ra -C1 -X

### Release clone of filesystem
/usr/scrt/bin/scrt_ra -C1 -W -L /dev/dbfs01lv

MANUAL OPERATIONS


### RN Primary Manual start
In troubleshooting and testing, these commands can start Recover Now manually:
/opt/visionsolutions/http/vsisvr/httpsvr/bin/strvsisvr
varyonvg rnvspvg
/usr/bin/startsrc -s scrt_scconfigd
/usr/scrt/bin/rtstart -C1


### Start without mount and fsck
/usr/scrt/bin/rtstart -C1 -M

### RN Primary Manual stop
In troubleshooting and testing, these commands will stop Recover Now manually:

# Unmount the protected filesystems
/usr/scrt/bin/rtumnt -DC1 | tee -a $log

# Kill processes if the filesystem is still mounted.
for i in `/usr/scrt/bin/sclist -C1 -f` ; do
mount | grep $i
if [[ $? -eq 0 ]]; then
fuser -kxuc $i
fi
done


# Try rtumnt again due to some timing issues observed.
sleep 3
/usr/scrt/bin/rtumnt -DC1


# Sync outstanding lfc's to DR server
/usr/sbin/sync
/usr/scrt/bin/scconfig -SC1


# Stop RecoverNow
/usr/scrt/bin/rtstop -FkC1

Recover Now Reset State Map


This will cause the entire recovery group to be resync'd as if new, clearing any rollback points.

First, manually stop all resources first, as listed above, then bring the context online:
varyonvg rnvspvg
/usr/scrt/bin/scconfig -MC1


### RTDR Resync
# Remote of prod from DR
/usr/scrt/bin/sccfgd_cmd -H PRODNODE -T "1 resync"

# Local on DR
/usr/scrt/bin/rtdr -qC101 resync

### Mount the filesystems on Primary
/usr/scrt/bin/rtmnt -C1

### Mount the filesystems on Recovery
/usr/scrt/bin/rtmnt -C101

### Unmount filesystems
/usr/scrt/bin/rtumnt -C1 # or -C 101

Recover Now Release Stuck Config


For errors such as:
scsmutil: log anchor cksum mismatch
ERROR: Failed to load EchoStream Production Server Drivers
ERROR: Drivers not loaded... Will not mount into an unprotected state

Clear the error:
/usr/scrt/bin/scsetup -MC1
/usr/scrt/bin/scconfig -uC1


Then you can use rtstart as normal.



FIX HOSTID CHANGED


### Start Recover Now
/opt/visionsolutions/http/vsisvr/httpsvr/bin/strvsisvr
varyonvg rnvspvg
/usr/bin/startsrc -s scrt_scconfigd
/usr/scrt/bin/rtstart -C1
Context not properly defined on this system

# /usr/scrt/bin/sccfgd_getctxs
HOSTID (new hostid)
IPADDRESS (multiple lines)
No context for production or backup listed

#/usr/scrt/bin/rtdr -C 1 -F 101 setup
/usr/scrt/bin/rtdr[14]: test: argument expected
rtdr: Configuration error -
rtdr: Primary Context ID <1> is not enabled.
rtdr: The Primary Context ID <1> must be enabled
rtdr: when creating a Failover Context ID.


### Shutdown the context
# /usr/scrt/bin/scsetup -MC1
scsetup: AET_TMO_NOVOTE: Setup failed.
scsetup: Detail: On wrong host.

# /usr/scrt/bin/scconfig -uC1
scconfig: AET_TMO_NOVOTE: Unexpected error
scconfig: Detail: On wrong host.

# cat /usr/scrt/run/node_license.properties
## begin signed data
#DoW Mon DD HH:MM:SS CDT YYYY
vision.license.customer=Company_name_with_underscores
vision.license.productname=EchoStream
vision.license.expirydatemig=YYYY-MM-DD HH\:MM\:SS
vision.license.machineid=0123456789abcdefghijLMNOPQR\=
vision.license.hostname=hostname


### Vision support is via:
RecoverNow/GeoCluster AIX, Replicate1 24x7 CustomerCare Technical Support:
U.S. and Canada: (800) 337-8214
International: +1 (949) 724-5465
CustomerCare Support Email: support@visionsolutions.com

After hours will just page out, but not make a ticket.
Email will have a ticket created within a few minutes.

### Test startup
# /usr/scrt/bin/scsetup
scsetup: AET_TMO_NOVOTE: Setup failed.
scsetup: Detail: On wrong host.


### Set path properly
cat <<'EOF' >> /etc/environment
export PATH=/usr/scrt/bin:$PATH
EOF


### Collect reference info from "production" node and "backup" node.
/usr/scrt/bin/scconfig -v
/usr/scrt/bin/scconfig -q
/usr/scrt/bin/rtattr -C1 -a HostId
/usr/scrt/bin/rtattr -C101 -a HostId
/usr/scrt/bin/rthostid


### Update hostid for changed production node
HOSTID=`rthostid`
/usr/scrt/bin/rtattr -C1 -a HostId -o production -v $HOSTID
/usr/scrt/bin/rtattr -C101 -a HostId -o backup -v $HOSTID
ssh BACKUPNODE /usr/scrt/bin/rtattr -C1 -a HostId -o production -v $HOSTID


### Re-collect all of the same reference data as above.

### Reconfigure the repository
scconfig -sC1
ssh BACKUPNODE /usr/scrt/bin/rtdr -C1 -F101 setup
/usr/scrt/bin/rtdr -C1 -F101 setup


### Restart everything
/usr/scrt/bin/rtstart -C1 && startsrc -s scrt_scconfigd
until df -k /databasedir 2>/dev/null >/dev/null ; do date ; sleep 10 ; done
/opt/visionsolutions/http/vsisvr/httpsvr/bin/strvsisvr 2>/dev/null
xaminmo: (Josh 2004 Happy)
[personal profile] xaminmo
This is from AIXMIND, on March 26, 2010 2:46 pm
It doesn't show up high enough in search queries, so I'm duplicating it here.
Note that AIX recreates most of /dev on boot, but we need a certain amount.

===============================================
Problem(Abstract): The /dev directory was accidentally deleted.
===============================================
Symptom: System wont boot
===============================================
Environment: AIX 5.3 (and others)
==============================================
Resolving the problem

Boot system into maintenance mode.
Access a root volume group before mounting filesystems

mount /dev/hd4 /mnt
mount /dev/hd2 /mnt/usr
mknod /mnt/dev/hd1 b 10 8
mknod /mnt/dev/hd2 b 10 5
mknod /mnt/dev/hd3 b 10 7
mknod /mnt/dev/hd4 b 10 4
mknod /mnt/dev/hd5 b 10 1
mknod /mnt/dev/hd6 b 10 2
mknod /mnt/dev/hd8 b 10 3
mknod /mnt/dev/hd9var b 10 6
umount /mnt/usr
umount /mnt
shutdown -Fr

source: http://www.aixmind.com/?p=728
Ref: http://wp.me/p3ecOp-zh
xaminmo: (Josh 2004 Happy)
[personal profile] xaminmo
Description: All print queues are hung.

Solution: Kill off all hung print IO processes
This should resolve anything other than a printer offline.
If you cannot ping the printer, then that has to be fixed first.

stopsrc -g spool
sleep 60
ps auxww | grep pio

kill everything listed
Use kill -9 if needed.

cd /var/spool/lpd/qdir
ls -alt
remove any bad or old jobs listed that are not needed (usually anything over a few hours or days)

cd /var/spool/lpd/stat
ls -alt
remove any stale status files listed (or all of them and they will regenreate.

ls /etc/q*
verify qconfig.bin is equal or newer in date from qconfig. If not, remove qconfig.bin

Restart the print subsystem
startsrc -g spooler

All should be well.
Try using enq or lpr to print a job.
lpstat -p printername to list jobs.
lpstat with no flags will list all print queues, but will hang on unpingable printers.

POWER7+

Dec. 4th, 2012 12:45 pm
xaminmo: (Josh 2004 Happy)
[personal profile] xaminmo
So, it came out 2 months ago, but here's the summary:
* D model numbers (9117-MMD, 9719-MHD, etc)
* Up to 128 cores in a p780+ (vs 96 in MMC)
* Double the RAM capacity (4 TB)
* MINIMUM CPU ENTITLEMENT 0.05 VS 0.10
* CoD CPU and RAM can be in a pool shared by multiple systems.
* NO p795 POWER7+ OPTION AT THIS TIME

Requirements:
* AIX 7.1 TL2, 71TL01SP06, 71TL00SP08, 61TL08, 61TL07SP06, 61TL06SP10
* AIX 5.3 TL12 SP07 (Expected but not released yet. Only for extended support)
* VIO 2220, VIO2215 (Dec19)
* HMC 7.7.6 (CR3 or later, and 3GB of RAM if over 256 LPARs total)
* i6.1 only supported through VIO or i7.1 client.

New 1.8" SSD enclosure:
* UltraSSD: New 1U drawer with 30SSDs (GX++ PCIe Cable and two SAS RAID controllers)
* UltraSSD 1U drawer has four 4xSAS ports for running two EXP24S 2U drawers.
* UltraSSD controllers will support EASY TIER for AIX and VIO in 2013.
* UltraSSD will be added to DS8k line in 2013.

New Disk-as-Tape device:
* "RDX" Removable Disk - looks like tape, but it hot-swap disk to replace pre-LTO tech.
* RDX SATA supported on iSeries as optical
* RDX USB supported on AIX and VIO as well.

New I/O components:
* IBM Rackswitch options, with 1GB, 10GB and 40GB ethernet ports.
* PCIe2 dual-port Remote DMA over Ethernet (vs Infiniband for low latency MPI)
* GX++ Dual Port 10gbit FCoE or 10gbit FC Adapters for p770/p780 (no Linux. iSeries through VIO)
* GX++ Dual Port 16GBFC or 10GBFCoE Adapters for p795 (no Linux or iSeries)

Hardware Enhancements:
* 4 sockets per CPU card (vs 2-sockets)
* Supports 64GB DIMMs (vs 32GB)
* Lower heat/power consumption with 32nm vs 45nm
* Better performance per core with 10MB L3 Cache vs 4MB and up to 4.4GHz
* Active Memory Expansion performance improved with on-chip Compression Accelerator
* Crypto accelerator for AES, SHA and RSA
* Random number generator on die
* Four floating point pipelines vs 2 (single precision takes 1, DP takes 2)
* Higher concurrency during firmware updates (Can reset one core at a time)
* Higher uptime with redundant lanes in cache and in CEC interconnect cables
* CPU upgrades for MMA, MMB and MMC will include new CEC enclosures.
* Free CoD: Includes 240GB memory days and 15 processor days per CPU initially shipped
* Free CoD: Includes 90 days of full activation (one shot)
* http://www-03.ibm.com/partnerworld/partnerinfo/src/atsmastr.nsf/WebIndex/TD105846

FLEX Hardware:
* p260 and p460 dual-port FCoE Mezzanine to support dual VIO
* New FCoE 8-port switch module to support new FCoE mezzanine cards
* New FC switch module
* New v7000 module
* New USB-3 storage drawer (1x RDX, 2X DVD-RAM)

Hardware Withdrawals:
* No PCI-X, HSL, RIO-1, or IOP support in POWER8
* 3.5" SAS drawers to be withdrawn in 2013.
* SCSI DISK SUPPORT IS DROPPED!!! SCSI Tape still okay on PCI-X #5736 in I/O drawer
xaminmo: (Logo IBM CATE)
[personal profile] xaminmo
I always run into issues when I work in a multiple VLAN environment, because it's not *that* common for my builds. This is a reminder for me.

The magic is when using multiple VLANs:
1) Don't use the real VLAN ID for the trunk PVID unless you know for certain that was set on the switch. It is stripped off of all packets, and who knows what the PVID of the switch is, if any.
2) Any mismatch between PVID on the SEA and the trunk will cause packets to be dropped.
3) Don't use IEEE VLAN mode for the client adapter unless you're going to add VLAN interfaces from AIX. When not in VLAN mode, the PVID is ADDED to all packets on client adapters.
4) When using multiple trunks on one SEA, they all have to be the same trunk priority. ha_mode=sharing balances not using trunk priority, but based on the order of the virt_adapters field.
xaminmo: Josh 2016 (Default)
[personal profile] xaminmo
This is from a decade ago, so I thought it time to update the URLs and post it to LJ.

Here is information on how to decode SCSI Sense Data. This revolved around IBM Magstar products since that is where I was first exposed to the guts of SCSI errors.

The AIX Error Report records for TAPE_ERR# (usually 1-6) often include SENSE DATA in the Detail section. A SCSI LOG PAGE 06h can be parsed manually to provide the SENSE KEY, ASC and ASCQ values, as well as the ERROR CODE which will tell us if it is current or past errors being reported. An example Log Page 6 is below:
	0600 0000 0300 0000 FF80 0000 0000 0000 0000 0000 7000 0000 0000 0015 0000 000B 
	0000 0000 001C 7F00 2000 0033 7E58 0000 0000 0000 0000 0000 0000 0000 0000 0000 
	0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
	0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
	0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 B041 0000 0000 

If you'll notice, byte 0 is 06. Also note that there are 32 bytes per line, and two hex digits per byte.

Byte 20 represents the SCSI error class. Valid classes are:
    * 70 - Current Error (Direct Access Logical Block NOT From Sense Data).
    * F0 - Current Error (Direct Access Logical Block IS From Sense Data)
    * 71 - Deferred Error (Direct Access Logical Block NOT From Sense Data).
    * 7F - Vendor Spec. Error (Direct Access Logical Block NOT From Sense Data).
    * EE - Encryption Error
    * F1 - Deferred Error (Direct Access Logical Block IS From Sense Data).
    * FF - Vendor Spec. Error (Direct Access Logical Block IS From Sense Data).

In this example, EC (byte 20) is 70, which is valid and means this is a current error.

When the error class is valid, we can get the sense key from byte 22.

In this example, the sense key is 00 (zero) which means "NO ADDITIONAL SENSE". The standard list of sense keys is:
	X0 - No Sense         X6 - Unit Attention    XC - Equal.
	X1 - Recovered Error  X7 - Data Protect      XD - Volume Overflow.
	X2 - Not Ready        X8 - Blank Check       XE - Miscompare.
	X3 - Medium Error     X9 - Vendor Specific   XF - RESERVED.
	X4 - Hardware Error   XA - Copy Aborted
	X5 - Illegal Request  XB - Aborted Command

ASC is at byte 32 (first byte on line 2) and ASCQ is byte 33.

The ASC and ASCQ chart is pretty extensive. Please see the ASC/ASCQ Code Listing from the SCSI Technical Committee for an authoritative reference:

Note also that sometimes the ASC/ASCQ pair you're looking up may fall under a different sense key than is expected. The Sense key gives general information, such as "Recovered error", "hardware error", or the like. The ASC/ASCQ pair tells what the actual problem is. This isn't always 100% helpful, but is close.

Good reference was had from the 3590 Maintenance Information Guide, Msgs section. This gives 90% of what anyone would need to decode SCSI LOG PAGE 06h messages for IBM tape drives. The Jaguar Tape Drives (IBM 3590 & 3592) Information Center is at:

Included within are how to decode SIM/MIM Records, Log Page 6, and other related information. The 3590 Hardware Reference Guide, Appendix B also shows decent information in regards to non SIM/MIM errors. It makes reference to sense key and ASC/ASCQ bytes. You can acquire PDF copies of tape removable media storage systems' manuals via the following URLS:

The Magstar Maintenance and Ultrium SCSI Reference books makes reference to "Fault Symptom Codes" which are more definitive; however, due to confidentiality of the 3590 microcode, a complete list of fault symptom codes is not available.

For encryption records, see the Troubleshooting section of the IBM TS3500 Tape Library (IBM 3584) Information Center:

The above also has general SCSI SENSE KEY/ASC/ASCQ and extended IBM codes under the Reference section.

There are other ways to get this information, but this was easiest for me.

Yours truly,
Josh Davis
xaminmo: (Logo IBM AIX 3.2.5)
[personal profile] xaminmo
On most AIX 7.1 systems, I find stray object files in /.

I finally got around to looking at them, and they are libiconv shared objects.

This is most likely an error in packaging of bos.rte.iconv.

The ones inside of /usr/lib/libiconv.a are from 2010 (7.1.1.0),
but the ones in / are from 2011 (7.1.1.15)

It's rare to run into NLS problems, so it's not been worth the hassle of calling in.

I typically leave them there, in case there is a real reason, or if IBM fixes/cleans them up in a future PTF.
xaminmo: (Logo IBM CATE)
[personal profile] xaminmo
If you've ended up upgraded to newer AIX, and are SAN boot from SDDPCM, there is hope:
* mksysb -eXpi /dev/rmt0 ## just in case it all blows up
* ## Stop/quiesce what you can, unmount filesystems, vary off volume groups.
* vi /usr/lpp/devices.sddpcm.53/deinstl/devices.sddpcm.53.rte.pre_d
* ## Add "exit 0" as the first line after the shebang.
* ZZ
* installp -ug devices.sddpcm.53
* installp -acXYgd /export/lpp_source/sddpcm devices.fcp.disk.ibm.mpio.rte devices.sddpcm.71.rte
* lspv | grep rootvg | cut -f 1 -d \ | xargs -n1 bosboot -ad
* shutdown -Fr now

This worked for me at 2.6.0.3 on several different systems.
xaminmo: (Baby poop)
[personal profile] xaminmo
Was installed on the single, internal SAS disk, but was provisioned some 60G LUNs from a VNX (aka CLARiiON).
Got the MPIO drivers on (because powerpath boot has management encumberances), and verified the devices looked good.
Rebooted, the mirrored onto the new LUNs, removed the old SAS disk, updates bootlist and bosboot.
shutdown -Fr and I get this:

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
QLogic QMI3572  Host Adapter FCode(IBM): 3.14 2010-04-30 15:03
ISP Firmware version 5.03.06
/
Elapsed time since release of system processors: 37167 mins 43 secs
DEFAULT CATCH!, exception-handler=fff00400
at   %SRR0: 0000000048000104   %SRR1: 0000000040003002
Open Firmware exception handler entered from non-OF code

Client's Fix Pt Regs:
 00 0000000000000001 0000000000038160 0000000060000000 0000000000044bd0
 04 0000000000000001 0000000000000000 0000000000000100 0000000000044bd0
 08 ffffffffffffffff 0000000000000003 0000000048000104 0000000060000000
 0c 0000000000000078 0000000000000000 0000000000000000 0000000000000000
 10 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 14 0000000002000000 0000000000000008 0000000000000000 0000000000000000
 18 0000000000000000 0000000000000001 0000000000039b58 0000000000044e10
 1c 0000000000044e08 0000000000038350 0000000000000000 0000000000000000
Special Regs:
    %IV: 00000400     %CR: 22000040    %XER: 00000000  %DSISR: 00000000
  %SRR0: 0000000048000104   %SRR1: 0000000040003002
    %LR: 0000000000011d7c    %CTR: 0000000048000104
   %DAR: 0000000000000000
Virtual PID = 0
 ok
0 > 


It's been years since I got a DEFAULT CATCH so I don't remember where to go from here.

Google is giving me no joy.
xaminmo: Josh 2016 (Default)
[personal profile] xaminmo
I found out today that if you're using smaller than a /24, NIM will fail to accept your host into the subnet.

I had an "ent" definition with snm 255.255.255.0. The IP was 192.168.1.17, network was .0, gateway .1.

I redefined this as 255.255.255.127, and when adding it to the master, it said that .17 was not a part of the .0 network.

What a strange, and lame limitation.

UPDATE And it was brought to my attention what my error was. For a subnet mask, it would be .128, not .127. I was mixing up the oddness of 255 plus broadcast address. *sigh*

I M SMRT.
xaminmo: Josh 2016 (Default)
[personal profile] xaminmo
This is entirely from IBM's docs. Some is cut and paste, and some is abbreviated or reworded slightly. No credit is taken for this material. All links herein go back to the source documentation which will have more details.

In the AIX Event Infrastructure, an event is defined as any change of a state or a value that can be detected by the kernel or a kernel extension at the time the change occurs. The events that can be monitored are represented as files in a pseudo file system.

The only steps necessary to set up the AIX Event Infrastructure are:
  1. Install the bos.ahafs fileset.
  2. Create the directory for the desired mount point.
  3. mkdir /aha
  4. mount -v ahafs /aha /aha

Mounting an AIX Event Infrastructure file system will automatically load the kernel extension and create all monitor factories. Only one instance of an AIX Event Infrastructure file system may be mounted at a time. An AIX Event Infrastructure file system may be mounted on any regular directory, but it is suggested that users use /aha.

Once mounted, the ahafs will contain these structures:
evProds.list "contains" the names of all currently defined event producers.
*.monFactory directories are automatically created for each event producer.
Subdirectories may be used for logical separation within a factory.
*.mon files within the factory, potentially created on open by the consumer, represent the events that can be monitored. For example, the file /aha/fs/modFile.monFactory/etc/password.mon is used to monitor the modifications to the /etc/passwd file.

The AIX Event Infrastructure will translate text input written to .mon files into specifications on how the user wants to be notified of event occurrences. Once a user has issued a select() or a blocking read() call to signify the beginning of their monitoring, the AIX Event Infrastructure will notify the corresponding event producer to start monitoring the specified event.

NOTE: The AIX® Event Infrastructure tracks monitoring per process, and is not thread safe. Processes should not use multiple threads to monitor the same event. Instead, spawn a separate process should the same event need to be monitored in multiple ways.

An example output for a state change event producer who has specified that a stack trace should be taken:
    BEGIN_EVENT_INFO
    TIME_tvsec=1269377315
    TIME_tvnsec=955475223
    SEQUENCE_NUM=0
    PID=2490594
    UID=0
    UID_LOGIN=0
    GID=0
    PROG_NAME=cat
    RC_FROM_EVPROD=1000
    END_EVENT_INFO

An example for a threshold value event:
    BEGIN_EVENT_INFO
    TIME_tvsec=1269378095
    TIME_tvnsec=959865951
    SEQUENCE_NUM=0
    CURRENT_VALUE=2
    RC_FROM_EVPROD=1000
    END_EVENT_INFO

NOTE:Due to the asynchronous nature of process notification, the current value returned may be stale by the time a process reads the monitor file. Users are notified when the threshold is first met or exceeded, but other operations which may alter the values being monitored will not be blocked.

All directories in the AIX® Event Infrastructure file system have an access mode of 1777 and all files have access mode of 0666. These modes cannot be changed, but the ownership of files and directories may be changed. Access control for monitoring events is done at the event producer level. Creation / modification times are not maintained and are always returned as the current time when issuing stat () on a file object within the pseudo file system. Any attempt to modify these times will return an error.

More details on monitoring events is available in the publib section on AIX 6.1 System Information on AHAFS

See Predefined Event Producers for info on what producers come with AIX 6.1.

Finally, PowerHA (aka HA/CMP) has producers which can show events from other nodes in your active cluster. See Pre-Defined Event Producers for a Cluster-Aware AIX Instance to see what's available in an HA/CMP aka PowerHA environment.
xaminmo: (Josh 2004 Happy)
[personal profile] xaminmo
So far, IBM says you cannot mount a USB mass storage device. I wanted to test this out a bit.
Howto and Details )
Suffice it to say that if you're willing to do a little but of manual work for the set-up, it's very usable.

UPDATE 2014-07-29: Minor typos and cleanup. Also, note that LVM is still not allowed on USBMSDs, even with AIX 7.1.

UPDATE 2015-05-21: In case it was not obvious, this is not supported by IBM. Currently, it's not in any whitebook, and ZTRANS says it's not supported. The biggest issue is a risk of kernel panic of the USB device is unplugged. Many of the safety mechanisms for filesystems going offline would be handled by LVM, which is completely bypassed. Doing this with a ramdisk is one thing, since if RAM goes away, you're in trouble anyway. Still no support for USBMSD being in a volume group.

Also, an IBMer had claimed copyright on the procedure in 2014, but that was rejected and deleted in 2015, mostly because it ended up in a defect complaint due to USB pull kernel panic.
xaminmo: (Logo IBM CATE)
[personal profile] xaminmo
DNS query timeouts are mediated by the environment variables:

RES_TIMEOUT
RES_RETRY

The defaults are 5 and 4 respectively. Each retry doubles the timeout from before.

5 + 10 + 20 + 50 = 75 seconds.

Multiply this by the number of nameservers you have listed in /etc/resolv.conf and that's your DNS timeout PER QUERY when DNS is unreachable.



The minimum values are 1 and 1, which means:

1 + 2 = 3 seconds per nameserver.

If your network is routinely laggy, overloaded or has packet drops, then setting this to minimums could lead to unexpected and intermittant DNS failures.

However, something like this could let you use multiple nameservers and not have to worry about SSH hanging for 5 minutes in your DR site.



The best place to make these changes are /etc/environment. Log out and back in, then restart any processes you want to have faster timeouts.

For anything started and owned by init, you may need need to reboot.


-JD
xaminmo: Josh 2016 (Default)
[personal profile] xaminmo
The formal name of the package is "tofrodos" and it's from http://www.thefreecountry.com/tofrodos/index.shtml

I've built version 1.7.6 on AIX 5.3 TL7 using XLc 9 and gmake as an installp package. The package makes symlinks "dos2unix" "unix2dos" "fromdos" and "todos". The binaries seem to run, but I haven't put them through rigorous testing.

The bandwidth isn't great, but the files aren't big. There's a bff, and the tgz of the source as well as an md5 file at http://www.omnitech.net/aix/
xaminmo: (Logo IBM Transparent)
[personal profile] xaminmo
This is from 2004, but I forgot to post it.

I tried to submit it as a Techdoc back in the day, but the overhead of submitting, maintaining, and then eventually proving I was authorized to submit became too much overhead.

Anyone with access to the megadatabase should be able to find this in the 2003/2004 RETAIN database as well.

I can't backdate it since this is a non-personal journal, so please excuse this post-for-posterity.

-Josh
UPDATED: 2011-04-07 to have newer URLs.

Keywords: 6027-1127 6027-1242 6027-1306 6027-1371 6027-1909 6027-531 6027-535 6027-540 6027-572 6027-701 cluster CT_MANAGEMENT_SCOPE /dev/gpfs hacmp maxblocksize mkrpdomain mmaddnode mmchcluster mmchconfig mmchfs mmcommon mmconfig mmcrcluster mmcrfs mmcrlv mmcrnsd mmdelcluster mmdeldisk mmdelfs mmdelnode mmdelnsd mmexportfs MMFS mmimportfs mmlscluster mmlsconfig mmlsdisk mmlsfs mmlsgpfsdisk mmlsnsd mmshutdown mmstartup mmvsdhelper NSD preprpnode primary_node recoverPeerDomain rmrpdomain RPD RSCT runmmfs startrpdomain stoprpdomain tie-breaker useDiskLease /usr/lpp/mmfs/bin/ /var/adm/ras/mmfs.log.LATEST /var/mmfs VSDs

GPFS Superdoc )
xaminmo: Josh 2016 (Default)
[personal profile] xaminmo
Program Services is the IBM support structure for defects, PTF orders, etc when a customer has not paif for how-to/usage support.

Internet Program Service problem reports are submitted at :

http://techsupport.services.ibm.com/server/pserv

Fax Program Service problem reports are sent to:
IBM Corporation

Attn: AIX Program Services
FAX phone: (512) 823-7634 or t/l 793-7634
793-7634 is forwarded to the Dallas site for processing

US Mail Program Service problem reports are sent to:
IBM Corporation

Department CST
IMAD: 30-01-0C
13800 Diplomat Drive
Dallas, TX 75234

Profile

eserver: (Default)
IBM POWER servers

June 2017

S M T W T F S
    123
45678 910
11121314151617
18192021222324
252627282930 

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 25th, 2017 10:27 pm
Powered by Dreamwidth Studios