RSU allows for a near-zero downtime firmware upgrade of all members in the stack. This is achieved with a scripted process that doesn't require running the upgrade from each stack member. At the time of writing this, there are a few caveats with the RSU process that require some special steps to make it work. I plan to address those issues here.
The current documented process for RSU, results in either an extended outage or incomplete upgrade.
-Two 3750x switches
-15.0(2)SE …This version has proven most stable for me on the 15.x train and contains some relevant bug fixes (referenced below)
-IP Services License
-Two /30 hand-offs from ISP (link and box redundancy)
-Multiple LACP Cross-stack EtherChannels to LAN
-LAN consists of redundant Nexus switches with Dual-attached hosts
-Catalyst 3750-X and 3560-X Switch Software Configuration Guide
-Bugs (CSCts07947 and CSCtx05704)
-Verify health of the switch stack
show switch detail
show switch stack-ring speed
-Monitor status of the upgrade
show switch stack-upgrade status
show switch stack-upgrade sequence
RSU Process +Caveats
In an effort to not reinvent the wheel I would definitely start with the “Rolling Stack Upgrade” section of the config guide (referenced above). However, the three key differences that caused me some digging and troubleshooting:
-Manually remove the current image
-Being on version 15.0(2)SE or higher
-Using the /reload command, which is a little unclear in the config guide
Note: Extracting (archive command) the images took about 15 minutes then the staggered reload process took another 15 minutes.
Enable persistent mac...if not already enabled :)
stack-mac persistent timer 0
Define redundant uplinks to "network", in my case
interface interface-id <-connection on the Master switch
interface interface-id <-connection on the Member switch
current image from all stack members
delete /force /recursive flash1:image-tar-folder
delete /force /recursive flash2:image-tar-folder
archive download-sw /reload /rolling-stack-upgrade tftp://ipaddress/image.tar
Force master switch (optional)
After the RSU process is completed, the master switch will have changed due to the staggered upgraded process. If you wish to force a specific switch to become master, you will need to reload only the current master switch. This is done with the command below “reload slot”. DO NOT, execute the “reload” command, as it will reload the whole stack and cause a 7 minute outage.
reload slot switch-number
With this RSU process, downtime to the environment was reduced from 7 minutes to sub-second. To determine the availability to the environment (in my case a SaaS solution) I monitored these connections:
Inbound SSH, https, RDP, ping
With the traditional firmware upgrade process, both stack members had to be reloaded at the same time to avoid a version mismatch and thus the need for a maintenance window. With RSU I recorded 0-1 packets lost and any loss was not noticeable to the end-user.
Thanks to TAC's help in uncovering the correct process. Also, please share your experiences, comments, etc. below.