Just been reading about the Internet outages and stability issues yesterday - it was traced (maybe I better add "allegedly") to a bug in Juniper's edge routers - code which controls the BGP (border gateway protocol) tables. Think of BGP as a big table containing the best or only routes an IP packet can take. It's pretty necessary stuff to keep networks networking. According to the report in The Register (http://www.theregister.co.uk/2011/11/07/global_net_outage/), the bug caused router devices to core dump and automatically restart.
Juniper seemed to get on the case quickly, and they're able to say already that "a software fix is available, and we've been working with our customers to immediately deploy the fix."
Think about this for a moment...
There is a crisis and senior executives at your company are in an agitated state demanding action. After hours of investigation and remedial coding (not to mention some fast-track QA) there is a fix available.
Now how do you get that fix to hundreds of companies who use your software in their networking products?
We had this scenario often enough when I was software VP for a semiconductor company and it was for this reason we developed the Licensing features in CogniDox. In a scenario like this we would have a product (let's call it the "Router 1000") which contained the latest version of our embedded software (let's call it "RouterWare 5.1"). Now we had an even newer version "5.1.1" containing the fix.
CogniDox is set up containing records of all customers (Licensees) and there would be a license for RouterWare 5.1. This license would be applied to a whole set of documents in CogniDox, including the zipped file containing the images for RouterWare 5.1. When the new version of the software was ready, it would be uploaded, issued and approved. It was then published (usually by a Product Manager or a senior Software person) to the customer support web portal.
Every customer contact who had that license was automatically emailed and told how to download the new software. As they did so a record was kept of who had downloaded. A day or two later when someone asked about this, it was easy to say that "85% of the customer base has downloaded it already". The other 15% were also easy to identify, in case Marketing or Sales wanted to contact them to see why they hadn't downloaded. If one of those 15% called in to report the same problem, the quick remedy was to suggest download and upgrade.
Nothing completely cancels out the stress of an urgent fix-to-the-field, but this level of automation certainly helps.