Wednesday, April 08, 2009

Eleven IT horror stories

Automatic updates
BrilliantCompany.com was growing at dot-com bubble rates. With departments popping up like daisies in spring, the IT staff was ceding desktop control to department heads because most everyone was technical anyway.

Shortly after a batch of 75 new Dell desktops arrived to populate a new product division, the network suddenly died in the middle of the day. All lights were green in infrastructure land, but performance had slowed to such a crawl that the LAN was effectively paralyzed. Some diligent sniffing and log file snooping revealed the culprit.
Turns out Windows XP’s Automatic Update had defaulted to high noon on a weekday, and all 75 machines attempted to download several hundred megs of Service Pack 2 simultaneously and individually. Instant network clog.
Solution:  Centralize IT control so one somebody can be responsible for all the details. This was done in short order after I released a sprightly memo to the appropriate folks. Then, I did what I should have done earlier and set up SUS (Software Update Services), now WSUS (Windows Server Update Services), to download updates and distribute at an appropriate time and after appropriate testing against departmental OS images.
Moral:  Just because your users are technical doesn’t mean they’ll behave with any more attention to detail than the average Joe. If network uptime is your responsibility, then take responsibility and manage what needs managing.

Client protection
InfoWorld reader SEnright relates a tearful tale: A mobile user called to say that his laptop was no longer functioning. After a lengthy phone conversation, during which the user initially denied anything unusual had happened, he disclosed that he had spilled an entire can of Coke on the keyboard. “He continued by telling me that he had tried to dry it with a hair dryer, but that it still would not boot. I asked him to send it back to me, and that I would have it repaired.”

But when SEnright opened the laptop’s shipping box the very next day, he had a bit of a shock. “The gentleman had not used a ‘hair dryer,’ but must have borrowed a heat gun at one of our locations, because all that was left of the keyboard was a cooled pool of molten black plastic.” Ouch.
Solution:  The laptop was insured for “accidental” damage only. Since the incident, maintaining full coverage of mobile equipment has been a matter of course for SEnright.
Moral:  Cover your mobile warriors. That means not only insuring their hardware, but giving them training and clear policy documents on what can and can’t be done with company hardware on the road. Further, make sure their data is backed up religiously, both when they’re at the home office and when they’re on the road.

Executive clout
Here, we’re concerned with that senior executive who just has to have full administrative rights to every machine on the network. Even though he’s about as technical as my cat--and my cat is dead.
Senior users can be dangers even without special access rights. John Schoonover, who worked for the Department of Defense on one of the largest network deployments in history during Operation Enduring Freedom was “witness to a huge lack of IQ points” in a senior manager.
According to Schoonover, military infosec installations generally follow a concept termed “the separation of red and black.” Red is simply data that has not been encrypted yet. (Danger, the world and sniffers can see you!) Black is the same data after it has been encrypted and is now ready to traverse the world. “These areas [red and black] are required to be separated by a six foot physical gap,” Schoonover says.
Our hero proceeds to follow these guidelines and deploys the network, but comes back from lunch one day to find the firewall down. Investigation shows that a senior manager “had taken the cabling from the inside router and connected to the Internet for connectivity, thus bypassing all firewall services, encryption, and -- oh yeah, that’s right -- the entire secure network with a jump straight to the Internet!”
Solution:  John says they “removed the culprit’s thumbs, because if you can’t grip the cable, you can’t unplug it.” I didn’t ask for any more details.
Moral:  Managing rogue senior users is an art in itself that requires diplomacy and even outright deception. In several installations I’ve renamed the Administration account something like “IT” and made “Administrator” a functionally limited account with simply more read/write access to data directories, while still blocking access to things like the Windows system directory or Unix root directories. Most times they never notice; and if they do, I’m pretty good at making up excuses why those directories remain closed off. (“Oh, that’s something Microsoft did in the last service pack. Gosh darn that Bill Gates.”)

Legal eagles hunting IT mice
Lawyers ruin everything -- including smoothly running networks. But IT managers who ignore the ever-changing legal landscape’s impact on technology do so at their peril.

I was once called in as referee among in-house counsel, senior management, and IT staff after the company was informed that child pornography had been tracked to its servers. The company didn’t know whether to aid the investigation by figuring out which employee was responsible or to just delete all the offending files immediately and most likely incur a fine but protect the firm from getting shut down.
In the end, the lawyers managed to make a deal with investigators. The company’s IT network stayed active and we tracked the lowlife down and had him arrested. Quietly.
Solution:
 Talk to senior management and corporate counsel about legal issues, such as corporate response to third-party audits or company responsibility for data it’s holding concerning third-parties, before they happen.
This discussion goes beyond IT-centric solutions. Management must decide whether it wants to retain all pertinent data (the best course of action for those third-party audits) or automatically delete offending data (such as whatever’s found in porn filters).
IT and management must see eye to eye on how the company will respond to law enforcement inquiries, investigations, or even raids. If Homeland Security agents believe a terrorist is masquerading as an employee and storing data on corporate servers, they can come in and pretty much take anything they want. That could put a real crimp in the style of, say, an e-business.
Developing the best course of action should involve senior management, corporate counsel, and law enforcement. The FBI is usually pretty helpful in these discussions -- and so, sometimes, is the local computer crimes department, such as the large Computer Investigation and Technology Unit division of the NYPD.
Moral:
 The higher you are on the IT food chain, the more such liability can spell serious trouble. If you make sure to discuss at least general legal eventualities with senior management, you’re much more likely to do yourself and your employer some real service in specific situations. If they refuse to discuss the matter, archive everything you can.

Disasters in disaster recovery
Gary Crispens reports an incident he encountered after questioning an IT director about the company’s preparedness for disaster recovery. The director responded huffily that the hot site was ready for any disaster, including the necessary space and equipment all backed by a diesel-powered generator with “plenty of fuel.”

After about a year, the company had a hurricane-related power outage that forced it to roll over to the hot site. “Sure enough, the IT Director had critical functions up and running and I could hear that generator running out back. But after about eight hours the power went out for good and all systems crashed when the generator stopped.”
It turned out that “plenty of fuel” was one 55 gallon barrel that was already half empty from the monthly testing.
Solution:
 A disaster recovery plan that called for fuel checks in addition to generator testing.
Moral:
 Disaster recovery isn’t a static issue. One plan or one policy is never perfect out of the gate. Ever. Pass such concepts by as many experienced eyes as you can and then revisit them annually or even bi-annually for refinement.

 

Rogue peripherals
CompUSA and the Dummies books are teaching users just enough of the tech alphabet to spell trouble.

One of my favorite stories was the network that was severely hacked by someone who came in from the outside and deleted the main Exchange message store. Firewall logs had gotten the local IT admin nowhere, so we were called in to do a little snooping around. I wish I’d thought of it, but another guy on the team had the sense to run AirSnort. He found a wide open Linksys wireless access point in about six seconds.
The internal admin insisted there was no wireless running anywhere on the network. It took some sneaker netting, but we found the rogue AP in a senior exec’s office about 20 minutes later. Seemed he saw how cheap they were at the local CompUSA and decided to plug one into the secondary network port in his office so he could use his notebook’s wireless instead of the wired connection because no wires “looks better.”
Another problem in this vein is USB. Being able to plug in a peripheral and achieve working status without the need to install drivers has rapidly spread the popularity of personal peripherals. You don’t want to get yourself get sucked into supporting things such as printers that aren’t on your official purchase list -- or external hard disks, DVD drives, sound systems, and even monitors.
Nor do you want the security risk of an employee plugging in a gig or two of empty space into any workstation’s USB port and copying important corporate information. Source code, accounting data, and historical records all can be copied quickly and then walk out in somebody’s hip pocket.
Solution:
 Let employees know what is and isn’t acceptable as corporate peripherals. Keep an accurate asset record of what belongs to the IT department so you can more easily find or ignore the stuff that doesn’t. And if data theft is a problem, think about protecting yourself by disabling USB drives, uninstalling CD-RW drives, or similar measures. The work you do now can save your bacon later.
Moral:
 Asset management isn’t just for the anal. Knowing exactly what’s supposed to be on your network is a key step to solving a wide variety of IT mysteries.

Security silliness
Security should be everyone’s job, from CTO to administrative assistant. It’s surprising how few organizations recognize this.

I think back to a time right after a fairly large network upgrade. All weekend, day and night, had been spent migrating a nightmare network from a hodgepodge of Windows 95/98/ME and even OS/2 clients with NetWare and Windows NT servers to a clean, homogenous utopia of redundant Windows 2000 Servers on the back and Windows XP Professional desktops on the front. Things hadn’t gone quite as smoothly as we’d hoped, so instead of finishing up on Sunday afternoon, we were still putting final tweaks in place on Monday morning.
After we did our last test (making sure all local tape backups were working properly) it was about noon. (Most users by now had logged in, been informed that they needed to choose a new password in accordance with our medium-strong password guidelines, and had chosen a new password.) I stumbled bleary-eyed into the lunchroom for my umpteenth caffeine fix. Chugging my Coke, I almost missed it while mincing out of the lunchroom. But it grabbed my attention from the corner of my eye and caused Coca-Cola to shoot from my schnoz like some enraged soda dragon.
“Password List.” Yes, every user’s new password along with IT and even some specific switch passwords had been printed out by a well-meaning secretary and posted in the lunchroom. After they pried my hands from her throat, she explained that she just figured it’d be easier to post them there than to answer all the phone calls when users inevitably forgot them. So she went around and collected them (in my name), built her list, and posted it.
Solution:
 User training. Passwords should not be regarded as obstacles but as keys for very important locks. Users must be made aware of such concepts, not simply dropped into new environments. If the secretary had been given a clue, she never would have done it, but the only training this company ever gave her was how to use Word.
Moral:
 Preaching may be a pain, but it can sure stop a lot of FUBAR stupidity before it gets very far.

Curiosity killed the kilobyte
These situations can vary, but have the common denominator of a user experimenting with something he knows is dangerous … and not watching what he’s doing. P. A. Dunkin relates a situation that, surprisingly, I’ve encountered myself. (Mr. Dunkin declined his family’s donut fortune in favor of becoming a sys admin for a software engineering firm.)

After a recent virus outbreak, a curious engineer decided to crack open a sample of the virus to “see what made it tick.” But instead of doing this on a PC that wasn’t connected to the LAN or even one using an operating system immune to the virus, he did neither and promptly reinfected the network.
Dunkin’s user had the good sense to come forward immediately -- the guy I had experience with didn’t even realize what he’d done so we didn’t detect the new infection until anti-virus software caught it.
Solution:
 For me, it was multiple areas of virus detection, both server and client. Nowadays you can even get this at the infrastructure layer and I highly recommend it. Just because a virus is killed once doesn’t mean it can’t get resurrected.
Moral:
 Dunkin says his users learned from the experience -- the advantage of having geek users. For many of us, however, his subsequent strategy is applicable: “I maintain an open-door anti-virus policy: No question about viruses is stupid, ever; and any time I have to send out a warning about an especially dangerous threat, I include an offer to help set up whatever measures are required, reminding them that it takes much less time to prevent an infection than to clean up after one.”

Server abuse
You can clean your server till it sparkles, but users can still find ways to abuse them -- especially on the storage front, as reader Yan Fortin relates. Fortin was having such a boring day, he was actually browsing his firewall logs simply for something to do (I hit Playboy.com in that situation, but to each his own). Suddenly, he received a user call that network file access was being denied. Another call prompted him to put down his fascinating log reading and do a little investigating.
“Lo and behold, I had five e-mails warning me that the free space on the F: network share was getting dangerously low. Unfortunately for me, I had turned off the Windows Messenger Service on my workstation, so I couldn’t receive any warning that way. Shame on me.” Indeed.
Fortin searched the drive for every file bigger than 50MB and stumbled upon a marketing user who was copying approximately 30 150MB TIFF files from a DVD to the network. “I called her to inform her that I would delete all her [expletive deleted] files, and did so right after.” Crisis over.
Solution:  Fortin purchased additional hard disk space for the server right after this incident and also had a firm talk with the user about the relatively finite nature of server disk space.
Moral:  Explaining things to inexperienced or even tech-phobic users may be a pain in the posterior, but it sure can save you time, trouble, and screaming managers in the long run.

Telecommuting terrors
Always remember that even telecommuters eventually come to the office. One reader relates the experience of a remote user visiting the home office and immediately killing the entire network. A little laptop investigation showed that the user had decided to configure his laptop as a DHCP server for his home network, which “suddenly made his machine the default gateway for that segment.”

Other examples include mamas and papas who genially allow their kids to play high-end games on the corporate hardware, or (worse) to surf the Internet in all those dark and fringelike nooks that teenagers like to explore on the Web. While the adults are out having dinner, the kids are home infecting the workstation, which promptly begins to spew out viruses the next time daddy either logs in or visits the office.
Solution:
 Perimeter defense. End-point security technologies such as Cisco’s NAC or Microsoft’s NAP are specifically designed to minimize this risk by scanning outside machines the moment they’re connected to the network. Failure to meet with specific criteria, including everything from minimal patch levels to scheduled virus scans, means the PC is dumped into a quarantine area of the network where it can be scanned, updated, and fixed without risk of harm to other nodes.
Moral:
 Talk to your telecommuters. Fair use policies with a little bit of disciplining oomph behind them can go a long way toward having mommy buy her precious offspring their own PC to infect rather than risking her job by letting them use hers.

Ultimate weirdness
This one won our Deepest Chuckle Award. Dave Schultz related an incident in which he tagged a note to a network laser printer informing users that if print quality suffered enough to warrant a toner cartridge replacement, they should first “shake a few times to yield a few additional copies.”

Schultz was later berated because a user suffered a work-related back injury by reading the note, then picking up the entire HP LaserJet 4000 and trying to shake the printer back and forth.
Solution:
Shoot the user, he’s lame now, anyway.
Moral:
Never let your blood pressure get too far into the dangerous numbers and keep a bottle of Advil handy.

(http://www.infoworld.com/print/21822)