Tuesday, August 23, 2011

Screw Ups Happen: Recovering from a Work Related Screw Up

This is part two in my series "Screw Ups Happen." Today we explore something going wrong on your watch and how to recover from it.

The Set-Up: It's the third release in a series; the other two have gone off with nary a hitch. When the execs ask if you can release on a Friday since the product has been stable for previous releases, you reluctantly agree to it. You stay until 7 pm and life is good, so you go home. You do not check email until Monday when you realize that something went horribly wrong early Saturday morning and people were either unable to look up your phone number or somehow expected you to be checking email at 4 am. When they did finally call you, your phone was out of charge (which you did not realize until Monday), and the emails get worse. You respond Monday at 8 am and try to get things settled down with people who are really pissed off and think you screwed up.

Managing This:

1) Accept you screwed up, because you did. You know releasing things on Friday is a bad idea because things can be broken over the weekend, but you did it, anyway (nevermind the execs that pressured you into it, they won't see it that way come Monday). You compounded this issue by not checking email or checking to see if your phone was charged. As a manager, when things go wrong you point people to yourself and not your team so they can get the work done and not get yelled at. When they can't find you, they yell at your team, randomize them, and without your additional knowledge of the situation and your team, possibly make the whole situation worse.

2) Accept your screw up publicly, via email. Do not go into detail. Do not make excuses. Make it short so it's hard to argue with or pick apart.

3) In the same email, list the things that can be done immediately to rectify the situation. Should be high level, no more than five things (preferably three) and give the overall impression that you know what to do and who to do it, and it's being resolved.

4) In a separate mail, promise a "post mortem" or final meeting after the crisis is over to analyze what went wrong to prevent it happening again. Yes, it seems likely that "charging your phone" and "not launching on a Friday" are the big learnings, but when this type of screw up happens, it's not about logically resolving the issues, it's about restoring trust and emotionally addressing people so the panic ends.

5) DO NOT CALL A BIG MEETING INCLUDING EVERYONE INVOLVED TO DISCUSS THE ISSUE. These are a) a waste of time b) a platform for the blaming to resume/continue/get worse and c) a way to allow your team to come to harm either by reputation or if one of those folks in the meeting insists that your team members that worked on the area that borked ("borked" being a technical term derived from the Swedish Chef to mean "all messed up) be in the meeting and then climbs all over them with no technical knowledge to understand why it's not their fault. Note: this is the favored method of managing an issue by execs eager to put out a fire and stop looking bad. Resist it. Tell your people to decline it if someone else calls the meeting, and if necessary, you attend, alone. But do not put your people through it, and keep your temper in the meeting, singing the "I'm responsible for what happened, so I know how to address the issue and resolve it," song as often as necessary.

6) While you're assuaging the overall panic that has occurred, you need to also be juggling fixes/solutions. Get your team on the issue and, if possibly, make executive decisions with their data to correct the problem. List other possible fixes, of course, but pick something and fix immediately. Many folks want to be consulted in which fix will be selected, which makes those people happy that additional mistakes will not take place because they are involved with the decision-making process. But what it really does is leave your system borked for longer while people who are not the experts (like your team) argue over the best approach.

7) After fixing it/getting it fixed, take the other possible options to the table of the stakeholders (not everyone who was ever interested, but the people who are on the line for the project). Tell them that for the immediate future your team selected option A because of whatever good reasons you selected, and then give them the other available options (high level) with pros and cons. If this was a PowerPoint extravaganza, there should be no more than one slide for each option, and the font for each slide should be 20-30 point. Preferably, though, you won't use PowerPoint, you'll just diagram on a white board.

8) Whatever the stakeholders decide, send an email to all involved and let them know the decision and the implementation plan and schedule. If they agreed with your emergency response for the long haul, explain the reasons, and refer any additional questions/issues to the stakeholders.

9) Implement the plan

10) Have a meeting with your team after the emergency has passed and talk about what happened and what you can do to prevent it. Do not talk about or focus on blame. Your team gets paid for the work they do, so presumeably no one screwed this up on purpose; get to the heart of the discussions and decisions that led to the issue, and again, admit your culpability. I like to put three columns together: Things that went right, Things that went wrong, and Action Items. Then the team and I fill the first two columns. When we've got the wrong and right, we derive action items to preclude the issues that came up happening again and fill in that column. It's not required--but I like to do it--I often add an additional column of Kudos; these are thanks, usually to people or departments, who came through for us. This means recognizing the good work under pressure in your own team, but also collecting those names and departments and sending thank you emails to those folks (with their boss's cc'd) after your meeting with your team is over. Just because something crappy happened doesn't mean that people who did an amazing job should go unnoticed.

11) Call a meeting of all panickers and stakeholders and give each person 1 minute to discuss their concerns and issues. Time it. They'll go over a bit, but you want this to be a 1 hour meeting, and you need 20 minutes to explain the action items your team came up with. But these folks need to feel heard. Route them when they go all blame-game; let them know that knowing what went wrong is imperative to preventing it in the future, but blaming individual groups or people is not actually productive and can be harmful. People need to be free to make mistakes or they don't take chances and do amazing work. Accept your culpability in the mess. If they make suggested action items during their time to talk, record them on a white board. Then include them in your discussion of the Action Items your team has come up with to prevent this happening again. Map action items to things that went wrong as defined by the people in this meeting (which may not correlate, exactly, to what went wrong, but they need the info in a way they understand). Ask for any additional action item suggestions, and then tell them that there will be an "artifact" created after this meeting of the discussion and final decisions. THEN END THE MEETING. People will want to keep talking; some are still scared, others are lobbying for position like wolves in a pack. Keep control. You screwed up, yes, but that doesn't make you any less the professional you have always been, and they're going to push you on that. Stand your ground. Control the meeting.

12) Write up a document of the two meetings and place it in the location where your team and other teams can see it. Write a final email on the topic and point to the artifact. If people wish to continue to discuss, disrupt their ongoing email conversation and suggest they meet with you directly. Use hall-way convos and impromptu discussions, but get it out of everyone's email box so everyone can get on with whatever the next big crisis is.

13) Try not to screw up again. Note, you will. Your team will. Failure is a part of success. I should amend this to "try not to screw up in the same way again." People want to trust you learn from your mistakes.

In all (and usually), making an honest mistake is not a firing offense, even if there is an avalanche of other small issues that transform together into a Voltron of serious "wow this is bad."

What determines your future at review time is how you recover from mistakes that are going to happen and how you show others that you have learned from those mistakes. No one is perfect, and attempting to go through something like this without admitting culpability just makes you look worse no matter how wonderfully you rescue the thing. Not allowing your team to take it in the chops is also very important as it trains the stakeholders that you are in charge, and they can't go around you to give orders to people whom you manage. It also trains your people that you are there for them, and will fall on your sword when its your sword to fall on, rather than blaming them and letting them take the heat.

On the whole, while unpleasant, this type of screw up (or ones like it), eventually build a stronger team and impress management over time. Like a broken bone grows back stronger, mistakes show people what you're capable of almost more impressively than your successes, which they might take for granted.

No comments:

Post a Comment