Public Power Magazine

Game Changer

From the July-August 2013 issue (Vol. 71, No. 5) of Public Power

Originally published June 11, 2013

By Brent Barker
June 11, 2013
The New York City skyline remained dark at sunrise on Aug. 15, 2003. More than 12 hours after the biggest North American power outage in history left huge swaths of the Northeast in sweltering darkness, much of New York and its suburbs were still without electricity. Image by © Chip East/Reuters/Corbis


It was rush hour in Manhattan on a hot August day in 2003 when the lights went out. Elevators stopped and hundreds of thousands of people were trapped inside the subways. But New Yorkers were not alone in their frustration.

The electric grid had failed massively, beginning in northern Ohio, then radiating outward to the north and east. Fifty million people throughout Ohio, Michigan Ontario, Pennsylvania, New York, Massachusetts, Connecticut and Vermont were without power. It proved to be the largest power outage in American history. It resulted in at least 11 deaths and an estimated $6 billion economic loss in the United States and more than $2 billion in Canada. In some parts of the United States, power was not restored for four days.

The Aug. 14, 2003 blackout proved to be a game-changing event for the power industry. The voluntary reliability standards, which had held sway for nearly 40 years following the 1965 Northeast blackout, were upended by congressional action. It ushered in a new era of mandatory, enforceable reliability standards, and a new culture focused on reliability excellence and accountability. The culture is underpinned by a hybrid system of bottom-up, industry-led standards development, coupled with government-led, top-down regulation and compliance monitoring. After 10 years, the result is significant improvements in reliability performance.

After the Disaster: New Legislation

The day after the blackout, President George W. Bush and Prime Minister Jean Chretien created a joint U.S.-Canadian task force that was charged with determining the causes of the outage and finding ways to reduce the possibility of future blackouts. Within three months, the panel issued a detailed report that sorted out the sequence of events and missteps, and laid out a sweeping series of 46 recommendations. First on the list was a recommendation to Congress to add reliability provisions to two pending energy bills —H.R. 6 and S. 2095. The specific recommendation was the establishment of mandatory and enforceable reliability standards, with penalties for non-compliance. This led to Section 215 of the Federal Power Act, under the Energy Policy Act of 2005. The act authorized the Federal Energy Regulatory Commission to create an Electric Reliability Organization, or ERO, to shoulder the responsibility of establishing and enforcing the standards. The existing North American Electric Reliability Council (subsequently re-named North American Electric Reliability Corp.) was the logical choice for ERO.

NERC had been established in 1968 as a voluntary organization, relying on reciprocity, peer pressure and mutual self-interest within the industry to ensure compliance with reliability requirements. It was an organization suitable for its time, an era when power system reliability was the responsibility of large integrated utilities that owned and operated the bulk power system. Deregulation, however, had changed the paradigm. “The old vertically integrated relationships of the industry were breaking down due to the emergence of competitive generators that didn't have transmission operations,” said Allen Mosher, vice president of policy analysis and reliability standards for the American Public Power Association. “Independents didn't really care about how the grid operated; they just wanted it to be there to deliver their power to wholesale markets.”

Reconciling competition and reliability had become a critical issue for all parties in the 1990s. The electric utility industry began supporting stand-alone reliability legislation as early as 2000-2001, per the recommendation of a U.S. Department of Energy blue ribbon task force, Mosher said. “We were saying, in effect, that a voluntary regime was no longer going to work due to the emergence of competition. We had the general rules right but this was a voluntary club, and like every club you don't have to be a member. That meant that compliance was going to be inconsistent; and since we are all interconnected by the grid, we were all vulnerable to the weakest link. This was proven to be true by the 2003 blackout.”

Tree, Power Line in Ohio Triggered Massive Outage

More than 800 events occurred during the blackout, most of them during the last six minutes when things accelerated. In the lead-up, just after 2 p.m. Eastern time, a 345-kV overhead transmission line in northern Ohio sagged from the afternoon heat, contacted a tree and tripped. An hour later, another 345-kV line just south of Cleveland sagged into a tree and was taken out of service. Power then shifted onto a third 345-kV line, causing it, too, to sag and touch a tree extending too far into the right-of-way. “There were only three lines that actually had tree faults,” said Gerry Cauley, president and CEO of NERC. “The other lines were then fooled into thinking they were overloaded. Voltage was dipping quite a bit in the Cleveland-Akron area, and the relays thought they were seeing an over-current problem and began to trip.”

The threshold moment came at 4:05 p.m., when the Sammis-Star 345-kV line, interpreting the under-voltage and overcurrent as a short circuit, took protective action and tripped. Subsequent analysis suggested that the blackout could have been averted prior to this failure by cutting 1.5 GW of load in the Cleveland-Akron area. But First Energy had lost sight of the situation. Its energy management system had failed earlier in the afternoon. After rebooting, an undetected problem with its alarm system persisted. Midwest Independent Transmission System Operator, the area’s reliability coordinator, was having problems with its state estimator, limiting its situational awareness as well. By 4:09 p.m., outages began to accelerate in an uncontrollable manner. “In a matter of minutes, 300 transmission lines across the Northeast went out,” Cauley said.

“If First Energy had understood what was happening on its system and acted on it quickly, it could have been prevented,” Mosher said. “If relays had been set differently it could have been prevented. If they had maintained their trees better it might have just been a localized outage. A lot of things had to go wrong for it to happen. In electric reliability we talk about planning for an n-1 basis. We lose all kinds of elements every day and maintain reliability. Somewhere between 5[percent] and 25 percent of the generation on the grid is out of service, and somewhere every day between 1[percent} and 10 percent of the transmission lines are out of service. Yet, we reliably serve all customers 24/7. In most cases you make provision for n-1-1 conditions, where you lose one element, reconfigure, then lose another element. On the day of the blackout, I would say First Energy was really up to around n-14.”

Early Standards Focused on 3 Ts

Cauley was in Princeton, N.J., working at NERC when the blackout hit. “We immediately assembled a team of 30-40 people, experts from industry and government and began what became a landmark assessment. A number of CEOs were involved on an executive team. We all recognized this was a real turning point for the industry, a chance to get down to the core issues.” Three issues jumped out: trees, tools, and training—the “three T’s,” they called them.

It had always been questionable whether right-of-way maintenance and vegetation fit within NERC’s purview. Because trees had been implicated as a triggering event in numerous historical outages, and played a direct role in the 2003 blackout, NERC decided that vegetation management belonged in the regulatory framework. By 2007, it established a vegetation standard and immediately began to enforce it. “We levied some pretty significant fines in 2008—in the range of $100,000 to $200,000—and it got everybody’s attention,” Cauley said. “That standard has led to a significant reduction in line-tree contacts in the last five years.”

A relay-loadability standard followed in short order. That became “the single biggest technical change made by NERC in preventing future blackouts,” Cauley said. “We put in a standard that says if the system gets into this weakened condition, the line relays should not trip automatically. They should trip only if there is an actual contact [with a tree] on that line. We are very confident this standard solved the problem. It has been put in place all across North America. Every relay had to be adjusted.”

Situational awareness, a major contributing factor in the blackout, was a much more complex problem, involving people, systems, emerging technology and institutional coordination on an unprecedented scale. “A fundamental of safe reliable operation is for operators to know exactly where they are, and how close they are to the edge,” Cauley said. “Improving awareness involves operator training, visualization tools, information sharing between operators and reliability coordinators from system to system, so there is redundancy in awareness. It is a shared problem, not a weakest-link problem.”

With determined effort, NERC had seen evidence of steady improvement in situational awareness, but got a rude shock on Sept. 8, 2011, when a localized event in Arizona cascaded into Southern California. Nearly 3 million people, from the desert region to San Diego, lost power. “The trigger was a switching error at a substation at Arizona Public Service,” Cauley said. “But the real issue was that some entities in the Southwest lost sight of the value of situational awareness.” NERC is working with the Western Electricity Coordinating Council to prevent similar occurrences going forward, he said.

Standard-Setting Process Involves Industry, Government

The NERC standard-setting process was established by Congress to be a hybrid affair, where the technical expertise of the industry would be drawn together into the drafting and vetting of standards. One result is that the process itself has helped inculcate a closer working relationship within the industry, and create the continuous learning environment that now characterizes NERC regulation. “We decided not to just take a regulatory, rule-based approach,” said Cauley. “We have a hybrid approach, one that emerged from the standards committees. Are we regulators? Yes, we have standards and we do compliance. But we are also focused on learning opportunities, helping the industry improve reliability.”

APPA’s Mosher, who chaired the NERC Standards Committee from 2010-2012, described the standard-setting process. “Under the NERC structure, there is a panoply of industry-driven, but NERC-administered committees. We try to have a diverse balance of industry segments and sectors on those committees. We have teams made up of industry subject matter experts that draft the reliability standards, which are sent out to another body that reviews the standards, makes comments and then votes to approve them. This really complex structure actually works pretty well to integrate the technical expertise of the industry into a regulatory framework.”

Standards are submitted to the independent NERC board and, if adopted, are then sent to FERC and the Canadian regulatory authorities for approval. Once approved, they become mandatory and enforceable with financial penalties. “What we have is an industry-led effort to impose government regulation upon itself,” Mosher said. “Very unusual—the reason it happened here is that all the pieces are interconnected to each other.”

Risk-Based Approach

The power industry, through NERC’s leadership, has adopted the risk-based, preventative approach to bulk power reliability that was in many ways pioneered by the nuclear power and airline industries. The idea is to aggressively go after the many small failures, such as maintenance issues, that can aggregate into larger problems. “Over the last four years we have become more sophisticated in terms of how we manage reliability using a defense-in-depth approach,” Cauley said. “We take care of a lot of little things. The industry averages about 200 events per year that are relatively small in terms of magnitude and consequences, but rich in information. We are instituting a continuous learning process. We look statistically for the opportunities. What things keep recurring?”

A number of databases now feed NERC’s analytical approach to risk management. The Transmission Availability Data System collects transmission outage data in a common format. Its counterpart, the Generation Availability Data System similarly captures information of generation outages. The NERC board of trustees approved mandatory GADS reporting for conventional generating units above 50 MW in 2012 and in 2013 expanded it to include all units above 20 MW. Renewable generation—wind and solar—are not part of the mandatory requirements. Two other databases, involving demand response systems and relay misoperation, respectively, have been added to the mix of data streams that NERC monitors and evaluates.

A relative newcomer, the Event Analysis Database takes a broader look, trying to get to the root cause of outages. “Event analysis is a more complex, organic understanding of the whole event,” Cauley said. “What did the company do? The equipment? The people? How do you reconstruct and learn from that? I have a department at NERC that does risk analysis on this data, trying to extract meaning from it. The big challenge is to get these five databases working together. Our job at NERC is to connect the dots, to see patterns at the higher level.”

Improving Situational Awareness

Event analysis is, by definition, after-the-fact. Situational awareness is a real-time, operational process, made all the more difficult by emerging issues, including variable generation, increasing severity of storms and cybersecurity. “We have a number of situational awareness programs to help guide the industry and an Electric Sector Information Sharing Analysis Center,” Cauley said. “At a macro level, we have a visualization room in our office in Atlanta that allows us to see what is going on in the grid—voltages, line flows, and the like—at the highest level. We have a cybersecurity room in our D.C. office. When we see signs of stress, we talk to the specific reliability coordinators or operators and, when we are really concerned, we communicate across regions.”

NERC has three levels of alerts, all tied to situational awareness. The first is informational, a heads-up alert, requiring no response from operators. The second level is a “recommendation.” This requires acknowledgement that the alert has been received and asks to be informed what specific actions operators have taken. The third level, “essential action,” is the most serious, describing mandatory action that must be taken immediately.

Synchrophasors offer great potential for situational awareness. They are advanced meters that allow precise measurements of phase angles at strategic points on the grid. The readings are time-stamped via GPS so they can be synchronized across vast regions. Phasor measurement units take the pulse of grid conditions at 30 observations-per-second, in contrast to readings made every 4 seconds by traditional supervisory control and data acquisition systems. Thus, they represent a potential improvement in situational awareness of two orders of magnitude. “In 2003, we had a handful of these devices and they became the anchor points of precision when we did the forensics work on the sequence of events,” Cauley said.

Since 2003, the U.S. Department of Energy has invested heavily in synchrophasor deployment as part of the American Reinvestment and Recovery Act. Western Electricity Coordinating Council utilities now lead the nation with hundreds of these units strategically placed, all communicating to a central database. “They give a razor-sharp image of the state of the power system,” Cauley said. “Current state estimation technology involves a bunch of independent measures, each one of which has an error range of 2 to 3 percent. These measures are then averaged to estimate grid conditions. Synchrophasors, when ready, will provide a leap to the next generation of precision.”

But they are not quite ready for prime time, largely because they generate a fire hose of data that can’t be readily absorbed. “The issue with synchrophasors right now is that you can only use them for forensics,” said Nathan Mitchell, director of electric reliability standards and compliance at APPA. “The tool that will allow us to use synchrophasphors for real-time data evaluation hasn’t been invented yet. Everybody would love to have that level of situational awareness, but it doesn’t yet exist. We just don’t have the technology to evaluate that much data coming so fast.”

Cybersecurity Is Growing Focus of Reliability Standards

Mitchell is vice chairman of the NERC Critical Infrastructure Protection Committee focused on cybersecurity. “The cyber system is used to control the bulk electric system. If you are attacked or fail to properly update software, you’ll go black just as surely as if the transmission line were taken out of service. Cyber wasn’t a big concern at the time of the 2003 blackout but now it is a very hot issue. Standards are being upgraded on a regular basis—we are up to version five.”

NERC now has nine mandatory critical infrastructure protection standards covering everything from sabotage reporting to the identification of critical cyber assets. NERC has conducted the first ever grid security exercise for the U.S. electric sector. The ES-ISAC (Electricity Sector Information Sharing and Analysis Center) gathers information from around the grid about security-related events, disturbances, and off-normal occurrences and shares that information with government agencies. In turn, these government entities provide ES-ISAC with information regarding risks, threats and warnings.

The Future of Reliability

No one can for sure whether cascading blackouts of the scale and consequence of the one in 2003 are a thing of the past. Most close observers are both optimistic and cautious about the future. Cauley sees NERC’s primary role of, “driving the chance of another major blackout as close to zero as we can. You can never say zero, but we are definitely improving reliability by eliminating a lot of underlying causes that tend to pile up and by improving the resiliency of the grid,” he said.

Mosher feels that the hybrid system that taps industry and government expertise and commitment bodes well for the future. Reliability is the one area where the usual divisions of the highly diversified electric power industry tend to fade away, he said. “When we go before FERC, we find we all agree on the same policy structure. People [who] fight like cats and dogs on market-related issues all agree on the same set of standards. That’s a remarkable thing. When it comes to reliability, we are all on the same page.”


Average Rating:

Please Sign in to rate this.


  Sign in to add a comment

Members of the American Public Power Association receive Public Power magazine as part of their annual dues payments.  The subscription rate for non-members without the annual directory is $100 per year in the United States and $130 per year outside of the United States. A subscription that includes the annual directory is $200.  The annual directory alone can be purchased for $150.

Public Power is published eight times a year by the American Public Power Association. Opinions expressed in single articles are not necessarily policies of the association.

The Sheridan Group of Hunt Valley, Md., is the authorized exclusive seller of reprints of articles published in Public Power magazine. Reprints may be ordered online.

Manager, Integrated Media
David L. Blaylock

Integrated Media Editor 
Laura D’Alessandro 

Senior Vice President, Publishing
Jeanne Wickline LaBella

Art Director
Robert Thomas III