How backups, backups, backups protect NYC’s cellular infrastructure

How backups, backups, backups protect NYC’s cellular infrastructure

The infrastructure that underpins our lives is not something we ever want to think about. Nothing good has come from suddenly needing to wonder “where does my water come from?” or “how does electricity connect into my home?” That pondering gets even more intense when we talk about cellular infrastructure, where a single dropped call or a choppy YouTube video can cause an expletive-laden tirade.

Recently, I visited Verizon’s cellular switch for the New York City metro area (disclosure: TechCrunch is owned by Oath, and Oath is part of Verizon). It’s a completely nondescript building in a nondescript suburb north of the city, so nondescript that it took Verizon’s representative about 15 minutes of circling around just to find it (frankly, the best security through obscurity I have seen in some time).

This switch, along with its sister, powers all cellular service in New York City, including three million voice or voice over LTE (VoLTE) calls and 708 million data connections a day. High-reliability and redundancy is a must for the facility, where dropping even one in 100,000 connections would create more than 7,000 angry customers a day. As Christine Williams, the senior operations manager who oversees the facility, explained, “It doesn’t matter what percentage of dropped calls you have if you are that person.”

As we walked through the server rows that processed those hundreds of millions of connections, I was surprised by just how little digital equipment was actually in the switch itself. “Software-defined networking” has taken full hold here, according to Michele White, who is Verizon’s Executive Director for Network Assurance in the U.S. northeast. As the team has replaced older equipment, the actual physical footprint has continued to downsize, even today. All of New York City’s traffic is run from a handful of feet of server racks.

The key to network assurance is two-fold. First is multiple levels of redundancy at every level of the infrastructure. Inside the switch, independent server racks can take over from other servers that fail, providing redundancy at the machine level. If the air conditioning — which is critical for machine performance — were to fail, mobile AC units can be deployed to pick up the burden.

All equipment in the building is serviced by DC power, and in the event of an external power loss, two diesel generators connected to a large fuel storage tank will take over. The facility is also equipped with battery backups that can sustain the facility for eight hours if the generators themselves don’t function appropriately.

Diesel generators can sustain power to the switch in the event of an external power outage

At a higher level, the switch and its sister share all New York City cellular traffic, but either one could handle the full load if necessary. In short, the goal of the switch’s design is to ensure that that no matter how small or large a problem it might experience, there is an instant backup ready to go to keep those cellular connections alive.

The other half of network assurance is centralization, something that I was surprised to hear in this supposed era of decentralization. Cellular sites in an urban area like New York are often placed on buildings, as anyone looking at roof lines can see from the street. Given those locations, it can be hard to provide backup generators and other failover infrastructure, and servicing them can also be challenging. With centralization, increasingly only the antenna is located at the site, with almost all other operations handled in central control offices and switches where Verizon has greater control of the environment.

Even with intense focus on redundancy, natural disasters can overwhelm even the best laid plans. The telecom company has an additional layer of redundancy with its mobile units, which are placed in a “barnyard” owing to the names of the equipment stored there. There are GOATs (generator on a truck), and COWs (cell on wheels), and BATs (bi-directional amplifier on a truck). These units get deployed to areas of the network that either are experiencing unusually strong demand (think the U.S. Open or a presidential inauguration) or where a natural disaster has stuck (like Hurricane Harvey).

A barnyard filled with animal-named mobile cell infrastructure, including COWs, COLTs, HORSEs, and others

That said, both White and Williams noted that mobile cell deployment is much rarer than people would guess. One reason is that cell sites are increasingly being installed with Remote Electrical Tilt, which allows nearby cell sites to adjust their antennas so as to provide some signal to an area formerly covered by an out-of-commission cell. That process I was told is increasingly automated, allowing the network to essentially self-heal itself in emergencies.

The other reason their deployment is rare is that network assurance already has to handle a remarkable amount of surging traffic throughout the normal ebb and flow of a dense urban city. “Rush hour in Times Square is pretty heavy,” noted Williams. Even something as heavy as a parade through Midtown Manhattan won’t typically exceed the network’s surge capacity.

One other redundancy that Verizon has been exploring is using drones to provide more adaptive coverage. The company has been testing “femto-cell” drone aircraft designed by American Aerospace Technologies that can provide one square mile of coverage for about sixteen hours. A drone capability could be particularly useful in cases like hurricanes, where roads are often littered with debris, making it hard for network engineers to deploy ground-based mobile cells.

I asked about 5G, which I have been covering more heavily this year as telecom deployments pick up. Given the current design of 5G, White and Williams didn’t expect too much change to happen at the switch level, where most of the core technology was likely to remain unchanged.

The trend that is changing things though is edge computing, which is in vogue due to the need for computing to be located closer to users to power applications like virtual reality and autonomous cars. That’s critical, because 50 milliseconds of extra latency could be the difference between an autonomous car hitting another vehicle or a new support pylon and swerving out of the way just in time.

Edge computing in many ways is decentralizing, and therefore there is a tension with the increasingly centralized nature of mobile communications infrastructure. Switches like this one are getting outfitted with edge technology, and more installations are expected in the coming years. 5G and edge are also deeply connected at the antenna level, and that will likely affect cell deployments far more than the switch infrastructure itself.

Edge, internet of things, 5G — all will increase the quantity and scale of the connections flowing through these networks. In the future, a cellular outage may not just inconvenience that YouTube user, but could also prevent an automobile from successfully navigating to a hospital during a natural disaster. It takes backups, backups, and backups to prevent us from ever having to ask, “where does that signal come from?”