On the Edge of the Edge: Taking Supercomputing to Space
May 7, 2021 • By Amelia Williamson Smith, Staff Writer
On Earth, scientists are used to having high-performance computers at their fingertips. Such computing capabilities are critical to analyze the rich data from experiments and extract the insights needed to make valuable scientific and technological advancements. But what if your laboratory is not on Earth—but in space?
Can you take a high-performance computer with you that has not been “hardened” to withstand spaceflight? Can it survive the rough forces of launch? Will it work in the extreme space environment, where solar flares, galactic radiation, and cosmic rays may interfere with computing?
Researchers at Hewlett Packard Enterprise (HPE) wanted to find out, so they packaged an unmodified commercial off-the-shelf (COTS) high-performance computer and sent it to the International Space Station (ISSInternational Space Station). HPE’s computer, Spaceborne Computer-1 (SBC-1), remained on the ISS for more than 1.5 years (657 days) in the first long-term demonstration of supercomputing capabilities from a COTS computer system on the space station. The SBC-1 mission was sponsored by the ISS U.S. National Laboratory.
“The vision for the Spaceborne Computer program is: Can astronauts fly with the latest COTS computer, and will the hardware stand up to the harsh conditions if it is given some smart software to take care of itself? And SBC-1 was a great success,” said Eng Lim Goh, HPE senior vice president and chief technology officer for artificial intelligence (and principal investigator of the SBC-1 mission).
Not only did SBC-1 work in space—the demonstration was nearly flawless. With an innovative approach to protect the system’s hardware using specially designed software, SBC-1 was able to continue successful operations throughout the duration of the mission, despite the extreme conditions in low Earth orbit(Abbreviation: LEO) The orbit around the Earth that extends up to an altitude of 2,000 km (1,200 miles) from Earth’s surface. The International Space Station’s orbit is in LEO, at an altitude of approximately 250 miles., and never once got an incorrect answer or had an interrupt due to the computer. SBC-1 also achieved a significant milestone while in orbit: running one teraflop, which amounts to more than one trillion calculations per second, for the first time in space.
Space-Based Supercomputing Success
The project was so successful that HPE sent a follow-on mission to the space station through the ISS National Lab, Spaceborne Computer-2 (SBC-2), which launched on Northrup Grumman’s 15th Commercial Resupply Services (CRS) mission and was just installed and began operations onboard the ISS. SBC-2 will incorporate lessons learned from SBC-1 but will also feature twice the processing power and will include graphics processing units and other artificial intelligence and edge processing capabilities. More importantly, it will also allow real users to leverage the system for in-space processing.
Sending high-definition imagery and large datasets from the ISS to Earth for processing is time-consuming and uses a lot of network bandwidth. Processing the raw data on the ISS and then sending down the results would save both time and bandwidth.
“This system will be valuable going forward,” Goh said. “And if it proves itself even further, I think there is strong motivation to provide it to astronauts for long-duration space exploration, where the communication time back to Earth gets longer and you can rely less and less on immediate responses from computing power on Earth.”
HPE’s successful demonstration benefits not only space-based computer systems but also computers on Earth that operate in harsh environments, said HPE’s Mark Fernandez, lead software engineer for SBC-1 and principal investigator for SBC-2.
“Part of HPE’s mission is edge computing, and we have a whole line of products that are meant to be on the edge—which could be on an offshore oil rig, in the depths of a mine, or in the very back of a massive warehouse,” Fernandez said. “Processing data on the edge is valuable to HPE, as is learning what works and doesn’t work and how to take those consequential actions when things go awry, and there’s no better place to do that than at the edge of the edge, which is the space station.”
A Consequential Design: Hardening With Software
To be able to continuously operate in the extreme space environment, HPE’s computer system needed to be autonomous. It had to not only monitor the hardware but also take action when needed to avoid failures and loss of data, Fernandez said.
Traditional “hardened” electronics are expensive and are designed around the anticipation of specific conditions that could damage a computer, such as radiation. However, in space, it is not always easy to know the exact conditions a computer may encounter. So HPE took a different approach—instead of designing the system for what might damage the equipment, the team considered the possible consequences of the damage and what mitigation would be necessary to continue successful operations.
“We built a whole suite of software around that idea, which we collectively call ‘hardening with software,’ and it proved to be invaluable during SBC-1,” Fernandez said. “The software would kick in and slow the computer down as needed.”
The software monitored all aspects of the hardware, and when the system detected conditions outside the established parameters, it would alert the HPE team on the ground and go into a safe state until the problematic conditions passed. Although the system could not determine what was causing the problem, the software was designed to reduce performance in a stairstep fashion to keep the hardware safe.
As a first step, the system would drop down from optimal performance and reduce its speed to run more slowly. If the problematic conditions persisted, the system would then drop down to idle. Finally, if needed, the system would power itself down to remain safe.
“Looking at all the parameters, the system would go through this stairstep decline in performance,” Fernandez said. “You would rather be running slowly than not running at all, you would rather be powered on than powered down, but you would rather be powered down than damaged.”
Establishing Proof of Concept
After arriving at the ISS, SBC-1 was installed and achieved its first milestone: powering up. “Installation was very exciting,” said HPE’s David Petersen, lead hardware engineer for SBC-1 and SBC-2. “It was a very iterative process, and when we energized the system and got a ping command back, that alone was a big accomplishment.”
The HPE team sent two identical computer systems to the ISS for the SBC-1 mission and kept two identical systems on Earth as controls. This allowed the team to confirm SBC-1 was getting the correct results and to compare the time it took to get the results on Earth versus in space.
After installation, the team began running several benchmark codes to assess SBC-1’s performance and establish proof of concept. While running the High-Performance Linpack benchmark, which is used to rank supercomputers according to speed, SBC-1 achieved another major milestone: running an impressive 1.1 trillion calculations per second.
“I think it must have been the first time astronauts in space were able to get that much computing power—one trillion floating point operations per second,” Goh said. “And SBC-1 ran it quite quickly. The system completed the benchmark before the space station completed one revolution around the Earth.”
Overcoming Challenges in Orbit
SBC-1 continued successful operations throughout the mission despite the challenging conditions in space. A common issue encountered was network loss of signal back to Earth, lasting from a few seconds to half an hour. In addition to loss of signal, power interruptions also presented challenges. The ISS is powered by solar cells, and the power distribution on station can fluctuate. There were also several instances of unplanned power loss—from a circuit breaker tripping to an astronaut accidentally bumping into SBC-1 and powering it off.
“We planned for the consequence of losing power; we didn’t really plan for an astronaut’s knee to bump into SBC-1’s emergency power switch,” Fernandez said lightheartedly. “The power was cut off, and when the power was restored and it was safe to begin operations again, we picked up right where we left off with no loss of data, so that was pretty exciting.”
SBC-1 did experience more correctable errors than the Earth-based control systems. The HPE team thinks this is likely due to solar flares, galactic radiation, and other phenomena encountered in space. During the SBC-1 mission, the team ran one of the two systems on the ISS as fast as possible and the other as slow as possible to see if one would encounter more errors than the other.
Running more slowly uses less power and less cooling, and it takes longer to get the answers, but the team thought it would also make the system less susceptible to errors. However, that turned out to be false. Both the fast- and slow-running systems encountered about the same number of correctable errors, Fernandez said. This is important because it showed that running at a high speed does not appear to increase the likelihood of errors.
The space environment also took a particular toll on SBC-1’s solid-state disks—out of 20 solid-state disks, 9 failed during the mission. The system had redundant copies of all data, so no data was lost, but the team plans to try different methods to better protect the solid-state disks during the SBC-2 mission.
Ending a Successful Mission and Beginning In-Space Processing
After more than 1.5 years of successful operations, SBC-1 was powered down for its return to Earth on SpaceX CRS-17. “Decommissioning was probably the hardest day in the whole process,” Petersen said. “Because of the success of the system for so long, the day we actually had to power it down was really a tough day.”
After the SpaceX capsule splashed down in the Pacific Ocean, the HPE team was anxious to retrieve SBC-1 and do the “shake, rattle, and roll” check to see how the system fared during its return. “When I applied power to the system, it booted right back up, and in a matter of minutes, we were back running the same benchmarks we were running in space,” Peterson said. “To successfully go through launch and then return and have the hardware come back and power up without any deformation or degradation I think was exceptional.”
The SBC-1 mission was originally planned to last one year but was extended due to changes in the ISS cargo schedule. During its extended time on station, SBC-1 had its first real user. NASA Langley’s Entry Descent and Landing team used SBC-1 to run code to advance the software being developed for the Mars lander. The code ran with no errors and even ran faster on SBC-1 than on the team’s earthbound computers.
For the SBC-2 mission, HPE’s computer system will be open to any investigator that could benefit from in-space processing. “We will run a set of benchmarks to prove that it works as we did in SBC-1 and then open it up for use out there on the edge of edge,” Fernandez said.
In-space processing could significantly benefit both scientists running experiments on the space station and ISS crew members. For example, a scientist studying lightning from the ISS may only need imagery taken so many seconds before and after a lightning strike. Instead of sending all the imagery to Earth to extract the data of interest, SBC-2 could process the imagery onboard the ISS and only send down the relevant data.
Additionally, objects 3D printed on the ISS must go through a quality control process in which cameras inspect the object to ensure the accuracy of the print. Instead of sending all this imagery to Earth for processing, SBC-2 could process it onboard the ISS and immediately notify the crew that the object is safe to use.
Looking to the future, in-space computing will also be critical for missions to Mars and beyond. “The purpose of exploration is insight, not the data,” Fernandez said. “If we can take computational resources with us on our mission and they can give us the correct answers, then we can collect the data, process it, and hopefully come up with the necessary insights to continue on our mission.”