When the Sheba Cancer Research Center (SCRC) wanted to transfer 300 terabytes of data from the US National Cancer Institute Center for Cancer Genomics (CCG) Genomic Data Commons (GDC) in Chicago to local storage to advance their research, they thought the process would be relatively straightforward. They acquired resources needed in Sheba’s data center, opened accounts on the GDC Data portal (a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine), received authorization for access to the databases, installed the client tools, and clicked ENTER to begin the transfer of the harmonized datasets. The next message: “Your download will be complete in 43,800 hours” took them by surprise. The existing networking infrastructure was not up to the job.
Dr. Eran Eyal , SCRC’s Head of Bioinformatics asked the Sheba IT department and commercial ISPs for advice. Neither had a workable solution. They even considered traveling from Israel to the Chicago NIH facility with digital storage devices in their suitcases to physically bring the data back to Israel. But the costs were too high and the NIH team had also never come across a lab that wanted to do that.
The GDC portal team suggested they seek advice from IUCC. IUCC already did some work with another Sheba Medical Center unit using IUCC’s ILAN network for high-speed telesurgery applications. But that level of connectivity and speed was nowhere near what was needed to transfer 300 terabytes effectively and securely.
We suggested a 1Gb/sec dedicated link between IUCC and SCRC using existing carrier infrastructure. After contacting the NIH GDC portal staff, we benchmarked the application to make sure the connectivity from Israel would be able to handle a sustained 1Gb/sec load. In October 2017, the line passed our tests and was put to work. The overall system configuration of Sheba and the NIH doesn’t allow them to actually reach the ultimate 1 Gb/sec speed, but it still more than satisfies the needs of the task at hand. The NIH’s TCP infrastructure definitely isn’t ideal. We see peaks of 800 Mb/sec and some valleys of 600 Mb/sec. But in the end, this translates into between 4 to 6 terabytes per day which would allow the 300 terabytes transfer in the planned time frame. Instead of a very unfeasible multi-year run, it was now a very workable three-month project.
Hank Nussbacher is Director of Network & Computing Infrastructure. He has worked at IUCC for the past 30 years and is responsible for network design, the NOC team, the CERT team and the cloud team.