Cloud computing services make it possible to promote groundbreaking research. This case study is about how a research group from Ben-Gurion University using cloud services to advance their research, achieved performance and cost levels that were not possible in a local computing environment.
Researchers in the Materials Physics group, part of Ben-Gurion University’s Department of Materials Engineering, study and develop models of physical and mechanical properties of materials under extreme conditions which can be applied to basic materials science studies and novel materials design. One aspect of their work focuses on important research studying “dislocations” in metals using numerical simulations. Dislocations are a critical factor in the mechanical properties of every crystalline material, including metals. Studying and understanding this phenomenon is crucial in any application, manufacturing process or product where the mechanical properties of metals are of utmost importance. From screwdrivers and golf clubs, to medical devices, vehicles, and rockets, understanding dislocations is fundamental to producing safer, stronger, and more reliable new metal and alloy products.
Even though dislocations have been studied for over a century, the problem has always been the difficulty accessing the relevant time and length scales of the processes involved. While some of these processes involve groups of many individual dislocations interacting as a collective over time spans of seconds and hours, underlying processes involve individual dislocations that occur within just a few millionths of a billionth of a second or over lengths of only several atoms long.
“This huge disparity in both time and length scales is intractable, both experimentally and computationally, within a one study realm,” says Eyal Oren, a PhD student and researcher in the group. “Parts of these spans were accessible experimentally. For instance, we could study the collective behavior of groups of dislocations over seconds through days. But the smaller extremes were completely out of our reach until the early 1990s, with the advent of sufficient hardware and software computational tools.”
Measuring Up Beyond Atoms
Many dislocation-related processes were successfully reproduced in atomistic simulations. This verified previous theoretical and experimental studies. But these processes were only relevant to extremely small length and time scales – because the simulation objects were atoms. Research therefore focused on ways to deduce conclusions applicable for lengths and times that were larger and longer than atomistic models enable. Since dislocations are, in fact, line defects in the perfect lattice arrangement of the atoms comprising the crystal, the theorized and verified properties of dislocations could then be transferred to other simulation codes to describe greater material volumes over longer simulated periods, since the objects that already underwent simulation were no longer atoms.
A crucial phenomenon in dislocation motion is known as “cross-slip”. Current codes that simulate dislocations directly, lack the ability to include that phenomenon, precisely because of the lack of detailed knowledge about the cross-slip process. To understand this phenomenon, imagine that dislocations cross from the crystal plane, in which they slip, to another crystal plane. This allows them to circumvent obstacles that inhibit motion while plastic (irreversible) deformation of the material occurs. In many metals, known as face-centered-cubic types such as aluminum or copper, the dislocations are not readily free to cross to other planes of motion. In order to do that, they must overcome an energy barrier. This barrier is a function of the material, as well as the applied stress and temperature conditions. So cross-slip is in fact mathematically equivalent to any other kinetic process, which researchers can simulate. “By simulating many instances of the same conditions in a given material, we can accumulate the statistics for ‘reactants’ and ‘products’,” explains Dr. Oren. These statistics let the team derive the kinetic properties of the material’s cross-slip process. With this generated knowledge, future research by this group and by others, may be able to be applied to other, larger-scale simulation codes.
The more simulations the merrier
More simulations deliver better statistics, and lowers uncertainties. The question was how to achieve this in the fastest, most cost-effective way possible. The team explored on premise options. But it was clear that for this research, a more distributed, flexible offer from a public cloud vendor was the optimal solution.
With the support of Israel’s Inter-University Computation Center, which offers a variety of public cloud services to academic researchers at Ben-Gurion University and all of Israel’s research universities, a spec and plan to contract services from Amazon Web Services (AWS) was drawn up and put into production.
Amazon Spot instances allowed the team to multiply the number of simulations running per thermodynamic condition, in parallel to the simulations running on local machines. For these computations the most important resource was core count so instance types of the “c” family were the natural choice, with machines of up to 96 vCPUs. This generated results that already had sufficiently low uncertainties – all at the same time. Moreover, the fact that this was already in production mode meant there was no need to inspect the simulations closely. Thus, interruptions of long simulations were no longer an issue since each simulation’s state was saved at regular intervals, while instance interruption behavior was set to “stop” instead of the “terminate” default.
At the same time, the group was able to save a great deal of money relative to on-demand instances, up to approximately a 75% discount. “This way, the “money burn” through leasing was not significant in the overall picture,” says Eyal. “Now that we are in production, these Amazon spot instances became favorable over purchasing new machines. In fact, the use of an external cloud solution was helpful in the development stage as well, when large simulations were carried out that could not be done on local computers. AWS was helpful to overcome any temporary lack of resources.”
Eyal Estrin, IUCC Cloud Architect, notes that the combination of cloud infrastructure, a pay-per-use-based payment model, and the ability to simply adjust resources according to changing research needs, reduces dependence on physical hardware and helps advance groundbreaking research.
For more information on the research: