Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Wednesday, May 31 • 4:10pm - 4:40pm
Scaling parallel modeling of agroecosystems with Lustre

Sign up or log in to save this to your schedule and see who's attending!

Feedback form is now closed.
Agro-IBIS is a traditional, serial simulation code written in Fortran, which falls into a category of Dynamic Global Ecosystem Models. The code simulates coupled water, energy, carbon, and nitrogen cycles and their interactions on a discretized landscape. The code was developed to decompose the land surface into a grid and process each cell in serial. As data resolution and precision of specifying model forcing (e.g., land management decisions by farmers) increases, the model run-times are prohibitively long. To date, it has not been possible to efficiently conduct sensitivity analyses nor ensemble foreasts, both of which are needed to improve resource management under present and future conditions.

In an attempt to scale the Agro-IBIS code to much larger problem sizes and higher resolution than previously attempted, this research team has used a straightforward domain decomposition on the grid problem, to allow the Agro-IBIS code to solve models on subsets of the complete problem space, and developed a parallel, C++ post-processing code to manage the results of each independent simulation and combine those into a coherent output.

The I/O model in Agro-IBIS was initially designed to run in serial, with multiple output streams to account for the time-evolving solutions for many variables that account for solutions to the internal equations that represent the physical, chemical, and biologic processes occurring in each simulation. The I/O pattern of file access and management required significant tuning to implement Agro-IBIS as an effective parallel application with the BigRed II Cray XE6/XK7 supercomputer and Lustre file systems at Indiana University. Initial runs on BigRed II while computing directly against the Data Capacitor 2 (DC2) 5 PB, Lustre file system were capable of driving IOPS values and write throughput to approximately 25000 and 24 GB/s, respectively, in isolated testing. These initial experiences were slowing the system to unacceptable levels of responsiveness while the system was in production, and the I/O was proving to be the single biggest bottleneck to performance.

With the newly parallelized Agro-IBIS, even with refined efficiency to reduce read and write operations in the code using on-node memory, the filesystem performance of DC2 was the limiting factor. We needed a system that would support development of a working solution, and which could handle the many-file I/O of this parallelized application. To that end, we constructed DCRAM, an SSD-based Lustre filesystem with 35 TB of Lustre (v2.8) storage, with two MDS and six OSS nodes. The 2 MDTs and 12 OSTs are each comprised of four 800 GB Intel SSDs in striped RAID-0 configurations, for highest possible performance. Previous testing had proven sets of 4-drive OSTs in pairs on each OSS had given best performance. DCRAM, like DC2, is connected to BR2 through Infiniband to the BR2 Gemini interconnect through our LNET routing setup. The current approach exclusively uses DCRAM for I/O with intermediate writes happening on compute nodes, with aggressive read-buffer caching via the netCDF library for input/boundary condition data.

In summary, our parallelization of the code exposed excessive I/O operations that were not important when being run in serial. Even with optimization of code and memory management to reduce I/O, the DC2 filesystem was not capable of supporting the I/O demands of the parallelized code without significant performance losses. To that end, the DCRAM SSD configuration was developed. The combined hardware and software modification reduced runtime for a 60-yr simulation across the Mississippi River Basin from about 10 days (single node) to 6 hours (512 compute nodes on BigRed II).

Presenter
SS

Shawn Slavin

Indiana University Pervasive Technology Institute

Authors
HC

H.E. Cicada Dennis

Indiana University Pervasive Technology Institute
R

Robert Henschel

Indiana University Pervasive Technology Institute
SS

Shawn Slavin

Indiana University Pervasive Technology Institute
S

Stephen Simms

Indiana University Pervasive Technology Institute
T

Tyler Balson

Indiana University Pervasive Technology Institute
AW

Adam Ward

Indiana University Pervasive Technology Institute
Y

Yuwei Li

Indiana University Pervasive Technology Institute



Wednesday May 31, 2017 4:10pm - 4:40pm
Alumni Hall (IMU - 1st Floor) 900 E 7th St, Bloomington, IN, 47405

Attendees (7)