In the HPC landscape, the number of cores in client machines is continually increasing. However, even in parallel applications, single-thread Input/Output (I/O) remains very common. While Lustre was originally optimized for multiple I/O operations at the same time, single-threaded applications cannot utilize this optimization if a single core is slow. Therefore, the time cost of single thread I/O in parallel applications cannot be reduced by simply adding more compute nodes.
There is optimization that can be done in the Lustre client to provide a significant performance gain for single-threaded applications or parallel applications containing large amounts of single-threaded I/O. Each stage of the Lustre I/O flow has been analyzed and an overview of the potential solution will be presented that will be a critical improvement for the utilization of many-core and multiple network interface architectures that we see in clusters today.
A proof-of-concept solution has been developed and tested with a real-world Hybrid Coordinate Ocean Model (HYCOM) application to demonstrate the significant performance gains that can be realized on many-core architectures. The HYCOM application performs a large amount of data reads when launched, before beginning intensive compute operations to analyze the data. If the I/O process on a many-core architecture is slow, it extends the total run time and is a primary bottleneck to increasing application throughput. With the proof-of-concept solution that will be presented, the I/O time for this application was significantly improved and the application’s performance was no longer restricted by I/O operations.
This development is being targeted for the community 2.10 release and full details can be read in the JIRA ticket - https://jira.hpdd.intel.com/browse/LU-8964
Dmitry is in support and development team of High Performance Data Division at Intel. He mostly focused on adoption Lustre to new Intel hardware like Xeon Phi, Omni-Path, etc.