Name: An extensible and scalable Lustre HSM Agent and Data Movers
Start: 2017-06-01T16:20:00-0400
End: 2017-06-01T16:50:00-0400

Lustre User Group
OpenSFS - https://www.opensfs.org/

Back To Schedule

An extensible and scalable Lustre HSM Agent and Data Movers

Feedback form is now closed.

This talk will present a technical overview of and early user experience from Cambridge University's use of the new Lustre HSM Agent and Data Movers, codenamed Lemur. Lemur is an open source project available at https://github.com/intel-hpdd/lemur and can be used in conjunction with any recent Lustre release (2.6+).

Lustre HSM (Hierarchical Storage Management) was developed primarily by CEA and was first introduced in the 2.5 series of Lustre releases. HSM gives filesystem administrators better control over storage utilization by allowing infrequently-used file data to be moved off of expensive Lustre storage onto less-expensive secondary storage.
Unlike the in-tree POSIX copytool which combines the Lustre HSM agent and data movement functionalities in a single binary, the Lemur approach separates these concerns with a modular design which allows new data movement targets to be added more easily. The goal is to enable and encourage a larger ecosystem of Lustre HSM storage tiers (e.g. Amazon S3, Scality, HPSS, etc) and secondary data movement possibilities.
Using modern technologies (the Go programming language, gRPC, etc), highly-performant data movers can be quickly developed to focus on the specifics of new HSM storage tiers without requiring an in-depth understanding and reimplementation of the Lustre HSM agent functionality.

The current releases of the Lemur project include a common Lustre HSM Agent which communicates with modular data mover plugins using gRPC, and two implementations of data mover plugins for POSIX and AWS S3-compatible HSM storage tiers.
The University of Cambridge has been using Lustre for many years as the primary high-performance storage for their research computing clusters. We are currently building a new storage platform for the wider University based around Intel Enterprise Edition Lustre and HSM as a general-purpose research storage area that is continually being archived for disaster recovery purposes. We have been using the Lemur copytool from an early stage in the project in conjunction with the Robinhood Policy Engine and have been impressed by its speed and stability over the in-tree POSIX copytool. Our design currently involves a 1.2PB Lustre filesystem built on Dell storage hardware that is accessed through dedicated gateway machines by users, and a HSM backend tier composed of a Tape archive and a 300TB disk cache that is combined into a single unified POSIX filesystem by QStar Archive Manager.

Presenter

Matt Rásó-Barnett

Cambridge University

John Hammond

Software Engineer, Intel Corporation

John Hammond is a negative eighth level Lustre* morlock at Intel HPDD. His interests include reviewing, deleting, breaking, and occasionally fixing your code. This is his sixth time speaking at LUG.

Authors