Lustre file systems features, like HSM and OST pools based on SSD, have enabled
multiple new use cases, which makes data management of the Lustre file system a new
daily task. The Robinhood Policy Engine is able to do several kinds of data
management based on pre-configured rules and has been confirmed as a versatile
tool to manage large Lustre file systems. However, using Robinhood requires an
external server with high-end CPUs, memory, and adequate storage for the
back-end RDBMS. It also relies on the Lustre Changelogs feature, adding additional
administration effort. We (DDN) want to propose a different, more integrated
approach, by developing a new policy engine named LIPE (Lustre Integrated
Policy Engine).
LIPE scans the MDTs/OSTs directly and maps the required information in memory,
avoiding the need for an external persistent data storage. The implemented
algorithms don’t rely on Lustre Changelog, simplifying the overall
administration.
The core component of Policy Engine is an arithmetic unit that can calculate
the values of arithmetic expressions. An expression is either 1) a 64-bit
integer, 2) a constant name, 3) a system attribute name, 4) an object
attribute name, 5) special function with configurable argument, 6) two
expressions that are combined together by an operator. By defining expressions
with different attributes, constants and operators, users have a flexible and
powerful way to define rules to match objects in MDTs/OSTs.
When LIPE is running against a MDT or OST device, it will scan all the inodes
in the device and attempts to match the inode against the rules. A rule is
matched if the value of its expression turns out to be a non-zero value. When
a rule is matched, an action defined in that rule will be triggered against
the inode.
Different types of actions could be defined as rules, e.g. HSM actions,
counter increasing actions, remove actions, copy actions, etc. Except counter
increasing, other kinds of actions are handled by agents, which could be
implemented as a plugin of LIPE. Thus, new types of actions could be easily
extended for new purposes.
In order to provide clearer functional classification, multiple rules can
be grouped together as a rule group in a manner of sorted list. When scanning
an inode, if one rule in a group is matched against that inode, the evaluation
of the whole entire group on that inode is finished. That means the other
rules below that rule in the list will not be evaluated against that inode
later on.
Multiple rule groups can be defined in a LIPE configuration set. Usually, these
rule groups are focusing on different aspects; for example, one rule group
for HSM, another for size distribution. Other examples of rule groups are
type distribution, access time distribution, modification time distribution,
stripe count distribution, temporary files, location on OSTs, location on OST
pools, etc.
Editing LIPE’s configuration file could be challenging since hundreds of rules
can be defined in a single configuration file. A web-based GUI has been
developed to simplify the configuration and usage of LIPE.
The evaluation of rules and LIPE’s device scanning mechanism is implemented in
a such efficient way that drastically improves the MDT/OST scanning process.
The scanning speed of a single SSD could provide un-cached scanning rate of
more than 1 million inodes per second. And if the server has enough memory to
cache this data, the scanning speed could be as high as 50+ million inodes per
second.
During the presentation, we'd like to provide more details on design and
implementation as well some preliminary benchmark results. Additionally,
possible LIPE use cases will be introduced.
Presenter