Loading…
LUG17 has ended
Back To Schedule
Thursday, June 1 • 10:50am - 11:20am
LNet network health

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
LNet Multi-Rail - which is included in the upcoming community Lustre 2.10 release - has implemented the ability for multiple interfaces to be used on the same LNet network or across multiple LNet networks utilizing the underlying homogeneous or heterogeneous fabrics. The LNet Health feature, which is targeted for the community Lustre 2.11 release, will add the ability to resend messages across different interfaces when interface or network failures are detected.

The implementation of this feature at the LNet layer allows LNet to mitigate communication failures before passing the failures to upper layers for further error handling. To accomplish this, LNet Network Health depends on health information reported by the underlying fabrics such as MLX and OPA, as well as monitoring the transmit timeouts maintained by the LND.

This implementation also provides the ability for LNet to retransmit messages across different types of interfaces. For example, if a peer has both MLX and OPA interfaces and a transmit error is detected on one of them then LNet can retransmit the message on the other available interface.

LNet Network Health will monitor three different types of failures, each dealt with separately at the LNet layer:
  • Local interface failures as reported by the underlying fabric to the LND.
- LND will notify LNet of the failure and LNet will mark the health of the interface. Future LNet messages will not be sent over that interface. The interface will be added on a queue and will be pinged periodically to attempt recovery.
- LNet will attempt to resend the message on a different local interface if one is available. If no interfaces are available for that peer, then the message fails and the failure is reported to PtlRPC, which will commence its failure and recovery operations.
  • Remote interface failures as reported by the remote fabric.
- LNet will demerit the health of the remote interface, thereby reducing its overall selection priority. If a remote interface is consistently down, it will be marked as down and will not be selected. It will be added to the recovery queue and pinged on regular intervals to determine if it can continue to be used.
- LNet will attempt to resend the message to a different interface for the same peer. If no interfaces are available for that peer, then the message fails and the failure is reported to PTLRPC, which will commence its failure and recovery operations.
  • Network timeouts.
- LNet will demerit the health of both the local and remote interfaces, since it’s not deterministic where the problem is.
- LNet will attempt to resend the message over a new pathway altogether. If none are available then the message fails and the failure is reported to PtlRPC, which will commence its failure and recovery operations.

In all failure cases, LNet will continue attempting to retransmit the
message up until the peer_timeout expires. If the peer_timeout expires
and a message has not been successfully sent to the next-hop, then
the message fails and LNet reports the failure to PtlRPC.

Presenter
A

Amir Shehata

Intel
Amir has been working in HPDD at Intel on the Lustre Networking module (LNet), since March 2013. He worked on Dynamic LNet Configuration, Multi-Rail and LNet Health/Resiliency features.


Thursday June 1, 2017 10:50am - 11:20am EDT
Alumni Hall (IMU - 1st Floor) 900 E 7th St, Bloomington, IN, 47405