Where Do We Need Smart NICs?

I am copying the title from a post of Ivan Pepelnjak, who runs ipSpace.net, one of the most well-respected blogs on computer networks.

There is not much technical discussion on SmartNICs, I welcome Ivan’s post, and I want to provide my perspective.

There is no doubt in my mind that SmartNICs are needed components of any modern cloud-based architecture. The proof is in the pudding: all cloud providers run some form of SmartNIC. Some think it is so crucial that they have significant internal development efforts (e.g., AWS Nitro). The trend is toward designing Enterprise data centers using a cloud architecture and therefore to adopt many of the same solutions seen in public cloud.

The need does not arise only from performance, mainly discussed in Ivan’s post, but also from many other aspects discussed in the following.

Performance, Latency and Jitter

I don’t doubt any of the performance discussion in Ivan’s post, but the question is: “what’s left of your server after it does any meaningful processing on 200 Gbps of traffic?” It is not just about receiving and sending (i.e., pushing) the traffic or even switching it from port to port. It is about applying useful services to it and doing that without incurring high latency and jitter.

Software Defined Networking

To implement a Software Defined Networking solution capable of operating in a modern Clos environment, a virtual switch must be feature reach, including encapsulation to support underlay and overlay networking, like VXLAN, VTEP, Geneve, etc. The performance greatly benefits from a hardware implementation.

Visibility

Telemetry. Streaming telemetry measurements directly from the data plane is crucial to understand what is happening in the network.
Tap-network. Avoiding a separate tap-network used to monitor the primary network is possible if the SmartNIC contains tap-network functionalities, resulting in considerable cost savings.

Security

Firewall and micro-segmentation to protect (the workloads on) the server from external attacks and avoid that a compromised server can attack the network by moving laterally.
Encryption both symmetric and asymmetric. If performed on the main server CPU, these are functions that will consume a lot of cores.
Containment. The SmartNIC should be a separate entity from the server from a security perspective. It should have its security posture, and the security posture must not be accessible through the PCIe interface. Like in multi-factor authentication, an attacker will need to compromise the two entities to succeed.

Virtual Block Storage

With the advent of NVMe over fabric the SmartNIC is perfectly positioned to act as a block storage adapter.

Scale

Let’s consider ACLs (Access Control Lists). It is easy to run a few of them in software, but let’s try to run an ACL with 100,000 ACE (Access Control Entry). It does not scale!

Combining all together

I have never seen anybody attempting to run all these features in SW on a server, but it is possible on a SmartNIC.

Virtualized or not?

Nowadays, everyone wants to design an architecture that works for bare metal servers, virtual machines, and containers. A SmartNIC allows that since it is placed between the server and the network. A software solution running on the host is problematic in several of these environments. With virtual machines, do you implement the solution in the hypervisor or do you run over it? Can you modify the software of a bare-metal server running a database or an ERP application?

Conclusions

When considering all these factors, it is clear that it is not just about how many Gbps I can push in and out of the server. It is about what features I want to enable and what level of security do I want to achieve.

I think there must be a good reason why all the main cloud-providers deploy SmartNICs on all of their servers.