Musings from P4 Roundtable Series

For Loop

Once again folks asked why P4 does not support loops? If one searches p4-dev archives, the reasons have been provided. A for loop is an unbounded computation for any data plane, let alone one running at several Tbps. Further, program memory may get corrupted and the termination of the for loop points to a very large number. This causes the data plane to thrash even more.

Note a pre-processor tool in pcube exists which supports for loop in P4. However, pcube only supports P4-14. I have filed an Issue against pcube to support P4-16.

Stateful Features

It is well-known that a P4-16 register can support stateful features. Alternatively, if the underlying hardware is using NPU in data plane, there's no reason why state cannot be saved in a P4-16 extern implemented in C. In the roundtable, it was also said, "P4 language has been extended in research papers to maintain state, but underlying hardware does not support the extensions". If the data plane uses NPU, this is hardware that can support the extensions via P4-16 externs.


Regarding merging of a P4 pipeline, several ARM cores, and crypto and compression engine in an asic, was presented by Pensando. One use of the asic is in smartNIC. An existing use of smartNIC in a web-scale deployment exists with Microsoft Azure cloud. Azure has deployed over 1 million hosts with smartNIC. See Azure’s smartNIC uses a FPGA and programs the FPGA using Verilog. See the Microsoft smartNIC presentation slides and video at this link: Microsoft claims their SDN changes every three months and thus an asic won’t work for their smartNIC. Microsoft also says a multicore npu asic is too complex to program and this is why this hardware was also rejected for their smartNIC. The jury is still out on whether multicore npu is harder to program using C vs. programming a FPGA in Verilog.

Note, it was Cavium, now Marvell, who pioneered smartNIC when a large cloud vendor used Cavium’s LiquidIO smartNIC. Seeing the success, Amazon acquired Anapurna Labs who developed an ARM Soc for use in smartNIC. This is Amazon’s Nitro solution. Also, note, the Cavium Octeon asic used in the LiquidIO smartNIC has 48 cores. One Octeon asic supports crypto and compression and also a fast path using the multiple cores. The fast path is programmed using C. Certainly, the Octeon has few ARM cores spare for use and thus I don’t see much difference between Octeon and Pensando asic from 10,000 feet. Note the Cisco ONE asic which runs switching/routing at 10 Tbps uses multiple NPUs programmed using P4. If Cisco has done it, there is no reason why the Octeon cannot add support to programming its fast path using P4.

smartNIC Software

In the Plenary session for P4 Use Cases in Programmable NIC, I noticed the Pensando P4 implementation supports an if-statement in a P4 Action which is great. Please ask your switching asic vendor if they support the same.

The Plenary session started with slides from Xilinx. If you see slide 5 Xilinx has DMA above the grey box above the green circles. When a packet is extracted after DMA, the packet is sent to P4 modules. Pensando has asked in their slides to extend P4-16 for DMA and processing of packet and message data. So far the internal MAC implementation of a switched port is not described in the P4 lang, so why should P4 add PCIe DMA access to the lang? The PSA architecture has non-P4-programmable blocks. The PNA (Portable Nic Architecture) can include non-P4-programmable blocks as well. However, I am open to adding any externs to P4-16 for NIC if sufficient justification exists.

Leave a Reply

Close Menu