[P4] any performance data on your P4ELTE/p4c compiler for DPDK ?

Tue Aug 2 16:41:57 CEST 2016

Hi,
thanks for the reply (and hey, it's summer, it is expected
that people is away).

Really impressive numbers you have!

(note to self: repo is at https://github.com/P4ELTE/p4c )

We would be tempted to try and write a netmap (and possibly libpcap) HAL
for your project, to see how much code it would be, and what kind of
performance we can get.

Any idea/suggestions on how to proceed ?
Also, apart from the device I/O, how much does your code depend on
DPDK e.g. for packet representation ?

cheers
luigi

On Tue, Aug 2, 2016 at 3:55 PM, Sándor Laki <lakis at elte.hu> wrote:
> Hi Luigi,
>
> Sorry for the delay; I was out of office for a few days.
>
> First of all, I have to point that our compiler only reuses the P4 front-end
> called HLIR and the back-end has fully been reimplemented. In our case the
> idea was to support multiple architectures, resulting in a core compiler
> that uses a Hardware Abstraction Library to be implemented for the given
> targets. HAL for DPDK is released with our compiler (see github) and for
> Freescale NPU is under development.
>
> The first measurements with Intel NICs show 13.04 and 10.10 MPPS for L2 and
> L3 examples on a single core setup with few entries in the tables. For L2,
> we also have some preliminary scalability measurements in another setup with
> 200 entries in tables smac and dmac; 17.5 MPPS with 2 cores, 27.8 MPPS with
> 4 cores 33.4 MPPS with 16 cores (It seems to be the hw bottleneck (Mellanox
> ConnectX-4); actually it was measured this morning). More comprehensive
> experiments will be available soon.
>
> So this is where we are now. Next steps are to extend the coverage of P4
> features, e.g. registers, packet re-circulation, etc.
>
> If I understand correctly, you improved the original P4-BM by adding the
> mentioned enhancements. Your numbers are promising all the more that the
> original P4-BM was not optimized for providing good performance (we only
> analyzed BMv1 and after that we decided to fully rewrite it). If you have
> any specific questions on our implementation, please do not hesitate to ask
> me.
>
> Thanks for your mail and I hope that we can somehow collaborate and help
> each other, making the European presence stronger in the P4 community.
>
> Ciao,
> Sandor
>
> --
> Sándor Laki, PhD
> Assistant professor
> Department of Information Systems
> Eötvös Loránd University
> Pázmány Péter stny. 1/C
> H-1117, Budapest, Hungary
> Room 2.506
> Web: http://lakis.web.elte.hu
> Phone: +36 1 372 2869 / 8477
> Cell: +36 70 374 2646
>
> On 2016.07.27. 15:57, Luigi Rizzo wrote:
>>
>> Hi,
>> I have read the thread on the p4-dev list regarding your
>> P4 compiler targeting DPDK, and was wondering if you have
>> some performance data (in terms of packets per second or similar)
>> of your code, say for some simple P4 configuration.
>>
>> With my student Yuri we have been working on accelerating the
>> reference p4 code on github (also adding support for netmap) and while
>> we have very good results in accelerating I/O and queues,
>> the main bottleneck is now in the ingress and egress stages.
>>
>> Roughly speaking (we will post this later to the list):
>>
>> - the reference P4 code has two operating modes:
>>
>>    SINGLE
>>       all stages (input, parse, ingres, egress, deparse, output)
>>       run in the same thread.
>>    MULTI
>>       processing is split in multiple threads (e.g. four)
>>       connected by queues
>>
>>    The reference code has however very expensive queues so the
>>    "MULTI" case is actually slower than the "SINGLE" one.
>>    We measured some 4900 ns/pkt for SIMPLE_ROUTER, and 1400 ns/pkt
>>    for L2 switch
>>
>> - Yuri and I made a number of enhancements to the queues,
>>    making them lock free, and that made the MULTI case more
>>    efficient, so the bottleneck is now the slowest stage in
>>    the pipeline.
>>
>> - We also worked on the memory allocator (another significant
>>    bottleneck) and added support for netmap.
>>
>> Overall, we are now down to about 500 ns per packet for the L2
>> switch, and 1400 ns/pkt for the SIMPLE_ROUTER. We still have
>> some room for improvement in the latter case.
>>
>> cheers
>> luigi
>>
>
>
> ---
> Ezt az e-mailt az Avast víruskereső szoftver átvizsgálta.
> https://www.avast.com/antivirus
>

-- 
-----------------------------------------+-------------------------------
 Prof. Luigi RIZZO, rizzo at iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
 TEL      +39-050-2217533               . via Diotisalvi 2
 Mobile   +39-338-6809875               . 56122 PISA (Italy)
-----------------------------------------+-------------------------------