[P4] Fwd: Re: [dpdk-users] Run-to-completion or Pipe-line for QAT PMD in DPDK
Sándor Laki
lakis at elte.hu
Fri Jan 18 14:18:07 CET 2019
Ez relevánsnak tűnik a crypto szempontjából és érdemes figyelembe venni
a szívások elkerülése végett.
Üdv.
S
-------- Továbbított üzenet --------
Tárgy: Re: [dpdk-users] Run-to-completion or Pipe-line for QAT PMD in DPDK
Dátum: Fri, 18 Jan 2019 13:13:43 +0000
Feladó: Trahe, Fiona <fiona.trahe at intel.com>
Címzett: Changchun Zhang <changchun.zhang at oracle.com>, users at dpdk.org
<users at dpdk.org>
CC: Trahe, Fiona <fiona.trahe at intel.com>
Hi Alex,
> -----Original Message-----
> From: users [mailto:users-bounces at dpdk.org] On Behalf Of Changchun Zhang
> Sent: Thursday, January 17, 2019 11:01 PM
> To: users at dpdk.org
> Subject: [dpdk-users] Run-to-completion or Pipe-line for QAT PMD in DPDK
>
> Hi,
>
>
>
> I have user question on using the QAT device in the DPDK.
>
> In the real design, after calling enqueuer_burst() on the specified
> queue pair at one of the lcore,
> usually which one is usually done?
>
> 1. should we do run-to-completion to call dequeuer_burst() waiting for
> the device finishing the
> crypto operation,
>
> 2. or should we do pipe-line, in which we return right after
> enqueuer_burst() and release the CPU.
> And call dequeuer_burst() on other thread function?
>
> Option 1 is more like synchronous and can be seen on all the DPDK
> crypto examples, while option 2 is
> asynchronous which I have never seen in any reference design if I
> missed anything.
[Fiona] Option 2 is not possible with QAT - the dequeue must be called
in the same thread as the enqueue. This is
optimised without atomics for best performance - if this is a problem
let us know. However best performance is not quite using option 1 and
not a synchronous blocking method. If you enqueue and then go straight
to dequeue, you're not getting the best advantage from the
cycles freed up by offloading. i.e. best to enqueue a burst, then go do
some other work, like maybe collecting more requests for next enqueue or
other processing, then dequeue. Take and process whatever ops are
dequeued - this
will not necessarily match up with the number you've enqueued - depends
on how quickly you call the dequeue.
Don't wait until all the enqueued ops are dequeued before enqueuing the
next batch.
SO it's asynchronous. But in the same thread.
You'll get best throughput when you keep the input filled up so the
device has operations to work on and
regularly dequeue a burst. Dequeuing too often will waste cycles in the
overhead calling the API, dequeuing too
slowly will cause the device to back up. Ideally tune for your
application to find the sweet spot in
between these 2 extremes.
---
Ezt az e-mailt az Avast víruskereső szoftver átvizsgálta.
https://www.avast.com/antivirus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://plc.inf.elte.hu/pipermail/p4/attachments/20190118/903f9541/attachment.html>
More information about the P4
mailing list