[P4] Fwd: Re: [dpdk-users] Run-to-completion or Pipe-line for QAT PMD in DPDK

Fri Jan 18 14:18:07 CET 2019

Ez relevánsnak tűnik a crypto szempontjából és érdemes figyelembe venni 
a szívások elkerülése végett.

Üdv.
S

-------- Továbbított üzenet --------
Tárgy: 	Re: [dpdk-users] Run-to-completion or Pipe-line for QAT PMD in DPDK
Dátum: 	Fri, 18 Jan 2019 13:13:43 +0000
Feladó: 	Trahe, Fiona <fiona.trahe at intel.com>
Címzett: 	Changchun Zhang <changchun.zhang at oracle.com>, users at dpdk.org 
<users at dpdk.org>
CC: 	Trahe, Fiona <fiona.trahe at intel.com>

Hi Alex,

> -----Original Message-----
> From: users [mailto:users-bounces at dpdk.org] On Behalf Of Changchun Zhang
> Sent: Thursday, January 17, 2019 11:01 PM
> To: users at dpdk.org
> Subject: [dpdk-users] Run-to-completion or Pipe-line for QAT PMD in DPDK
>
> Hi,
>
>
>
> I have user question on using the QAT device in the DPDK.
>
> In the real design, after calling enqueuer_burst() on the specified 
> queue pair at one of the lcore,
> usually which one is usually done?
>
> 1. should we do run-to-completion to call dequeuer_burst() waiting for 
> the device finishing the
> crypto operation,
>
> 2. or should we do pipe-line, in which we return right after 
> enqueuer_burst() and release the CPU.
> And call dequeuer_burst() on other thread function?
>
> Option 1 is more like synchronous and can be seen on all the DPDK 
> crypto examples, while option 2 is
> asynchronous which I have never seen in any reference design if I 
> missed anything.
[Fiona] Option 2 is not possible with QAT - the dequeue must be called 
in the same thread as the enqueue. This is
optimised without atomics for best performance - if this is a problem 
let us know. However best performance is not quite using option 1 and 
not a synchronous blocking method. If you enqueue and then go straight 
to dequeue, you're not getting the best advantage from the
cycles freed up by offloading. i.e. best to enqueue a burst, then go do 
some other work, like maybe collecting more requests for next enqueue or 
other processing, then dequeue. Take and process whatever ops are 
dequeued - this
will not necessarily match up with the number you've enqueued - depends 
on how quickly you call the dequeue.
Don't wait until all the enqueued ops are dequeued before enqueuing the 
next batch.
SO it's asynchronous. But in the same thread.
You'll get best throughput when you keep the input filled up so the 
device has operations to work on and
regularly dequeue a burst. Dequeuing too often will waste cycles in the 
overhead calling the API, dequeuing too
slowly will cause the device to back up. Ideally tune for your 
application to find the sweet spot in
between these 2 extremes.

---
Ezt az e-mailt az Avast víruskereső szoftver átvizsgálta.
https://www.avast.com/antivirus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://plc.inf.elte.hu/pipermail/p4/attachments/20190118/903f9541/attachment.html>