[P4] Fwd: [NSDI '17] Reviews for paper #252 "OpenP4C: Target-independent Compiler for..."
Dániel Horpácsi
daniel-h at elte.hu
Thu Dec 8 18:08:29 CET 2016
Gyors átnézés után úgy tűnik, az ötletet megvennék és tetszik nekik a
Core-HAL szétbontás ötlete és ilyenfajta kivitelezése, de ennél
alacsonyabb szintű és konkrétabb magyarázatot várnak minden kapcsolódó
kérdésre. (Illetve majdnem mindegyik megjegyezte, hogy a cikk második
fele nem volt jól olvasható.)
Pár kulcsprobléma az én olvasatomban (mérésektől, értékeléstől eltekintve):
- Meg kell határoznunk és meg kell neveznünk a targeteket (ez tisztázza
azt is, hogy hardware switch-ekre is lövünk-e). Ha nem is mindet, de
legalább egy hosszú távon reprezentatív gyűjteményt.
- Adnunk kell egy konkrét választ arra, hogy a meghatározott targetek
esetében mi a tényleges Core-HAL felbontás. Enélkül nem fogadják el a
magas szintű érvelést és nem érik be az ötletek felvillantásával.
- Meg kell mutatnunk, hogy a javasolt felbontás mellett milyen konkrét
Core-szintű és milyen konkrét HAL-szintű optimalizációkat tudunk
elvégezni az egyes targetek esetében. Nem összefoglalást várnak, hanem
részletes bemutatást.
- Tisztázni kell, hogy pontosan mennyi overhead-et jelent a Core-HAL
szétválasztás. Ha minimálisat, akkor azt a minimálisat hol és pontosan
hogyan.
- Meg kell védeni a megközelítésünket azzal szemben, mint ha minden
targetre külön fordító készülne (azaz megvédeni a p4_16 megközelítéssel
szemben).
Dani
On 2016-12-08 16:02, Sándor Laki wrote:
>
> Kérésre itt a review! Jó sok és elég nagy a szórás...
>
> Üdv.
>
> Sanyi
>
>
>
> -------- Forwarded Message --------
> Subject: [NSDI '17] Reviews for paper #252 "OpenP4C:
> Target-independent Compiler for..."
> Date: Thu, 8 Dec 2016 14:54:56 +0000 (UTC)
> From: NSDI '17 HotCRP <noreply at nsdi17.usenix.hotcrp.com>
> Reply-To: nsdi17chairs at usenix.org
> To: Sándor Laki <lakis at elte.hu>
> CC: nsdi17chairs at usenix.org
>
>
>
> Dear Sándor Laki,
>
> Thank you for submitting your paper to The 14th USENIX Symposium on
> Networked Systems Design and Implementation (NSDI '17).
>
> Title: OpenP4C: Target-independent Compiler for High Performance
> Protocol-independent Packet Processors
> Authors: Dániel Horpácsi (Eötvös Loránd University)
> Dániel Leskó (Eötvös Loránd University)
> Péter Vörös (Eötvös Loránd University)
> Róbert Kitlei (Eötvös Loránd University)
> Máté Tejfel (Eötvös Loránd University)
> Sándor Laki (Eötvös Loránd University)
> Paper site:https://nsdi17.usenix.hotcrp.com/paper/252?cap=0252aynHTuAIqSyA
>
> Reviews and comments on your paper are appended to this email. The
> submission URL above also has the paper's reviews and comments.
>
> -Aditya and Jon
> nsdi17chairs at usenix.org
>
> ===========================================================================
> NSDI '17 Review #252A
> ---------------------------------------------------------------------------
> Paper #252: OpenP4C: Target-independent Compiler for High Performance
> Protocol-independent Packet Processors
> ---------------------------------------------------------------------------
>
> Overall merit: 1. Strong Reject (I'll argue against
> this paper.)
> Reviewer expertise: 4. Expert
>
> ===== Paper summary =====
>
> This paper describes OpenP4C: a new compiler for the P4 data-plane language that is designed to provide portability and good performance. The key idea behind the design of OpenP4C is introducing a distinction between two intermediate representations: Core, which models a generic packet-processing pipeline, and HAL, which captures target-specific implementation details and optimizations. OpenP4C has been implemented in Python and evaluated against OVS on standard L2 and L3 applications, as well as a load balancer.
>
> ===== Strengths =====
>
> + P4 has gained a lot of attention, but only a few implementations have been developed thus far.
>
> + The idea of separating a software run-time into Core and HAL layers seems like a nice design.
>
> + The authors have implemented a prototype with multiple backends (DPDK and Freescale) and executed it on real-world examples.
>
> ===== Areas for improvement =====
>
> - There seems to be no concrete proposal for the Core and HAL. Instead, if I understood correctly, the authors expect an expert to identify the boundary between the two and, presumably, define interfaces between them.
>
> - Although the Introduction discusses the paper in a general setting, OpenP4C seems targeted mostly at software switches. Given that, the novelty over PISCES [SIGCOMM '16] is unclear.
>
> - The evaluation feels preliminary and only compares against OVS. Hence, it doesn't identify the key bottlenecks in the system, the most important optimizations, scalability beyond basic parallelism, etc.
>
> ===== Comments for author =====
>
> Having a portable, high-performance compiler for P4 would be extremely valuable. Unfortunately I'm having trouble putting my finger on the new technical contribution in this paper. The Introduction sets up the idea of a two-tiered Core/HAL architecture, which seems interesting, but then doesn't identify a concrete design for either. The rest of the paper seems sensible, but mostly straightforward -- if I understood correctly, you extended the existing P4 front-end that generates P4-hlir and built a backend that emits C code. Overall, I encourage you to continue this line of work, however, it would be good to either focus on deeper conceptual (e.g., fleshing out the design of the Core/HAL interface) or an engineering (e.g., optimizations that show a big win for certain applications, more comprehensive evaluations) contributions.
>
> Some questions and comments:
>
> * Why is C a good intermediate representation? This choice sort of comes out of nowhere and isn't explained in any depth.
>
> * Since the design of the Core/HAL interface is a critical part of your design, could you experiment with the impact of different choices? Perhaps you could do some small case studies where you draw the line in different places and see the impact. Could one then automate the decision by systematically and cleverly exploring the design space?
>
> * Lazy extraction seems like it would work fine on a software switch, but it would likely not be possible on a hardware ASIC. This surprised me the first time I read the paper because I didn't realize you were targeting software switches exclusively.
>
> * Can you show the precise Core language used in your implementation (in a technical way -- e.g., by showing ASTs or an API). Similarly for the HAL? This would shed light on the merits of your design.
>
> * On page 6, why is the build system for Freescale relevant?
>
> * Several \refs and \cites are listed as ??. It seems that LaTeX was not run until a fixed-point.
>
> * How does performance compare against P4's Behavioral Model (which was not designed with performance in mind)? What about PISCES (whose source code is available)? What questions are being answered in the evaluation beyond: it runs and is competitive with OVS as one varies the number of cores and packet sizes? Are these the most important things to measure?
>
> * Have you considered hardware targets?
>
> ===========================================================================
> NSDI '17 Review #252B
> ---------------------------------------------------------------------------
> Paper #252: OpenP4C: Target-independent Compiler for High Performance
> Protocol-independent Packet Processors
> ---------------------------------------------------------------------------
>
> Overall merit: 2. Weak Reject (This paper doesn't
> belong in the conference, but it
> won't upset me.)
> Reviewer expertise: 2. Some familiarity
>
> ===== Paper summary =====
>
> The paper proposes an intermediate representation for P4 programs such that each hardware manufacturer does not have to build an entire compiler but only has to become compatible with a generic HAL. OpenP4C compiles P4 programs down to the language that HAL understands while the manufacturer's HAL compatibility takes care of the rest.
>
> ===== Strengths =====
>
> The solution they propose uses a time tested way of obtaining portability of high level code with an intermediate language where hardware-specific low level details are left to hardware designer. The paper sets this case up nicely for open flow routers that use the P4 language for data plane configuration.
>
> ===== Areas for improvement =====
>
> Comparison with an end-to-end compiler built for a specific hardware is needed to understand how much overhead the HAL adds. For instance, comparing with a certain functionality built entirely with DPDK without HAL.
>
> ===== Comments for author =====
>
> The motivation is not clear to me. If given an open source P4 compiler, each hardware manufacturer could potentially optimize it for their individual hardware. Given that the performance stakes are high, an end-to-end design and optimization approach may actually be warranted. So it must be shown that the performance is not hurt compared to something that is compiled natively.
>
> Something like the HAL seems extremely heavy handed if all the hardware heterogeneity is just around the packet format and the table structure as shown in your case studies. Deeper architectural differences must exist to properly motivate HAL. For instance, are there any RISC vs. CISC like tradeoffs? Are there any differences in memory models? are there any differences in cache coherence models where there are optimization opportunities present in some settings while missing in others. Threading models? Parallelism models etc.
>
> Also, if most of the differences are in deed just around packet and table formats, why not simply wrap them into generic data structures that can be understood and linked to C using a generic abstraction. Then the hardware work can be concretized into building such data structures while the OpenP4C can focus on compiling down to C for working with the generic data structure layer?
>
> Also, HAL used in Windows or Linux today is more a binary compatibility layer rather than a compiler construct. So the naming is a bit confusing and can create a misunderstanding. What you are suggesting is a hardware independent intermediate language for portability.
>
> Evaluation against OVS shows that P4 programs compiled by you to C can perform as good as OVS policies. However, what is missing from the evaluation is:
> 1. Code snippets to show that the C code is indeed readable. Size of binaries generated when not using the intermediate representation etc.
> 2. That a program compiled by OpenP4C can perform as good as a program written entirely using DPDK -- there are some router software router implementations out there that you could consider.
>
> There are several formatting errors (missing citations and references). E.g.:
> 1. "We can see in Figure ??" on page 8.
> 2. "we have applied the NFPA [?]" on page 7
> 3. "MACLearner in Figure ??"
>
> ===========================================================================
> NSDI '17 Review #252C
> ---------------------------------------------------------------------------
> Paper #252: OpenP4C: Target-independent Compiler for High Performance
> Protocol-independent Packet Processors
> ---------------------------------------------------------------------------
>
> Overall merit: 3. Weak Accept (I can't complain about
> this paper being accepted, but I'm
> not enthusiastic.)
> Reviewer expertise: 3. Knowledgeable
>
> ===== Paper summary =====
>
> The paper presents and discusses a P4 compiler targeting different
> software (and perhaps hardware) platforms. The paper contains a
> reasonable introduction on design principles and architecture
> of the compiler and includes a performance evaluation section
> comparing DPDK results for OpenP4 and OVS.
>
> ===== Strengths =====
>
> I think this is a useful research tool and definitely a good addition to
> the program.
>
> ===== Areas for improvement =====
>
> Presentation and the experimental section could be improved, see below.
>
> ===== Comments for author =====
>
> There are very few details on the actual
> implementation of the CORE-HAL inteface, in particular on
> how much effort would be required to support a new target.
> It is unclear how the HAL interface supports hardware offloading,
> the example you give (Freescale) seems to suggest a negative
> answer.
>
> The performance evaluation section may at least need some clarifications,
> such as the data point on Fig.8 OpenP4C with 200 entries (the explanation
> you give is not convincing, and at least you could build a workload that
> provides an even distribution to explore scalability in a better way).
> What is completely missing from the graphs is the cost of slow path operation
> which is an important metric.
>
> The single core performance, although better than on OVS, is not that exciting.
> There are many papers reporting around 15Mpps forwarding between two NICs with a
> single core using DPDK. Could you explain whether the difference is due to a
> more complex pipeline that cannot be further smplified, poor hardware drivers,
> or just implementation issues in the HAL ?
>
> ===========================================================================
> NSDI '17 Review #252D
> ---------------------------------------------------------------------------
> Paper #252: OpenP4C: Target-independent Compiler for High Performance
> Protocol-independent Packet Processors
> ---------------------------------------------------------------------------
>
> Overall merit: 2. Weak Reject (This paper doesn't
> belong in the conference, but it
> won't upset me.)
> Reviewer expertise: 3. Knowledgeable
>
> ===== Paper summary =====
>
> This paper proposes a multi-target compiler “OPENP4C” that generates a high performing switch program from the P4 language. The compiler generates a target-independent "Core" code while performing "target independent optimizations". The core code is then linked to a hardware abstraction library (HAL) that is responsible for doing target-dependent optimizations. This modularity improves retargetability/portability of switch program as we just need to link hardware/target specific libraries (HAL) to the "core" code, i.e. only the HAL changes for different hardwares. The paper also talks about finding the right line of separation (correctly classifying what packet-related operation should be placed where) between Core and HAL. This is important for the system to achieve both its goal of modularity and retargetability. They compare OPENP4C with OVS, and show that it provides performance comparable and in some cases better than OVS
>
> ===== Strengths =====
>
> - I like and buy the idea of splitting target-dependent optimization from target-independent optimization to achieve retargetability while achieving optimal performance.
> - Classifying operations correctly into Core and HAL ("target-dependent" and "target-independent" group) seems interesting and challenging
>
> ===== Areas for improvement =====
>
> + Modularity and retargetability seems like a goal inherent to the P4 language, so I am not sure OPENP4C can claim that as their selling point
> + The "separation of the components" section (3.1) looks incomplete. (Look into comments section for detailed pointers)
> + Writing in section 4 and 5 needs to improve. There were lots of missing references, citations in the paper too
>
> ===== Comments for author =====
>
> + Section 3.1 looks interesting but incomplete. The paper claims that "classification is guided by the potential hardware targets considered", and can be done either at high-level ( using an expert ) or at low level (by comparing two hardware switch implementations). The details involved in making low-level classification are unclear, how do you identify that "a particular operation is substantially different in different switch implementations". An example showing a set of operations in a set of switches, and how they are classified by OPENP4C would be useful.
>
> + It's also unclear exactly how OPENP4C decides whether its classification of operations are correct or not (needs refinement or not). The validation part could also be explained in detail.
>
> + The paper claims that "low-level classification is applicable if concrete implementations for at least two different hardware targets are available" . Why are 2 target implementations enough? You haven't justified this claim.
>
> + There may be value involved in seeing whether/how different combinations of targets can lead to different separation line.
>
> + Would like to see evaluation with more complex use-cases/ real-world switch programs. For example, the P4 github repository has a sample switch program called "switch.p4" that implements many networking features like L2, L3, tunneling, multicast etc. That could be a good starting point.
>
> + I am not sure what was the purpose of figure 2 and 4. Both were not referred in the paper.
>
> ===========================================================================
> NSDI '17 Review #252E
> ---------------------------------------------------------------------------
> Paper #252: OpenP4C: Target-independent Compiler for High Performance
> Protocol-independent Packet Processors
> ---------------------------------------------------------------------------
>
> Overall merit: 1. Strong Reject (I'll argue against
> this paper.)
> Reviewer expertise: 3. Knowledgeable
>
> ===== Paper summary =====
>
> This paper presents OpenP4C, an OpenFlow compiler that takes P4 specifications and then generate efficient switch implementations. The key innovation here is a hardware abstraction layer that allows the compiler to tailor the compiled code for different target platforms, together with an upper layer for target independent optimizations. Performing evaluation include the use of the DPDK software for fast packet handling, and performance evaluation of OpenP4C is shown to outperform Open vSwitch.
>
> ===== Strengths =====
>
> The goal of the paper to generate efficient OpenFlow implementations optimized at different hardware, and the separation of optimization concerns (between platform independent and dependent portions) is a good direction to go.
>
> ===== Areas for improvement =====
>
> The paper lacks sufficient details for one to appreciate the technical metrics (see comments below). The paper is also written in a sloppy manner. Evaluation section does not showcase the initial goals of multi-target optimizations described in the introduction.
>
> ===== Comments for author =====
>
> The goal of having a P4 compiler that can optimize for different target hardware is a compelling one. The use of a hardware abstraction layer is also the right way to go. However the paper lacks details for one to evaluate its metrics. For example,
>
> a. The paper mentions that different architectures pose different constraints? Can you provide some examples of hardware platforms, and what these constraints are, and how your compiler handles them?
>
> b. Can you give more specific on how hardware dependent code is abstracted away into separate software component(s)?
>
> c. The boundary between hardware independence and dependence (Figure 3) is too arbitrary. How is this selected, and what are the tradeoffs? For example, one can make the case that packet parsing implementation may be hardware dependent.
>
> d. Section 3.3 shows the categories of target dependent operations. Can you provide more explanations on how/why these are the categories chosen, and tie these choices with the use cases.
>
> e. The performance evaluation compares OpenP4C with OVS, but this seems to be missing the initial point of the work, which is to show how a single P4 specification can run efficiently for different hardware configurations, working around constraints of each hardware.
>
>
> Finally, the paper is written in a sloppy manner. There are a number of dangling references, e.g. in Section 4.1, 4.2 and 4.3. Some acronyms and systems component names are also presented without first introducing them. E.g. "Core" in the intro, and HLIR (which was introduced in Section 3.2 but the acronym used in the section prior to that). A careful proof read is required.
>
> ===========================================================================
> NSDI '17 Review #252F
> ---------------------------------------------------------------------------
> Paper #252: OpenP4C: Target-independent Compiler for High Performance
> Protocol-independent Packet Processors
> ---------------------------------------------------------------------------
>
> Overall merit: 1. Strong Reject (I'll argue against
> this paper.)
> Reviewer expertise: 2. Some familiarity
>
> ===== Paper summary =====
>
> This paper presents a compiler from P4 to harware-assisted software switches. The compiler assumes a hardware abstraction layer that is defined based on an examination of hardware features and abstractions of the P4 language. The paper evaluates the implementation by compiling to a DPDK-based Mellanox 100Gbps card.
>
> ===== Strengths =====
>
> * This is a relevant problem, how to compile P4 (and similar) language programs to hardware
> * The paper discusses important considerations when designing such a compiler
>
> ===== Areas for improvement =====
>
> * The paper is quite confusing, and it took a while to understand that the goal is to compile to hardware-assisted software switches
> * The discussion is quite high level
> * The consideration of related work is not sufficient to justify why a new approach is needed
>
> ===== Comments for author =====
>
> As mentioned above, efficient implementations of the P4 language in hardware are an important step to evolve SDNs beyond OpenFlow. It seems that you are making good progress in this goal. This paper, however, is confusing, and needs better positioning regarding what exactly you are trying to do, and how it compares to what other groups are doing.
>
> It took me a while to understand (if I did) what the target of your compilation is. It seems to be hardware-assisted software switches, such that your platform will be able to run a compiled C program (your core), and will expose to this C program the HAL. How does this map to a hardware switch that is not a commodity PC with a fancy network card? Is that a non-goal of your project?
>
> If your target is indeed hardware-assisted soft-switch, you should more carefully compare against existing work, specifically [9], PISCES, and [7]. Another aspect that needs careful discussion is the relationship between your project and the Open dataplane project. Why is OpenDataplane not sufficient as the definition of your HAL, and why do you need to define a new HAL?
>
> One of they key points in your project is finding the balance between what goes in the HAL and what goes in the core. You describe some of the reasoning behind your choices, but the small number of platforms you discuss does not provide enough of a basis to demonstrate your choices are the best ones. You say (page 2) "We had to find the balance between compiler complexity, portability, and performance": what is the evidence that you found all of these balances?
>
> One last potentially interesting discussion: since P4 is a young and changing language, are there features of the language that could change to make it more efficient or expressive to compile?
>
> Some minor issues:
> * page 3: "there is no communication overhead stemming from the modularisation": this is not necessarily true. Depending on the modularization and the interfaces, you may or may not have extra copies of data, which will have different costs.
> * mainstraim-like: this is misspelled, but also too informal
> * page 6: is not a big deal -> also too informal
> * on page 7: figure 5 has the wrong placement of \label, making it be referenced as 4.1
> * page 8: provded -> provided
>
>
>
>
> ------------------------------------------------------------------------
> Avast logo <https://www.avast.com/antivirus>
>
> Ezt az e-mailt az Avast víruskereső szoftver átvizsgálta.
> www.avast.com <https://www.avast.com/antivirus>
>
>
>
>
> _______________________________________________
> P4 mailing list
> P4 at plc.inf.elte.hu
> https://plc.inf.elte.hu/mailman/listinfo/p4
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://plc.inf.elte.hu/pipermail/p4/attachments/20161208/b7654191/attachment-0001.html>
More information about the P4
mailing list