1 | A revised version of your manuscript that takes into account the comments
|
---|
2 | of the referees will be reconsidered for publication.
|
---|
3 |
|
---|
4 | We have attempted to address all the referee's comments in the revised version
|
---|
5 | of the paper, with notes below for each comment.
|
---|
6 |
|
---|
7 | =============================================================================
|
---|
8 |
|
---|
9 | Reviewing: 1
|
---|
10 |
|
---|
11 | As far as I can tell, the article contains three main ideas: an
|
---|
12 | asynchronous execution / threading model; a model for monitors to provide
|
---|
13 | mutual exclusion; and an implementation. The first two ideas are drawn
|
---|
14 | together in Table 1: unfortunately this is on page 25 of 30 pages of
|
---|
15 | text. Implementation choices and descriptions are scattered throughout the
|
---|
16 | paper - and the sectioning of the paper seems almost arbitrary.
|
---|
17 |
|
---|
18 | Fixed, Table 1 is moved to the start and explained in detail.
|
---|
19 |
|
---|
20 | The article is about its contributions. Simply adding feature X to
|
---|
21 | language Y isn't by itself a contribution, (when feature X isn't already a
|
---|
22 | contribution).
|
---|
23 |
|
---|
24 | C++ (Y) added object-oriented programming (X) to C, where OO programming (X)
|
---|
25 | was not a contribution.
|
---|
26 |
|
---|
27 | For example: why support two kinds of generators as well as user-level
|
---|
28 | threads? Why support both low and high level synchronization constructs?
|
---|
29 |
|
---|
30 | Fixed, as part of discussing Table 1.
|
---|
31 |
|
---|
32 | Similarly I would have found the article easier to follow if it was written
|
---|
33 | top down, presenting the design principles, present the space of language
|
---|
34 | features, justify chosen language features (and rationale) and those
|
---|
35 | excluded, and then present implementation, and performance.
|
---|
36 |
|
---|
37 | Fixed, the paper is now restructured in this form.
|
---|
38 |
|
---|
39 | Then the writing of the article is often hard to follow, to say the
|
---|
40 | least. Two examples: section 3 "stateful functions" - I've some idea
|
---|
41 | what that is (a function with Algol's "own" or C's "static" variables?
|
---|
42 | but in fact the paper has a rather more specific idea than that.
|
---|
43 |
|
---|
44 | Fixed, at the start of this section.
|
---|
45 |
|
---|
46 | The top of page 3 throws a whole lot of definitions at the reader
|
---|
47 | "generator" "coroutine" "stackful" "stackless" "symmetric" "asymmetric"
|
---|
48 | without every stopping to define each one
|
---|
49 |
|
---|
50 | Hopefully fixed by moving Table 1 forward.
|
---|
51 |
|
---|
52 | --- but then in footnote "C" takes the time to explain what C's "main"
|
---|
53 | function is? I cannot imagine a reader of this paper who doesn't know what
|
---|
54 | "main" is in C; especially if they understand the other concepts already
|
---|
55 | presented in the paper.
|
---|
56 |
|
---|
57 | Fixed by shortening.
|
---|
58 |
|
---|
59 | The start of section 3 then does the same
|
---|
60 | thing: putting up a whole lot of definitions, making distinctions and
|
---|
61 | comparisons, even talking about some runtime details, but the critical
|
---|
62 | definition of a monitor doesn't appear until three pages later, at the
|
---|
63 | start of section 5 on p15, lines 29-34 are a good, clear, description
|
---|
64 | of what a monitor actually is. That needs to come first, rather than
|
---|
65 | being buried again after two sections of comparisons, discussions,
|
---|
66 | implementations, and options that are ungrounded because they haven't
|
---|
67 | told the reader what they are actually talking about. First tell the
|
---|
68 | reader what something is, then how they might use it (as programmers:
|
---|
69 | what are the rules and restrictions) and only then start comparison
|
---|
70 | with other things, other approaches, other languages, or
|
---|
71 | implementations.
|
---|
72 |
|
---|
73 | Hopefully fixed by moving Table 1 forward.
|
---|
74 |
|
---|
75 | The description of the implementation is similarly lost in the trees
|
---|
76 | without ever really seeing the wood. Figure 19 is crucial here, but
|
---|
77 | it's pretty much at the end of the paper, and comments about
|
---|
78 | implementations are threaded throughout the paper without the context
|
---|
79 | (fig 19) to understand what's going on.
|
---|
80 |
|
---|
81 | We have to agree to disagree on the location of Fig 19. Early discussion about
|
---|
82 | implementation for the various control structures are specific to that feature.
|
---|
83 | Fig 19 shows the global runtime structure, which manages only the threading
|
---|
84 | aspect of the control structures and their global organization.
|
---|
85 |
|
---|
86 | The protocol for performance testing may just about suffice for C (although
|
---|
87 | is N constantly ten million, or does it vary for each benchmark)
|
---|
88 |
|
---|
89 | Fixed, the paper states N varies per language/benchmark so the benchmark runs
|
---|
90 | long enough to get a good average per operation.
|
---|
91 |
|
---|
92 | but such evaluation isn't appropriate for garbage-collected or JITTed
|
---|
93 | languages like Java or Go.
|
---|
94 |
|
---|
95 | Please explain. All the actions in the benchmarks occur independently of the
|
---|
96 | storage-management scheme, e.g., acquiring a lock is an aspect of execution not
|
---|
97 | storage. In fact, garbage-collected or JITTed languages cheat on benchmarks and
|
---|
98 | we had to take great care to prevent cheating and measure the actual operation.
|
---|
99 |
|
---|
100 | p1 only a subset of C-forall extensions?
|
---|
101 |
|
---|
102 | Fixed, removed.
|
---|
103 |
|
---|
104 | p1 "has features often associated with object-oriented programming
|
---|
105 | languages, such as constructors, destructors, virtuals and simple
|
---|
106 | inheritance." There's no need to quibble about this. Once a language has
|
---|
107 | inheritance, it's hard to claim it's not object-oriented.
|
---|
108 |
|
---|
109 | We have to agree to disagree. Object languages are defined by the notion of
|
---|
110 | nested functions in a aggregate structure with a special receiver parameter
|
---|
111 | "this", not by inheritance. Inheritance is a polymorphic mechanism, e.g,
|
---|
112 | Plan-9 C has simple inheritance but is not object-oriented. Because Cforall
|
---|
113 | does not have a specific receiver, it is possible to have multiple function
|
---|
114 | parameters as receivers, which introduces new concepts like bulk acquire for
|
---|
115 | monitors.
|
---|
116 |
|
---|
117 | p2 barging? signals-as-hints?
|
---|
118 |
|
---|
119 | Added a footnote for barging. We feel these terms are well known in the
|
---|
120 | concurrency literature, especially in pthreads and Java, and both terms have
|
---|
121 | citations with extensive explanations and further citations.
|
---|
122 |
|
---|
123 | p3 start your discussion of generations with a simple example of a
|
---|
124 | C-forall generator. Fig 1(b) might do: but put it inline instead of
|
---|
125 | the python example - and explain the key rules and restrictions on the
|
---|
126 | construct. Then don't even start to compare with coroutines until
|
---|
127 | you've presented, described and explained your coroutines...
|
---|
128 | p3 I'd probably leave out the various "C" versions unless there are
|
---|
129 | key points to make you can't make in C-forall. All the alternatives
|
---|
130 | are just confusing.
|
---|
131 |
|
---|
132 | Hopefully fixed as this block of text has been rewritten.
|
---|
133 |
|
---|
134 | p4 but what's that "with" in Fig 1(B)
|
---|
135 |
|
---|
136 | Footnote D explains the semantic of "with", which is like unqualified access
|
---|
137 | for the receiver to the fields of a class from member routines, i.e., no
|
---|
138 | "this->".
|
---|
139 |
|
---|
140 | p5 start with the high level features of C-forall generators...
|
---|
141 |
|
---|
142 | Hopefully fixed by moving Table 1 forward.
|
---|
143 |
|
---|
144 | p5 why is the paper explaining networking protocols?
|
---|
145 |
|
---|
146 | Fixed, added discussion on this point.
|
---|
147 |
|
---|
148 | p7 lines 1-9 (transforming generator to coroutine - why would I do any of
|
---|
149 | this? Why would I want one instead of the other (do not use "stack" in your
|
---|
150 | answer!)
|
---|
151 |
|
---|
152 | As stated on line 1 because state declarations from the generator type can be
|
---|
153 | moved out of the coroutine type into the coroutine main
|
---|
154 |
|
---|
155 | p10 last para "A coroutine must retain its last resumer to suspend back
|
---|
156 | because the resumer is on a different stack. These reverse pointers allow
|
---|
157 | suspend to cycle backwards, " I've no idea what is going on here? why
|
---|
158 | should I care? Shouldn't I just be using threads instead? why not?
|
---|
159 |
|
---|
160 | Hopefully fixed by moving Table 1 forward.
|
---|
161 |
|
---|
162 | p16 for the same reasons - what reasons?
|
---|
163 |
|
---|
164 | Hopefully fixed by moving Table 1 forward.
|
---|
165 |
|
---|
166 | p17 if the multiple-monitor entry procedure really is novel, write a paper
|
---|
167 | about that, and only about that.
|
---|
168 |
|
---|
169 | We do not believe this is a practical suggestion.
|
---|
170 |
|
---|
171 | p23 "Loose Object Definitions" - no idea what that means. in that
|
---|
172 | section: you can't leave out JS-style dynamic properties. Even in
|
---|
173 | OOLs that (one way or another) allow separate definitions of methods
|
---|
174 | (like Objective-C, Swift, Ruby, C#) at any time a runtime class has a
|
---|
175 | fixed definition. Quite why the detail about bit mask implementation
|
---|
176 | is here anyway, I've no idea.
|
---|
177 |
|
---|
178 | Fixed by rewriting the section.
|
---|
179 |
|
---|
180 | p25 this cluster isn't a CLU cluster then?
|
---|
181 |
|
---|
182 | No. A CLU cluster is like a class in an object-oriented programming language.
|
---|
183 | A CFA cluster is a runtime organizational mechanism.
|
---|
184 |
|
---|
185 | * conclusion should conclude the paper, not the related.
|
---|
186 |
|
---|
187 | We do not understand this comment.
|
---|
188 |
|
---|
189 | =============================================================================
|
---|
190 |
|
---|
191 | Reviewing: 2
|
---|
192 |
|
---|
193 | There is much description of the system and its details, but nothing about
|
---|
194 | (non-artificial) uses of it. Although the microbenchmark data is
|
---|
195 | encouraging, arguably not enough practical experience with the system has
|
---|
196 | been reported here to say much about either its usability advantages or its
|
---|
197 | performance.
|
---|
198 |
|
---|
199 | We have a Catch-22 problem. Without publicity, there is no user community;
|
---|
200 | without a user community, there are no publications for publicity.
|
---|
201 |
|
---|
202 | p2: lines 4--9 are a little sloppy. It is not the languages but their
|
---|
203 | popular implementations which "adopt" the 1:1 kernel threading model.
|
---|
204 |
|
---|
205 | Fixed.
|
---|
206 |
|
---|
207 | line 10: "medium work" -- "medium-sized work"?
|
---|
208 |
|
---|
209 | Fixed.
|
---|
210 |
|
---|
211 | line 18: "is all sequential to the compiler" -- not true in modern
|
---|
212 | compilers, and in 2004 H-J Boehm wrote a tech report describing exactly why
|
---|
213 | ("Threads cannot be implemented as a library", HP Labs).
|
---|
214 |
|
---|
215 | We will have to disagree on this point. First, I am aware of Hans's 2004 paper
|
---|
216 | because in that paper Hans cites my seminal work on this topic from 1995, which
|
---|
217 | we cite in this paper. Second, while modern memory-models have been added to
|
---|
218 | languages like Java/C/C++ and new languages usually start with a memory model,
|
---|
219 | it is still the programmer's responsibility to use them for racy code. Only
|
---|
220 | when the programing language provides race-free constructs is the language
|
---|
221 | aware of the concurrency; otherwise the code is sequential. Hans's paper "You
|
---|
222 | Don't Know Jack About Shared Variables or Memory Models" talks about these
|
---|
223 | issues, and is also cited in the paper.
|
---|
224 |
|
---|
225 | line 20: "knows the optimization boundaries" -- I found this vague. What's
|
---|
226 | an example?
|
---|
227 |
|
---|
228 | Fixed.
|
---|
229 |
|
---|
230 | line 31: this paragraph has made a lot of claims. Perhaps forward-reference
|
---|
231 | to the parts of the paper that discuss each one.
|
---|
232 |
|
---|
233 | Fixed by adding a road-map paragraph at the end of the introduction.
|
---|
234 |
|
---|
235 | line 33: "so the reader can judge if" -- this reads rather
|
---|
236 | passive-aggressively. Perhaps better: "... to support our argument that..."
|
---|
237 |
|
---|
238 | Fixed.
|
---|
239 |
|
---|
240 | line 41: "a dynamic partitioning mechanism" -- I couldn't tell what this
|
---|
241 | meant
|
---|
242 |
|
---|
243 | Fixed.
|
---|
244 |
|
---|
245 | p3. Presenting concept of a "stateful function" as a new language feature
|
---|
246 | seems odd. In C, functions often have local state thanks to static local
|
---|
247 | variables (or globals, indeed). Of course, that has several
|
---|
248 | limitations. Can you perhaps present your contributions by enumerating
|
---|
249 | these limitations? See also my suggestion below about a possible framing
|
---|
250 | centred on a strawman.
|
---|
251 |
|
---|
252 | Fixed, at the start of this section.
|
---|
253 |
|
---|
254 | line 2: "an old idea that is new again" -- this is too oblique
|
---|
255 |
|
---|
256 | Fixed, removed.
|
---|
257 |
|
---|
258 | lines 2--15: I found this to be a word/concept soup. Stacks, closures,
|
---|
259 | generators, stackless stackful, coroutine, symmetric, asymmetric,
|
---|
260 | resume/suspend versus resume/resume... there needs to be a more gradual and
|
---|
261 | structured way to introduce all this, and ideally one that minimises
|
---|
262 | redundancy. Maybe present it as a series of "definitions" each with its own
|
---|
263 | heading, e.g. "A closure is stackless if its local state has statically
|
---|
264 | known fixed size"; "A generator simply means a stackless closure." And so
|
---|
265 | on. Perhaps also strongly introduce the word "activate" as a direct
|
---|
266 | contrast with resume and suspend. These are just a flavour of the sort of
|
---|
267 | changes that might make this paragraph into something readable.
|
---|
268 |
|
---|
269 | Continuing the thought: I found it confusing that by these definitions, a
|
---|
270 | stackful closure is not a stack, even though logically the stack *is* a
|
---|
271 | kind of closure (it is a representation of the current thread's
|
---|
272 | continuation).
|
---|
273 |
|
---|
274 | Fixed. Rewrote paragraph and moved Table 1 forward.
|
---|
275 |
|
---|
276 | lines 24--27: without explaining what the boost functor types mean, I don't
|
---|
277 | think the point here comes across.
|
---|
278 |
|
---|
279 | Replaced with uC++ example because boost appears to have dropped symmetric
|
---|
280 | coroutines.
|
---|
281 |
|
---|
282 | line 34: "semantically coupled" -- I wasn't sure what this meant
|
---|
283 |
|
---|
284 | Fixed.
|
---|
285 |
|
---|
286 | p4: the point of Figure 1 (C) was not immediately clear. It seem to be
|
---|
287 | showing how one might "compile down" Figure 1 (B). Or is that Figure 1 (A)?
|
---|
288 |
|
---|
289 | Fixed. Rewrote sentence.
|
---|
290 |
|
---|
291 | It's right that the incidental language features of the system are not
|
---|
292 | front-and-centre, but I'd appreciate some brief glossing of non-C languages
|
---|
293 | features as they appear. Examples are the square bracket notation, the pipe
|
---|
294 | notation and the constructor syntax. These explanations could go in the
|
---|
295 | caption of the figure which first uses them, perhaps. Overall I found the
|
---|
296 | figure captions to be terse, and a missed opportunity to explain clearly
|
---|
297 | what was going on.
|
---|
298 |
|
---|
299 | Fixed, added descriptive footnote about Cforall. We prefer to put text in the
|
---|
300 | body of the paper and keep captions short.
|
---|
301 |
|
---|
302 | p5 line 23: "This restriction is removed..." -- give us some up-front
|
---|
303 | summary of your contributions and the elements of the language design that
|
---|
304 | will be talked about, so that this isn't an aside. This will reduce the
|
---|
305 | "twisty passages" feeling that characterises much of the paper.
|
---|
306 |
|
---|
307 | Fixed, remove parenthesis.
|
---|
308 |
|
---|
309 | line 40: "a killer asymmetric generator" -- this is stylistically odd, and
|
---|
310 | the sentence about failures doesn't convincingly argue that C\/ will help
|
---|
311 | with them. Have you any experience writing device drivers using C\/? Or any
|
---|
312 | argument that the kinds of failures can be traced to the "stack-ripping"
|
---|
313 | style that one is forced to use without coroutines ?
|
---|
314 |
|
---|
315 | Fixed, added new paragraph.
|
---|
316 |
|
---|
317 | Also, a typo on line
|
---|
318 | 41: "device drives". And saying "Windows/Linux" is sloppy... what does the
|
---|
319 | cited paper actually say?
|
---|
320 |
|
---|
321 | Fixed.
|
---|
322 |
|
---|
323 | p6 lines 13--23: this paragraph is difficult to understand. It seems to be
|
---|
324 | talking about a control-flow pattern roughly equivalent to tail recursion.
|
---|
325 | What is the high-level point, other than that this is possible?
|
---|
326 |
|
---|
327 | Fixed, rewrote start of the paragraph.
|
---|
328 |
|
---|
329 | line 34: "which they call coroutines" -- a better way to make this point is
|
---|
330 | presumably that the C++20 proposal only provides a specialised kind of
|
---|
331 | coroutine, namely generators, despite its use of the more general word.
|
---|
332 |
|
---|
333 | Fixed.
|
---|
334 |
|
---|
335 | line 47: "... due to dynamic stack allocation, execution..." -- this
|
---|
336 | sentence doesn't scan. I suggest adding "and for" in the relevant places
|
---|
337 | where currently there are only commas.
|
---|
338 |
|
---|
339 | Fixed.
|
---|
340 |
|
---|
341 | p8 / Figure 5 (B) -- the GNU C extension of unary "&&" needs to be
|
---|
342 | explained.
|
---|
343 |
|
---|
344 | Fixed, added explanation at first usage in Figure 1(C) and reference.
|
---|
345 |
|
---|
346 | The whole figure needs a better explanation, in fact.
|
---|
347 |
|
---|
348 | Fixed, rewrote start of the paragraph.
|
---|
349 |
|
---|
350 | p9, lines 1--10: I wasn't sure this stepping-through really added much
|
---|
351 | value. What are the truly important points to note about this code?
|
---|
352 |
|
---|
353 | Fixed, shortened and merged with previous paragraph.
|
---|
354 |
|
---|
355 | p10: similarly, lines 3--27 again are somewhere between tedious and
|
---|
356 | confusing. I'm sure the motivation and details of "starter semantics" can
|
---|
357 | both be stated much more pithily.
|
---|
358 |
|
---|
359 | Fixed, shortened these paragraphs.
|
---|
360 |
|
---|
361 | line 32: "a self-resume does not overwrite the last resumer" -- is this a
|
---|
362 | hack or a defensible principled decision?
|
---|
363 |
|
---|
364 | Fixed, removed but it is a defensible principled decision.
|
---|
365 |
|
---|
366 | p11: "a common source of errors" -- among beginners or among production
|
---|
367 | code? Presumably the former.
|
---|
368 |
|
---|
369 | Forgetting is not specific to beginners.
|
---|
370 |
|
---|
371 | line 23: "with builtin and library" -- not sure what this means
|
---|
372 |
|
---|
373 | Fixed.
|
---|
374 |
|
---|
375 | lines 31--36: these can be much briefer. The only important point here
|
---|
376 | seems to be that coroutines cannot be copied.
|
---|
377 |
|
---|
378 | Fixed, shortened.
|
---|
379 |
|
---|
380 | p12: line 1: what is a "task"? Does it matter?
|
---|
381 |
|
---|
382 | Fixed, "task" has been changed to "thread" throughout the paper.
|
---|
383 |
|
---|
384 | line 7: calling it "heap stack" seems to be a recipe for
|
---|
385 | confusion. "Stack-and-heap" might be better, and contrast with
|
---|
386 | "stack-and-VLS" perhaps. When "VLS" is glossed, suggest actually expanding
|
---|
387 | its initials: say "length" not "size".
|
---|
388 |
|
---|
389 | Fixed, make correction and rewrote some of the text.
|
---|
390 |
|
---|
391 | line 21: are you saying "cooperative threading" is the same as
|
---|
392 | "non-preemptive scheduling", or that one is a special case (kind) of the
|
---|
393 | other? Both are defensible, but be clear.
|
---|
394 |
|
---|
395 | Fixed, clarified the definitions.
|
---|
396 |
|
---|
397 | line 27: "mutual exclusion and synchronization" -- the former is a kind of
|
---|
398 | the latter, so I suggest "and other forms of synchronization".
|
---|
399 |
|
---|
400 | We have to agree to disagree. Included a citation that explains the
|
---|
401 | differences.
|
---|
402 |
|
---|
403 | line 30: "can either be a stackless or stackful" -- stray "a", but also,
|
---|
404 | this seems to be switching from generic/background terminology to
|
---|
405 | C\/-specific terminology.
|
---|
406 |
|
---|
407 | Fixed, but the terms stackless or stackful are not specific to Cforall; they
|
---|
408 | are well known in the literature.
|
---|
409 |
|
---|
410 | An expositional idea occurs: start the paper with a strawman naive/limited
|
---|
411 | realisation of coroutines -- say, Simon Tatham's popular "Coroutines in C"
|
---|
412 | web page -- and identify point by point what the limitations are and how
|
---|
413 | C\/ overcomes them. Currently the presentation is often flat (lacking
|
---|
414 | motivating contrasts) and backwards (stating solutions before
|
---|
415 | problems). The foregoing approach might fix both of these.
|
---|
416 |
|
---|
417 | We prefer the current structure of our paper and believe the paper does
|
---|
418 | explain basic coding limitations and how they are overcome in using high-level
|
---|
419 | control-floe mechanisms.
|
---|
420 |
|
---|
421 | page 13: line 23: it seems a distraction to mention the Python feature
|
---|
422 | here.
|
---|
423 |
|
---|
424 | Why? It is the first location in the paper where dynamic allocation and
|
---|
425 | initialization are mentioned.
|
---|
426 |
|
---|
427 | p14 line 5: it seems odd to describe these as "stateless" just because they
|
---|
428 | lack shared mutable state. It means the code itself is even more
|
---|
429 | stateful. Maybe the "stack ripping" argument could usefully be given here.
|
---|
430 |
|
---|
431 | Fixed, changed "stateless" to "non-shared".
|
---|
432 |
|
---|
433 | line 16: "too restrictive" -- would be good to have a reference to justify
|
---|
434 | this, or at least give a sense of what the state-of-the-art performance in
|
---|
435 | transactional memory systems is (both software and hardware)
|
---|
436 |
|
---|
437 | Fixed, added 2 citations.
|
---|
438 |
|
---|
439 | line 22: "simulate monitors" -- what about just *implementing* monitors?
|
---|
440 | isn't that what these systems do? or is the point more about refining them
|
---|
441 | somehow into something more specialised?
|
---|
442 |
|
---|
443 | Fixed, changed "simulate monitors" to "manually implement a monitor".
|
---|
444 |
|
---|
445 | p15: sections 4.1 and 4.2 seem adrift and misplaced. Split them into basic
|
---|
446 | parts (which go earlier) and more advanced parts (e.g. barging, which can
|
---|
447 | be explained later).
|
---|
448 |
|
---|
449 | Fixed, removed them by shortening and merging with previous section.
|
---|
450 |
|
---|
451 | line 31: "acquire/release" -- misses an opportunity to contrast the
|
---|
452 | monitor's "enter/exit" abstraction with the less structured acquire/release
|
---|
453 | of locks.
|
---|
454 |
|
---|
455 | Fixed, added "by call/return" in sentence.
|
---|
456 |
|
---|
457 | p16 line 12: the "implicit" versus "explicit" point is unclear. Is it
|
---|
458 | perhaps about the contract between an opt-in *discipline* and a
|
---|
459 | language-enforced *guarantee*?
|
---|
460 |
|
---|
461 | Fixed.
|
---|
462 |
|
---|
463 | line 28: no need to spend ages dithering about which one is default and
|
---|
464 | which one is the explicit qualifier. Tell us what you decided, briefly
|
---|
465 | justify it, and move on.
|
---|
466 |
|
---|
467 | Fixed, shortened paragraph.
|
---|
468 |
|
---|
469 | p17: Figure 11: since the main point seems to be to highlight bulk acquire,
|
---|
470 | include a comment which identifies the line where this is happening.
|
---|
471 |
|
---|
472 | Fixed.
|
---|
473 |
|
---|
474 | line 2: "impossible to statically..." -- or dynamically. Doing it
|
---|
475 | dynamically would be perfectly acceptable (locking is a dynamic operation
|
---|
476 | after all)
|
---|
477 |
|
---|
478 | Fixed, clarified the "statically" applied to the unknown-sized pointer types.
|
---|
479 |
|
---|
480 | "guarantees acquisition order is consistent" -- assuming it's done in a
|
---|
481 | single bulk acquire.
|
---|
482 |
|
---|
483 | Fixed.
|
---|
484 |
|
---|
485 | p18: section 5.3: the text here is a mess. The explanations of "internal"
|
---|
486 | versus "external" scheduling are unclear, and "signals as hints" is not
|
---|
487 | explained. "... can cause thread starvation" -- means including a while
|
---|
488 | loop, or not doing so? "There are three signalling mechanisms.." but the
|
---|
489 | text does not follow that by telling us what they are. My own scribbled
|
---|
490 | attempt at unpicking the internal/external thing: "threads already in the
|
---|
491 | monitor, albeit waiting, have priority over those trying to enter".
|
---|
492 |
|
---|
493 | Fixed, rewrote and shortened paragraphs.
|
---|
494 |
|
---|
495 | p19: line 3: "empty condition" -- explain that condition variables don't
|
---|
496 | store anything. So being "empty" means that the queue of waiting threads
|
---|
497 | (threads waiting to be signalled that the condition has become true) is
|
---|
498 | empty.
|
---|
499 |
|
---|
500 | Fixed, changed condition variable to condition queue throughout the paper.
|
---|
501 |
|
---|
502 | line 6: "... can be transformed into external scheduling..." -- OK, but
|
---|
503 | give some motivation.
|
---|
504 |
|
---|
505 | The paper states that it removes the condition queues and signal/wait. Changed
|
---|
506 | "transform" to "simplified".
|
---|
507 |
|
---|
508 | p20: line 6: "mechnaism"
|
---|
509 |
|
---|
510 | Fixed.
|
---|
511 |
|
---|
512 | lines 16--20: this is dense and can probably only be made clear with an
|
---|
513 | example
|
---|
514 |
|
---|
515 | Fixed, rewrote and added example.
|
---|
516 |
|
---|
517 | p21 line 21: clarify that nested monitor deadlock was describe earlier (in
|
---|
518 | 5.2). (Is the repetition necessary?)
|
---|
519 |
|
---|
520 | Fixed, put in a forward reference, and the point bears repeating because
|
---|
521 | releasing a subset of acquired monitors in unique to Cforall concurrency.
|
---|
522 |
|
---|
523 | line 27: "locks, and by extension monitors" -- this is true but the "by
|
---|
524 | extension" argument is faulty. It is perfectly possible to use locks as a
|
---|
525 | primitive and build a compositional mechanism out of them,
|
---|
526 | e.g. transactions.
|
---|
527 |
|
---|
528 | True, but that is not what we said. Locks are not composable, monitors are
|
---|
529 | built using locks not transactions, so by extension monitors are not composable.
|
---|
530 |
|
---|
531 | p22 line 2: should say "restructured"
|
---|
532 |
|
---|
533 | Fixed.
|
---|
534 |
|
---|
535 | line 33: "Implementing a fast subset check..." -- make clear that the
|
---|
536 | following section explains how to do this. Restructuring the sections
|
---|
537 | themselves could do this, or noting in the text.
|
---|
538 |
|
---|
539 | Fixed, added a forward reference to the following sections.
|
---|
540 |
|
---|
541 | p23: line 3: "dynamic member adding, eg, JavaScript" -- needs to say "as
|
---|
542 | permitted in JavaScript", and "dynamically adding members" is stylistically
|
---|
543 | better
|
---|
544 |
|
---|
545 | Fixed.
|
---|
546 |
|
---|
547 | p23: line 18: "urgent stack" -- back-reference to where this was explained
|
---|
548 | before
|
---|
549 |
|
---|
550 | Fixed.
|
---|
551 |
|
---|
552 | p24 line 7: I did not understand what was more "direct" about "direct
|
---|
553 | communication". Also, what is a "passive monitor" -- just a monitor, given
|
---|
554 | that monitors are passive by design?
|
---|
555 |
|
---|
556 | The back half of line 7 defines "direct". For example, Go, Java, pthread
|
---|
557 | threads cannot directly call/communicate with one another, where they can in
|
---|
558 | Ada, uC++, and Cforall threads. Figure 18 show this exact difference.
|
---|
559 |
|
---|
560 | A monitor object is *passive* because it does not have a thread, while a Go,
|
---|
561 | Java, Cforall "thread" object is *active* because it has a thread.
|
---|
562 |
|
---|
563 | line 14 / section 5.9: this table was useful and it (or something like it)
|
---|
564 | could be used much earlier on to set the structure of the rest of the
|
---|
565 | paper.
|
---|
566 |
|
---|
567 | Fixed, Table 1 is moved to the start and explained in detail.
|
---|
568 |
|
---|
569 | The explanation at present is too brief, e.g. I did not really understand
|
---|
570 | the point about cases 7 and 8. Table 1: what does "No / Yes" mean?
|
---|
571 |
|
---|
572 | Fixed, expanded the explanation.
|
---|
573 |
|
---|
574 | p25 line 2: instead of casually dropping in a terse explanation for the
|
---|
575 | newly introduced term "virtual processor", introduce it
|
---|
576 | properly. Presumably the point is to give a less ambiguous meaning to
|
---|
577 | "thread" by reserving it only for C\/'s green threads.
|
---|
578 |
|
---|
579 | Fixed.
|
---|
580 |
|
---|
581 | p26 line 15: "transforms user threads into fibres" -- a reference is needed
|
---|
582 | to explain what "fibres" means... guessing it's in the sense of Adya et al.
|
---|
583 |
|
---|
584 | Fixed. In a prior correct, the term fibre from Adya is defined.
|
---|
585 |
|
---|
586 | line 20: "Microsoft runtime" -- means Windows?
|
---|
587 |
|
---|
588 | Fixed.
|
---|
589 |
|
---|
590 | lines 21--26: don't say "interrupt" to mean "signal", especially not
|
---|
591 | without clear introduction. You can use "POSIX signal" to disambiguate from
|
---|
592 | condition variables' "signal".
|
---|
593 |
|
---|
594 | We have to agree to disagree on this terminology. Interrupt is the action of
|
---|
595 | stopping the CPU while a signal is a specific kind of interrupt. The two terms
|
---|
596 | seem to be well understood in the literature.
|
---|
597 |
|
---|
598 | p27 line 3: "frequency is usually long" -- that's a "time period" or
|
---|
599 | "interval", not a frequency
|
---|
600 |
|
---|
601 | Fixed.
|
---|
602 |
|
---|
603 | line 5: the lengthy quotation is not really necessary; just paraphrase the
|
---|
604 | first sentence and move on.
|
---|
605 |
|
---|
606 | Fixed.
|
---|
607 |
|
---|
608 | line 20: "to verify the implementation" -- I don't think that means what is
|
---|
609 | intended
|
---|
610 |
|
---|
611 | Fixed, changed "verify" to "test".
|
---|
612 |
|
---|
613 | Tables in section 7 -- too many significant figures. How many overall runs
|
---|
614 | are described? What is N in each case?
|
---|
615 |
|
---|
616 | Fixed. As stated, N=31.
|
---|
617 |
|
---|
618 | p29 line 2: "to eliminate this cost" -- arguably confusing since nowadays
|
---|
619 | on commodity CPUs most of the benefits of inlining are not to do with call
|
---|
620 | overheads, but from later optimizations enabled as a consequence of the
|
---|
621 | inlining
|
---|
622 |
|
---|
623 | Fixed.
|
---|
624 |
|
---|
625 | line 41: "a hierarchy" -- are they a hierarchy? If so, this could be
|
---|
626 | explained earlier. Also, to say these make up "an integrated set... of
|
---|
627 | control-flow features" verges on the tautologous.
|
---|
628 |
|
---|
629 | Fixed, rewrote sentence.
|
---|
630 |
|
---|
631 | p30 line 15: "a common case being web servers and XaaS" -- that's two cases
|
---|
632 |
|
---|
633 | Fixed.
|
---|
634 |
|
---|
635 | ============================================================================
|
---|
636 |
|
---|
637 | Reviewing: 3
|
---|
638 |
|
---|
639 | * Expand on the motivations for including both generator and coroutines, vs
|
---|
640 | trying to build one atop the other
|
---|
641 |
|
---|
642 | Fixed, Table 1 is moved to the start and explained in detail.
|
---|
643 |
|
---|
644 | * Expand on the motivations for having both symmetric and asymmetric
|
---|
645 | coroutines?
|
---|
646 |
|
---|
647 | A coroutine is not marked as symmetric or asymmetric, it is a coroutine.
|
---|
648 | Symmetric or asymmetric is a stylistic use of a coroutine. By analogy, a
|
---|
649 | function is not marked as recursive or non-recursive. Recursion is a style of
|
---|
650 | programming with a function. So there is no notion of motivation for having
|
---|
651 | both symmetric and asymmetric as they follow from how a programmer uses suspend
|
---|
652 | and resume.
|
---|
653 |
|
---|
654 | * Comparison to async-await model adopted by other languages
|
---|
655 |
|
---|
656 | Fixed, added a new section on this topic.
|
---|
657 |
|
---|
658 | * Consider performance comparisons against node.js and Rust frameworks
|
---|
659 |
|
---|
660 | Fixed.
|
---|
661 |
|
---|
662 | * Discuss performance of monitors vs finer-grained memory models and atomic
|
---|
663 | operations found in other languages
|
---|
664 |
|
---|
665 | The paper never suggested high-level concurrency constructs can or should
|
---|
666 | replace race programming or hardware atomics. The paper suggests programmers
|
---|
667 | use high-level constructs when and where is it feasible because they are easy
|
---|
668 | and safer to use. The monitor example of an atomic counter is just that, an
|
---|
669 | example, not the way it should be done if maximal performance is required. We
|
---|
670 | have tried to make this point clear in the paper.
|
---|
671 |
|
---|
672 | * Why both internal/external scheduling for synchronization?
|
---|
673 |
|
---|
674 | Some additional motivation has been added.
|
---|
675 |
|
---|
676 | * Generators are not exposed as a "function" that returns a generator
|
---|
677 | object, but rather as a kind of struct, with communication happening via
|
---|
678 | mutable state instead of "return values".
|
---|
679 |
|
---|
680 | Yes, Cforall uses an object-style of coroutine, which allows multiple interface
|
---|
681 | functions that pass and return values through a structure. This approach allows
|
---|
682 | a generator function to have different kinds of return values and different
|
---|
683 | kinds of parameters to produce those values. Our generators can provide this
|
---|
684 | capability via multiple interface functions to the generator/coroutine state,
|
---|
685 | which is discussed on page 5, lines 13-21.
|
---|
686 |
|
---|
687 | That is, the generator must be manually resumed and (if I understood) it
|
---|
688 | is expected to store values that can then later be read (perhaps via
|
---|
689 | methods), instead of having a `yield <Expr>` statement that yields up a
|
---|
690 | value explicitly.
|
---|
691 |
|
---|
692 | All generators are manually resumed, e.g., Python/nodejs use "next" to resume a
|
---|
693 | generator. Yes, yield <Expr> has a single interface with one input/return type,
|
---|
694 | versus the Cforall approach allowing arbitrary number of interfaces of
|
---|
695 | arbitrary types.
|
---|
696 |
|
---|
697 | * Both "symmetric" and "asymmetric" generators are supported, instead of
|
---|
698 | only asymmetric.
|
---|
699 |
|
---|
700 | Yes, because they support different functionality as discussed in Chris
|
---|
701 | Marlin's seminal work and both forms are implemented in Simula67. We did not
|
---|
702 | invent symmetric and asymmetric generators/coroutines, we took them from the
|
---|
703 | literature.
|
---|
704 |
|
---|
705 | * Coroutines (multi-frame generators) are an explicit mechanism.
|
---|
706 |
|
---|
707 | In most other languages, coroutines are rather built by layering
|
---|
708 | single-frame generators atop one another (e.g., using a mechanism like
|
---|
709 | async-await),
|
---|
710 |
|
---|
711 | We disagree. Node.js has async-await but has a separate coroutine feature.
|
---|
712 | While there are claims that coroutines can be built from async-await and/or
|
---|
713 | continuations, in actuality they cannot.
|
---|
714 |
|
---|
715 | and symmetric coroutines are basically not supported. I'd like to see a bit
|
---|
716 | more justification for Cforall including all the above mechanisms -- it
|
---|
717 | seemed like symmetric coroutines were a useful building block for some of
|
---|
718 | the user-space threading and custom scheduler mechanisms that were briefly
|
---|
719 | mentioned later in the paper.
|
---|
720 |
|
---|
721 | Hopefully fixed by moving Table 1 forward.
|
---|
722 |
|
---|
723 | In the discussion of coroutines, I would have expected a bit more of a
|
---|
724 | comparison to the async-await mechanism offered in other languages.
|
---|
725 |
|
---|
726 | We added a new section at the start to point out there is no comparison between
|
---|
727 | coroutines and async-await.
|
---|
728 |
|
---|
729 | Certainly the semantics of async-await in JavaScript implies
|
---|
730 | significantly more overhead (because each async fn is a distinct heap
|
---|
731 | object). [Rust's approach avoids this overhead][zc], however, and might be
|
---|
732 | worthy of a comparison (see the Performance section).
|
---|
733 |
|
---|
734 | We could not get Rust async-await to work, and when reading the description of
|
---|
735 | rust async-await, it appears to be Java-style executors with futures (possibly
|
---|
736 | fast futures).
|
---|
737 |
|
---|
738 | There are several sections in the paper that compare against atomics -- for
|
---|
739 | example, on page 15, the paper shows a simple monitor that encapsulates an
|
---|
740 | integer and compares that to C++ atomics. Later, the paper compares the
|
---|
741 | simplicity of monitors against the `volatile` quantifier from Java. The
|
---|
742 | conclusion in section 8 also revisits this point.
|
---|
743 | While I agree that monitors are simpler, they are obviously also
|
---|
744 | significantly different from a performance perspective -- the paper doesn't
|
---|
745 | seem to address this at all. It's plausible that (e.g.) the `Aint` monitor
|
---|
746 | type described in the paper can be compiled and mapped to the specialized
|
---|
747 | instructions offered by hardware, but I didn't see any mention of how this
|
---|
748 | would be done.
|
---|
749 |
|
---|
750 | Fixed, see response above.
|
---|
751 |
|
---|
752 | There is also no mention of the more nuanced memory ordering
|
---|
753 | relations offered by C++11 and how one might achieve similar performance
|
---|
754 | characteristics in Cforall (perhaps the answer is that one simply doesn't
|
---|
755 | need to; I think that's defensible, but worth stating explicitly).
|
---|
756 |
|
---|
757 | Cforall is built on C, and therefore has full access to all the gcc atomics,
|
---|
758 | and automatically gets any gcc updates. Furthermore, section 6.9 states that
|
---|
759 | Cforall provides the full panoply of low-level locks, as does Java, Go, C++,
|
---|
760 | for performance programming.
|
---|
761 |
|
---|
762 | Cforall includes both internal and external scheduling; I found the
|
---|
763 | explanation for the external scheduling mechanism to be lacking in
|
---|
764 | justification. Why include both mechanisms when most languages seem to make
|
---|
765 | do with only internal scheduling? It would be useful to show some scenarios
|
---|
766 | where external scheduling is truly more powerful.
|
---|
767 |
|
---|
768 | Fixed. Pointed out external scheduling is simpler as part of rewriting in that
|
---|
769 | section, and added additional examples.
|
---|
770 |
|
---|
771 | I would have liked to see some more discussion of external scheduling and
|
---|
772 | how it interacts with software engineering best practices. It seems
|
---|
773 | somewhat similar to AOP in certain regards. It seems to add a bit of "extra
|
---|
774 | semantics" to monitor methods, in that any method may now also become a
|
---|
775 | kind of synchronization point.
|
---|
776 |
|
---|
777 | Fixed somewhat. Pointed out that external scheduling has been around for a long
|
---|
778 | time (40 years) in Ada, so there is a body of the software-engineering
|
---|
779 | experience using it. As well, I have been teaching it for 30 years in the
|
---|
780 | concurrency course at Waterloo. We don't know what software engineering best
|
---|
781 | practices you imagine it interacting with. Yes, monitor functions are
|
---|
782 | synchronization points with external scheduling.
|
---|
783 |
|
---|
784 | The "open-ended" nature of this feels like it could easily lead to subtle
|
---|
785 | bugs, particularly when code refactoring occurs (which may e.g. split an
|
---|
786 | existing method into two).
|
---|
787 |
|
---|
788 | Any time a public interface is refactored, it invalids existing calls, so there
|
---|
789 | is always an issue. For mutex routines and external scheduling, the waitfor
|
---|
790 | statements may have to be updated, but that update is part of the refactoring.
|
---|
791 |
|
---|
792 | This seems particularly true if external scheduling can occur across
|
---|
793 | compilation units -- the paper suggested that this is true, but I wasn't
|
---|
794 | entirely clear.
|
---|
795 |
|
---|
796 | Every aspect of Cforall allows separate compilation. The function prototypes
|
---|
797 | necessary for separate compilation provide all the information necessary to
|
---|
798 | compile any aspect of a program.
|
---|
799 |
|
---|
800 | I would have also appreciated a few more details on how external scheduling
|
---|
801 | is implemented. It seems to me that there must be some sort of "hooks" on
|
---|
802 | mutex methods so that they can detect whether some other function is
|
---|
803 | waiting on them and awaken those blocked threads. I'm not sure how such
|
---|
804 | hooks are inserted, particularly across compilation units.
|
---|
805 |
|
---|
806 | Hooks are inserted by the Cforall translator, in the same way that Java
|
---|
807 | inserted hooks into a "synchronized" member of a monitor. As for Java, as long
|
---|
808 | as the type information is consistent across compilation units, the correct
|
---|
809 | code is inserted.
|
---|
810 |
|
---|
811 | The material in Section 5.6 didn't quite clarify the matter for me. For
|
---|
812 | example, it left me somewhat confused about whether the `f` and `g`
|
---|
813 | functions declared were meant to be local to a translation unit, or shared
|
---|
814 | with other unit.
|
---|
815 |
|
---|
816 | There are no restrictions with respect to static or external mutex functions.
|
---|
817 | Cforall is C. Any form of access or separate compilation in C applies to
|
---|
818 | Cforall. As in C, function prototypes carry all necessary information to
|
---|
819 | compile the code.
|
---|
820 |
|
---|
821 | To start, I did not realize that the `mutex_opt` notation was a keyword, I
|
---|
822 | thought it was a type annotation. I think this could be called out more
|
---|
823 | explicitly.
|
---|
824 |
|
---|
825 | Fixed, indicated "mutex" is a C-style parameter-only declaration type-qualifier.
|
---|
826 |
|
---|
827 | Later, in section 5.2, the paper discusses `nomutex` annotations, which
|
---|
828 | initially threw me, as they had not been introduced (now I realize that
|
---|
829 | this paragraph is there to justify why there is no such keyword). The
|
---|
830 | paragraph might be rearranged to make that clearer, perhaps by leading with
|
---|
831 | the choice that Cforall made.
|
---|
832 |
|
---|
833 | Fixed, rewrote paragraph removing nomutex.
|
---|
834 |
|
---|
835 | On page 17, the paper states that "acquiring multiple monitors is safe from
|
---|
836 | deadlock", but this could be stated a bit more precisely: acquiring
|
---|
837 | multiple monitors in a bulk-acquire is safe from deadlock (deadlock can
|
---|
838 | still result from nested acquires).
|
---|
839 |
|
---|
840 | Fixed.
|
---|
841 |
|
---|
842 | On page 18, the paper states that wait states do not have to be enclosed in
|
---|
843 | loops, as there is no concern of barging. This seems true but there are
|
---|
844 | also other reasons to use loops (e.g., if there are multiple reasons to
|
---|
845 | notify on the same condition). Thus the statement initially surprised me,
|
---|
846 | as barging is only one of many reasons that I typically employ loops around
|
---|
847 | waits.
|
---|
848 |
|
---|
849 | Fixed. Rewrote the sentence. Note, for all non-barging cases where you employ a
|
---|
850 | loop around a wait, the unblocking task must change state before blocking
|
---|
851 | again. In the barging case, the unblocking thread blocks again without
|
---|
852 | changing state.
|
---|
853 |
|
---|
854 | I did not understand the diagram in Figure 12 for some time. Initially, I
|
---|
855 | thought that it was generic to all monitors, and I could not understand the
|
---|
856 | state space. It was only later that I realized it was specific to your
|
---|
857 | example. Updating the caption from "Monitor scheduling to "Monitor
|
---|
858 | scheduling in the example from Fig 13" might have helped me quite a bit.
|
---|
859 |
|
---|
860 | Fixed, updated text to clarify. Did not change the caption because the
|
---|
861 | signal_block does not apply to Figure 13.
|
---|
862 |
|
---|
863 | I spent quite some time reading the boy/girl dating example (\*) and I
|
---|
864 | admit I found it somewhat confusing. For example, I couldn't tell whether
|
---|
865 | there were supposed to be many "girl" threads executing at once, or if
|
---|
866 | there was only supposed to be one girl and one boy thread executing in a
|
---|
867 | loop.
|
---|
868 |
|
---|
869 | The paper states:
|
---|
870 |
|
---|
871 | The dating service matches girl and boy threads with matching compatibility
|
---|
872 | codes so they can exchange phone numbers.
|
---|
873 |
|
---|
874 | so there are many girl/boy threads. There is nothing preventing an individual
|
---|
875 | girl/boy from arranging multiple dates.
|
---|
876 |
|
---|
877 | Are the girl/boy threads supposed to invoke the girl/boy methods or vice
|
---|
878 | versa?
|
---|
879 |
|
---|
880 | As long as the girls/boys are consistent in the calls, it does not matter. The
|
---|
881 | goal is to find a partner and exchange phone numbers.
|
---|
882 |
|
---|
883 | Surely there is some easier way to set this up?
|
---|
884 |
|
---|
885 | There are some other solutions using monitors but they all have a similar
|
---|
886 | structure.
|
---|
887 |
|
---|
888 | The paper offered a number of comparisons to Go, C#, Scala, and so forth,
|
---|
889 | but seems to have overlooked another recent language, Rust. In many ways,
|
---|
890 | Rust seems to be closest in philosophy to Cforall, so it seems like an odd
|
---|
891 | omission. I already mentioned above that Rust is in the process of shipping
|
---|
892 | [async-await syntax][aa], which is definitely an alternative to the
|
---|
893 | generator/coroutine approach in Cforall (though one with clear pros/cons).
|
---|
894 |
|
---|
895 | We cannot get rust async-await example programs to compile nor does the select!
|
---|
896 | macro compile.
|
---|
897 |
|
---|
898 | @plg2[1]% rustc --version
|
---|
899 | rustc 1.40.0 (73528e339 2019-12-16)
|
---|
900 |
|
---|
901 | @plg2[2]% cat future.rs
|
---|
902 | use futures::executor::block_on;
|
---|
903 |
|
---|
904 | async fn hello_world() {
|
---|
905 | println!("hello, world!");
|
---|
906 | }
|
---|
907 |
|
---|
908 | fn main() {
|
---|
909 | let future = hello_world(); // Nothing is printed
|
---|
910 | block_on(future); // `future` is run and "hello, world!" is printed
|
---|
911 | }
|
---|
912 |
|
---|
913 | @plg2[3]% rustc -C opt-level=3 future.rs
|
---|
914 | error[E0670]: `async fn` is not permitted in the 2015 edition
|
---|
915 | --> future.rs:3:1
|
---|
916 | |
|
---|
917 | 3 | async fn hello_world() {
|
---|
918 | | ^^^^^
|
---|
919 |
|
---|
920 | error[E0433]: failed to resolve: maybe a missing crate `futures`?
|
---|
921 | --> future.rs:1:5
|
---|
922 | |
|
---|
923 | 1 | use futures::executor::block_on;
|
---|
924 | | ^^^^^^^ maybe a missing crate `futures`?
|
---|
925 |
|
---|
926 | error[E0425]: cannot find function `block_on` in this scope
|
---|
927 | --> future.rs:9:5
|
---|
928 | |
|
---|
929 | 9 | block_on(future); // `future` is run and "hello, world!" is printed
|
---|
930 | | ^^^^^^^^ not found in this scope
|
---|
931 |
|
---|
932 | error: aborting due to 3 previous errors
|
---|
933 |
|
---|
934 | Some errors have detailed explanations: E0425, E0433, E0670.
|
---|
935 | For more information about an error, try `rustc --explain E0425`.
|
---|
936 |
|
---|
937 |
|
---|
938 | In the performance section in particular, you might consider comparing
|
---|
939 | against some of the Rust web servers and threading systems.
|
---|
940 |
|
---|
941 | This paper is not about building web-servers. Nor are web-servers a reasonable
|
---|
942 | benchmark for language concurrency. Web-servers are a benchmark for
|
---|
943 | non-blocking I/O library efficiency accessed in the underlying operating
|
---|
944 | system. Our prior work on web-server performance:
|
---|
945 |
|
---|
946 | @inproceedings{Pariag07,
|
---|
947 | author = {David Pariag and Tim Brecht and Ashif Harji and Peter Buhr and Amol Shukla},
|
---|
948 | title = {Comparing the Performance of Web Server Architectures},
|
---|
949 | booktitle = {Proceedings of the 2007 Eurosys conference},
|
---|
950 | month = mar,
|
---|
951 | year = 2007,
|
---|
952 | pages = {231--243},
|
---|
953 | }
|
---|
954 |
|
---|
955 | @inproceedings{Harji12,
|
---|
956 | author = {Ashif S. Harji and Peter A. Buhr and Tim Brecht},
|
---|
957 | title = {Comparing High-Performance Multi-core Web-Server Architectures},
|
---|
958 | booktitle = {Proceedings of the 5th Annual International Systems and Storage Conference},
|
---|
959 | series = {SYSTOR '12},
|
---|
960 | publisher = {ACM},
|
---|
961 | address = {New York, NY, USA},
|
---|
962 | location = {Haifa, Israel},
|
---|
963 | month = jun,
|
---|
964 | year = 2012,
|
---|
965 | articleno = 1,
|
---|
966 | pages = {1:1--1:12},
|
---|
967 | }
|
---|
968 |
|
---|
969 | shows the steps to build a high-performance web-server, which are largely
|
---|
970 | independent of the server architecture and programing language.
|
---|
971 |
|
---|
972 | It would seem worth trying to compare their "context switching" costs as
|
---|
973 | well -- I believe both actix and tokio have a notion of threads that could
|
---|
974 | be readily compared.
|
---|
975 |
|
---|
976 | Again, context-switching speed is largely irrelevant because the amount of code
|
---|
977 | to process an http request is large enough to push any concurrency costs into
|
---|
978 | the background.
|
---|
979 |
|
---|
980 | Another addition that might be worth considering is to compare against
|
---|
981 | node.js promises, although I think the comparison to process creation is
|
---|
982 | not as clean.
|
---|
983 |
|
---|
984 | Done.
|
---|