1 | Reviewing: 1
|
---|
2 |
|
---|
3 | I still have a couple of issues --- perhaps the largest is that it's
|
---|
4 | still not clear at this point in the paper what some of these options
|
---|
5 | are, or crucially how they would be used. I don't know if it's
|
---|
6 | possible to give high-level examples or use cases to be clear about
|
---|
7 | these up front - or if that would duplicate too much information from
|
---|
8 | later in the paper - either way expanding out the discussion - even if
|
---|
9 | just two a couple of sentences for each row - would help me more.
|
---|
10 |
|
---|
11 | Section 2.1 is changed to address this suggestion.
|
---|
12 |
|
---|
13 |
|
---|
14 | * 1st para section 2 begs the question: why not support each
|
---|
15 | dimension independently, and let the programmer or library designer
|
---|
16 | combine features?
|
---|
17 |
|
---|
18 | As seen in Table 1, not all of the combinations work, and having programmers
|
---|
19 | directly use these low-level mechanisms is error prone. Accessing these
|
---|
20 | fundamental mechanisms through higher-level constructs has always been the
|
---|
21 | purpose of a programming language.
|
---|
22 |
|
---|
23 |
|
---|
24 | * Why must there "be language mechanisms to create, block/unblock, and join
|
---|
25 | with a thread"? There aren't in Smalltalk (although there are in the
|
---|
26 | runtime). Especially given in Cforall those mechanisms are *implicit* on
|
---|
27 | thread creation and destruction?
|
---|
28 |
|
---|
29 | Fixed, changed sentence to:
|
---|
30 |
|
---|
31 | A programmer needs mechanisms to create, block and unblock, and join with a
|
---|
32 | thread, even if these basic mechanisms are supplied indirectly through
|
---|
33 | high-level features.
|
---|
34 |
|
---|
35 |
|
---|
36 | * "Case 1 is a function that borrows storage for its state (stack
|
---|
37 | frame/activation) and a thread from its invoker"
|
---|
38 |
|
---|
39 | this much makes perfect sense to me, but I don't understand how a
|
---|
40 | non-stateful, non-threaded function can then retain
|
---|
41 |
|
---|
42 | "this state across callees, ie, function local-variables are
|
---|
43 | retained on the stack across calls."
|
---|
44 |
|
---|
45 | how can it retain function-local values *across calls* when it
|
---|
46 | doesn't have any functional-local state?
|
---|
47 |
|
---|
48 | In the following example:
|
---|
49 |
|
---|
50 | void foo() {
|
---|
51 | // local variables and code
|
---|
52 | }
|
---|
53 | void bar() {
|
---|
54 | // local variables
|
---|
55 | foo();
|
---|
56 | }
|
---|
57 |
|
---|
58 | bar is the caller and foo is the callee. bar borrows the program stack and
|
---|
59 | thread to make the call to foo. When foo, the callee, is executing, bar's local
|
---|
60 | variables (state) is retained on the *borrowed* stack across the call. (Note, I
|
---|
61 | added *borrowed* to that sentence in the paper to help clarify.) Furthermore,
|
---|
62 | foo's local variables are also retain on the borrowed stack. When foo and bar
|
---|
63 | return, all of their local state is gone (not retained). This behaviour is
|
---|
64 | standard call/return semantics in an imperative language.
|
---|
65 |
|
---|
66 |
|
---|
67 | I'm not sure if I see two separate cases here - roughly equivalent
|
---|
68 | to C functions without static storage, and then C functions *with*
|
---|
69 | static storage.
|
---|
70 |
|
---|
71 | Yes, but there is only one instance of the static storage across all
|
---|
72 | activations of the C function. For generators and coroutines, each instance has
|
---|
73 | its own state, like an object in an OO language.
|
---|
74 |
|
---|
75 |
|
---|
76 | I assumed that was the distinction between cases 1 & 3; but perhaps the
|
---|
77 | actual distinction is that 3 has a suspend/resume point, and so the
|
---|
78 | "state" in figure 1 is this component of execution state (viz figs 1 & 2),
|
---|
79 | not the state representing the cross-call variables?
|
---|
80 |
|
---|
81 | So case 3 is like an object with the added ability to retain where it was last
|
---|
82 | executing. When a generator is resumed, the generator object (structure
|
---|
83 | instance) is passed as an explicit reference, and within this object is the
|
---|
84 | restart location in the generator's "main". When the generator main executes,
|
---|
85 | it uses the borrowed stack for its local variables and any functions it calls,
|
---|
86 | just like an object member borrows the stack for its local variables but also
|
---|
87 | has an implicit receiver to the object state. A generator can have static
|
---|
88 | storage, too, which is a single instance across all generator instances of that
|
---|
89 | type, as for static storage in an object type. All the kinds of storage are
|
---|
90 | at play with semantics that is the same as in other languages.
|
---|
91 |
|
---|
92 |
|
---|
93 | > but such evaluation isn't appropriate for garbage-collected or JITTed
|
---|
94 | > languages like Java or Go.
|
---|
95 | For JITTed languages in particular, reporting peak performance needs to
|
---|
96 | "warm up" the JIT with a number of iterators before beginning
|
---|
97 | measurement. Actually for JIT's its even worse: see Edd Barrett et al
|
---|
98 | OOPSLA 2017.
|
---|
99 |
|
---|
100 | Of our testing languages, only Java and Node.js are JITTED. To ensure the Java
|
---|
101 | test-programs correctly measured the specific feature, we consulted with Dave
|
---|
102 | Dice at Oracle who works directly on the development of the Oracle JVM
|
---|
103 | Just-in-Time Compiler. We modified our test programs based on his advise, and
|
---|
104 | he validated our programs as correctly measuring the specified language
|
---|
105 | feature. Dave's help is recognized in the Acknowledgment section. Similarly,
|
---|
106 | we verified our Node.js programs with Gregor Richards, an expert in
|
---|
107 | just-in-time compilation for dynamic typing. Hence, we have taken into account
|
---|
108 | all issues related to performing benchmarks in JITTED languages. Also, all the
|
---|
109 | benchmark programs are publicly available for independent verification.
|
---|
110 |
|
---|
111 |
|
---|
112 | * footnote A - I've looked at various other papers & the website to try to
|
---|
113 | understand how "object-oriented" Cforall is - I'm still not sure. This
|
---|
114 | footnote says Cforall has "virtuals" - presumably virtual functions,
|
---|
115 | i.e. dynamic dispatch - and inheritance: that really is OO as far as I
|
---|
116 | (and most OO people) are concerned. For example Haskell doesn't have
|
---|
117 | inheritance, so it's not OO; while CLOS (the Common Lisp *Object* System)
|
---|
118 | or things like Cecil and Dylan are considered OO even though they have
|
---|
119 | "multiple function parameters as receivers", lack "lexical binding between
|
---|
120 | a structure and set of functions", and don't have explicit receiver
|
---|
121 | invocation syntax. Python has receiver syntax, but unlike Java or
|
---|
122 | Smalltalk or C++, method declarations still need to have an explicit
|
---|
123 | "self" receiver parameter. Seems to me that Go, for example, is
|
---|
124 | more-or-less OO with interfaces, methods, and dynamic dispatch (yes also
|
---|
125 | and an explicit receiver syntax but that's not determinative); while Rust
|
---|
126 | lacks dynamic dispatch built-in. C is not OO as a language, but as you
|
---|
127 | say given it supports function pointers with structures, it does support
|
---|
128 | an OO programming style.
|
---|
129 |
|
---|
130 | This is why I again recommend just not buying into this fight: not making
|
---|
131 | any claims about whether Cforall is OO or is not - because as I see it,
|
---|
132 | the rest of the paper doesn't depend on whether Cforall is OO or not.
|
---|
133 | That said: this is just a recommendation, and I won't quibble over this
|
---|
134 | any further.
|
---|
135 |
|
---|
136 | We believe it is important to identify Cforall as a non-OO language because it
|
---|
137 | heavily influences the syntax and semantics used to build its concurrency.
|
---|
138 | Since many aspects of Cforall are not OO, the rest of the paper *does* depend
|
---|
139 | on Cforall being identified as non-OO, otherwise readers would have
|
---|
140 | significantly different expectations for the design. We did not mean to suggest
|
---|
141 | that languages that support function pointers with structures support an OO
|
---|
142 | style. We revised the footnote to avoid this interpretation. Finally, Golang
|
---|
143 | does not identify itself as an OO language.
|
---|
144 | https://golang.org/doc/faq#Is_Go_an_object-oriented_language
|
---|
145 |
|
---|
146 |
|
---|
147 | * is a "monitor function" the same as a "mutex function"?
|
---|
148 | if so the paper should pick one term; if not, make the distinction clear.
|
---|
149 |
|
---|
150 | Fixed. Picked "mutex". Changed the language and all places in the paper.
|
---|
151 |
|
---|
152 |
|
---|
153 | * "As stated on line 1 because state declarations from the generator
|
---|
154 | type can be moved out of the coroutine type into the coroutine main"
|
---|
155 |
|
---|
156 | OK sure, but again: *why* would a programmer want to do that?
|
---|
157 | (Other than, I guess, to show the difference between coroutines &
|
---|
158 | generators?) Perhaps another way to put this is that the first
|
---|
159 | para of 3.2 gives the disadvantages of coroutines vs-a-vs
|
---|
160 | generators, briefly describes the extended semantics, but never
|
---|
161 | actually says why a programmer may want those extended semantics,
|
---|
162 | or how they would benefit. I don't mean to belabour the point,
|
---|
163 | but (generalist?) readers like me would generally benefit from
|
---|
164 | those kinds of discussions about each feature throughout the
|
---|
165 | paper: why might a programmer want to use them?
|
---|
166 |
|
---|
167 | On page 8, it states:
|
---|
168 |
|
---|
169 | Having to manually create the generator closure by moving local-state
|
---|
170 | variables into the generator type is an additional programmer burden (removed
|
---|
171 | by the coroutine in Section 3.2). ...
|
---|
172 |
|
---|
173 | also these variables can now be refactored into helper function, where the
|
---|
174 | helper function suspends at arbitrary call depth. So imagine a coroutine helper
|
---|
175 | function that is only called occasionally within the coroutine but it has a
|
---|
176 | large array that is retained across suspends within the helper function. For a
|
---|
177 | generator, this large array has to be declared in the generator type enlarging
|
---|
178 | each generator instance even through the array is only used occasionally.
|
---|
179 | Whereas, the coroutine only needs the array allocated when needed. Now a
|
---|
180 | coroutine has a stack which occupies storage, but the maximum stack size only
|
---|
181 | needs to be the call chain allocating the most storage, whereas the generator
|
---|
182 | has a maximum size of all variable that could be created.
|
---|
183 |
|
---|
184 |
|
---|
185 | > p17 if the multiple-monitor entry procedure really is novel, write a paper
|
---|
186 | > about that, and only about that.
|
---|
187 | > We do not believe this is a practical suggestion.
|
---|
188 | * I'm honestly not trying to be snide here: I'm not an expert on monitor or
|
---|
189 | concurrent implementations. Brinch Hansen's original monitors were single
|
---|
190 | acquire; this draft does not cite any other previous work that I could
|
---|
191 | see. I'm not suggesting that the brief mention of this mechanism
|
---|
192 | necessarily be removed from this paper, but if this is novel (and a clear
|
---|
193 | advance over a classical OO monitor a-la Java which only acquires the
|
---|
194 | distinguished receiver) then that would be worth another paper in itself.
|
---|
195 |
|
---|
196 | First, to explain multiple-monitor entry in Cforall as a separate paper would
|
---|
197 | require significant background on Cforall concurrency, which means repeating
|
---|
198 | large sections of this paper. Second, it feels like a paper just on
|
---|
199 | multiple-monitor entry would be a little thin, even if the capability is novel
|
---|
200 | to the best of our knowledge. Third, we feel multiple-monitor entry springs
|
---|
201 | naturally from the overarching tone in the paper that Cforall is a non-OO
|
---|
202 | programming language, allowing multiple-mutex receivers.
|
---|
203 |
|
---|
204 |
|
---|
205 | My typo: the paper's conclusion should come at the end, after the
|
---|
206 | future work section.
|
---|
207 |
|
---|
208 | Combined into a Conclusions and Future Work section.
|
---|
209 |
|
---|
210 |
|
---|
211 |
|
---|
212 | Reviewing: 2
|
---|
213 |
|
---|
214 | on the Boehm paper and whether code is "all sequential to the compiler": I
|
---|
215 | now understand the authors' position better and suspect we are in violent
|
---|
216 | agreement, except for whether it's appropriate to use the rather breezy
|
---|
217 | phrase "all sequential to the compiler". It would be straightforward to
|
---|
218 | clarify that code not using the atomics features is optimized *as if* it
|
---|
219 | were sequential, i.e. on the assumption of a lack of data races.
|
---|
220 |
|
---|
221 | Fixed, "as inline and library code is compiled as sequential without any
|
---|
222 | explicit concurrent directive."
|
---|
223 |
|
---|
224 |
|
---|
225 | on the distinction between "mutual exclusion" and "synchronization": the
|
---|
226 | added citation does help, in that it makes a coherent case for the
|
---|
227 | definition the authors prefer. However, the text could usefully clarify
|
---|
228 | that this is a matter of definition not of fact, given especially that in
|
---|
229 | my assessment the authors' preferred definition is not the most common
|
---|
230 | one. (Although the mention of Hoare's apparent use of this definition is
|
---|
231 | one data point, countervailing ones are found in many contemporaneous or
|
---|
232 | later papers, e.g. Habermann's 1972 "Synchronization of Communicating
|
---|
233 | Processes" (CACM 15(3)), Reed & Kanodia's 1979 "Synchronization with
|
---|
234 | eventcounts and sequencers" (CACM (22(2)) and so on.)
|
---|
235 |
|
---|
236 | Fixed, "We contend these two properties are independent, ...".
|
---|
237 |
|
---|
238 | With respect to the two papers, Habermann fundamentally agrees with our
|
---|
239 | definitions, where the term mutual exclusion is the same but Habermann uses the
|
---|
240 | term "communication" for our synchronization. However, the term "communication"
|
---|
241 | is rarely used to mean synchronization. The fact that Habermann collectively
|
---|
242 | calls these two mechanisms synchronization is the confusion. Reed & Kanodia
|
---|
243 | state:
|
---|
244 |
|
---|
245 | By mutual exclusion we mean any mechanism that forces the time ordering of
|
---|
246 | execution of pieces of code, called critical regions, in a system of
|
---|
247 | concurrent processes to be a total ordering.
|
---|
248 |
|
---|
249 | But there is no timing order for a critical region (which I assume means
|
---|
250 | critical section); threads can arrive at any time in any order, where the
|
---|
251 | mutual exclusion for the critical section ensures one thread executes at a
|
---|
252 | time. Interestingly, Reed & Kanodia's critical region is Habermann's
|
---|
253 | communication not critical section. These papers only buttress our contention
|
---|
254 | about the confusion of these terms in the literature.
|
---|
255 |
|
---|
256 |
|
---|
257 | section 2 (an expanded version of what was previously section 5.9) lacks
|
---|
258 | examples and is generally obscure and allusory ("the most advanced feature"
|
---|
259 | -- name it! "in triplets" -- there is only one triplet!;
|
---|
260 |
|
---|
261 | Fixed.
|
---|
262 |
|
---|
263 | These independent properties can be used to compose different language
|
---|
264 | features, forming a compositional hierarchy, where the combination of all
|
---|
265 | three is the most advanced feature, called a thread/task/process. While it is
|
---|
266 | possible for a language to only provide threads for composing
|
---|
267 | programs~\cite{Hermes90}, this unnecessarily complicates and makes inefficient
|
---|
268 | solutions to certain classes of problems.
|
---|
269 |
|
---|
270 |
|
---|
271 | what are "execution locations"? "initialize" and "de-initialize"
|
---|
272 |
|
---|
273 | Fixed through an example at the end of the sentence.
|
---|
274 |
|
---|
275 | \item[\newterm{execution state}:] is the state information needed by a
|
---|
276 | control-flow feature to initialize and manage both compute data and execution
|
---|
277 | location(s), and de-initialize. For example, calling a function initializes a
|
---|
278 | stack frame including contained objects with constructors, manages local data
|
---|
279 | in blocks and return locations during calls, and de-initializes the frame by
|
---|
280 | running any object destructors and management operations.
|
---|
281 |
|
---|
282 |
|
---|
283 | what? "borrowed from the invoker" is a concept in need of explaining or at
|
---|
284 | least a fully explained example -- in what sense does a plain function
|
---|
285 | borrow" its stack frame?
|
---|
286 |
|
---|
287 | A function has no storage except for static storage; it gets its storage from
|
---|
288 | the program stack. For a function to run it must borrow storage from somewhere.
|
---|
289 | When the function returns, the borrowed storage is returned. Do you have a more
|
---|
290 | appropriate word?
|
---|
291 |
|
---|
292 |
|
---|
293 | "computation only" as opposed to what?
|
---|
294 |
|
---|
295 | This is a term in concurrency for operations that compute without blocking,
|
---|
296 | i.e., the operation starts with everything it needs to compute its result and
|
---|
297 | runs to completion, blocking only when it is done and returns its result.
|
---|
298 | Computation only occurs in "embarrassingly parallel" problems.
|
---|
299 |
|
---|
300 |
|
---|
301 | in 2.2, in what way is a "request" fundamental to "synchronization"?
|
---|
302 |
|
---|
303 | I assume you are referring to the last bullet.
|
---|
304 |
|
---|
305 | Synchronization must be able to control the service order of requests
|
---|
306 | including prioritizing selection from different kinds of outstanding requests,
|
---|
307 | and postponing a request for an unspecified time while continuing to accept
|
---|
308 | new requests.
|
---|
309 |
|
---|
310 | Habermann states on page 173
|
---|
311 |
|
---|
312 | Looking at deposit we see that sender and receiver should be synchronized with
|
---|
313 | respect to buffer overflow: deposit of another message must be delayed if
|
---|
314 | there is no empty message frame, and such delay should be removed when the
|
---|
315 | first empty frame becomes available. This is programmed as
|
---|
316 | synchronization(frame) : deposit is preceded by wait(frame), accept is
|
---|
317 | followed by signal( frame), and the constant C[frame] is set equal to
|
---|
318 | "bufsize."
|
---|
319 |
|
---|
320 | Here synchronization is controlling the service order of requests: when the
|
---|
321 | buffer is full, requests for insert (sender) are postponed until a request to
|
---|
322 | remove (receiver) occurs. Without the ability to control service order among
|
---|
323 | requests, the producer/consumer problem with a bounded buffer cannot be
|
---|
324 | solved. Hence, this capability is fundamental.
|
---|
325 |
|
---|
326 |
|
---|
327 | and the "implicitly" versus "explicitly" point needs stating as elsewhere,
|
---|
328 | with a concrete example e.g. Java built-in mutexes versus
|
---|
329 | java.util.concurrent).
|
---|
330 |
|
---|
331 | Fixed.
|
---|
332 |
|
---|
333 | MES must be available implicitly in language constructs, \eg Java built-in
|
---|
334 | monitors, as well as explicitly for specialized requirements, \eg
|
---|
335 | @java.util.concurrent@, because requiring programmers to build MES using
|
---|
336 | low-level locks often leads to incorrect programs.
|
---|
337 |
|
---|
338 |
|
---|
339 | section 6: 6.2 omits the most important facts in preference for otherwise
|
---|
340 | inscrutable detail: "identify the kind of parameter" (first say *that there
|
---|
341 | are* kinds of parameter, and what "kinds" means!); "mutex parameters are
|
---|
342 | documentation" is misleading (they are also semantically significant!) and
|
---|
343 | fails to say *what* they mean; the most important thing is surely that
|
---|
344 | 'mutex' is a language feature for performing lock/unlock operations at
|
---|
345 | function entry/exit. So say it!
|
---|
346 |
|
---|
347 | These sections have been rewritten address the comments.
|
---|
348 |
|
---|
349 |
|
---|
350 | The meanings of examples f3 and f4 remain unclear.
|
---|
351 |
|
---|
352 | Rewrote the paragraph.
|
---|
353 |
|
---|
354 |
|
---|
355 | Meanwhile in 6.3, "urgent" is not introduced (we are supposed to infer its
|
---|
356 | meaning from Figure 12,
|
---|
357 |
|
---|
358 | Defined Hoare's urgent list at the start of the paragraph.
|
---|
359 |
|
---|
360 |
|
---|
361 | but that Figure is incomprehensible to me), and we are told of "external
|
---|
362 | scheduling"'s long history in Ada but not clearly what it actually means;
|
---|
363 |
|
---|
364 | We do not know how to address your comment because the description is clear to
|
---|
365 | us and other non-reviewers who have read the paper. I had forgotten an
|
---|
366 | important citation to prior work on this topic, which is now referenced:
|
---|
367 |
|
---|
368 | Buhr Peter A., Fortier Michel, Coffin Michael H.. Monitor
|
---|
369 | Classification. ACM Computing Surveys. 1995; 27(1):63-107.
|
---|
370 |
|
---|
371 | This citation is a 45 page paper expanding on the topic of internal scheduling.
|
---|
372 | I also added a citation to Hoare's monitor paper where signal_block is defined:
|
---|
373 |
|
---|
374 | When a process signals a condition on which another process is waiting, the
|
---|
375 | signalling process must wait until the resumed process permits it to
|
---|
376 | proceed.
|
---|
377 |
|
---|
378 | External scheduling from Ada was subsequently added to uC++ and Cforall.
|
---|
379 | Furthermore, Figure 20 shows a direct comparison of CFA procedure-call
|
---|
380 | external-scheduling with Go channel external-scheduling. So external scheduling
|
---|
381 | exists in languages beyond Ada.
|
---|
382 |
|
---|
383 |
|
---|
384 | 6.4's description of "waitfor" tells us it is different from an if-else
|
---|
385 | chain but tries to use two *different* inputs to tell us that the behavior
|
---|
386 | is different; tell us an instance where *the same* values of C1 and C2 give
|
---|
387 | different behavior (I even wrote out a truth table and still don't see the
|
---|
388 | semantic difference)
|
---|
389 |
|
---|
390 | Again, it is unclear what is the problem. For the if-statement, if C1 is true,
|
---|
391 | it only waits for a call to mem1, even if C2 is true and there is an
|
---|
392 | outstanding call to mem2. For the waitfor, if both C1 and C2 are true and there
|
---|
393 | is a call to mem2 but not mem1, it accepts the call to mem2, which cannot
|
---|
394 | happen with the if-statement. So for all true when clauses, any outstanding
|
---|
395 | call is immediately accepted. If there are no outstanding calls, the waitfor
|
---|
396 | blocks until the next call to any of the true when clauses occurs. I added the
|
---|
397 | following sentence to further clarify.
|
---|
398 |
|
---|
399 | Hence, the @waitfor@ has parallel semantics, accepting any true @when@ clause.
|
---|
400 |
|
---|
401 | Note, the same parallel semantics exists for the Go select statement with
|
---|
402 | respect to waiting for a set of channels to receive data. While Go select does
|
---|
403 | not have a when clause, it would be trivial to add it, making the select more
|
---|
404 | expressive.
|
---|
405 |
|
---|
406 |
|
---|
407 | The authors frequently use bracketed phrases, and sometimes slashes "/", in
|
---|
408 | ways that are confusing and/or detrimental to readability. Page 13 line
|
---|
409 | 2's "forward (backward)" is one particularly egregious example. In general
|
---|
410 | I would recommend the the authors try to limit their use of parentheses and
|
---|
411 | slashes as a means of forcing a clearer wording to emerge.
|
---|
412 |
|
---|
413 | Many of the slashes and parentheticals have been removed. Some are retained to
|
---|
414 | express joined concepts: call/return, suspend/resume, resume/resume, I/O.
|
---|
415 |
|
---|
416 |
|
---|
417 | Also, the use of "eg." is often cursory and does not explain the examples
|
---|
418 | given, which are frequently a one- or two-word phrase of unclear referent.
|
---|
419 |
|
---|
420 | A few of these are fixed.
|
---|
421 |
|
---|
422 |
|
---|
423 | Considering the revision more broadly, none of the more extensive or
|
---|
424 | creative rewrites I suggested in my previous review have been attempted,
|
---|
425 | nor any equivalent efforts to improve its readability.
|
---|
426 |
|
---|
427 | If you reread the previous response, we addressed all of your suggestions except
|
---|
428 |
|
---|
429 | An expositional idea occurs: start the paper with a strawman
|
---|
430 | naive/limited realisation of coroutines -- say, Simon Tatham's popular
|
---|
431 | "Coroutines in C" web page -- and identify point by point what the
|
---|
432 | limitations are and how C\/ overcomes them. Currently the presentation
|
---|
433 | is often flat (lacking motivating contrasts) and backwards (stating
|
---|
434 | solutions before problems). The foregoing approach might fix both of
|
---|
435 | these.
|
---|
436 |
|
---|
437 | We prefer the current structure of our paper and believe the paper does
|
---|
438 | explain basic coding limitations and how they are overcome in using
|
---|
439 | high-level control-floe mechanisms.
|
---|
440 |
|
---|
441 | and we have addressed readability issues in this version.
|
---|
442 |
|
---|
443 |
|
---|
444 | The hoisting of the former section 5.9 is a good idea, but the newly added
|
---|
445 | material accompanying it (around Table 1) suffers fresh deficiencies in
|
---|
446 | clarity. Overall the paper is longer than before, even though (as my
|
---|
447 | previous review stated), I believe a shorter paper is required in order to
|
---|
448 | serve the likely purpose of publication. (Indeed, the authors' letter
|
---|
449 | implies that a key goal of publication is to build community and gain
|
---|
450 | external users.)
|
---|
451 |
|
---|
452 | This comment is the referee's opinion, which we do not agree with it. Our
|
---|
453 | previous 35 page SP&E paper on Cforall:
|
---|
454 |
|
---|
455 | Moss Aaron, Schluntz Robert, Buhr Peter A.. Cforall: Adding Modern
|
---|
456 | Programming Language Features to C. Softw. Pract. Exper. 2018;
|
---|
457 | 48(12):2111-2146.
|
---|
458 |
|
---|
459 | has a similar structure and style to this paper, and it received an award from
|
---|
460 | John Wiley & Sons:
|
---|
461 |
|
---|
462 | Software: Practice & Experience for articles published between January 2017
|
---|
463 | and December 2018, "most downloads in the 12 months following online
|
---|
464 | publication showing the work generated immediate impact and visibility,
|
---|
465 | contributing significantly to advancement in the field".
|
---|
466 |
|
---|
467 | So we have demonstrated an ability to build community and gain external users.
|
---|
468 |
|
---|
469 |
|
---|
470 | Given this trajectory, I no longer see a path to an acceptable revision of
|
---|
471 | the present submission. Instead I suggest the authors consider splitting
|
---|
472 | the paper in two: one half about coroutines and stack management, the other
|
---|
473 | about mutexes, monitors and the runtime. (A briefer presentation of the
|
---|
474 | runtime may be helpful in the first paper also, and a brief recap of the
|
---|
475 | generator and coroutine support is obviously needed in the second too.)
|
---|
476 |
|
---|
477 | Again we disagree with the referee's suggestion to vastly restructure the
|
---|
478 | paper. What advantage is there is presenting exactly the same material across
|
---|
479 | two papers, which will likely end up longer than a single paper? The current
|
---|
480 | paper has a clear theme that fundamental execution properties generate a set of
|
---|
481 | basic language mechanisms, and we then proceed to show how these mechanisms can
|
---|
482 | be designed into the programing language Cforall.
|
---|
483 |
|
---|
484 |
|
---|
485 | I do not buy the authors' defense of the limited practical experience or
|
---|
486 | "non-micro" benchmarking presented. Yes, gaining external users is hard and
|
---|
487 | I am sympathetic on that point. But building something at least *somewhat*
|
---|
488 | substantial with your own system should be within reach, and without it the
|
---|
489 | "practice and experience" aspects of the work have not been explored.
|
---|
490 | Clearly C\/ is the product of a lot of work over an extended period, so it
|
---|
491 | is a surprise that no such experience is readily available for inclusion.
|
---|
492 |
|
---|
493 | Understood. There are no agreed-upon concurrency benchmarks, which is why
|
---|
494 | micro-benchmarks are often used. Currently, the entire Cforall runtime is
|
---|
495 | written in Cforall (10,000+ lines of code (LOC)). This runtime is designed to
|
---|
496 | be thread safe, automatically detects the use of concurrency features at link
|
---|
497 | time, and bootstraps into a threaded runtime almost immediately at program
|
---|
498 | startup so threads can be declared as global variables and may run to
|
---|
499 | completion before the program main starts. The concurrent core of the runtime
|
---|
500 | is 3,500 LOC and bootstraps from low-level atomic primitives into Cforall locks
|
---|
501 | and high-level features. In other words, the concurrent core uses itself as
|
---|
502 | quickly as possible to start using high-level concurrency. There are 12,000+
|
---|
503 | LOC in the Cforall tests-suite used to verify language features, which are run
|
---|
504 | nightly. Of theses, there are 2000+ LOC running standard concurrent tests, such
|
---|
505 | as aggressively testing each language feature, and classical examples such as
|
---|
506 | bounded buffer, dating service, matrix summation, quickSort, binary insertion
|
---|
507 | sort, etc. More experience will be available soon, based on ongoing work in
|
---|
508 | the "future works" section. Specifically, non-blocking I/O is working with the
|
---|
509 | new Linux io_uring interface and a new high-performance ready-queue is under
|
---|
510 | construction to take into account this change. With non-blocking I/O, it will
|
---|
511 | be possible to write applications like high-performance web servers, as is now
|
---|
512 | done in Rust and Go. Also, completed is Java-style executors for work-based
|
---|
513 | concurrent programming and futures. Under construction is a high-performance
|
---|
514 | actor system.
|
---|
515 |
|
---|
516 |
|
---|
517 | It does not seem right to state that a stack is essential to Von Neumann
|
---|
518 | architectures -- since the earliest Von Neumann machines (and indeed early
|
---|
519 | Fortran) did not use one.
|
---|
520 |
|
---|
521 | Reference Manual Fortran II for the IBM 704 Data Processing System, 1958 IBM, page 2
|
---|
522 | https://archive.computerhistory.org/resources/text/Fortran/102653989.05.01.acc.pdf
|
---|
523 |
|
---|
524 | Since a subprogram may call for other subprograms to any desired depth, a
|
---|
525 | particular CALL statement may be defined by a pyramid of multi-level
|
---|
526 | subprograms.
|
---|
527 |
|
---|
528 | I think we may be differing on the meaning of stack. You may be imagining a
|
---|
529 | modern stack that grows and shrinks dynamically. Whereas early Fortran
|
---|
530 | preallocated a stack frame for each function, like Python allocates a frame for
|
---|
531 | a generator. Within each preallocated Fortran function is a frame for local
|
---|
532 | variables and a pointer to store the return value for a call. The Fortran
|
---|
533 | call/return mechanism uses these frames to build a traditional call stack
|
---|
534 | linked by the return pointer. The only restriction is that a function stack
|
---|
535 | frame can only be used once, implying no direct or indirect recursion. Hence,
|
---|
536 | without a stack mechanism, there can be no call/return to "any desired depth",
|
---|
537 | where the maximum desired depth is limited by the number of functions. So
|
---|
538 | call/return requires some form of a stack, virtually all programming language
|
---|
539 | have call/return, past and present, and these languages run on Von Neumann
|
---|
540 | machines that do not distinguish between program and memory space, have mutable
|
---|
541 | state, and the concept of a pointer to data or code.
|
---|
542 |
|
---|
543 |
|
---|
544 | To elaborate on something another reviewer commented on: it is a surprise
|
---|
545 | to find a "Future work" section *after* the "Conclusion" section. A
|
---|
546 | "Conclusions and future work" section often works well.
|
---|
547 |
|
---|
548 | Done.
|
---|
549 |
|
---|
550 |
|
---|
551 |
|
---|
552 | Reviewing: 3
|
---|
553 |
|
---|
554 | but it remains really difficult to have a good sense of which idea I should
|
---|
555 | use and when. This applies in different ways to different features from the
|
---|
556 | language:
|
---|
557 |
|
---|
558 | * coroutines/generators/threads: here there is some discussion, but it can
|
---|
559 | be improved.
|
---|
560 | * internal/external scheduling: I didn't find any direct comparison between
|
---|
561 | these features, except by way of example.
|
---|
562 |
|
---|
563 | See changes below.
|
---|
564 |
|
---|
565 |
|
---|
566 | I would have preferred something more like a table or a few paragraphs
|
---|
567 | highlighting the key reasons one would pick one construct or the other.
|
---|
568 |
|
---|
569 | Section 2.1 is changed to address this suggestion.
|
---|
570 |
|
---|
571 |
|
---|
572 | The discussion of clusters and pre-emption in particular feels quite rushed.
|
---|
573 |
|
---|
574 | We believe a brief introduction to the Cforall runtime structure is important
|
---|
575 | because clustering within a user-level versus distributed system is unusual,
|
---|
576 | Furthermore, the explanation of preemption is important because several new
|
---|
577 | languages, like Go and Rust tokio, are not preemptive. Rust threads are
|
---|
578 | preemptive only because it uses kernel threads, which UNIX preempts.
|
---|
579 |
|
---|
580 |
|
---|
581 | * Recommend to shorten the comparison on coroutine/generator/threads in
|
---|
582 | Section 2 to a paragraph with a few examples, or possibly a table
|
---|
583 | explaining the trade-offs between the constructs
|
---|
584 |
|
---|
585 | Not done, see below.
|
---|
586 |
|
---|
587 |
|
---|
588 | * Recommend to clarify the relationship between internal/external
|
---|
589 | scheduling -- is one more general but more error-prone or low-level?
|
---|
590 |
|
---|
591 | Done, see below.
|
---|
592 |
|
---|
593 |
|
---|
594 | There is obviously a lot of overlap between these features, and in
|
---|
595 | particular between coroutines and generators. As noted in the previous
|
---|
596 | review, many languages have chosen to offer *only* generators, and to
|
---|
597 | build coroutines by stacks of generators invoking one another.
|
---|
598 |
|
---|
599 | Coroutines built from stacks of generators have problems, such as no symmetric
|
---|
600 | control-flow. Furthermore, stacks of generators have a problem with the
|
---|
601 | following programming pattern. logger is a library function called from a
|
---|
602 | function or a coroutine, where the doit for the coroutine suspends. With stacks
|
---|
603 | of generators, there has to be a function and generator version of logger to
|
---|
604 | support these two scenarios. If logger is a library function, it may be
|
---|
605 | impossible to create the generator logger because the logger function is
|
---|
606 | opaque.
|
---|
607 |
|
---|
608 | #include <fstream.hfa>
|
---|
609 | #include <coroutine.hfa>
|
---|
610 |
|
---|
611 | forall( otype T | { void doit( T ); } )
|
---|
612 | void logger( T & t ) {
|
---|
613 | doit( t );
|
---|
614 | }
|
---|
615 |
|
---|
616 | coroutine C {};
|
---|
617 | void main( C & c ) with( c ) {
|
---|
618 | void doit( C & c ) { suspend; }
|
---|
619 | logger( c );
|
---|
620 | }
|
---|
621 | void mem( C & c ) {
|
---|
622 | resume( c );
|
---|
623 | }
|
---|
624 |
|
---|
625 | int main() {
|
---|
626 | C c;
|
---|
627 | mem( c );
|
---|
628 | mem( c );
|
---|
629 |
|
---|
630 | struct S {};
|
---|
631 | S s;
|
---|
632 | void doit( S & s ) {}
|
---|
633 | logger( s );
|
---|
634 | }
|
---|
635 |
|
---|
636 | Additional text has been added to the start of Section 3.2 addressing this
|
---|
637 | comment.
|
---|
638 |
|
---|
639 |
|
---|
640 | In fact, the end of Section 2.1 (on page 5) contains a particular paragraph
|
---|
641 | that embodies this "top down" approach. It starts, "programmers can now
|
---|
642 | answer three basic questions", and thus gives some practical advice for
|
---|
643 | which construct you should use and when. I think giving some examples of
|
---|
644 | specific applications that this paragraph, combined with some examples of
|
---|
645 | cases where each construct was needed, would be a better approach.
|
---|
646 |
|
---|
647 | I don't think this comparison needs to be very long. It seems clear enough
|
---|
648 | that one would
|
---|
649 |
|
---|
650 | * prefer generators for simple computations that yield up many values,
|
---|
651 | * prefer coroutines for more complex processes that have significant
|
---|
652 | internal structure,
|
---|
653 |
|
---|
654 | We do not believe the number of values yielded is an important factor is
|
---|
655 | choosing between a generator or coroutine; either form can receive (input) or
|
---|
656 | return (output) millions of values. As well, simple versus complex computations
|
---|
657 | is also not an absolute criterion for selection as both forms can be simple or
|
---|
658 | complex.
|
---|
659 |
|
---|
660 | As stated in the paper, a coroutine brings generality to the implementation
|
---|
661 | because of the addition stack, whereas generators have restrictions on standard
|
---|
662 | software-engineering practices: variable placement, no helper functions without
|
---|
663 | creating an explicit generator stack, and no symmetric control-flow. It is
|
---|
664 | true that simple generators probably suffer less from restrictions. Also,
|
---|
665 | simplicity means the amount of work done is small so the cost of the
|
---|
666 | suspend/resume becomes a factor, where experiments show suspend/resume for a
|
---|
667 | generator is 10 times faster than for a coroutines.
|
---|
668 |
|
---|
669 |
|
---|
670 | * prefer threads for cases where parallel execution is desired or needed.
|
---|
671 |
|
---|
672 | Agreed, but this description does not mention mutual exclusion and
|
---|
673 | synchronization, which is essential in any meaningful concurrent program.
|
---|
674 |
|
---|
675 | Our point here is to illustrate that a casual "top down" explanation is
|
---|
676 | insufficient to explain the complexity of the underlying execution properties.
|
---|
677 | We presented some rule-of-thumbs at the end of Section 2 but programmers must
|
---|
678 | understand all the underlying mechanisms and their interactions to exploit the
|
---|
679 | execution properties to their fullest, and to understand when a programming
|
---|
680 | language does or does not provide a desired mechanism.
|
---|
681 |
|
---|
682 |
|
---|
683 | I did appreciate the comparison in Section 2.3 between async-await in
|
---|
684 | JS/Java and generators/coroutines. I agree with its premise that those
|
---|
685 | mechanisms are a poor replacement for generators (and, indeed, JS has a
|
---|
686 | distinct generator mechanism, for example, in part for this reason). I
|
---|
687 | believe I may have asked for this in a previous review, but having read it,
|
---|
688 | I wonder if it is really necessary, since those mechanisms are so different
|
---|
689 | in purpose.
|
---|
690 |
|
---|
691 | Given that you asked about this before, I believe other readers might also ask
|
---|
692 | the same question because async-await is very popular. So I think this section
|
---|
693 | does help to position the work in the paper among other work, and hence, it is
|
---|
694 | appropriate to keep it in the paper.
|
---|
695 |
|
---|
696 |
|
---|
697 | I find the motivation for supporting both internal and external scheduling
|
---|
698 | to be fairly implicit. After several reads through the section, I came to
|
---|
699 | the conclusion that internal scheduling is more expressive than external
|
---|
700 | scheduling, but sometimes less convenient or clear. Is this correct? If
|
---|
701 | not, it'd be useful to clarify where external scheduling is more
|
---|
702 | expressive.
|
---|
703 |
|
---|
704 | I would find it very interesting to try and capture some of the properties
|
---|
705 | that make internal vs external scheduling the better choice.
|
---|
706 |
|
---|
707 | For example, it seems to me that external scheduling works well if there
|
---|
708 | are only a few "key" operations, but that internal scheduling might be
|
---|
709 | better otherwise, simply because it would be useful to have the ability to
|
---|
710 | name a signal that can be referenced by many methods.
|
---|
711 |
|
---|
712 | To address this point, the last paragraph on page 22 (now page 23) has been
|
---|
713 | augmented to the following:
|
---|
714 |
|
---|
715 | Given external and internal scheduling, what guidelines can a programmer use
|
---|
716 | to select between them? In general, external scheduling is easier to
|
---|
717 | understand and code because only the next logical action (mutex function(s))
|
---|
718 | is stated, and the monitor implicitly handles all the details. Therefore,
|
---|
719 | there are no condition variables, and hence, no wait and signal, which reduces
|
---|
720 | coding complexity and synchronization errors. If external scheduling is
|
---|
721 | simpler than internal, why not use it all the time? Unfortunately, external
|
---|
722 | scheduling cannot be used if: scheduling depends on parameter value(s) or
|
---|
723 | scheduling must block across an unknown series of calls on a condition
|
---|
724 | variable (\ie internal scheduling). For example, the dating service cannot be
|
---|
725 | written using external scheduling. First, scheduling requires knowledge of
|
---|
726 | calling parameters to make matching decisions and parameters of calling
|
---|
727 | threads are unavailable within the monitor. Specifically, a thread within the
|
---|
728 | monitor cannot examine the @ccode@ of threads waiting on the calling queue to
|
---|
729 | determine if there is a matching partner. (Similarly, if the bounded buffer
|
---|
730 | or readers/writer are restructured with a single interface function with a
|
---|
731 | parameter denoting producer/consumer or reader/write, they cannot be solved
|
---|
732 | with external scheduling.) Second, a scheduling decision may be delayed
|
---|
733 | across an unknown number of calls when there is no immediate match so the
|
---|
734 | thread in the monitor must block on a condition. Specifically, if a thread
|
---|
735 | determines there is no opposite calling thread with the same @ccode@, it must
|
---|
736 | wait an unknown period until a matching thread arrives. For complex
|
---|
737 | synchronization, both external and internal scheduling can be used to take
|
---|
738 | advantage of best of properties of each.
|
---|
739 |
|
---|
740 |
|
---|
741 | Consider the bounded buffer from Figure 13: if it had multiple methods for
|
---|
742 | removing elements, and not just `remove`, then the `waitfor(remove)` call
|
---|
743 | in `insert` might not be sufficient.
|
---|
744 |
|
---|
745 | Section 6.4 Extended waitfor shows the waitfor is very powerful and can handle
|
---|
746 | your request:
|
---|
747 |
|
---|
748 | waitfor( remove : buffer ); or waitfor( remove2 : buffer );
|
---|
749 |
|
---|
750 | and its shorthand form (not shown in the paper)
|
---|
751 |
|
---|
752 | waitfor( remove, remove2 : buffer );
|
---|
753 |
|
---|
754 | A call to either of these remove functions satisfies the waitfor (exact
|
---|
755 | selection details are discussed in Section 6.4).
|
---|
756 |
|
---|
757 |
|
---|
758 | The same is true, I think, of the `signal_block` function, which I
|
---|
759 | have not encountered before;
|
---|
760 |
|
---|
761 | In Tony Hoare's seminal paper on Monitors "Monitors: An Operating System
|
---|
762 | Structuring Concept", it states on page 551:
|
---|
763 |
|
---|
764 | When a process signals a condition on which another process is waiting, the
|
---|
765 | signalling process must wait until the resumed process permits it to
|
---|
766 | proceed. We therefore introduce for each monitor a second semaphore "urgent"
|
---|
767 | (initialized to 0), on which signalling processes suspend themselves by the
|
---|
768 | operation P(urgent).
|
---|
769 |
|
---|
770 | Hence, the original definition of signal is in fact signal_block, i.e., the
|
---|
771 | signaller blocks. Later implementations of monitor switched to signaller
|
---|
772 | nonblocking because most signals occur before returns, which allows the
|
---|
773 | signaller to continue execution, exit the monitor, and run concurrently with
|
---|
774 | the signalled thread that restarts in the monitor. When the signaller is not
|
---|
775 | going to exit immediately, signal_block is appropriate.
|
---|
776 |
|
---|
777 |
|
---|
778 | it seems like its behavior can be modeled with multiple condition
|
---|
779 | variables, but that's clearly more complex.
|
---|
780 |
|
---|
781 | Yes. Buhr, Fortier and Coffin show in Monitor Classification, ACM Computing
|
---|
782 | Surveys, 27(1):63-107, that all extant monitors with different signalling
|
---|
783 | semantics can be transformed into each other. However, some transformations are
|
---|
784 | complex and runtime expensive.
|
---|
785 |
|
---|
786 |
|
---|
787 | One question I had about `signal_block`: what happens if one signals
|
---|
788 | but no other thread is waiting? Does it block until some other thread
|
---|
789 | waits? Or is that user error?
|
---|
790 |
|
---|
791 | On page 20, it states:
|
---|
792 |
|
---|
793 | Signalling is unconditional because signalling an empty condition queue does
|
---|
794 | nothing.
|
---|
795 |
|
---|
796 | To the best of our knowledge, all monitors have the same semantics for
|
---|
797 | signalling an empty condition queue, regardless of the kind of signal, i.e.,
|
---|
798 | signal or signal_block.
|
---|
799 |
|
---|
800 |
|
---|
801 | I believe that one difference between the Go program and the Cforall
|
---|
802 | equivalent is that the Goroutine has an associated queue, so that
|
---|
803 | multiple messages could be enqueued, whereas the Cforall equivalent is
|
---|
804 | effectively a "bounded buffer" of length 1. Is that correct?
|
---|
805 |
|
---|
806 | Actually, the buffer length is 0 for the Cforall call and the Go unbuffered
|
---|
807 | send so both are synchronous communication.
|
---|
808 |
|
---|
809 |
|
---|
810 | I think this should be stated explicitly. (Presumably, one could modify the
|
---|
811 | Cforall program to include an explicit vector of queued messages if
|
---|
812 | desired, but you would also be reimplementing the channel abstraction.)
|
---|
813 |
|
---|
814 | Fixed, by adding the following sentences:
|
---|
815 |
|
---|
816 | The different between call and channel send occurs for buffered channels
|
---|
817 | making the send asynchronous. In \CFA, asynchronous call and multiple
|
---|
818 | buffers is provided using an administrator and worker
|
---|
819 | threads~\cite{Gentleman81} and/or futures (not discussed).
|
---|
820 |
|
---|
821 |
|
---|
822 | Also, in Figure 20, I believe that there is a missing `mutex` keyword.
|
---|
823 |
|
---|
824 | Fixed.
|
---|
825 |
|
---|
826 |
|
---|
827 | I was glad to see that the paper acknowledged that Cforall still had
|
---|
828 | low-level atomic operations, even if their use is discouraged in favor of
|
---|
829 | higher-level alternatives.
|
---|
830 |
|
---|
831 | There was never an attempt to not acknowledged that Cforall had low-level
|
---|
832 | atomic operations. The original version of the paper stated:
|
---|
833 |
|
---|
834 | 6.6 Low-level Locks
|
---|
835 | For completeness and efficiency, Cforall provides a standard set of low-level
|
---|
836 | locks: recursive mutex, condition, semaphore, barrier, etc., and atomic
|
---|
837 | instructions: fetchAssign, fetchAdd, testSet, compareSet, etc.
|
---|
838 |
|
---|
839 | and that section is still in the paper. In fact, we use these low-level
|
---|
840 | mechanisms to build all of the high-level concurrency constructs in Cforall.
|
---|
841 |
|
---|
842 |
|
---|
843 | However, I still feel that the conclusion overstates the value of the
|
---|
844 | contribution here when it says that "Cforall high-level race-free monitors
|
---|
845 | and threads provide the core mechanisms for mutual exclusion and
|
---|
846 | synchronization, without the need for volatile and atomics". I feel
|
---|
847 | confident that Java programmers, for example, would be advised to stick
|
---|
848 | with synchronized methods whenever possible, and it seems to me that they
|
---|
849 | offer similar advantages -- but they sometimes wind up using volatiles for
|
---|
850 | performance reasons.
|
---|
851 |
|
---|
852 | I think we are agreeing violently. 99.9% of Java/Cforall/Go/Rust concurrent
|
---|
853 | programs can achieve very good performance without volatile or atomics because
|
---|
854 | the runtime system has already used these low-level capabilities to build an
|
---|
855 | efficient set of high-level concurrency constructs.
|
---|
856 |
|
---|
857 | 0.1% of the time programmers need to build their own locks and synchronization
|
---|
858 | mechanisms. This need also occurs for storage management. Both of these
|
---|
859 | mechanisms are allowed in Cforall but are fraught with danger and should be
|
---|
860 | discouraged. Specially, it takes a 7th dan Black Belt programmer to understand
|
---|
861 | fencing for a WSO memory model, such as on the ARM. It doesn't help that the
|
---|
862 | C++ atomics are baroque and incomprehensible. I'm sure Hans Boehm, Doug Lea,
|
---|
863 | Dave Dice and me would agree that 99% of hand-crafted locks created by
|
---|
864 | programmers are broken and/or non-portable.
|
---|
865 |
|
---|
866 |
|
---|
867 | I was also confused by the term "race-free" in that sentence. In
|
---|
868 | particular, I don't think that Cforall has any mechanisms for preventing
|
---|
869 | *data races*, and it clearly doesn't prevent "race conditions" (which would
|
---|
870 | bar all sorts of useful programs). I suppose that "race free" here might be
|
---|
871 | referring to the improvements such as removing barging behavior.
|
---|
872 |
|
---|
873 | We use the term "race free" to mean the same as Boehm/Adve's "data-race
|
---|
874 | freedom" in
|
---|
875 |
|
---|
876 | Boehm Hans-J., Adve Sarita V., You Don't Know Jack About Shared Variables or
|
---|
877 | Memory Models. Communications ACM. 2012; 55(2):48-54.
|
---|
878 | https://queue.acm.org/detail.cfm?id=2088916
|
---|
879 |
|
---|
880 | which is cited in the paper. Furthermore, we never said that Cforall has
|
---|
881 | mechanisms for preventing *all* data races. We said Cforall high-level
|
---|
882 | race-free monitors and threads[, when used with mutex access function] (added
|
---|
883 | to the paper), implies no data races within these constructs, unless a
|
---|
884 | programmer directly publishes shared state. This approach is exactly what
|
---|
885 | Boehm/Adve advocate for the vast majority of concurrent programming.
|
---|
886 |
|
---|
887 |
|
---|
888 | It would perhaps be more interesting to see a comparison built using [tokio] or
|
---|
889 | [async-std], two of the more prominent user-space threading libraries that
|
---|
890 | build on Rust's async-await feature (which operates quite differently than
|
---|
891 | Javascript's async-await, in that it doesn't cause every aync function call to
|
---|
892 | schedule a distinct task).
|
---|
893 |
|
---|
894 | Done.
|
---|
895 |
|
---|
896 |
|
---|
897 | Several figures used the `with` keyword. I deduced that `with(foo)` permits
|
---|
898 | one to write `bar` instead of `foo.bar`. It seems worth introducing.
|
---|
899 | Apologies if this is stated in the paper, if so I missed it.
|
---|
900 |
|
---|
901 | Page 6, footnote F states:
|
---|
902 |
|
---|
903 | The Cforall "with" clause opens an aggregate scope making its fields directly
|
---|
904 | accessible, like Pascal "with", but using parallel semantics; multiple
|
---|
905 | aggregates may be opened.
|
---|
906 |
|
---|
907 |
|
---|
908 | On page 20, section 6.3, "external scheduling and vice versus" should be
|
---|
909 | "external scheduling and vice versa".
|
---|
910 |
|
---|
911 | Fixed.
|
---|
912 |
|
---|
913 |
|
---|
914 | On page 5, section 2.3, the paper states "we content" but it should be "we
|
---|
915 | contend".
|
---|
916 |
|
---|
917 | Fixed.
|
---|
918 |
|
---|
919 |
|
---|
920 | Page 1. I don't believe that it s fair to imply that Scala is "research
|
---|
921 | vehicle" as it is used by major players, Twitter being the most prominent
|
---|
922 | example.
|
---|
923 |
|
---|
924 | Fixed. Removed "research".
|
---|
925 |
|
---|
926 |
|
---|
927 | Page 15. Must Cforall threads start after construction (e.g. see your
|
---|
928 | example on page 15, line 21)?
|
---|
929 |
|
---|
930 | Yes. Our experience in Java, uC++ and Cforall is that 95% of the time
|
---|
931 | programmers want threads to start immediately. (Most Java programs have no code
|
---|
932 | between a thread declaration and the call to start the thread.) Therefore,
|
---|
933 | this semantic should be the default because (see page 13):
|
---|
934 |
|
---|
935 | Alternatives, such as explicitly starting threads as in Java, are repetitive
|
---|
936 | and forgetting to call start is a common source of errors.
|
---|
937 |
|
---|
938 | To handle the other 5% of the cases, there is a trivial Cforall pattern
|
---|
939 | providing Java-style start/join. The additional cost for this pattern is small
|
---|
940 | in comparison to the work performed between the start and join.
|
---|
941 |
|
---|
942 | thread T {};
|
---|
943 | void start( T & mutex ) {} // any function name
|
---|
944 | void join( T & mutex ) {} // any function name
|
---|
945 | void main( T & t ) {
|
---|
946 | sout | "start";
|
---|
947 | waitfor( start : t ); // wait to be started
|
---|
948 | sout | "restart"; // perform work
|
---|
949 | waitfor( join : t ); // wait to be joined
|
---|
950 | sout | "join";
|
---|
951 | }
|
---|
952 |
|
---|
953 | int main() {
|
---|
954 | T t[3]; // threads start and delay
|
---|
955 | sout | "need to start";
|
---|
956 | for ( i; 3 ) start( t[i] );
|
---|
957 | sout | "need to join";
|
---|
958 | for ( i; 3 ) join( t[i] );
|
---|
959 | sout | "threads stopped";
|
---|
960 | } // threads deleted from stack
|
---|
961 |
|
---|
962 | $ a.out
|
---|
963 | need to start
|
---|
964 | start
|
---|
965 | start
|
---|
966 | start
|
---|
967 | need to join
|
---|
968 | restart
|
---|
969 | restart
|
---|
970 | restart
|
---|
971 | threads stopped
|
---|
972 | join
|
---|
973 | join
|
---|
974 | join
|
---|
975 |
|
---|
976 |
|
---|
977 | Page 18, line 17: is using
|
---|
978 |
|
---|
979 | Fixed.
|
---|