1 | Reviewing: 1 |
---|
2 | |
---|
3 | I still have a couple of issues --- perhaps the largest is that it's |
---|
4 | still not clear at this point in the paper what some of these options |
---|
5 | are, or crucially how they would be used. I don't know if it's |
---|
6 | possible to give high-level examples or use cases to be clear about |
---|
7 | these up front - or if that would duplicate too much information from |
---|
8 | later in the paper - either way expanding out the discussion - even if |
---|
9 | just two a couple of sentences for each row - would help me more. |
---|
10 | |
---|
11 | Section 2.1 is changed to address this suggestion. |
---|
12 | |
---|
13 | |
---|
14 | * 1st para section 2 begs the question: why not support each |
---|
15 | dimension independently, and let the programmer or library designer |
---|
16 | combine features? |
---|
17 | |
---|
18 | As seen in Table 1, not all of the combinations work, and having programmers |
---|
19 | directly use these low-level mechanisms is error prone. Accessing these |
---|
20 | fundamental mechanisms through higher-level constructs has always been the |
---|
21 | purpose of a programming language. |
---|
22 | |
---|
23 | |
---|
24 | * Why must there "be language mechanisms to create, block/unblock, and join |
---|
25 | with a thread"? There aren't in Smalltalk (although there are in the |
---|
26 | runtime). Especially given in Cforall those mechanisms are *implicit* on |
---|
27 | thread creation and destruction? |
---|
28 | |
---|
29 | The best description of Smalltalk concurrency I can find is in J. Hunt, |
---|
30 | Smalltalk and Object Orientation, Springer-Verlag London Limited, 1997, Chapter |
---|
31 | 31 Concurrency in Smalltalk. It states on page 332: |
---|
32 | |
---|
33 | For a process to be spawned from the current process there must be some way |
---|
34 | of creating a new process. This is done using one of four messages to a |
---|
35 | block. These messages are: |
---|
36 | |
---|
37 | aBlock fork: This creates and schedules a process which will execute the |
---|
38 | block. The priority of this process is inherited from the parent process. |
---|
39 | ... |
---|
40 | |
---|
41 | The Semaphore class provides facilities for achieving simple synchronization, |
---|
42 | it is simple because it only allows for two forms of communication signal and |
---|
43 | wait. |
---|
44 | |
---|
45 | Hence, "aBlock fork" creates, "Semaphore" blocks/unblocks (as does message send |
---|
46 | to an aBlock object), and garbage collection of an aBlock joins with its |
---|
47 | thread. The fact that a programmer *implicitly* does "fork", "block"/"unblock", |
---|
48 | and "join", does not change their fundamental requirement. |
---|
49 | |
---|
50 | |
---|
51 | * "Case 1 is a function that borrows storage for its state (stack |
---|
52 | frame/activation) and a thread from its invoker" |
---|
53 | |
---|
54 | this much makes perfect sense to me, but I don't understand how a |
---|
55 | non-stateful, non-threaded function can then retain |
---|
56 | |
---|
57 | "this state across callees, ie, function local-variables are |
---|
58 | retained on the stack across calls." |
---|
59 | |
---|
60 | how can it retain function-local values *across calls* when it |
---|
61 | doesn't have any functional-local state? |
---|
62 | |
---|
63 | In the following example: |
---|
64 | |
---|
65 | void foo() { |
---|
66 | // local variables and code |
---|
67 | } |
---|
68 | void bar() { |
---|
69 | // local variables |
---|
70 | foo(); |
---|
71 | } |
---|
72 | |
---|
73 | bar is the caller and foo is the callee. bar borrows the program stack and |
---|
74 | thread to make the call to foo. When foo, the callee, is executing, bar's local |
---|
75 | variables (state) is retained on the *borrowed* stack across the call. (Note, I |
---|
76 | added *borrowed* to that sentence in the paper to help clarify.) Furthermore, |
---|
77 | foo's local variables are also retain on the borrowed stack. When foo and bar |
---|
78 | return, all of their local state is gone (not retained). This behaviour is |
---|
79 | standard call/return semantics in an imperative language. |
---|
80 | |
---|
81 | |
---|
82 | I'm not sure if I see two separate cases here - roughly equivalent |
---|
83 | to C functions without static storage, and then C functions *with* |
---|
84 | static storage. |
---|
85 | |
---|
86 | Yes, but there is only one instance of the static storage across all |
---|
87 | activations of the C function. For generators and coroutines, each instance has |
---|
88 | its own state, like an object in an OO language. |
---|
89 | |
---|
90 | |
---|
91 | I assumed that was the distinction between cases 1 & 3; but perhaps the |
---|
92 | actual distinction is that 3 has a suspend/resume point, and so the |
---|
93 | "state" in figure 1 is this component of execution state (viz figs 1 & 2), |
---|
94 | not the state representing the cross-call variables? |
---|
95 | |
---|
96 | So case 3 is like an object with the added ability to retain where it was last |
---|
97 | executing. When a generator is resumed, the generator object (structure |
---|
98 | instance) is passed as an explicit reference, and within this object is the |
---|
99 | restart location in the generator's "main". When the generator main executes, |
---|
100 | it uses the borrowed stack for its local variables and any functions it calls, |
---|
101 | just like an object member borrows the stack for its local variables but also |
---|
102 | has an implicit receiver to the object state. A generator can have static |
---|
103 | storage, too, which is a single instance across all generator instances of that |
---|
104 | type, as for static storage in an object type. All the kinds of storage are |
---|
105 | at play with semantics that is virtually the same as in other languages. |
---|
106 | |
---|
107 | |
---|
108 | > but such evaluation isn't appropriate for garbage-collected or JITTed |
---|
109 | > languages like Java or Go. |
---|
110 | For JITTed languages in particular, reporting peak performance needs to |
---|
111 | "warm up" the JIT with a number of iterators before beginning |
---|
112 | measurement. Actually for JIT's its even worse: see Edd Barrett et al |
---|
113 | OOPSLA 2017. |
---|
114 | |
---|
115 | Of our testing languages, only Java is JITTED. To ensure the Java test-programs |
---|
116 | correctly measured the specific feature, we consulted with Dave Dice at Oracle |
---|
117 | who works directly on the development of the Oracle JVM Just-in-Time |
---|
118 | Compiler. We modified our test programs based on his advise, and he validated |
---|
119 | our programs as correctly measuring the specified language feature. Hence, we |
---|
120 | have taken into account all issues related to performing benchmarks in JITTED |
---|
121 | languages. Dave's help is recognized in the Acknowledgment section. Also, all |
---|
122 | the benchmark programs are publicly available for independent verification. |
---|
123 | |
---|
124 | |
---|
125 | * footnote A - I've looked at various other papers & the website to try to |
---|
126 | understand how "object-oriented" Cforall is - I'm still not sure. This |
---|
127 | footnote says Cforall has "virtuals" - presumably virtual functions, |
---|
128 | i.e. dynamic dispatch - and inheritance: that really is OO as far as I |
---|
129 | (and most OO people) are concerned. For example Haskell doesn't have |
---|
130 | inheritance, so it's not OO; while CLOS (the Common Lisp *Object* System) |
---|
131 | or things like Cecil and Dylan are considered OO even though they have |
---|
132 | "multiple function parameters as receivers", lack "lexical binding between |
---|
133 | a structure and set of functions", and don't have explicit receiver |
---|
134 | invocation syntax. Python has receiver syntax, but unlike Java or |
---|
135 | Smalltalk or C++, method declarations still need to have an explicit |
---|
136 | "self" receiver parameter. Seems to me that Go, for example, is |
---|
137 | more-or-less OO with interfaces, methods, and dynamic dispatch (yes also |
---|
138 | and an explicit receiver syntax but that's not determinative); while Rust |
---|
139 | lacks dynamic dispatch built-in. C is not OO as a language, but as you |
---|
140 | say given it supports function pointers with structures, it does support |
---|
141 | an OO programming style. |
---|
142 | |
---|
143 | This is why I again recommend just not buying into this fight: not making |
---|
144 | any claims about whether Cforall is OO or is not - because as I see it, |
---|
145 | the rest of the paper doesn't depend on whether Cforall is OO or not. |
---|
146 | That said: this is just a recommendation, and I won't quibble over this |
---|
147 | any further. |
---|
148 | |
---|
149 | We believe it is important to identify Cforall as a non-OO language because it |
---|
150 | heavily influences the syntax and semantics used to build its concurrency. |
---|
151 | Since many aspects of Cforall are not OO, the rest of the paper *does* depend |
---|
152 | on Cforall being identified as non-OO, otherwise readers would have |
---|
153 | significantly different expectations for the design. We believe your definition |
---|
154 | of OO is too broad, such as including C. Just because a programming language |
---|
155 | can support aspects of the OO programming style, does not make it OO. (Just |
---|
156 | because a duck can swim does not make it a fish.) |
---|
157 | |
---|
158 | Our definition of non-OO follows directly from the Wikipedia entry: |
---|
159 | |
---|
160 | Object-oriented programming (OOP) is a programming paradigm based on the |
---|
161 | concept of "objects", which can contain data, in the form of fields (often |
---|
162 | known as attributes or properties), and code, in the form of procedures |
---|
163 | (often known as methods). A feature of objects is an object's procedures that |
---|
164 | can access and often modify the data fields of the object with which they are |
---|
165 | associated (objects have a notion of "this" or "self"). |
---|
166 | https://en.wikipedia.org/wiki/Object-oriented_programming |
---|
167 | |
---|
168 | Cforall fails this definition as code cannot appear in an "object" and there is |
---|
169 | no implicit receiver. As well, Cforall, Go, and Rust do not have nominal |
---|
170 | inheritance and they not considered OO languages, e.g.: |
---|
171 | |
---|
172 | "**Is Go an object-oriented language?** Yes and no. Although Go has types and |
---|
173 | methods and allows an object-oriented style of programming, there is no type |
---|
174 | hierarchy. The concept of "interface" in Go provides a different approach |
---|
175 | that we believe is easy to use and in some ways more general. There are also |
---|
176 | ways to embed types in other types to provide something analogous-but not |
---|
177 | identical-to subclassing. Moreover, methods in Go are more general than in |
---|
178 | C++ or Java: they can be defined for any sort of data, even built-in types |
---|
179 | such as plain, "unboxed" integers. They are not restricted to structs (classes). |
---|
180 | https://golang.org/doc/faq#Is_Go_an_object-oriented_language |
---|
181 | |
---|
182 | |
---|
183 | * is a "monitor function" the same as a "mutex function"? |
---|
184 | if so the paper should pick one term; if not, make the distinction clear. |
---|
185 | |
---|
186 | Fixed. Picked "mutex". Changed the language and all places in the paper. |
---|
187 | |
---|
188 | |
---|
189 | * "As stated on line 1 because state declarations from the generator |
---|
190 | type can be moved out of the coroutine type into the coroutine main" |
---|
191 | |
---|
192 | OK sure, but again: *why* would a programmer want to do that? |
---|
193 | (Other than, I guess, to show the difference between coroutines & |
---|
194 | generators?) Perhaps another way to put this is that the first |
---|
195 | para of 3.2 gives the disadvantages of coroutines vs-a-vs |
---|
196 | generators, briefly describes the extended semantics, but never |
---|
197 | actually says why a programmer may want those extended semantics, |
---|
198 | or how they would benefit. I don't mean to belabour the point, |
---|
199 | but (generalist?) readers like me would generally benefit from |
---|
200 | those kinds of discussions about each feature throughout the |
---|
201 | paper: why might a programmer want to use them? |
---|
202 | |
---|
203 | On page 8, it states: |
---|
204 | |
---|
205 | Having to manually create the generator closure by moving local-state |
---|
206 | variables into the generator type is an additional programmer burden (removed |
---|
207 | by the coroutine in Section 3.2). ... |
---|
208 | |
---|
209 | also these variables can now be refactored into helper function, where the |
---|
210 | helper function suspends at arbitrary call depth. So imagine a coroutine helper |
---|
211 | function that is only called occasionally within the coroutine but it has a |
---|
212 | large array that is retained across suspends within the helper function. For a |
---|
213 | generator, this large array has to be declared in the generator type enlarging |
---|
214 | each generator instance even through the array is only used occasionally. |
---|
215 | Whereas, the coroutine only needs the array allocated when needed. Now a |
---|
216 | coroutine has a stack which occupies storage, but the maximum stack size only |
---|
217 | needs to be the call chain allocating the most storage, where as the generator |
---|
218 | has a maximum size of all variable that could be created. |
---|
219 | |
---|
220 | |
---|
221 | > p17 if the multiple-monitor entry procedure really is novel, write a paper |
---|
222 | > about that, and only about that. |
---|
223 | > We do not believe this is a practical suggestion. |
---|
224 | * I'm honestly not trying to be snide here: I'm not an expert on monitor or |
---|
225 | concurrent implementations. Brinch Hansen's original monitors were single |
---|
226 | acquire; this draft does not cite any other previous work that I could |
---|
227 | see. I'm not suggesting that the brief mention of this mechanism |
---|
228 | necessarily be removed from this paper, but if this is novel (and a clear |
---|
229 | advance over a classical OO monitor a-la Java which only acquires the |
---|
230 | distinguished receiver) then that would be worth another paper in itself. |
---|
231 | |
---|
232 | First, to explain multiple-monitor entry in Cforall as a separate paper would |
---|
233 | require significant background on Cforall concurrency, which means repeating |
---|
234 | large sections of this paper. Second, it feels like a paper just on |
---|
235 | multiple-monitor entry would be a little thin, even if the capability is novel |
---|
236 | to the best of our knowledge. Third, we feel multiple-monitor entry springs |
---|
237 | naturally from the overarching tone in the paper that Cforall is a non-OO |
---|
238 | programming language, allowing multiple-mutex receivers. |
---|
239 | |
---|
240 | |
---|
241 | My typo: the paper's conclusion should come at the end, after the |
---|
242 | future work section. |
---|
243 | |
---|
244 | Combined into a Conclusions and Future Work section. |
---|
245 | |
---|
246 | |
---|
247 | |
---|
248 | Reviewing: 2 |
---|
249 | |
---|
250 | on the Boehm paper and whether code is "all sequential to the compiler": I |
---|
251 | now understand the authors' position better and suspect we are in violent |
---|
252 | agreement, except for whether it's appropriate to use the rather breezy |
---|
253 | phrase "all sequential to the compiler". It would be straightforward to |
---|
254 | clarify that code not using the atomics features is optimized *as if* it |
---|
255 | were sequential, i.e. on the assumption of a lack of data races. |
---|
256 | |
---|
257 | Fixed, "as inline and library code is compiled as sequential without any |
---|
258 | explicit concurrent directive." |
---|
259 | |
---|
260 | |
---|
261 | on the distinction between "mutual exclusion" and "synchronization": the |
---|
262 | added citation does help, in that it makes a coherent case for the |
---|
263 | definition the authors prefer. However, the text could usefully clarify |
---|
264 | that this is a matter of definition not of fact, given especially that in |
---|
265 | my assessment the authors' preferred definition is not the most common |
---|
266 | one. (Although the mention of Hoare's apparent use of this definition is |
---|
267 | one data point, countervailing ones are found in many contemporaneous or |
---|
268 | later papers, e.g. Habermann's 1972 "Synchronization of Communicating |
---|
269 | Processes" (CACM 15(3)), Reed & Kanodia's 1979 "Synchronization with |
---|
270 | eventcounts and sequencers" (CACM (22(2)) and so on.) |
---|
271 | |
---|
272 | Fixed, "We contend these two properties are independent, ...". |
---|
273 | |
---|
274 | With respect to the two papers, Habermann fundamentally agrees with our |
---|
275 | definitions, where the term mutual exclusion is the same but Habermann uses the |
---|
276 | term "communication" for our synchronization. However, the term "communication" |
---|
277 | is rarely used to mean synchronization. The fact that Habermann collectively |
---|
278 | calls these two mechanisms synchronization is the confusion. Reed & Kanodia |
---|
279 | state: |
---|
280 | |
---|
281 | By mutual exclusion we mean any mechanism that forces the time ordering of |
---|
282 | execution of pieces of code, called critical regions, in a system of |
---|
283 | concurrent processes to be a total ordering. |
---|
284 | |
---|
285 | But there is no timing order for a critical region (which I assume means |
---|
286 | critical section); threads can arrive at any time in any order, where the |
---|
287 | mutual exclusion for the critical section ensures one thread executes at a |
---|
288 | time. Interestingly, Reed & Kanodia's mutual exclusion is Habermann's |
---|
289 | communication not mutual exclusion. These papers only buttress our contention |
---|
290 | about the confusion of these terms in the literature. |
---|
291 | |
---|
292 | |
---|
293 | section 2 (an expanded version of what was previously section 5.9) lacks |
---|
294 | examples and is generally obscure and allusory ("the most advanced feature" |
---|
295 | -- name it! "in triplets" -- there is only one triplet!; |
---|
296 | |
---|
297 | Fixed. |
---|
298 | |
---|
299 | These independent properties can be used to compose different language |
---|
300 | features, forming a compositional hierarchy, where the combination of all |
---|
301 | three is the most advanced feature, called a thread/task/process. While it is |
---|
302 | possible for a language to only provide threads for composing |
---|
303 | programs~\cite{Hermes90}, this unnecessarily complicates and makes inefficient |
---|
304 | solutions to certain classes of problems. |
---|
305 | |
---|
306 | |
---|
307 | what are "execution locations"? "initialize" and "de-initialize" |
---|
308 | |
---|
309 | Fixed through an example at the end of the sentence. |
---|
310 | |
---|
311 | \item[\newterm{execution state}:] is the state information needed by a |
---|
312 | control-flow feature to initialize, manage compute data and execution |
---|
313 | location(s), and de-initialize, \eg calling a function initializes a stack |
---|
314 | frame including contained objects with constructors, manages local data in |
---|
315 | blocks and return locations during calls, and de-initializes the frame by |
---|
316 | running any object destructors and management operations. |
---|
317 | |
---|
318 | |
---|
319 | what? "borrowed from the invoker" is a concept in need of explaining or at |
---|
320 | least a fully explained example -- in what sense does a plain function |
---|
321 | borrow" its stack frame? |
---|
322 | |
---|
323 | A function has no storage except for static storage; it gets its storage from |
---|
324 | the program stack. For a function to run it must borrow storage from somewhere. |
---|
325 | When the function returns, the borrowed storage is returned. Do you have a more |
---|
326 | appropriate word? |
---|
327 | |
---|
328 | "computation only" as opposed to what? |
---|
329 | |
---|
330 | This is a term in concurrency for operations that compute without blocking, |
---|
331 | i.e., the operation starts with everything it needs to compute its result and |
---|
332 | runs to completion, blocking only when it is done and returns its result. |
---|
333 | |
---|
334 | |
---|
335 | in 2.2, in what way is a "request" fundamental to "synchronization"? |
---|
336 | |
---|
337 | I assume you are referring to the last bullet. |
---|
338 | |
---|
339 | Synchronization must be able to control the service order of requests |
---|
340 | including prioritizing selection from different kinds of outstanding requests, |
---|
341 | and postponing a request for an unspecified time while continuing to accept |
---|
342 | new requests. |
---|
343 | |
---|
344 | Habermann states on page 173 |
---|
345 | |
---|
346 | Looking at deposit we see that sender and receiver should be synchronized with |
---|
347 | respect to buffer overflow: deposit of another message must be delayed if |
---|
348 | there is no empty message frame, and such delay should be removed when the |
---|
349 | first empty frame becomes available. This is programmed as |
---|
350 | synchronization(frame) : deposit is preceded by wait(frame), accept is |
---|
351 | followed by signal( frame), and the constant C[frame] is set equal to |
---|
352 | "bufsize." |
---|
353 | |
---|
354 | Here synchronization is controlling the service order of requests: when the |
---|
355 | buffer is full, requests for insert (sender) are postponed until a request to |
---|
356 | remove (receiver) occurs. Without the ability to control service order among |
---|
357 | requests, the producer/consumer problem with a bounded buffer cannot be |
---|
358 | solved. Hence, this capability is fundamental. |
---|
359 | |
---|
360 | |
---|
361 | and the "implicitly" versus "explicitly" point needs stating as elsewhere, |
---|
362 | with a concrete example e.g. Java built-in mutexes versus |
---|
363 | java.util.concurrent). |
---|
364 | |
---|
365 | Fixed. |
---|
366 | |
---|
367 | MES must be available implicitly in language constructs, \eg Java built-in |
---|
368 | monitors, as well as explicitly for specialized requirements, \eg |
---|
369 | @java.util.concurrent@, because requiring programmers to build MES using |
---|
370 | low-level locks often leads to incorrect programs. |
---|
371 | |
---|
372 | |
---|
373 | section 6: 6.2 omits the most important facts in preference for otherwise |
---|
374 | inscrutable detail: "identify the kind of parameter" (first say *that there |
---|
375 | are* kinds of parameter, and what "kinds" means!); "mutex parameters are |
---|
376 | documentation" is misleading (they are also semantically significant!) and |
---|
377 | fails to say *what* they mean; the most important thing is surely that |
---|
378 | 'mutex' is a language feature for performing lock/unlock operations at |
---|
379 | function entry/exit. So say it! |
---|
380 | |
---|
381 | These sections have been rewritten address the comments. |
---|
382 | |
---|
383 | |
---|
384 | The meanings of examples f3 and f4 remain unclear. |
---|
385 | |
---|
386 | Rewrote the paragraph. |
---|
387 | |
---|
388 | |
---|
389 | Meanwhile in 6.3, "urgent" is not introduced (we are supposed to infer its |
---|
390 | meaning from Figure 12, |
---|
391 | |
---|
392 | Defined Hoare's urgent list at the start of the paragraph. |
---|
393 | |
---|
394 | |
---|
395 | but that Figure is incomprehensible to me), and we are told of "external |
---|
396 | scheduling"'s long history in Ada but not clearly what it actually means; |
---|
397 | |
---|
398 | We do not know how to address your comment because the description is clear to |
---|
399 | us and other non-reviewers who have read the paper. I had forgotten an |
---|
400 | important citation to prior work on this topic, which is now referenced: |
---|
401 | |
---|
402 | Buhr Peter A., Fortier Michel, Coffin Michael H.. Monitor |
---|
403 | Classification. ACM Computing Surveys. 1995; 27(1):63-107. |
---|
404 | |
---|
405 | This citation is a 45 page paper expanding on the topic of internal scheduling. |
---|
406 | I also added a citation to Hoare's monitor paper where signal_block is defined: |
---|
407 | |
---|
408 | When a process signals a condition on which another process is waiting, the |
---|
409 | signalling process must wait until the resumed process permits it to |
---|
410 | proceed. |
---|
411 | |
---|
412 | External scheduling from Ada was subsequently added to uC++ and Cforall. |
---|
413 | Furthermore, Figure 20 shows a direct comparison of CFA procedure-call |
---|
414 | external-scheduling with Go channel external-scheduling. So external scheduling |
---|
415 | exists in languages beyond Ada. |
---|
416 | |
---|
417 | |
---|
418 | 6.4's description of "waitfor" tells us it is different from an if-else |
---|
419 | chain but tries to use two *different* inputs to tell us that the behavior |
---|
420 | is different; tell us an instance where *the same* values of C1 and C2 give |
---|
421 | different behavior (I even wrote out a truth table and still don't see the |
---|
422 | semantic difference) |
---|
423 | |
---|
424 | Again, it is unclear what is the problem. For the if-statement, if C1 is true, |
---|
425 | it only waits for a call to mem1, even if C2 is true and there is an |
---|
426 | outstanding call to mem2. For the waitfor, if both C1 and C2 are true and there |
---|
427 | is a call to mem2 but not mem1, it accepts the call to mem2, which cannot |
---|
428 | happen with the if-statement. So for all true when clauses, any outstanding |
---|
429 | call is immediately accepted. If there are no outstanding calls, the waitfor |
---|
430 | blocks until the next call to any of the true when clauses occurs. I added the |
---|
431 | following sentence to further clarify. |
---|
432 | |
---|
433 | Hence, the @waitfor@ has parallel semantics, accepting any true @when@ clause. |
---|
434 | |
---|
435 | Note, the same parallel semantics exists for the Go select statement with |
---|
436 | respect to waiting for a set of channels to receive data. While Go select does |
---|
437 | not have a when clause, it would be trivial to add it, making the select more |
---|
438 | expressive. |
---|
439 | |
---|
440 | |
---|
441 | The authors frequently use bracketed phrases, and sometimes slashes "/", in |
---|
442 | ways that are confusing and/or detrimental to readability. Page 13 line |
---|
443 | 2's "forward (backward)" is one particularly egregious example. In general |
---|
444 | I would recommend the the authors try to limit their use of parentheses and |
---|
445 | slashes as a means of forcing a clearer wording to emerge. |
---|
446 | |
---|
447 | Many of the slashes and parentheticals have been removed. Some are retained to |
---|
448 | express joined concepts: call/return, suspend/resume, resume/resume, I/O. |
---|
449 | |
---|
450 | |
---|
451 | Also, the use of "eg." is often cursory and does not explain the examples |
---|
452 | given, which are frequently a one- or two-word phrase of unclear referent. |
---|
453 | |
---|
454 | A few of these are fixed. |
---|
455 | |
---|
456 | |
---|
457 | Considering the revision more broadly, none of the more extensive or |
---|
458 | creative rewrites I suggested in my previous review have been attempted, |
---|
459 | nor any equivalent efforts to improve its readability. |
---|
460 | |
---|
461 | If you reread the previous response, we addressed all of your suggestions except |
---|
462 | |
---|
463 | An expositional idea occurs: start the paper with a strawman |
---|
464 | naive/limited realisation of coroutines -- say, Simon Tatham's popular |
---|
465 | "Coroutines in C" web page -- and identify point by point what the |
---|
466 | limitations are and how C\/ overcomes them. Currently the presentation |
---|
467 | is often flat (lacking motivating contrasts) and backwards (stating |
---|
468 | solutions before problems). The foregoing approach might fix both of |
---|
469 | these. |
---|
470 | |
---|
471 | We prefer the current structure of our paper and believe the paper does |
---|
472 | explain basic coding limitations and how they are overcome in using |
---|
473 | high-level control-floe mechanisms. |
---|
474 | |
---|
475 | and we have addressed readability issues in this version. |
---|
476 | |
---|
477 | |
---|
478 | The hoisting of the former section 5.9 is a good idea, but the newly added |
---|
479 | material accompanying it (around Table 1) suffers fresh deficiencies in |
---|
480 | clarity. Overall the paper is longer than before, even though (as my |
---|
481 | previous review stated), I believe a shorter paper is required in order to |
---|
482 | serve the likely purpose of publication. (Indeed, the authors' letter |
---|
483 | implies that a key goal of publication is to build community and gain |
---|
484 | external users.) |
---|
485 | |
---|
486 | This comment is the referee's opinion, which we do not agree with it. Our |
---|
487 | previous 35 page SP&E paper on Cforall: |
---|
488 | |
---|
489 | Moss Aaron, Schluntz Robert, Buhr Peter A.. Cforall: Adding Modern |
---|
490 | Programming Language Features to C. Softw. Pract. Exper. 2018; |
---|
491 | 48(12):2111-2146. |
---|
492 | |
---|
493 | has a similar structure and style to this paper, and it received an award from |
---|
494 | John Wiley & Sons: |
---|
495 | |
---|
496 | Software: Practice & Experience for articles published between January 2017 |
---|
497 | and December 2018, "most downloads in the 12 months following online |
---|
498 | publication showing the work generated immediate impact and visibility, |
---|
499 | contributing significantly to advancement in the field". |
---|
500 | |
---|
501 | So we have demonstrated an ability to build community and gain external users. |
---|
502 | |
---|
503 | |
---|
504 | Given this trajectory, I no longer see a path to an acceptable revision of |
---|
505 | the present submission. Instead I suggest the authors consider splitting |
---|
506 | the paper in two: one half about coroutines and stack management, the other |
---|
507 | about mutexes, monitors and the runtime. (A briefer presentation of the |
---|
508 | runtime may be helpful in the first paper also, and a brief recap of the |
---|
509 | generator and coroutine support is obviously needed in the second too.) |
---|
510 | |
---|
511 | Again we disagree with the referee's suggestion to vastly restructure the |
---|
512 | paper. What advantage is there is presenting exactly the same material across |
---|
513 | two papers, which will likely end up longer than a single paper? The current |
---|
514 | paper has a clear theme that fundamental execution properties generate a set of |
---|
515 | basic language mechanisms, and we then proceed to show how these mechanisms can |
---|
516 | be designed into the programing language Cforall. |
---|
517 | |
---|
518 | |
---|
519 | I do not buy the authors' defense of the limited practical experience or |
---|
520 | "non-micro" benchmarking presented. Yes, gaining external users is hard and |
---|
521 | I am sympathetic on that point. But building something at least *somewhat* |
---|
522 | substantial with your own system should be within reach, and without it the |
---|
523 | "practice and experience" aspects of the work have not been explored. |
---|
524 | Clearly C\/ is the product of a lot of work over an extended period, so it |
---|
525 | is a surprise that no such experience is readily available for inclusion. |
---|
526 | |
---|
527 | Understood. There are no agreed-upon concurrency benchmarks, which is why |
---|
528 | micro-benchmarks are often used. Currently, the entire Cforall runtime is |
---|
529 | written in Cforall (10,000+ lines of code (LOC)). This runtime is designed to |
---|
530 | be thread safe, automatically detects the use of concurrency features at link |
---|
531 | time, and bootstraps into a threaded runtime almost immediately at program |
---|
532 | startup so threads can be declared as global variables and may run to |
---|
533 | completion before the program main starts. The concurrent core of the runtime |
---|
534 | is 3,500 LOC and bootstraps from low-level atomic primitives into Cforall locks |
---|
535 | and high-level features. In other words, the concurrent core uses itself as |
---|
536 | quickly as possible to start using high-level concurrency. There are 12,000+ |
---|
537 | LOC in the Cforall tests-suite used to verify language features, which are run |
---|
538 | nightly. Of theses, there are 2000+ LOC running standard concurrent tests, such |
---|
539 | as aggressively testing each language feature, and classical examples such as |
---|
540 | bounded buffer, dating service, matrix summation, quickSort, binary insertion |
---|
541 | sort, etc. More experience will be available soon, based on ongoing work in |
---|
542 | the "future works" section. Specifically, non-blocking I/O is working with the |
---|
543 | new Linux io_uring interface and a new high-performance ready-queue is under |
---|
544 | construction to take into account this change. With non-blocking I/O, it will |
---|
545 | be possible to write applications like high-performance web servers, as is now |
---|
546 | done in Rust and Go. Also, completed is Java-style executors for work-based |
---|
547 | concurrent programming and futures. Under construction is a high-performance |
---|
548 | actor system. |
---|
549 | |
---|
550 | |
---|
551 | It does not seem right to state that a stack is essential to Von Neumann |
---|
552 | architectures -- since the earliest Von Neumann machines (and indeed early |
---|
553 | Fortran) did not use one. |
---|
554 | |
---|
555 | Reference Manual Fortran II for the IBM 704 Data Processing System, 1958 IBM, page 2 |
---|
556 | https://archive.computerhistory.org/resources/text/Fortran/102653989.05.01.acc.pdf |
---|
557 | |
---|
558 | Since a subprogram may call for other subprograms to any desired depth, a |
---|
559 | particular CALL statement may be defined by a pyramid of multi-level |
---|
560 | subprograms. |
---|
561 | |
---|
562 | I think we may be differing on the meaning of stack. You may be imagining a |
---|
563 | modern stack that grows and shrink dynamically. Whereas early Fortran |
---|
564 | preallocated a stack frame for each function, like Python allocates a frame for |
---|
565 | a generator. Within each preallocated Fortran function is a frame for local |
---|
566 | variables and a pointer to store the return value for a call. The Fortran |
---|
567 | call/return mechanism than uses these frames to build a traditional call stack |
---|
568 | linked by the return pointer. The only restriction is that a function stack |
---|
569 | frame can only be used once, implying no direct or indirect recursion. Hence, |
---|
570 | without a stack mechanism, there can be no call/return to "any desired depth", |
---|
571 | where the maximum desired depth is limited by the number of functions. So |
---|
572 | call/return requires some form of a stack, virtually all programming language |
---|
573 | have call/return, past and present, and these languages run on Von Neumann |
---|
574 | machines that do not distinguish between program and memory space, have mutable |
---|
575 | state, and the concept of a pointer to data or code. |
---|
576 | |
---|
577 | |
---|
578 | To elaborate on something another reviewer commented on: it is a surprise |
---|
579 | to find a "Future work" section *after* the "Conclusion" section. A |
---|
580 | "Conclusions and future work" section often works well. |
---|
581 | |
---|
582 | Done. |
---|
583 | |
---|
584 | |
---|
585 | |
---|
586 | Reviewing: 3 |
---|
587 | |
---|
588 | but it remains really difficult to have a good sense of which idea I should |
---|
589 | use and when. This applies in different ways to different features from the |
---|
590 | language: |
---|
591 | |
---|
592 | * coroutines/generators/threads: here there is some discussion, but it can |
---|
593 | be improved. |
---|
594 | * interal/external scheduling: I didn't find any direct comparison between |
---|
595 | these features, except by way of example. |
---|
596 | |
---|
597 | See changes below. |
---|
598 | |
---|
599 | |
---|
600 | I would have preferred something more like a table or a few paragraphs |
---|
601 | highlighting the key reasons one would pick one construct or the other. |
---|
602 | |
---|
603 | Section 2.1 is changed to address this suggestion. |
---|
604 | |
---|
605 | |
---|
606 | The discussion of clusters and pre-emption in particular feels quite rushed. |
---|
607 | |
---|
608 | We believe a brief introduction to the Cforall runtime structure is important |
---|
609 | because clustering within a user-level versus distributed system is unusual, |
---|
610 | Furthermore, the explanation of preemption is important because several new |
---|
611 | languages, like Go and Rust tokio, are not preemptive. Rust threads are |
---|
612 | preemptive only because it uses kernel threads, which UNIX preempts. |
---|
613 | |
---|
614 | |
---|
615 | * Recommend to shorten the comparison on coroutine/generator/threads in |
---|
616 | Section 2 to a paragraph with a few examples, or possibly a table |
---|
617 | explaining the trade-offs between the constructs |
---|
618 | |
---|
619 | Not done, see below. |
---|
620 | |
---|
621 | |
---|
622 | * Recommend to clarify the relationship between internal/external |
---|
623 | scheduling -- is one more general but more error-prone or low-level? |
---|
624 | |
---|
625 | Done, see below. |
---|
626 | |
---|
627 | |
---|
628 | There is obviously a lot of overlap between these features, and in |
---|
629 | particular between coroutines and generators. As noted in the previous |
---|
630 | review, many languages have chosen to offer *only* generators, and to |
---|
631 | build coroutines by stacks of generators invoking one another. |
---|
632 | |
---|
633 | As we point out, coroutines built from stacks of generators have problems, such |
---|
634 | as no symmetric control-flow. Furthermore, stacks of generators have a problem |
---|
635 | with the following programming pattern. logger is a library function called |
---|
636 | from a function or a coroutine, where the doit for the coroutine suspends. With |
---|
637 | stacks of generators, there has to be a function and generator version of |
---|
638 | logger to support these two scenarios. If logger is a library function, it may |
---|
639 | be impossible to create the generator logger because the logger function is |
---|
640 | opaque. |
---|
641 | |
---|
642 | #include <fstream.hfa> |
---|
643 | #include <coroutine.hfa> |
---|
644 | |
---|
645 | forall( otype T | { void doit( T ); } ) |
---|
646 | void logger( T & t ) { |
---|
647 | doit( t ); |
---|
648 | } |
---|
649 | |
---|
650 | coroutine C {}; |
---|
651 | void main( C & c ) with( c ) { |
---|
652 | void doit( C & c ) { suspend; } |
---|
653 | logger( c ); |
---|
654 | } |
---|
655 | void mem( C & c ) { |
---|
656 | resume( c ); |
---|
657 | } |
---|
658 | |
---|
659 | int main() { |
---|
660 | C c; |
---|
661 | mem( c ); |
---|
662 | mem( c ); |
---|
663 | |
---|
664 | struct S {}; |
---|
665 | S s; |
---|
666 | void doit( S & s ) {} |
---|
667 | logger( s ); |
---|
668 | } |
---|
669 | |
---|
670 | |
---|
671 | |
---|
672 | In fact, the end of Section 2.1 (on page 5) contains a particular paragraph |
---|
673 | that embodies this "top down" approach. It starts, "programmers can now |
---|
674 | answer three basic questions", and thus gives some practical advice for |
---|
675 | which construct you should use and when. I think giving some examples of |
---|
676 | specific applications that this paragraph, combined with some examples of |
---|
677 | cases where each construct was needed, would be a better approach. |
---|
678 | |
---|
679 | I don't think this comparison needs to be very long. It seems clear enough |
---|
680 | that one would |
---|
681 | |
---|
682 | * prefer generators for simple computations that yield up many values, |
---|
683 | |
---|
684 | This description does not cover output or matching generators that do not yield |
---|
685 | many or any values. For example, the output generator Fmt yields no value; the |
---|
686 | device driver yields a value occasionally once a message is found. Furthermore, |
---|
687 | real device drivers are not simple; there can have hundreds of states and |
---|
688 | transitions. Imagine the complex event-engine for a web-server written as a |
---|
689 | generator. |
---|
690 | |
---|
691 | * prefer coroutines for more complex processes that have significant |
---|
692 | internal structure, |
---|
693 | |
---|
694 | As for generators, complexity is not the criterion for selection. A coroutine |
---|
695 | brings generality to the implementation because of the addition stack, whereas |
---|
696 | generators have restrictions on standard software-engining practises: variable |
---|
697 | placement, no helper functions without creating an explicit generator stack, |
---|
698 | and no symmetric control-flow. Whereas, the producer/consumer example in Figure |
---|
699 | 7 uses stack variable placement, helpers, and simple ping/pong-style symmetric |
---|
700 | control-flow. |
---|
701 | |
---|
702 | * prefer threads for cases where parallel execution is desired or needed. |
---|
703 | |
---|
704 | Agreed, but this description does not mention mutual exclusion and |
---|
705 | synchronization, which is essential in any meaningful concurrent program. |
---|
706 | |
---|
707 | Our point here is to illustrate that a casual "top down" explanation is |
---|
708 | insufficient to explain the complexity of the underlying execution properties. |
---|
709 | We presented some rule-of-thumbs at the end of Section 2 but programmers must |
---|
710 | understand all the underlying mechanisms and their interactions to exploit the |
---|
711 | execution properties to their fullest, and to understand when a programming |
---|
712 | language does or does not provide a desired mechanism. |
---|
713 | |
---|
714 | |
---|
715 | I did appreciate the comparison in Section 2.3 between async-await in |
---|
716 | JS/Java and generators/coroutines. I agree with its premise that those |
---|
717 | mechanisms are a poor replacement for generators (and, indeed, JS has a |
---|
718 | distinct generator mechanism, for example, in part for this reason). I |
---|
719 | believe I may have asked for this in a previous review, but having read it, |
---|
720 | I wonder if it is really necessary, since those mechanisms are so different |
---|
721 | in purpose. |
---|
722 | |
---|
723 | Given that you asked about this it before, I believe other readers might also |
---|
724 | ask the same question because async-await is very popular. So I think this |
---|
725 | section does help to position the work in the paper among other work, and |
---|
726 | hence, it is appropriate to keep it in the paper. |
---|
727 | |
---|
728 | |
---|
729 | I find the motivation for supporting both internal and external scheduling |
---|
730 | to be fairly implicit. After several reads through the section, I came to |
---|
731 | the conclusion that internal scheduling is more expressive than external |
---|
732 | scheduling, but sometimes less convenient or clear. Is this correct? If |
---|
733 | not, it'd be useful to clarify where external scheduling is more |
---|
734 | expressive. |
---|
735 | |
---|
736 | I would find it very interesting to try and capture some of the properties |
---|
737 | that make internal vs external scheduling the better choice. |
---|
738 | |
---|
739 | For example, it seems to me that external scheduling works well if there |
---|
740 | are only a few "key" operations, but that internal scheduling might be |
---|
741 | better otherwise, simply because it would be useful to have the ability to |
---|
742 | name a signal that can be referenced by many methods. |
---|
743 | |
---|
744 | To address this point, the last paragraph on page 22 (now page 23) has been |
---|
745 | augmented to the following: |
---|
746 | |
---|
747 | Given external and internal scheduling, what guidelines can a programmer use |
---|
748 | to select between them? In general, external scheduling is easier to |
---|
749 | understand and code because only the next logical action (mutex function(s)) |
---|
750 | is stated, and the monitor implicitly handles all the details. Therefore, |
---|
751 | there are no condition variables, and hence, no wait and signal, which reduces |
---|
752 | coding complexity and synchronization errors. If external scheduling is |
---|
753 | simpler than internal, why not use it all the time? Unfortunately, external |
---|
754 | scheduling cannot be used if: scheduling depends on parameter value(s) or |
---|
755 | scheduling must block across an unknown series of calls on a condition |
---|
756 | variable (\ie internal scheduling). For example, the dating service cannot be |
---|
757 | written using external scheduling. First, scheduling requires knowledge of |
---|
758 | calling parameters to make matching decisions and parameters of calling |
---|
759 | threads are unavailable within the monitor. Specifically, a thread within the |
---|
760 | monitor cannot examine the @ccode@ of threads waiting on the calling queue to |
---|
761 | determine if there is a matching partner. (Similarly, if the bounded buffer |
---|
762 | or readers/writer are restructured with a single interface function with a |
---|
763 | parameter denoting producer/consumer or reader/write, they cannot be solved |
---|
764 | with external scheduling.) Second, a scheduling decision may be delayed |
---|
765 | across an unknown number of calls when there is no immediate match so the |
---|
766 | thread in the monitor must block on a condition. Specifically, if a thread |
---|
767 | determines there is no opposite calling thread with the same @ccode@, it must |
---|
768 | wait an unknown period until a matching thread arrives. For complex |
---|
769 | synchronization, both external and internal scheduling can be used to take |
---|
770 | advantage of best of properties of each. |
---|
771 | |
---|
772 | |
---|
773 | Consider the bounded buffer from Figure 13: if it had multiple methods for |
---|
774 | removing elements, and not just `remove`, then the `waitfor(remove)` call |
---|
775 | in `insert` might not be sufficient. |
---|
776 | |
---|
777 | Section 6.4 Extended waitfor shows the waitfor is very powerful and can handle |
---|
778 | your request: |
---|
779 | |
---|
780 | waitfor( remove : buffer ); or waitfor( remove2 : buffer ); |
---|
781 | |
---|
782 | and its shorthand form (not shown in the paper) |
---|
783 | |
---|
784 | waitfor( remove, remove2 : t ); |
---|
785 | |
---|
786 | A call to one these remove functions satisfies the waitfor (exact selection |
---|
787 | details are discussed in Section 6.4). |
---|
788 | |
---|
789 | |
---|
790 | The same is true, I think, of the `signal_block` function, which I |
---|
791 | have not encountered before; |
---|
792 | |
---|
793 | In Tony Hoare's seminal paper on Monitors "Monitors: An Operating System |
---|
794 | Structuring Concept", it states on page 551: |
---|
795 | |
---|
796 | When a process signals a condition on which another process is waiting, the |
---|
797 | signalling process must wait until the resumed process permits it to |
---|
798 | proceed. We therefore introduce for each monitor a second semaphore "urgent" |
---|
799 | (initialized to 0), on which signalling processes suspend themselves by the |
---|
800 | operation P(urgent). |
---|
801 | |
---|
802 | Hence, the original definition of signal is in fact signal_block, i.e., the |
---|
803 | signaller blocks. Later implementations of monitor switched to signaller |
---|
804 | nonblocking because most signals occur before returns, which allows the |
---|
805 | signaller to continue execution, exit the monitor, and run concurrently with |
---|
806 | the signalled thread that restarts in the monitor. When the signaller is not |
---|
807 | going to exit immediately, signal_block is appropriate. |
---|
808 | |
---|
809 | |
---|
810 | it seems like its behavior can be modeled with multiple condition |
---|
811 | variables, but that's clearly more complex. |
---|
812 | |
---|
813 | Yes. Buhr, Fortier and Coffin show in Monitor Classification, ACM Computing |
---|
814 | Surveys, 27(1):63-107, that all extant monitors with different signalling |
---|
815 | semantics can be transformed into each other. However, some transformations are |
---|
816 | complex and runtime expensive. |
---|
817 | |
---|
818 | |
---|
819 | One question I had about `signal_block`: what happens if one signals |
---|
820 | but no other thread is waiting? Does it block until some other thread |
---|
821 | waits? Or is that user error? |
---|
822 | |
---|
823 | On page 20, it states: |
---|
824 | |
---|
825 | Signalling is unconditional because signalling an empty condition queue does |
---|
826 | nothing. |
---|
827 | |
---|
828 | To the best of our knowledge, all monitors have the same semantics for |
---|
829 | signalling an empty condition queue, regardless of the kind of signal, i.e., |
---|
830 | signal or signal_block. |
---|
831 | |
---|
832 | I believe that one difference between the Go program and the Cforall |
---|
833 | equivalent is that the Goroutine has an associated queue, so that |
---|
834 | multiple messages could be enqueued, whereas the Cforall equivalent is |
---|
835 | effectively a "bounded buffer" of length 1. Is that correct? |
---|
836 | |
---|
837 | Actually, the buffer length is 0 for the Cforall call and the Go unbuffered |
---|
838 | send so both are synchronous communication. |
---|
839 | |
---|
840 | I think this should be stated explicitly. (Presumably, one could modify the |
---|
841 | Cforall program to include an explicit vector of queued messages if |
---|
842 | desired, but you would also be reimplementing the channel abstraction.) |
---|
843 | |
---|
844 | Fixed, by adding the following sentences: |
---|
845 | |
---|
846 | The different between call and channel send occurs for buffered channels |
---|
847 | making the send asynchronous. In \CFA, asynchronous call and multiple |
---|
848 | buffers is provided using an administrator and worker |
---|
849 | threads~\cite{Gentleman81} and/or futures (not discussed). |
---|
850 | |
---|
851 | |
---|
852 | Also, in Figure 20, I believe that there is a missing `mutex` keyword. |
---|
853 | |
---|
854 | Fixed. |
---|
855 | |
---|
856 | |
---|
857 | I was glad to see that the paper acknowledged that Cforall still had |
---|
858 | low-level atomic operations, even if their use is discouraged in favor of |
---|
859 | higher-level alternatives. |
---|
860 | |
---|
861 | There was never an attempt to not acknowledged that Cforall had low-level |
---|
862 | atomic operations. The original version of the paper stated: |
---|
863 | |
---|
864 | 6.6 Low-level Locks |
---|
865 | For completeness and efficiency, Cforall provides a standard set of low-level |
---|
866 | locks: recursive mutex, condition, semaphore, barrier, etc., and atomic |
---|
867 | instructions: fetchAssign, fetchAdd, testSet, compareSet, etc. |
---|
868 | |
---|
869 | and that section is still in the paper. In fact, we use these low-level |
---|
870 | mechanisms to build all of the high-level concurrency constructs in Cforall. |
---|
871 | |
---|
872 | |
---|
873 | However, I still feel that the conclusion overstates the value of the |
---|
874 | contribution here when it says that "Cforall high-level race-free monitors |
---|
875 | and threads provide the core mechanisms for mutual exclusion and |
---|
876 | synchronization, without the need for volatile and atomics". I feel |
---|
877 | confident that Java programmers, for example, would be advised to stick |
---|
878 | with synchronized methods whenever possible, and it seems to me that they |
---|
879 | offer similar advantages -- but they sometimes wind up using volatiles for |
---|
880 | performance reasons. |
---|
881 | |
---|
882 | I think we are agreeing violently. 99.9% of Java/Cforall/Go/Rust concurrent |
---|
883 | programs can achieve very good performance without volatile or atomics because |
---|
884 | the runtime system has already used these low-level capabilities to build an |
---|
885 | efficient set of high-level concurrency constructs. |
---|
886 | |
---|
887 | 0.1% of the time programmers need to build their own locks and synchronization |
---|
888 | mechanisms. This need also occurs for storage management. Both of these |
---|
889 | mechanisms are allowed in Cforall but are fraught with danger and should be |
---|
890 | discouraged. Specially, it takes a 7th dan Black Belt programmer to understand |
---|
891 | fencing for a WSO memory model, such as on the ARM. It doesn't help that the |
---|
892 | C++ atomics are baroque and incomprehensible. I'm sure Hans Boehm, Doug Lea, |
---|
893 | Dave Dice and me would agree that 99% of hand-crafted locks created by |
---|
894 | programmers are broken and/or non-portable. |
---|
895 | |
---|
896 | |
---|
897 | I was also confused by the term "race-free" in that sentence. In |
---|
898 | particular, I don't think that Cforall has any mechanisms for preventing |
---|
899 | *data races*, and it clearly doesn't prevent "race conditions" (which would |
---|
900 | bar all sorts of useful programs). I suppose that "race free" here might be |
---|
901 | referring to the improvements such as removing barging behavior. |
---|
902 | |
---|
903 | We use the term "race free" to mean the same as Boehm/Adve's "data-race |
---|
904 | freedom" in |
---|
905 | |
---|
906 | Boehm Hans-J., Adve Sarita V., You Don't Know Jack About Shared Variables or |
---|
907 | Memory Models. Communications ACM. 2012; 55(2):48-54. |
---|
908 | https://queue.acm.org/detail.cfm?id=2088916 |
---|
909 | |
---|
910 | which is cited in the paper. Furthermore, we never said that Cforall has |
---|
911 | mechanisms for preventing *all* data races. We said Cforall high-level |
---|
912 | race-free monitors and threads[, when used with mutex access function] (added |
---|
913 | to the paper), implies no data races within these constructs, unless a |
---|
914 | programmer directly publishes shared state. This approach is exactly what |
---|
915 | Boehm/Adve advocate for the vast majority of concurrent programming. |
---|
916 | |
---|
917 | |
---|
918 | It would perhaps be more interesting to see a comparison built using [tokio] or |
---|
919 | [async-std], two of the more prominent user-space threading libraries that |
---|
920 | build on Rust's async-await feature (which operates quite differently than |
---|
921 | Javascript's async-await, in that it doesn't cause every aync function call to |
---|
922 | schedule a distinct task). |
---|
923 | |
---|
924 | Done. |
---|
925 | |
---|
926 | |
---|
927 | Several figures used the `with` keyword. I deduced that `with(foo)` permits |
---|
928 | one to write `bar` instead of `foo.bar`. It seems worth introducing. |
---|
929 | Apologies if this is stated in the paper, if so I missed it. |
---|
930 | |
---|
931 | Page 6, footnote F states: |
---|
932 | |
---|
933 | The Cforall "with" clause opens an aggregate scope making its fields directly |
---|
934 | accessible, like Pascal "with", but using parallel semantics; multiple |
---|
935 | aggregates may be opened. |
---|
936 | |
---|
937 | |
---|
938 | On page 20, section 6.3, "external scheduling and vice versus" should be |
---|
939 | "external scheduling and vice versa". |
---|
940 | |
---|
941 | Fixed. |
---|
942 | |
---|
943 | |
---|
944 | On page 5, section 2.3, the paper states "we content" but it should be "we |
---|
945 | contend". |
---|
946 | |
---|
947 | Fixed. |
---|
948 | |
---|
949 | |
---|
950 | Page 1. I don't believe that it s fair to imply that Scala is "research |
---|
951 | vehicle" as it is used by major players, Twitter being the most prominent |
---|
952 | example. |
---|
953 | |
---|
954 | Fixed. Removed "research". |
---|
955 | |
---|
956 | |
---|
957 | Page 15. Must Cforall threads start after construction (e.g. see your |
---|
958 | example on page 15, line 21)? |
---|
959 | |
---|
960 | Yes. Our experience in Java, uC++ and Cforall is that 95% of the time |
---|
961 | programmers want threads to start immediately. (Most Java programs have no code |
---|
962 | between a thread declaration and the call to start the thread.) Therefore, |
---|
963 | this semantic should be the default because (see page 13): |
---|
964 | |
---|
965 | Alternatives, such as explicitly starting threads as in Java, are repetitive |
---|
966 | and forgetting to call start is a common source of errors. |
---|
967 | |
---|
968 | To handle the other 5% of the cases, there is a trivial Cforall pattern |
---|
969 | providing Java-style start/join. The additional cost for this pattern is 2 |
---|
970 | light-weight thread context-switches. |
---|
971 | |
---|
972 | thread T {}; |
---|
973 | void start( T & mutex ) {} // any function name |
---|
974 | void join( T & mutex ) {} // any function name |
---|
975 | void main( T & t ) { |
---|
976 | sout | "start"; |
---|
977 | waitfor( start : t ); // wait to be started |
---|
978 | sout | "restart"; // perform work |
---|
979 | waitfor( join : t ); // wait to be joined |
---|
980 | sout | "join"; |
---|
981 | } |
---|
982 | int main() { |
---|
983 | T t[3]; // threads start and delay |
---|
984 | sout | "need to start"; |
---|
985 | for ( i; 3 ) start( t[i] ); |
---|
986 | sout | "need to join"; |
---|
987 | for ( i; 3 ) join( t[i] ); |
---|
988 | sout | "threads stopped"; |
---|
989 | } // threads deleted from stack |
---|
990 | |
---|
991 | $ a.out |
---|
992 | need to start |
---|
993 | start |
---|
994 | start |
---|
995 | start |
---|
996 | need to join |
---|
997 | restart |
---|
998 | restart |
---|
999 | restart |
---|
1000 | threads stopped |
---|
1001 | join |
---|
1002 | join |
---|
1003 | join |
---|
1004 | |
---|
1005 | |
---|
1006 | Page 18, line 17: is using |
---|
1007 | |
---|
1008 | Fixed. |
---|