Changeset ea05f8d
- Timestamp:
- Jun 17, 2019, 7:15:09 PM (6 years ago)
- Branches:
- ADT, arm-eh, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
- Children:
- 07ca4dd
- Parents:
- 9d5089e (diff), e6faef4 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - Files:
-
- 5 added
- 9 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/concurrency/Makefile
r9d5089e rea05f8d 27 27 FullCoroutinePhases \ 28 28 corlayout \ 29 CondSigWait \ 29 30 monitor \ 30 31 ext_monitor \ -
doc/papers/concurrency/Paper.tex
r9d5089e rea05f8d 15 15 \usepackage{epic,eepic} 16 16 \usepackage{xspace} 17 \usepackage{enumitem} 17 18 \usepackage{comment} 18 19 \usepackage{upquote} % switch curled `'" to straight … … 23 24 \usepackage{dcolumn} % align decimal points in tables 24 25 \usepackage{capt-of} 26 \setlength{\multicolsep}{6.0pt plus 2.0pt minus 1.5pt} 25 27 26 28 \hypersetup{breaklinks=true} … … 271 273 \abstract[Summary]{ 272 274 \CFA is a polymorphic, non-object-oriented, concurrent, backwards-compatible extension of the C programming language. 273 This paper discusses the design philosophy and implementation of its advanced control-flow and concurrent/parallel features, along with the supporting runtime .274 These features are created from scratch as ISO C has only low-level and/or unimplemented concurrency, so C programmers continue to rely on library features like Cpthreads.275 This paper discusses the design philosophy and implementation of its advanced control-flow and concurrent/parallel features, along with the supporting runtime written in \CFA. 276 These features are created from scratch as ISO C has only low-level and/or unimplemented concurrency, so C programmers continue to rely on library features like pthreads. 275 277 \CFA introduces modern language-level control-flow mechanisms, like generators, coroutines, user-level threading, and monitors for mutual exclusion and synchronization. 276 278 % Library extension for executors, futures, and actors are built on these basic mechanisms. 277 The runtime provides significant programmer simplification and safety by eliminating spurious wakeup and optionalmonitor barging.279 The runtime provides significant programmer simplification and safety by eliminating spurious wakeup and monitor barging. 278 280 The runtime also ensures multiple monitors can be safely acquired \emph{simultaneously} (deadlock free), and this feature is fully integrated with all monitor synchronization mechanisms. 279 281 All control-flow features integrate with the \CFA polymorphic type-system and exception handling, while respecting the expectations and style of C programmers. … … 292 294 \section{Introduction} 293 295 294 This paper discusses the design philosophy and implementation of advanced language-level control-flow and concurrent/parallel features in \CFA~\cite{Moss18 } and its runtime.296 This paper discusses the design philosophy and implementation of advanced language-level control-flow and concurrent/parallel features in \CFA~\cite{Moss18,Cforall} and its runtime, which is written entirely in \CFA. 295 297 \CFA is a modern, polymorphic, non-object-oriented\footnote{ 296 298 \CFA has features often associated with object-oriented programming languages, such as constructors, destructors, virtuals and simple inheritance. 297 299 However, functions \emph{cannot} be nested in structures, so there is no lexical binding between a structure and set of functions (member/method) implemented by an implicit \lstinline@this@ (receiver) parameter.}, 298 300 backwards-compatible extension of the C programming language. 299 In many ways, \CFA is to C as Scala~\cite{Scala} is to Java, providing a \emph{research vehicle} for new typing and control flow capabilities on top of a highly popular programming language allowing immediate dissemination. 300 Within the \CFA framework, new control-flow features are created from scratch. 301 ISO \Celeven defines only a subset of the \CFA extensions, where the overlapping features are concurrency~\cite[\S~7.26]{C11}. 302 However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}. 303 Furthermore, \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error prone; 301 In many ways, \CFA is to C as Scala~\cite{Scala} is to Java, providing a \emph{research vehicle} for new typing and control-flow capabilities on top of a highly popular programming language allowing immediate dissemination. 302 Within the \CFA framework, new control-flow features are created from scratch because ISO \Celeven defines only a subset of the \CFA extensions, where the overlapping features are concurrency~\cite[\S~7.26]{C11}. 303 However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}, and \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error prone; 304 304 no high-level language concurrency features are defined. 305 305 Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-8 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C11 concurrency approach. … … 314 314 The main argument for user-level threading is that they are lighter weight than kernel threads (locking and context switching do not cross the kernel boundary), so there is less restriction on programming styles that encourage large numbers of threads performing medium work-units to facilitate load balancing by the runtime~\cite{Verch12}. 315 315 As well, user-threading facilitates a simpler concurrency approach using thread objects that leverage sequential patterns versus events with call-backs~\cite{vonBehren03}. 316 Finally, performant user-threading implementations (both time and space) are largely competitive withdirect kernel-threading implementations, while achieving the programming advantages of high concurrency levels and safety.316 Finally, performant user-threading implementations (both time and space) meet or exceed direct kernel-threading implementations, while achieving the programming advantages of high concurrency levels and safety. 317 317 318 318 A further effort over the past two decades is the development of language memory-models to deal with the conflict between language features and compiler/hardware optimizations, i.e., some language features are unsafe in the presence of aggressive sequential optimizations~\cite{Buhr95a,Boehm05}. … … 324 324 The key feature that dovetails with this paper is non-local exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++} 325 325 } and coroutines. 326 Finally, solutions in the language allows matching constructs with language paradigm, i.e., imperative and functional languages have different presentations of the same concept.326 Finally, language solutions allow matching constructs with language paradigm, i.e., imperative and functional languages often have different presentations of the same concept to fit their programming model. 327 327 328 328 Finally, it is important for a language to provide safety over performance \emph{as the default}, allowing careful reduction of safety for performance when necessary. 329 329 Two concurrency violations of this philosophy are \emph{spurious wakeup} and \emph{barging}, i.e., random wakeup~\cite[\S~8]{Buhr05a} and signals-as-hints~\cite[\S~8]{Buhr05a}, where one is a consequence of the other, i.e., once there is spurious wakeup, signals-as-hints follows. 330 330 However, spurious wakeup is \emph{not} a foundational concurrency property~\cite[\S~8]{Buhr05a}, it is a performance design choice. 331 Similarly, signals-as-hints is alsoa performance decision.331 Similarly, signals-as-hints is often a performance decision. 332 332 We argue removing spurious wakeup and signals-as-hints makes concurrent programming significantly safer because it removes local non-determinism and matches with programmer expectation. 333 ( Experience by the authorsteaching concurrency is that students are highly confused by these semantics.)333 (Authors experience teaching concurrency is that students are highly confused by these semantics.) 334 334 Clawing back performance, when local non-determinism is unimportant, should be an option not the default. 335 335 336 336 \begin{comment} 337 For example, it is possible to provide exceptions, coroutines, monitors, and tasks as specialized types in an object-oriented language, integrating these constructs to allow leveraging the type-system (static type-checking) and all other object-oriented capabilities~\cite{uC++}.338 It is also possible to leverage call/return for blocking communication via new control structures, versus switching to alternative communication paradigms, like channels or message passing.339 As well, user threading is often a complementary feature, allowing light-weight threading to match with low-cost objects, while hiding the application/kernel boundary.340 User threading also allows layering of implicit concurrency models (no explicit thread creation), such executors, data-flow, actors, into a single language, so programmers can chose the model that best fits an algorithm.\footnote{341 All implicit concurrency models have explicit threading in their implementation, and hence, can be build from explicit threading;342 however, the reverse is seldom true, i.e., given implicit concurrency, e.g., actors, it is virtually impossible to create explicit concurrency, e.g., blocking thread objects.}343 Finally, with extended language features and user-level threading it is possible to discretely fold locking and non-blocking I/O multiplexing into the language's I/O libraries, so threading implicitly dovetails with the I/O subsystem.344 \CFA embraces language extensions and user-level threading to provide advanced control-flow (exception handling\footnote{345 \CFA exception handling will be presented in a separate paper.346 The key feature that dovetails with this paper is non-local exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++}347 } and coroutines) and concurrency.348 349 337 Most augmented traditional (Fortran 18~\cite{Fortran18}, Cobol 14~\cite{Cobol14}, Ada 12~\cite{Ada12}, Java 11~\cite{Java11}) and new languages (Go~\cite{Go}, Rust~\cite{Rust}, and D~\cite{D}), except \CC, diverge from C with different syntax and semantics, only interoperate indirectly with C, and are not systems languages, for those with managed memory. 350 338 As a result, there is a significant learning curve to move to these languages, and C legacy-code must be rewritten. … … 355 343 356 344 \CFA embraces user-level threading, language extensions for advanced control-flow, and safety as the default. 357 We present comparative examples so the reader can judge if the \CFA control-flow extensions are better and safer than those in o r proposed for \Celeven, \CC, and other concurrent, imperative programming-languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms.345 We present comparative examples so the reader can judge if the \CFA control-flow extensions are better and safer than those in other concurrent, imperative programming-languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms. 358 346 The main contributions of this work are: 359 347 \begin{itemize} … … 609 597 The name \lstinline|main| has special meaning in C, specifically the function where a program starts execution. 610 598 Hence, overloading this name for other starting points (generator/coroutine/thread) is a logical extension.} 611 which takes as its only parameter a reference to the generator type, called a \emph{generator main}.599 called a \emph{generator main},which takes as its only parameter a reference to the generator type. 612 600 The generator main contains @suspend@ statements that suspend execution without ending the generator versus @return@. 613 601 For the Fibonacci generator-main,\footnote{ … … 635 623 636 624 Having to manually create the generator closure by moving local-state variables into the generator type is an additional programmer burden. 637 (This restriction is removed by the coroutine , seeSection~\ref{s:Coroutine}.)625 (This restriction is removed by the coroutine in Section~\ref{s:Coroutine}.) 638 626 This requirement follows from the generality of variable-size local-state, \eg local state with a variable-length array requires dynamic allocation because the array size is unknown at compile time. 639 627 However, dynamic allocation significantly increases the cost of generator creation/destruction and is a show-stopper for embedded real-time programming. … … 678 666 679 667 Note, the cost of creating and resuming the device-driver generator, @Driver@, is virtually identical to call/return, so performance in an operating-system kernel is excellent. 680 As well, the data state is small, where variables @byte@ and @msg@ are communication variables for passing in message bytes and returning the message, and variables @lnth@, @crc@, and @sum@ are local variable that must be retained between calls and are hoisted into the generator type.681 Manually, detecting and hoisting local-state variables is easy when the number is small.668 As well, the data state is small, where variables @byte@ and @msg@ are communication variables for passing in message bytes and returning the message, and variables @lnth@, @crc@, and @sum@ are local variable that must be retained between calls and are manually hoisted into the generator type. 669 % Manually, detecting and hoisting local-state variables is easy when the number is small. 682 670 Finally, the execution state is large, with one @resume@ and seven @suspend@s. 683 Hence, the key benefits of the generator are correctness, safety, and maintenance because the execution states of the FSMare transcribed directly into the programming language rather than using a table-driven approach.671 Hence, the key benefits of the generator are correctness, safety, and maintenance because the execution states are transcribed directly into the programming language rather than using a table-driven approach. 684 672 Because FSMs can be complex and occur frequently in important domains, direct support of the generator is crucial in a systems programming-language. 685 673 … … 806 794 Figure~\ref{f:CPingPongSim} shows the implementation of the symmetric generator, where the complexity is the @resume@, which needs an extension to the calling convention to perform a forward rather than backward jump. 807 795 This jump starts at the top of the next generator main to re-execute the normal calling convention to make space on the stack for its local variables. 808 However, before the jump, the caller must reset its stack (and any registers) equivalent to a @return@, but subsequently jump forward not backwards.796 However, before the jump, the caller must reset its stack (and any registers) equivalent to a @return@, but subsequently jump forward. 809 797 This semantics is basically a tail-call optimization, which compilers already perform. 810 Hence, assembly code is manually insert in the example to undo the generator's entry code before the direct jump.811 This assembly code is fragile as itdepends on what entry code is generated, specifically if there are local variables and the level of optimization.798 The example shows the assembly code to undo the generator's entry code before the direct jump. 799 This assembly code depends on what entry code is generated, specifically if there are local variables and the level of optimization. 812 800 To provide this new calling convention requires a mechanism built into the compiler, which is beyond the scope of \CFA at this time. 813 801 Nevertheless, it is possible to hand generate any symmetric generators for proof of concept and performance testing. … … 876 864 Finally, part of this generator work was inspired by the recent \CCtwenty generator proposal~\cite{C++20Coroutine19} (which they call coroutines). 877 865 Our work provides the same high-performance asymmetric-generators as \CCtwenty, and extends their work with symmetric generators. 878 An additional \CCtwenty generator feature allows @suspend@ and @resume@ to be followed by a restricted compound-statement that is executed after the current generator has reset its stack but before calling the next generator, specified with the following \CFA syntax.866 An additional \CCtwenty generator feature allows @suspend@ and @resume@ to be followed by a restricted compound-statement that is executed after the current generator has reset its stack but before calling the next generator, specified with \CFA syntax: 879 867 \begin{cfa} 880 868 ... suspend`{ ... }`; … … 891 879 Stackful coroutines extend generator semantics, \ie there is an implicit closure and @suspend@ may appear in a helper function called from the coroutine main. 892 880 A coroutine is specified by replacing @generator@ with @coroutine@ for the type. 893 Thisgenerality results in higher cost for creation, due to dynamic stack allocation, execution, due to context switching among stacks, and terminating, due to possible stack unwinding and dynamic stack deallocation.881 Coroutine generality results in higher cost for creation, due to dynamic stack allocation, execution, due to context switching among stacks, and terminating, due to possible stack unwinding and dynamic stack deallocation. 894 882 A series of different kinds of coroutines and their implementation demonstrate how coroutines extend generators. 895 883 … … 1249 1237 Hence, if a coroutine's destructor detects the coroutine is not ended, it implicitly raises a cancellation exception (uncatchable exception) at the coroutine and resumes it so the cancellation exception can propagate to the root of the coroutine's stack destroying all local variable on the stack. 1250 1238 So the \CFA semantics for the generator and coroutine, ensure both can be safely deallocated at any time, regardless of their current state, like any other aggregate object. 1251 Explicitly raising normal exceptions at another coroutine can replace flag variables, like @stop@, \eg @prod@ raises a @stop@ exception at @cons@ after it finishes generating values and resumes @cons@, which catches the @stop@ exception to terminate its loop.1239 Explicitly raising normal exceptions at another coroutine can replace flag variables, like @stop@, \eg @prod@ raises a @stop@ exception at @cons@ after it finishes generating values and resumes @cons@, which catches the @stop@ exception to terminate its loop. 1252 1240 1253 1241 Finally, there is an interesting effect for @suspend@ with symmetric coroutines. … … 1260 1248 \subsection{(Generator) Coroutine Implementation} 1261 1249 1262 A significant implementation challenge for generators/coroutines (and threads , seeSection~\ref{s:threads}) is adding extra fields to the custom types and related functions, \eg inserting code after/before the coroutine constructor/destructor and @main@ to create/initialize/de-initialize/destroy any extra fields, \eg stack.1263 There are several solutions to these problem, which follow from the object-oriented -flavoured choice to adoptcustom types.1250 A significant implementation challenge for generators/coroutines (and threads in Section~\ref{s:threads}) is adding extra fields to the custom types and related functions, \eg inserting code after/before the coroutine constructor/destructor and @main@ to create/initialize/de-initialize/destroy any extra fields, \eg stack. 1251 There are several solutions to these problem, which follow from the object-oriented flavour of adopting custom types. 1264 1252 1265 1253 For object-oriented languages, inheritance is used to provide extra fields and code via explicit inheritance: … … 1270 1258 As well, some special properties are not handled by existing language semantics, \eg the execution of constructors/destructors is in the wrong order to implicitly start threads because the thread must start \emph{after} all constructors as it relies on a completely initialized object, but the inherited constructor runs \emph{before} the derived. 1271 1259 Alternatives, such as explicitly starting threads as in Java, are repetitive and forgetting to call start is a common source of errors. 1272 1273 1260 An alternative is composition: 1274 1261 \begin{cfa} … … 1284 1271 The downside of this approach is that it makes custom types a special case in the language. 1285 1272 Users wanting to extend custom types or build their own can only do so in ways offered by the language. 1286 Furthermore, implementing custom types without language support smay display the power of a programming language.1287 \CFA blends the two approaches, providing custom type for idiomatic \CFA code, extending and building new custom types is still possible, similar to Java approach with builtin and library concurrency.1288 1289 Part of the mechanism to generalize custom types is the \CFA trait , \eg the definition for custom-type @coroutine@ is anything satisfying the trait @is_coroutine@, and this trait both enforces and restricts the coroutine-interface functions.1273 Furthermore, implementing custom types without language support may display the power of a programming language. 1274 \CFA blends the two approaches, providing custom type for idiomatic \CFA code, while extending and building new custom types is still possible, similar to Java concurrency with builtin and library. 1275 1276 Part of the mechanism to generalize custom types is the \CFA trait~\cite[\S~2.3]{Moss18}, \eg the definition for custom-type @coroutine@ is anything satisfying the trait @is_coroutine@, and this trait both enforces and restricts the coroutine-interface functions. 1290 1277 \begin{cfa} 1291 1278 trait is_coroutine( `dtype` T ) { … … 1293 1280 coroutine_desc * get_coroutine( T & ); 1294 1281 }; 1295 forall( `dtype` T | is_coroutine(T) ) void $suspend$( T & ); 1296 forall( `dtype` T | is_coroutine(T) ) void resume( T & ); 1282 forall( `dtype` T | is_coroutine(T) ) void $suspend$( T & ), resume( T & ); 1297 1283 \end{cfa} 1298 1284 Note, copying generators/coroutines/threads is not meaningful. … … 1301 1287 Furthermore, two coroutines cannot logically execute on the same stack. 1302 1288 A deep coroutine copy, which copies the stack, is also meaningless in an unmanaged language (no garbage collection), like C, because the stack may contain pointers to object within it that require updating for the copy. 1303 Now, the@dtype@ property provides no \emph{implicit} copying operations and the @is_coroutine@ trait provides no \emph{explicit} copying operations, so all coroutines must be passed by reference (pointer).1289 The \CFA @dtype@ property provides no \emph{implicit} copying operations and the @is_coroutine@ trait provides no \emph{explicit} copying operations, so all coroutines must be passed by reference (pointer). 1304 1290 The function definitions ensures there is a statically-typed @main@ function that is the starting point (first stack frame) of a coroutine, and a mechanism to get (read) the currently executing coroutine handle. 1305 1291 The @main@ function has no return value or additional parameters because the coroutine type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values versus fixed ones. 1306 1292 The advantage of this approach is that users can easily create different types of coroutines, \eg changing the memory layout of a coroutine is trivial when implementing the @get_coroutine@ function, and possibly redefining @suspend@ and @resume@. 1307 1293 1308 The \CFA custom-type @coroutine@ implicitly implements the getter and forward declarations required for implementing the coroutine main:1294 The \CFA custom-type @coroutine@ implicitly implements the getter and forward declarations for the coroutine main. 1309 1295 \begin{cquote} 1310 1296 \begin{tabular}{@{}ccc@{}} … … 1342 1328 \end{tabular} 1343 1329 \end{cquote} 1344 The combination of custom types and fundamental @trait@ description of these types allows a nconcise specification for programmers and tools, while more advanced programmers can have tighter control over memory layout and initialization.1330 The combination of custom types and fundamental @trait@ description of these types allows a concise specification for programmers and tools, while more advanced programmers can have tighter control over memory layout and initialization. 1345 1331 1346 1332 Figure~\ref{f:CoroutineMemoryLayout} shows different memory-layout options for a coroutine (where a task is similar). 1347 The coroutine handle is the @coroutine@ instance containing programmer specified global/communication variables across interface functions.1333 The coroutine handle is the @coroutine@ instance containing programmer specified type global/communication variables across interface functions. 1348 1334 The coroutine descriptor contains all implicit declarations needed by the runtime, \eg @suspend@/@resume@, and can be part of the coroutine handle or separate. 1349 The coroutine stack can appear in a number of locations and fixed or variable sized.1335 The coroutine stack can appear in a number of locations and be fixed or variable sized. 1350 1336 Hence, the coroutine's stack could be a VLS\footnote{ 1351 1337 We are examining variable-sized structures (VLS), where fields can be variable-sized structures or arrays. … … 1355 1341 For heap stack allocation, allocation/deallocation is an expensive heap allocation (where the heap can be a shared resource), modulo any stack constructor costs. 1356 1342 With heap stack allocation, it is also possible to use a split (segmented) stack calling-convention, available with gcc and clang, so the stack is variable sized. 1357 Currently, \CFA supports stack/heap allocated descriptors but only fixed-sized heap allocated stacks ;1343 Currently, \CFA supports stack/heap allocated descriptors but only fixed-sized heap allocated stacks. 1358 1344 In \CFA debug-mode, the fixed-sized stack is terminated with a write-only page, which catches most stack overflows. 1359 1345 Experience teaching concurrency with \uC~\cite{CS343} shows fixed-sized stacks are rarely an issue for students. 1360 Split-stack allocation is under development but requires recompilation of existingcode, which may be impossible.1346 Split-stack allocation is under development but requires recompilation of legacy code, which may be impossible. 1361 1347 1362 1348 \begin{figure} … … 1377 1363 However, coroutines are a stepping stone towards concurrency. 1378 1364 1379 The transition to concurrency, even for a single thread with multiple stacks, occurs when coroutines context switch to a \newterm{scheduling coroutine}, introducing non-determinism from the coroutine perspective~\cite[\S~3 ]{Buhr05a}.1365 The transition to concurrency, even for a single thread with multiple stacks, occurs when coroutines context switch to a \newterm{scheduling coroutine}, introducing non-determinism from the coroutine perspective~\cite[\S~3,]{Buhr05a}\cite{Adya02}. 1380 1366 Therefore, a minimal concurrency system requires coroutines \emph{in conjunction with a nondeterministic scheduler}. 1381 1367 The resulting execution system now follows a cooperative threading-model, called \newterm{non-preemptive scheduling}. … … 1391 1377 For stackless, the scheduler performs scheduling on the stack of the current coroutine and switches directly to the next coroutine, so there is one context switch. 1392 1378 For stackful, the current coroutine switches to the scheduler, which performs scheduling, and it then switches to the next coroutine, so there are two context switches. 1393 A stackful scheduler is often used for simplicity and security.1379 The \CFA runtime uses a stackful scheduler for uniformity and security. 1394 1380 1395 1381 … … 1402 1388 \begin{tabular}{@{}lll@{}} 1403 1389 \multicolumn{1}{c}{\textbf{Java}} & \multicolumn{1}{c}{\textbf{\Celeven}} & \multicolumn{1}{c}{\textbf{pthreads}} \\ 1404 \begin{cfa} [aboveskip=0pt,belowskip=0pt]1390 \begin{cfa} 1405 1391 class MyTask extends Thread {...} 1406 1392 mytask t = new MyTask(...); … … 1410 1396 \end{cfa} 1411 1397 & 1412 \begin{cfa} [aboveskip=0pt,belowskip=0pt]1398 \begin{cfa} 1413 1399 class MyTask { ... } // functor 1414 1400 MyTask mytask; … … 1418 1404 \end{cfa} 1419 1405 & 1420 \begin{cfa} [aboveskip=0pt,belowskip=0pt]1406 \begin{cfa} 1421 1407 void * rtn( void * arg ) {...} 1422 1408 pthread_t t; int i = 3; … … 1487 1473 \subsection{Thread Implementation} 1488 1474 1489 Threads in \CFA are user level run by runtime kernel threads (see Section~\ref{s: Parallelism}), where user threads provide concurrency and kernel threads provide parallelism.1475 Threads in \CFA are user level run by runtime kernel threads (see Section~\ref{s:CFARuntimeStructure}), where user threads provide concurrency and kernel threads provide parallelism. 1490 1476 Like coroutines, and for the same design reasons, \CFA provides a custom @thread@ type and a @trait@ to enforce and restrict the task-interface functions. 1491 1477 \begin{cquote} … … 1511 1497 Similarly, the function definitions ensures there is a statically-typed @main@ function that is the thread starting point (first stack frame), a mechanism to get (read) the currently executing thread handle, and a special destructor to prevent deallocation while the thread is executing. 1512 1498 (The qualifier @mutex@ for the destructor parameter is discussed in Section~\ref{s:Monitor}.) 1513 The difference between the coroutine and thread is that a coroutine borrows a thread from its caller, so the first thread resuming a coroutine creates the coroutine's stack starts running the coroutine main on the stack;1499 The difference between the coroutine and thread is that a coroutine borrows a thread from its caller, so the first thread resuming a coroutine creates the coroutine's stack and starts running the coroutine main on the stack; 1514 1500 whereas, a thread is scheduling for execution in @main@ immediately after its constructor is run. 1515 1501 No return value or additional parameters are necessary for this function because the @thread@ type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values. 1516 1502 1517 \begin{comment} % put in appendix with coroutine version ???1518 As such the @main@ function of a thread can be defined as1519 \begin{cfa}1520 thread foo {};1521 1522 void main(foo & this) {1523 sout | "Hello World!";1524 }1525 \end{cfa}1526 1527 In this example, threads of type @foo@ start execution in the @void main(foo &)@ function, which prints @"Hello World!".@ While this paper encourages this approach to enforce strongly typed programming, users may prefer to use the function-based thread semantics for the sake of simplicity.1528 With the static semantics it is trivial to write a thread type that takes a function pointer as a parameter and executes it on its stack asynchronously.1529 \begin{cfa}1530 typedef void (*voidRtn)(int);1531 1532 thread RtnRunner {1533 voidRtn func;1534 int arg;1535 };1536 1537 void ?{}(RtnRunner & this, voidRtn inRtn, int arg) {1538 this.func = inRtn;1539 this.arg = arg;1540 }1541 1542 void main(RtnRunner & this) {1543 // thread starts here and runs the function1544 this.func( this.arg );1545 }1546 1547 void hello(/*unused*/ int) {1548 sout | "Hello World!";1549 }1550 1551 int main() {1552 RtnRunner f = {hello, 42};1553 return 0?1554 }1555 \end{cfa}1556 A consequence of the strongly typed approach to main is that memory layout of parameters and return values to/from a thread are now explicitly specified in the \textbf{API}.1557 \end{comment}1558 1559 1503 1560 1504 \section{Mutual Exclusion / Synchronization} 1561 1505 1562 Un controlled nondeterministic execution produces meaningless results.1563 To produce meaningful execution requires clawing back some determinism using mutual exclusion and synchronization, where mutual exclusion provides access -control for threads using shared data, and synchronization is a timing relationship among threads~\cite[\S~4]{Buhr05a}.1506 Unrestricted nondeterminism is meaningless as there is no way to know when the result is completed without synchronization. 1507 To produce meaningful execution requires clawing back some determinism using mutual exclusion and synchronization, where mutual exclusion provides access control for threads using shared data, and synchronization is a timing relationship among threads~\cite[\S~4]{Buhr05a}. 1564 1508 Some concurrent systems eliminate mutable shared-state by switching to stateless communication like message passing~\cite{Thoth,Harmony,V-Kernel,MPI} (Erlang, MPI), channels~\cite{CSP} (CSP,Go), actors~\cite{Akka} (Akka, Scala), or functional techniques (Haskell). 1565 However, in call/return-based languages, these approaches force a clear distinction, \ie introduce a new programming paradigm between regular and concurrent computation, \eg function call versus message passing.1509 However, these approaches introduce a new communication mechanism for concurrency different from the standard communication using function call/return. 1566 1510 Hence, a programmer must learn and manipulate two sets of design/programming patterns. 1567 1511 While this distinction can be hidden away in library code, effective use of the library still has to take both paradigms into account. … … 1584 1528 The generalization is called a \newterm{group critical-section}~\cite{Joung00}, where multiple tasks with the same session may use the resource simultaneously, but different sessions may not use the resource simultaneously. 1585 1529 The readers/writer problem~\cite{Courtois71} is an instance of a group critical-section, where readers share a session but writers have a unique session. 1586 \newterm{Mutual exclusion} enforces the correct kind and number of threads us ea critical section.1530 \newterm{Mutual exclusion} enforces the correct kind and number of threads using a critical section. 1587 1531 1588 1532 However, many solutions exist for mutual exclusion, which vary in terms of performance, flexibility and ease of use. … … 1612 1556 1613 1557 A \textbf{monitor} is a set of functions that ensure mutual exclusion when accessing shared state. 1614 More precisely, a monitor is a programming technique that binds mutual exclusion tofunction scope, as opposed to locks, where mutual-exclusion is defined by acquire/release calls, independent of lexical context (analogous to block and heap storage allocation).1615 The strong association with the call/return paradigmeases programming, comprehension, and maintenance, at a slight cost in flexibility and efficiency.1558 More precisely, a monitor is a programming technique that implicitly binds mutual exclusion to static function scope, as opposed to locks, where mutual-exclusion is defined by acquire/release calls, independent of lexical context (analogous to block and heap storage allocation). 1559 Restricting acquire/release points eases programming, comprehension, and maintenance, at a slight cost in flexibility and efficiency. 1616 1560 \CFA uses a custom @monitor@ type and leverages declaration semantics (deallocation) to protect active or waiting threads in a monitor. 1617 1561 1618 1562 The following is a \CFA monitor implementation of an atomic counter. 1619 \newpage1620 1563 \begin{cfa}[morekeywords=nomutex] 1621 1564 `monitor` Aint { int cnt; }; $\C[4.25in]{// atomic integer counter}$ 1622 1565 int ++?( Aint & `mutex`$\(_{opt}\)$ this ) with( this ) { return ++cnt; } $\C{// increment}$ 1623 int ?=?( Aint & `mutex`$\(_{opt}\)$ lhs, int rhs ) with( lhs ) { cnt = rhs; } $\C{// conversions }\CRT$1566 int ?=?( Aint & `mutex`$\(_{opt}\)$ lhs, int rhs ) with( lhs ) { cnt = rhs; } $\C{// conversions with int}\CRT$ 1624 1567 int ?=?( int & lhs, Aint & `mutex`$\(_{opt}\)$ rhs ) with( rhs ) { lhs = cnt; } 1625 1568 \end{cfa} … … 1646 1589 } 1647 1590 \end{cfa} 1648 \CFA monitors also ensure the monitor lock is alwaysreleased regardless of how an acquiring function ends (normal or exceptional), and returning a shared variable is safe via copying before the lock is released.1649 Similar safety is offered by explicitmechanisms like \CC RAII;1650 monitor implicit safety ensures no programmer error.1651 Furthermore, RAII mechansims cannot handle complex synchronization within a monitor, where the monitor lock may not be released on function exit b ut passed to an unblocking thread.1591 \CFA monitors also ensure the monitor lock is released regardless of how an acquiring function ends (normal or exceptional), and returning a shared variable is safe via copying before the lock is released. 1592 Similar safety is offered by \emph{explicit} mechanisms like \CC RAII; 1593 monitor \emph{implicit} safety ensures no programmer usage errors. 1594 Furthermore, RAII mechansims cannot handle complex synchronization within a monitor, where the monitor lock may not be released on function exit because it is passed to an unblocking thread; 1652 1595 RAII is purely a mutual-exclusion mechanism (see Section~\ref{s:Scheduling}). 1653 1596 … … 1673 1616 \end{tabular} 1674 1617 \end{cquote} 1675 Like before, the @dtype@ property prevents \emph{implicit} copy operations and the @is_monitor@ trait provides no \emph{explicit} copy operations, so monitors must be passed by reference (pointer).1618 The @dtype@ property prevents \emph{implicit} copy operations and the @is_monitor@ trait provides no \emph{explicit} copy operations, so monitors must be passed by reference (pointer). 1676 1619 % Copying a lock is insecure because it is possible to copy an open lock and then use the open copy when the original lock is closed to simultaneously access the shared data. 1677 1620 % Copying a monitor is secure because both the lock and shared data are copies, but copying the shared data is meaningless because it no longer represents a unique entity. 1678 1621 Similarly, the function definitions ensures there is a mechanism to get (read) the currently executing monitor handle, and a special destructor to prevent deallocation if a thread using the shared data. 1679 The custom monitor type also inserts any lock ing vehicles needed to implement the monitor locking semnatics.1622 The custom monitor type also inserts any locks needed to implement the mutual exclusion semantics. 1680 1623 1681 1624 … … 1688 1631 1689 1632 The benefit of mandatory monitor qualifiers is self-documentation, but requiring both @mutex@ and \lstinline[morekeywords=nomutex]@nomutex@ for all monitor parameters is redundant. 1690 Instead, the semantics have one qualifier as the default ,and the other required.1633 Instead, the semantics have one qualifier as the default and the other required. 1691 1634 For example, make the safe @mutex@ qualifier the default because assuming \lstinline[morekeywords=nomutex]@nomutex@ may cause subtle errors. 1692 1635 Alternatively, make the unsafe \lstinline[morekeywords=nomutex]@nomutex@ qualifier the default because it is the \emph{normal} parameter semantics while @mutex@ parameters are rare. 1693 1636 Providing a default qualifier implies knowing whether a parameter is a monitor. 1694 Since \CFA relies heavily on traits as an abstraction mechanism, t he distinction between a type that is a monitor and a type that looks like a monitor can become blurred.1637 Since \CFA relies heavily on traits as an abstraction mechanism, types can coincidentally match the monitor trait but not be a monitor, similar to inheritance where a shape and playing card can both be drawable. 1695 1638 For this reason, \CFA requires programmers to identify the kind of parameter with the @mutex@ keyword and uses no keyword to mean \lstinline[morekeywords=nomutex]@nomutex@. 1696 1639 1640 \newpage 1697 1641 The next semantic decision is establishing which parameter \emph{types} may be qualified with @mutex@. 1698 1642 The following has monitor parameter types that are composed of multiple objects. … … 1713 1657 A positive consequence of this design decision is the ability to support multi-monitor functions,\footnote{ 1714 1658 While object-oriented monitors can be extended with a mutex qualifier for multiple-monitor members, no prior example of this feature could be found.} 1715 called \newterm{bulk acquire} (see Section~\ref{s:Implementation} for implementation details).1659 called \newterm{bulk acquire}. 1716 1660 \CFA guarantees acquisition order is consistent across calls to @mutex@ functions using the same monitors as arguments, so acquiring multiple monitors is safe from deadlock. 1717 Figure~\ref{f:BankTransfer} shows a trivial solution to the bank -account transfer problemusing \CFA monitors with implicit locking and \CC with explicit locking.1661 Figure~\ref{f:BankTransfer} shows a trivial solution to the bank transfer problem, where two resources must be locked simultaneously, using \CFA monitors with implicit locking and \CC with explicit locking. 1718 1662 A \CFA programmer only has to manage when to acquire mutual exclusion; 1719 1663 a \CC programmer must select the correct lock and acquisition mechanism from a panoply of locking options. … … 1792 1736 \end{figure} 1793 1737 1794 Users can force the acquiring order, by using @mutex@/\lstinline[morekeywords=nomutex]@nomutex@. 1738 Users can still force the acquiring order by using @mutex@/\lstinline[morekeywords=nomutex]@nomutex@. 1739 \newpage 1795 1740 \begin{cfa} 1796 1741 void foo( M & mutex m1, M & mutex m2 ); $\C{// acquire m1 and m2}$ … … 1803 1748 \end{cfa} 1804 1749 The bulk-acquire semantics allow @bar@ or @baz@ to acquire a monitor lock and reacquire it in @foo@. 1805 In the calls to @bar@ and @baz@, the monitors are acquired in opposite order, resulting in deadlock.1750 In the calls to @bar@ and @baz@, the monitors are acquired in opposite order, possibly resulting in deadlock. 1806 1751 However, this case is the simplest instance of the \emph{nested-monitor problem}~\cite{Lister77}, where monitors are acquired in sequence versus bulk. 1807 1752 Detecting the nested-monitor problem requires dynamic tracking of monitor calls, and dealing with it requires rollback semantics~\cite{Dice10}. 1808 1753 \CFA does not deal with this fundamental problem. 1809 1754 1810 1811 \subsection{\protect\lstinline@mutex@ statement} 1812 \label{mutex-stmt} 1813 1814 The monitor call-semantics associate all locking semantics with functions. 1815 Like Java, \CFA offers an alternative @mutex@ statement to reduce refactoring and naming. 1755 Finally, like Java, \CFA offers an alternative @mutex@ statement to reduce refactoring and naming. 1816 1756 \begin{cquote} 1757 \renewcommand{\arraystretch}{0.0} 1817 1758 \begin{tabular}{@{}l@{\hspace{3\parindentlnth}}l@{}} 1818 1759 \multicolumn{1}{c}{\textbf{\lstinline@mutex@ call}} & \multicolumn{1}{c}{\lstinline@mutex@ \textbf{statement}} \\ 1819 \begin{cfa} 1760 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 1820 1761 monitor M { ... }; 1821 1762 void foo( M & mutex m1, M & mutex m2 ) { … … 1827 1768 \end{cfa} 1828 1769 & 1829 \begin{cfa} 1770 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 1830 1771 1831 1772 void bar( M & m1, M & m2 ) { … … 1843 1784 \label{s:Scheduling} 1844 1785 1786 % There are many aspects of scheduling in a concurrency system, all related to resource utilization by waiting threads, \ie which thread gets the resource next. 1787 % Different forms of scheduling include access to processors by threads (see Section~\ref{s:RuntimeStructureCluster}), another is access to a shared resource by a lock or monitor. 1788 This section discusses monitor scheduling for waiting threads eligible for entry, \ie which thread gets the shared resource next. (See Section~\ref{s:RuntimeStructureCluster} for scheduling threads on virtual processors.) 1845 1789 While monitor mutual-exclusion provides safe access to shared data, the monitor data may indicate that a thread accessing it cannot proceed, \eg a bounded buffer may be full/empty so produce/consumer threads must block. 1846 1790 Leaving the monitor and trying again (busy waiting) is impractical for high-level programming. 1847 1791 Monitors eliminate busy waiting by providing synchronization to schedule threads needing access to the shared data, where threads block versus spinning. 1848 Synchronization is generally achieved with internal~\cite{Hoare74} or external~\cite[\S~2.9.2]{uC++} scheduling , where \newterm{scheduling} defines which thread acquires the critical section next.1792 Synchronization is generally achieved with internal~\cite{Hoare74} or external~\cite[\S~2.9.2]{uC++} scheduling. 1849 1793 \newterm{Internal scheduling} is characterized by each thread entering the monitor and making an individual decision about proceeding or blocking, while \newterm{external scheduling} is characterized by an entering thread making a decision about proceeding for itself and on behalf of other threads attempting entry. 1850 1794 Finally, \CFA monitors do not allow calling threads to barge ahead of signalled threads, which simplifies synchronization among threads in the monitor and increases correctness. 1851 1795 If barging is allowed, synchronization between a signaller and signallee is difficult, often requiring additional flags and multiple unblock/block cycles. 1852 In fact, signals-as-hints is completely opposite from that proposed by Hoare in the seminal paper on monitors :1853 \begin{cquote}1854 However, we decree that a signal operation be followed immediately by resumption of a waiting program, without possibility of an intervening procedure call from yet a third program.1855 It is only in this way that a waiting program has an absolute guarantee that it can acquire the resource just released by the signalling program without any danger that a third program will interpose a monitor entry and seize the resource instead.~\cite[p.~550]{Hoare74}1856 \end{cquote}1796 In fact, signals-as-hints is completely opposite from that proposed by Hoare in the seminal paper on monitors~\cite[p.~550]{Hoare74}. 1797 % \begin{cquote} 1798 % However, we decree that a signal operation be followed immediately by resumption of a waiting program, without possibility of an intervening procedure call from yet a third program. 1799 % It is only in this way that a waiting program has an absolute guarantee that it can acquire the resource just released by the signalling program without any danger that a third program will interpose a monitor entry and seize the resource instead.~\cite[p.~550]{Hoare74} 1800 % \end{cquote} 1857 1801 Furthermore, \CFA concurrency has no spurious wakeup~\cite[\S~9]{Buhr05a}, which eliminates an implicit form of barging. 1858 Hence, a \CFA @wait@ statement is not enclosed in a @while@ loop that can cause thread starvation. 1859 1860 Figure~\ref{f:BBInt} shows a \CFA generic bounded-buffer with internal scheduling, where producers/consumers enter the monitor, see the buffer is full/empty, and block on an appropriate condition lock, @full@/@empty@. 1802 Hence, a \CFA @wait@ statement is not enclosed in a @while@ loop retesting a blocking predicate, which can cause thread starvation due to barging. 1803 1804 Figure~\ref{f:MonitorScheduling} shows internal/external scheduling (for the bounded-buffer example in Figure~\ref{f:InternalExternalScheduling}). 1805 External calling threads block on the calling queue, if the monitor is occupied, otherwise they enter in FIFO order. 1806 Internal threads block on condition queues via @wait@ and they reenter from the condition in FIFO order. 1807 1808 There are three signalling mechanisms to unblock waiting threads to enter the monitor. 1809 Note, signalling cannot have the signaller and signalled thread in the monitor simultaneously because of the mutual exclusion so only can proceed. 1810 For internal scheduling, threads are unblocked from condition queues using @signal@, where the signallee is moved to urgent and the signaller continues (solid line). 1811 Multiple signals move multiple signallees to urgent, until the condition is empty. 1812 When the signaller exits or waits, a thread blocked on urgent is processed before calling threads to prevent barging. 1813 (Java conceptually moves the signalled thread to the calling queue, and hence, allows barging.) 1814 The alternative unblock is in the opposite order using @signal_block@, where the signaller is moved to urgent and the signallee continues (dashed line), and is implicitly unblocked from urgent when the signallee exits or waits. 1815 1816 For external scheduling, the condition queues are not used; 1817 instead threads are unblocked directly from the calling queue using @waitfor@ based on function names requesting mutual exclusion. 1818 (The linear search through the calling queue to locate a particular call can be reduced to $O(1)$.) 1819 The @waitfor@ has the same semantics as @signal_block@, where the signalled thread executes before the signallee, which waits on urgent. 1820 Executing multiple @waitfor@s from different signalled functions causes the calling threads to move to urgent. 1821 External scheduling requires urgent to be a stack, because the signaller excepts to execute immediately after the specified monitor call has exited or waited. 1822 Internal scheduling behaves the same for an urgent stack or queue, except for signalling multiple threads, where the threads unblock from urgent in reverse order from signalling. 1823 If the restart order is important, multiple signalling by a signal thread can be transformed into shared signalling among threads, where each thread signals the next thread. 1824 Hence, \CFA uses an urgent stack. 1825 1826 \begin{figure} 1827 \centering 1828 % \subfloat[Scheduling Statements] { 1829 % \label{fig:SchedulingStatements} 1830 % {\resizebox{0.45\textwidth}{!}{\input{CondSigWait.pstex_t}}} 1831 \input{CondSigWait.pstex_t} 1832 % }% subfloat 1833 % \quad 1834 % \subfloat[Bulk acquire monitor] { 1835 % \label{fig:BulkMonitor} 1836 % {\resizebox{0.45\textwidth}{!}{\input{ext_monitor.pstex_t}}} 1837 % }% subfloat 1838 \caption{Monitor Scheduling} 1839 \label{f:MonitorScheduling} 1840 \end{figure} 1841 1842 Figure~\ref{f:BBInt} shows a \CFA generic bounded-buffer with internal scheduling, where producers/consumers enter the monitor, see the buffer is full/empty, and block on an appropriate condition variable, @full@/@empty@. 1861 1843 The @wait@ function atomically blocks the calling thread and implicitly releases the monitor lock(s) for all monitors in the function's parameter list. 1862 The appropriate condition lock is signalled to unblock an opposite kind of thread after an element is inserted/removed from the buffer. 1863 Signalling is unconditional, because signalling an empty condition lock does nothing. 1864 1865 Signalling semantics cannot have the signaller and signalled thread in the monitor simultaneously, which means: 1866 \begin{enumerate} 1867 \item 1868 The signalling thread returns immediately, and the signalled thread continues. 1869 \item 1870 The signalling thread continues and the signalled thread is marked for urgent unblocking at the next scheduling point (exit/wait). 1871 \item 1872 The signalling thread blocks but is marked for urgrent unblocking at the next scheduling point and the signalled thread continues. 1873 \end{enumerate} 1874 The first approach is too restrictive, as it precludes solving a reasonable class of problems, \eg dating service (see Figure~\ref{f:DatingService}). 1875 \CFA supports the next two semantics as both are useful. 1876 Finally, while it is common to store a @condition@ as a field of the monitor, in \CFA, a @condition@ variable can be created/stored independently. 1877 Furthermore, a condition variable is tied to a \emph{group} of monitors on first use, called \newterm{branding}, which means that using internal scheduling with distinct sets of monitors requires one condition variable per set of monitors. 1844 The appropriate condition variable is signalled to unblock an opposite kind of thread after an element is inserted/removed from the buffer. 1845 Signalling is unconditional, because signalling an empty condition variable does nothing. 1846 It is common to declare condition variables as monitor fields to prevent shared access, hence no locking is required for access as the conditions are protected by the monitor lock. 1847 In \CFA, a condition variable can be created/stored independently. 1848 To still prevent expensive locking on access, a condition variable is tied to a \emph{group} of monitors on first use, called \newterm{branding}, resulting in a low-cost boolen test to detect sharing from other monitors. 1849 1850 % Signalling semantics cannot have the signaller and signalled thread in the monitor simultaneously, which means: 1851 % \begin{enumerate} 1852 % \item 1853 % The signalling thread returns immediately and the signalled thread continues. 1854 % \item 1855 % The signalling thread continues and the signalled thread is marked for urgent unblocking at the next scheduling point (exit/wait). 1856 % \item 1857 % The signalling thread blocks but is marked for urgrent unblocking at the next scheduling point and the signalled thread continues. 1858 % \end{enumerate} 1859 % The first approach is too restrictive, as it precludes solving a reasonable class of problems, \eg dating service (see Figure~\ref{f:DatingService}). 1860 % \CFA supports the next two semantics as both are useful. 1878 1861 1879 1862 \begin{figure} … … 1889 1872 }; 1890 1873 void ?{}( Buffer(T) & buffer ) with(buffer) { 1891 [front, back, count]= 0;1874 front = back = count = 0; 1892 1875 } 1893 1894 1876 void insert( Buffer(T) & mutex buffer, T elem ) 1895 1877 with(buffer) { … … 1908 1890 \end{lrbox} 1909 1891 1892 % \newbox\myboxB 1893 % \begin{lrbox}{\myboxB} 1894 % \begin{cfa}[aboveskip=0pt,belowskip=0pt] 1895 % forall( otype T ) { // distribute forall 1896 % monitor Buffer { 1897 % 1898 % int front, back, count; 1899 % T elements[10]; 1900 % }; 1901 % void ?{}( Buffer(T) & buffer ) with(buffer) { 1902 % [front, back, count] = 0; 1903 % } 1904 % T remove( Buffer(T) & mutex buffer ); // forward 1905 % void insert( Buffer(T) & mutex buffer, T elem ) 1906 % with(buffer) { 1907 % if ( count == 10 ) `waitfor( remove, buffer )`; 1908 % // insert elem into buffer 1909 % 1910 % } 1911 % T remove( Buffer(T) & mutex buffer ) with(buffer) { 1912 % if ( count == 0 ) `waitfor( insert, buffer )`; 1913 % // remove elem from buffer 1914 % 1915 % return elem; 1916 % } 1917 % } 1918 % \end{cfa} 1919 % \end{lrbox} 1920 1910 1921 \newbox\myboxB 1911 1922 \begin{lrbox}{\myboxB} 1912 1923 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 1913 forall( otype T ) { // distribute forall 1914 monitor Buffer { 1915 1916 int front, back, count; 1917 T elements[10]; 1918 }; 1919 void ?{}( Buffer(T) & buffer ) with(buffer) { 1920 [front, back, count] = 0; 1921 } 1922 T remove( Buffer(T) & mutex buffer ); // forward 1923 void insert( Buffer(T) & mutex buffer, T elem ) 1924 with(buffer) { 1925 if ( count == 10 ) `waitfor( remove, buffer )`; 1926 // insert elem into buffer 1927 1928 } 1929 T remove( Buffer(T) & mutex buffer ) with(buffer) { 1930 if ( count == 0 ) `waitfor( insert, buffer )`; 1931 // remove elem from buffer 1932 1933 return elem; 1934 } 1935 } 1924 monitor ReadersWriter { 1925 int rcnt, wcnt; // readers/writer using resource 1926 }; 1927 void ?{}( ReadersWriter & rw ) with(rw) { 1928 rcnt = wcnt = 0; 1929 } 1930 void EndRead( ReadersWriter & mutex rw ) with(rw) { 1931 rcnt -= 1; 1932 } 1933 void EndWrite( ReadersWriter & mutex rw ) with(rw) { 1934 wcnt = 0; 1935 } 1936 void StartRead( ReadersWriter & mutex rw ) with(rw) { 1937 if ( wcnt > 0 ) `waitfor( EndWrite, rw );` 1938 rcnt += 1; 1939 } 1940 void StartWrite( ReadersWriter & mutex rw ) with(rw) { 1941 if ( wcnt > 0 ) `waitfor( EndWrite, rw );` 1942 else while ( rcnt > 0 ) `waitfor( EndRead, rw );` 1943 wcnt = 1; 1944 } 1945 1936 1946 \end{cfa} 1937 1947 \end{lrbox} 1938 1948 1939 \subfloat[Internal scheduling]{\label{f:BBInt}\usebox\myboxA} 1940 %\qquad 1941 \subfloat[External scheduling]{\label{f:BBExt}\usebox\myboxB} 1942 \caption{Generic bounded-buffer} 1943 \label{f:GenericBoundedBuffer} 1949 \subfloat[Generic bounded buffer, internal scheduling]{\label{f:BBInt}\usebox\myboxA} 1950 \hspace{3pt} 1951 \vrule 1952 \hspace{3pt} 1953 \subfloat[Readers / writer lock, external scheduling]{\label{f:RWExt}\usebox\myboxB} 1954 1955 \caption{Internal / external scheduling} 1956 \label{f:InternalExternalScheduling} 1944 1957 \end{figure} 1945 1958 1946 Figure~\ref{f:BBExt} shows a \CFA generic bounded-buffer with external scheduling, where producers/consumers detecting a full/\-empty buffer block and prevent more producers/consumers from entering the monitor until there is a free/empty slot in the buffer. 1959 Figure~\ref{f:BBInt} can be transformed into external scheduling by removing the condition variables and signals/waits, and adding the following lines at the locations of the current @wait@s in @insert@/@remove@, respectively. 1960 \begin{cfa}[aboveskip=2pt,belowskip=1pt] 1961 if ( count == 10 ) `waitfor( remove, buffer )`; | if ( count == 0 ) `waitfor( insert, buffer )`; 1962 \end{cfa} 1963 Here, the producers/consumers detects a full/\-empty buffer and prevents more producers/consumers from entering the monitor until there is a free/empty slot in the buffer. 1947 1964 External scheduling is controlled by the @waitfor@ statement, which atomically blocks the calling thread, releases the monitor lock, and restricts the function calls that can next acquire mutual exclusion. 1948 1965 If the buffer is full, only calls to @remove@ can acquire the buffer, and if the buffer is empty, only calls to @insert@ can acquire the buffer. 1949 Threads making calls to functions that are currently excluded, block outside of (external to) the monitor on a calling queue, versus blocking on condition queues inside of (internal to) the monitor. 1950 External scheduling allows users to wait for events from other threads without concern of unrelated events occurring. 1966 Threads making calls to functions that are currently excluded block outside of (external to) the monitor on the calling queue, versus blocking on condition queues inside of (internal to) the monitor. 1967 Figure~\ref{f:RWExt} shows a readers/writer lock written using external scheduling, where a waiting reader detects a writer using the resource and restricts further calls until the writer exits by calling @EndWrite@. 1968 The writer does a similar action for each reader or writer using the resource. 1969 Note, no new calls to @StarRead@/@StartWrite@ may occur when waiting for the call to @EndRead@/@EndWrite@. 1970 External scheduling allows waiting for events from other threads while restricting unrelated events. 1951 1971 The mechnaism can be done in terms of control flow, \eg Ada @accept@ or \uC @_Accept@, or in terms of data, \eg Go @select@ on channels. 1952 While both mechanisms have strengths and weaknesses, this project uses a control-flow mechanism to stay consistent with other language semantics. 1953 Two challenges specific to \CFA for external scheduling are loose object-definitions (see Section~\ref{s:LooseObjectDefinitions}) and multiple-monitor functions (see Section~\ref{s:Multi-MonitorScheduling}). 1954 1955 For internal scheduling, non-blocking signalling (as in the producer/consumer example) is used when the signaller is providing the cooperation for a waiting thread; 1956 the signaller enters the monitor and changes state, detects a waiting threads that can use the state, performs a non-blocking signal on the condition queue for the waiting thread, and exits the monitor to run concurrently. 1957 The waiter unblocks next from the urgent queue, uses/takes the state, and exits the monitor. 1958 Blocking signalling is the reverse, where the waiter is providing the cooperation for the signalling thread; 1959 the signaller enters the monitor, detects a waiting thread providing the necessary state, performs a blocking signal to place it on the urgent queue and unblock the waiter. 1960 The waiter changes state and exits the monitor, and the signaller unblocks next from the urgent queue to use/take the state. 1972 While both mechanisms have strengths and weaknesses, this project uses the control-flow mechanism to be consistent with other language features. 1973 % Two challenges specific to \CFA for external scheduling are loose object-definitions (see Section~\ref{s:LooseObjectDefinitions}) and multiple-monitor functions (see Section~\ref{s:Multi-MonitorScheduling}). 1961 1974 1962 1975 Figure~\ref{f:DatingService} shows a dating service demonstrating non-blocking and blocking signalling. … … 1981 1994 int GirlPhNo, BoyPhNo; 1982 1995 condition Girls[CCodes], Boys[CCodes]; 1983 condition exchange;1996 `condition exchange;` 1984 1997 }; 1985 1998 int girl( DS & mutex ds, int phNo, int ccode ) { … … 1992 2005 `signal( Boys[ccode] );` 1993 2006 `wait( exchange );` 1994 } // if2007 } 1995 2008 return BoyPhNo; 1996 2009 } … … 2035 2048 \end{figure} 2036 2049 2050 In summation, for internal scheduling, non-blocking signalling (as in the producer/consumer example) is used when the signaller is providing the cooperation for a waiting thread; 2051 the signaller enters the monitor and changes state, detects a waiting threads that can use the state, performs a non-blocking signal on the condition queue for the waiting thread, and exits the monitor to run concurrently. 2052 The waiter unblocks next from the urgent queue, uses/takes the state, and exits the monitor. 2053 Blocking signalling is the reverse, where the waiter is providing the cooperation for the signalling thread; 2054 the signaller enters the monitor, detects a waiting thread providing the necessary state, performs a blocking signal to place it on the urgent queue and unblock the waiter. 2055 The waiter changes state and exits the monitor, and the signaller unblocks next from the urgent queue to use/take the state. 2056 2037 2057 Both internal and external scheduling extend to multiple monitors in a natural way. 2038 2058 \begin{cquote} … … 2057 2077 \end{tabular} 2058 2078 \end{cquote} 2059 For @wait( e )@, the default semantics is to atomically block the signaller and release all acquired mutex types in the parameter list, \ie @wait( e, m1, m2 )@.2079 For @wait( e )@, the default semantics is to atomically block the signaller and release all acquired mutex parameters, \ie @wait( e, m1, m2 )@. 2060 2080 To override the implicit multi-monitor wait, specific mutex parameter(s) can be specified, \eg @wait( e, m1 )@. 2061 2081 Wait statically verifies the released monitors are the acquired mutex-parameters so unconditional release is safe. 2062 2082 While \CC supports bulk locking, @wait@ only accepts a single lock for a condition variable, so bulk locking with condition variables is asymmetric. 2063 2083 Finally, a signaller, 2084 \newpage 2064 2085 \begin{cfa} 2065 2086 void baz( M & mutex m1, M & mutex m2 ) { … … 2069 2090 must have acquired at least the same locks as the waiting thread signalled from the condition queue. 2070 2091 2071 Similarly, for @waitfor( rtn )@, the default semantics is to atomically block the acceptor and release all acquired mutex types in the parameter list, \ie @waitfor( rtn, m1, m2 )@.2092 Similarly, for @waitfor( rtn )@, the default semantics is to atomically block the acceptor and release all acquired mutex parameters, \ie @waitfor( rtn, m1, m2 )@. 2072 2093 To override the implicit multi-monitor wait, specific mutex parameter(s) can be specified, \eg @waitfor( rtn, m1 )@. 2073 2094 @waitfor@ statically verifies the released monitors are the same as the acquired mutex-parameters of the given function or function pointer. … … 2092 2113 While deadlock can occur with multiple/nesting acquisition, this is a consequence of locks, and by extension monitors, not being perfectly composable. 2093 2114 2094 % Supporting barging prevention as well as extending internal scheduling to multiple monitors is the main source of complexity in the design and implementation of \CFA concurrency. 2095 2096 2097 \subsection{Barging Prevention} 2098 2099 Figure~\ref{f:BargingPrevention} shows \CFA code where bulk acquire adds complexity to the internal-signalling semantics. 2115 2116 2117 \subsection{Extended \protect\lstinline@waitfor@} 2118 2119 Figure~\ref{f:ExtendedWaitfor} show the extended form of the @waitfor@ statement to conditionally accept one of a group of mutex functions, with an optional statement to be performed \emph{after} the mutex function finishes. 2120 For a @waitfor@ clause to be executed, its @when@ must be true and an outstanding call to its corresponding member(s) must exist. 2121 The \emph{conditional-expression} of a @when@ may call a function, but the function must not block or context switch. 2122 If there are multiple acceptable mutex calls, selection occurs top-to-bottom (prioritized) among the @waitfor@ clauses, whereas some programming languages with similar mechanisms accept nondeterministically for this case, \eg Go \lstinline[morekeywords=select]@select@. 2123 If some accept guards are true and there are no outstanding calls to these members, the acceptor is accept-blocked until a call to one of these members is made. 2124 If there is a @timeout@ clause, it provides an upper bound on waiting. 2125 If all the accept guards are false, the statement does nothing, unless there is a terminating @else@ clause with a true guard, which is executed instead. 2126 Hence, the terminating @else@ clause allows a conditional attempt to accept a call without blocking. 2127 If both @timeout@ and @else@ clause are present, the @else@ must be conditional, or the @timeout@ is never triggered. 2128 2129 \begin{figure} 2130 \centering 2131 \begin{cfa} 2132 `when` ( $\emph{conditional-expression}$ ) $\C{// optional guard}$ 2133 waitfor( $\emph{mutex-member-name}$ ) $\emph{statement}$ $\C{// action after call}$ 2134 `or` `when` ( $\emph{conditional-expression}$ ) $\C{// any number of functions}$ 2135 waitfor( $\emph{mutex-member-name}$ ) $\emph{statement}$ 2136 `or` ... 2137 `when` ( $\emph{conditional-expression}$ ) $\C{// optional guard}$ 2138 `timeout` $\emph{statement}$ $\C{// optional terminating timeout clause}$ 2139 `when` ( $\emph{conditional-expression}$ ) $\C{// optional guard}$ 2140 `else` $\emph{statement}$ $\C{// optional terminating clause}$ 2141 \end{cfa} 2142 \caption{Extended \protect\lstinline@waitfor@} 2143 \label{f:ExtendedWaitfor} 2144 \end{figure} 2145 2146 Note, a group of conditional @waitfor@ clauses is \emph{not} the same as a group of @if@ statements, e.g.: 2147 \begin{cfa} 2148 if ( C1 ) waitfor( mem1 ); when ( C1 ) waitfor( mem1 ); 2149 else if ( C2 ) waitfor( mem2 ); or when ( C2 ) waitfor( mem2 ); 2150 \end{cfa} 2151 The left example only accepts @mem1@ if @C1@ is true or only @mem2@ if @C2@ is true. 2152 The right example accepts either @mem1@ or @mem2@ if @C1@ and @C2@ are true. 2153 2154 An interesting use of @waitfor@ is accepting the @mutex@ destructor to know when an object is deallocated. 2155 \begin{cfa} 2156 void insert( Buffer(T) & mutex buffer, T elem ) with( buffer ) { 2157 if ( count == 10 ) 2158 waitfor( remove, buffer ) { 2159 // insert elem into buffer 2160 } or `waitfor( ^?{}, buffer )` throw insertFail; 2161 } 2162 \end{cfa} 2163 When the buffer is deallocated, the current waiter is unblocked and informed, so it can perform an appropriate action. 2164 However, the basic @waitfor@ semantics do not support this functionality, since using an object after its destructor is called is undefined. 2165 Therefore, to make this useful capability work, the semantics for accepting the destructor is the same as @signal@, \ie the call to the destructor is placed on the urgent queue and the acceptor continues execution, which throws an exception to the acceptor and then the caller is unblocked from the urgent queue to deallocate the object. 2166 Accepting the destructor is an idiomatic way to terminate a thread in \CFA. 2167 2168 2169 \subsection{Bulk Barging Prevention} 2170 2171 Figure~\ref{f:BulkBargingPrevention} shows \CFA code where bulk acquire adds complexity to the internal-signalling semantics. 2100 2172 The complexity begins at the end of the inner @mutex@ statement, where the semantics of internal scheduling need to be extended for multiple monitors. 2101 2173 The problem is that bulk acquire is used in the inner @mutex@ statement where one of the monitors is already acquired. 2102 2174 When the signalling thread reaches the end of the inner @mutex@ statement, it should transfer ownership of @m1@ and @m2@ to the waiting threads to prevent barging into the outer @mutex@ statement by another thread. 2103 However, both the signalling and waiting thread W1 still need monitor @m1@. 2175 However, both the signalling and waiting threads W1 and W2 need some subset of monitors @m1@ and @m2@. 2176 \begin{cquote} 2177 condition c: (order 1) W2(@m2@), W1(@m1@,@m2@)\ \ \ or\ \ \ (order 2) W1(@m1@,@m2@), W2(@m2@) \\ 2178 S: acq. @m1@ $\rightarrow$ acq. @m1,m2@ $\rightarrow$ @signal(c)@ $\rightarrow$ rel. @m2@ $\rightarrow$ pass @m2@ unblock W2 (order 2) $\rightarrow$ rel. @m1@ $\rightarrow$ pass @m1,m2@ unblock W1 \\ 2179 \hspace*{2.75in}$\rightarrow$ rel. @m1@ $\rightarrow$ pass @m1,m2@ unblock W1 (order 1) 2180 \end{cquote} 2104 2181 2105 2182 \begin{figure} … … 2113 2190 mutex( m1, m2 ) { // $\LstCommentStyle{\color{red}inner}$ 2114 2191 ... `signal( c )`; ... 2115 // m1, m2 acquired2192 // m1, m2 still acquired 2116 2193 } // $\LstCommentStyle{\color{red}release m2}$ 2117 2194 // m1 acquired … … 2128 2205 ... 2129 2206 mutex( m1, m2 ) { 2130 ... `wait( c )`; // block andrelease m1, m22131 // m1, m2 acquired2207 ... `wait( c )`; // release m1, m2 2208 // m1, m2 reacquired 2132 2209 } // $\LstCommentStyle{\color{red}release m2}$ 2133 2210 // m1 acquired … … 2142 2219 2143 2220 mutex( m2 ) { 2144 ... `wait( c )`; ...2145 // m2 acquired2221 ... `wait( c )`; // release m2 2222 // m2 reacquired 2146 2223 } // $\LstCommentStyle{\color{red}release m2}$ 2147 2224 … … 2153 2230 2154 2231 \begin{cquote} 2155 \subfloat[Signalling Thread ]{\label{f:SignallingThread}\usebox\myboxA}2232 \subfloat[Signalling Thread (S)]{\label{f:SignallingThread}\usebox\myboxA} 2156 2233 \hspace{3\parindentlnth} 2157 2234 \subfloat[Waiting Thread (W1)]{\label{f:WaitingThread}\usebox\myboxB} … … 2159 2236 \subfloat[Waiting Thread (W2)]{\label{f:OtherWaitingThread}\usebox\myboxC} 2160 2237 \end{cquote} 2161 \caption{B arging Prevention}2162 \label{f:B argingPrevention}2238 \caption{Bulk Barging Prevention} 2239 \label{f:BulkBargingPrevention} 2163 2240 \end{figure} 2164 2241 2165 One scheduling solution is for the signaller to keep ownership of all locks until the last lock is ready to be transferred, because this semantics fits most closely to the behaviour of single-monitor scheduling. 2166 However, Figure~\ref{f:OtherWaitingThread} shows this solution is complex depending on other waiters, resulting in options when the signaller finishes the inner mutex-statement. 2167 The signaller can retain @m2@ until completion of the outer mutex statement and pass the locks to waiter W1, or it can pass @m2@ to waiter W2 after completing the inner mutex-statement, while continuing to hold @m1@. 2168 In the latter case, waiter W2 must eventually pass @m2@ to waiter W1, which is complex because W1 may have waited before W2, so W2 is unaware of it. 2169 Furthermore, there is an execution sequence where the signaller always finds waiter W2, and hence, waiter W1 starves. 2170 2171 While a number of approaches were examined~\cite[\S~4.3]{Delisle18}, the solution chosen for \CFA is a novel techique called \newterm{partial signalling}. 2172 Signalled threads are moved to the urgent queue and the waiter at the front defines the set of monitors necessary for it to unblock. 2173 Partial signalling transfers ownership of monitors to the front waiter. 2174 When the signaller thread exits or waits in the monitor, the front waiter is unblocked if all its monitors are released. 2175 The benefit is encapsulating complexity into only two actions: passing monitors to the next owner when they should be released and conditionally waking threads if all conditions are met. 2242 One scheduling solution is for the signaller S to keep ownership of all locks until the last lock is ready to be transferred, because this semantics fits most closely to the behaviour of single-monitor scheduling. 2243 However, this solution is inefficient if W2 waited first and can be immediate passed @m2@ when released, while S retains @m1@ until completion of the outer mutex statement. 2244 If W1 waited first, the signaller must retain @m1@ amd @m2@ until completion of the outer mutex statement and then pass both to W1. 2245 % Furthermore, there is an execution sequence where the signaller always finds waiter W2, and hence, waiter W1 starves. 2246 To support this efficient semantics (and prevent barging), the implementation maintains a list of monitors acquired for each blocked thread. 2247 When a signaller exits or waits in a monitor function/statement, the front waiter on urgent is unblocked if all its monitors are released. 2248 Implementing a fast subset check for the necessary released monitors is important. 2249 % The benefit is encapsulating complexity into only two actions: passing monitors to the next owner when they should be released and conditionally waking threads if all conditions are met. 2176 2250 2177 2251 … … 2180 2254 2181 2255 In an object-oriented programming-language, a class includes an exhaustive list of operations. 2182 However, new members can be added via static inheritance or dynamic members, \eg JavaScript~\cite{JavaScript}. 2183 Similarly, monitor functions can be added at any time in \CFA, making it less clear for programmers and more difficult to implement. 2184 \begin{cfa} 2185 monitor M { ... }; 2186 void `f`( M & mutex m ); 2187 void g( M & mutex m ) { waitfor( `f` ); } $\C{// clear which f}$ 2188 void `f`( M & mutex m, int ); $\C{// different f}$ 2189 void h( M & mutex m ) { waitfor( `f` ); } $\C{// unclear which f}$ 2190 \end{cfa} 2191 Hence, the \CFA code for entering a monitor looks like: 2192 \begin{cfa} 2193 if ( $\textrm{\textit{monitor is free}}$ ) $\C{// enter}$ 2194 else if ( $\textrm{\textit{already own monitor}}$ ) $\C{// continue}$ 2195 else if ( $\textrm{\textit{monitor accepts me}}$ ) $\C{// enter}$ 2196 else $\C{// block}$ 2197 \end{cfa} 2198 For the first two conditions, it is easy to implement a check that can evaluate the condition in a few instructions. 2199 However, a fast check for \emph{monitor accepts me} is much harder to implement depending on the constraints put on the monitors. 2200 Figure~\ref{fig:ClassicalMonitor} shows monitors are often expressed as an entry (calling) queue, some acceptor queues, and an urgent stack/queue. 2201 2202 \begin{figure} 2203 \centering 2204 \subfloat[Classical monitor] { 2205 \label{fig:ClassicalMonitor} 2206 {\resizebox{0.45\textwidth}{!}{\input{monitor.pstex_t}}} 2207 }% subfloat 2208 \quad 2209 \subfloat[Bulk acquire monitor] { 2210 \label{fig:BulkMonitor} 2211 {\resizebox{0.45\textwidth}{!}{\input{ext_monitor.pstex_t}}} 2212 }% subfloat 2213 \caption{Monitor implementation} 2214 \label{f:MonitorImplementation} 2215 \end{figure} 2216 2217 For a fixed (small) number of mutex functions (\eg 128), the accept check reduces to a bitmask of allowed callers, which can be checked with a single instruction. 2218 This approach requires a unique dense ordering of functions with a small upper-bound and the ordering must be consistent across translation units. 2219 For object-oriented languages these constraints are common, but \CFA mutex functions can be added in any scope and are only visible in certain translation unit, precluding program-wide dense-ordering among mutex functions. 2220 2221 Figure~\ref{fig:BulkMonitor} shows the \CFA monitor implementation. 2222 The mutex function called is associated with each thread on the entry queue, while a list of acceptable functions is kept separately. 2223 The accepted list is a variable-sized array of accepted function pointers, so the single instruction bitmask comparison is replaced by dereferencing a pointer followed by a (usually short) linear search. 2256 A new class can add members via static inheritance but the subclass still has an exhaustive list of operations. 2257 (Dynamic member adding, \eg JavaScript~\cite{JavaScript}, is not considered.) 2258 In the object-oriented scenario, the type and all its operators are always present at compilation (even separate compilation), so it is possible to number the operations in a bit mask and use an $O(1)$ compare with a similar bit mask created for the operations specified in a @waitfor@. 2259 2260 In \CFA, monitor functions can be statically added/removed in translation units, so it is impossible to apply an $O(1)$ approach. 2261 \begin{cfa} 2262 monitor M { ... }; // common type, included in .h file 2263 translation unit 1 2264 void `f`( M & mutex m ); 2265 void g( M & mutex m ) { waitfor( `f`, m ); } 2266 translation unit 2 2267 void `f`( M & mutex m ); 2268 void `g`( M & mutex m ); 2269 void h( M & mutex m ) { waitfor( `f`, m ) or waitfor( `g`, m ); } 2270 \end{cfa} 2271 The @waitfor@ statements in each translation unit cannot form a unique bit-mask because the monitor type does not carry that information. 2272 Hence, function pointers are used to identify the functions listed in the @waitfor@ statement, stored in a variable-sized array, 2273 Then, the same implementation approach used for the urgent stack is used for the calling queue. 2274 Each caller has a list of monitors acquired, and the @waitfor@ statement performs a (usually short) linear search matching functions in the @waitfor@ list with called functions, and then verifying the associated mutex locks can be transfers. 2275 (A possible way to construct a dense mapping is at link or load-time.) 2224 2276 2225 2277 … … 2292 2344 \label{f:UnmatchedMutexSets} 2293 2345 \end{figure} 2294 2295 2296 \subsection{Extended \protect\lstinline@waitfor@}2297 2298 Figure~\ref{f:ExtendedWaitfor} show the extended form of the @waitfor@ statement to conditionally accept one of a group of mutex functions, with a specific action to be performed \emph{after} the mutex function finishes.2299 For a @waitfor@ clause to be executed, its @when@ must be true and an outstanding call to its corresponding member(s) must exist.2300 The \emph{conditional-expression} of a @when@ may call a function, but the function must not block or context switch.2301 If there are multiple acceptable mutex calls, selection occurs top-to-bottom (prioritized) in the @waitfor@ clauses, whereas some programming languages with similar mechanisms accept nondeterministically for this case, \eg Go \lstinline[morekeywords=select]@select@.2302 If some accept guards are true and there are no outstanding calls to these members, the acceptor is accept-blocked until a call to one of these members is made.2303 If all the accept guards are false, the statement does nothing, unless there is a terminating @else@ clause with a true guard, which is executed instead.2304 Hence, the terminating @else@ clause allows a conditional attempt to accept a call without blocking.2305 If there is a @timeout@ clause, it provides an upper bound on waiting.2306 If both a @timeout@ clause and an @else@ clause are present, the @else@ must be conditional, or the @timeout@ is never triggered.2307 In all cases, the statement following is executed \emph{after} a clause is executed to know which of the clauses executed.2308 2309 \begin{figure}2310 \centering2311 \begin{cfa}2312 `when` ( $\emph{conditional-expression}$ ) $\C{// optional guard}$2313 waitfor( $\emph{mutex-member-name}$ )2314 $\emph{statement}$ $\C{// action after call}$2315 `or` `when` ( $\emph{conditional-expression}$ ) $\C{// optional guard}$2316 waitfor( $\emph{mutex-member-name}$ )2317 $\emph{statement}$ $\C{// action after call}$2318 `or` ... $\C{// list of waitfor clauses}$2319 `when` ( $\emph{conditional-expression}$ ) $\C{// optional guard}$2320 `timeout` $\C{// optional terminating timeout clause}$2321 $\emph{statement}$ $\C{// action after timeout}$2322 `when` ( $\emph{conditional-expression}$ ) $\C{// optional guard}$2323 `else` $\C{// optional terminating clause}$2324 $\emph{statement}$ $\C{// action when no immediate calls}$2325 \end{cfa}2326 \caption{Extended \protect\lstinline@waitfor@}2327 \label{f:ExtendedWaitfor}2328 \end{figure}2329 2330 Note, a group of conditional @waitfor@ clauses is \emph{not} the same as a group of @if@ statements, e.g.:2331 \begin{cfa}2332 if ( C1 ) waitfor( mem1 ); when ( C1 ) waitfor( mem1 );2333 else if ( C2 ) waitfor( mem2 ); or when ( C2 ) waitfor( mem2 );2334 \end{cfa}2335 The left example only accepts @mem1@ if @C1@ is true or only @mem2@ if @C2@ is true.2336 The right example accepts either @mem1@ or @mem2@ if @C1@ and @C2@ are true.2337 2338 An interesting use of @waitfor@ is accepting the @mutex@ destructor to know when an object is deallocated.2339 \begin{cfa}2340 void insert( Buffer(T) & mutex buffer, T elem ) with( buffer ) {2341 if ( count == 10 )2342 waitfor( remove, buffer ) {2343 // insert elem into buffer2344 } or `waitfor( ^?{}, buffer )` throw insertFail;2345 }2346 \end{cfa}2347 When the buffer is deallocated, the current waiter is unblocked and informed, so it can perform an appropriate action.2348 However, the basic @waitfor@ semantics do not support this functionality, since using an object after its destructor is called is undefined.2349 Therefore, to make this useful capability work, the semantics for accepting the destructor is the same as @signal@, \ie the call to the destructor is placed on the urgent queue and the acceptor continues execution, which throws an exception to the acceptor and then the caller is unblocked from the urgent queue to deallocate the object.2350 Accepting the destructor is an idiomatic way to terminate a thread in \CFA.2351 2346 2352 2347 … … 2489 2484 For situations where threads do not require direct communication, case 9 provides faster creation/destruction by eliminating @mutex@ setup. 2490 2485 2491 \begin{table} [b]2486 \begin{table} 2492 2487 \caption{Object property composition} 2493 2488 \centering … … 2503 2498 No & No & \textbf{1}\ \ \ aggregate type & \textbf{2}\ \ \ @monitor@ aggregate type \\ 2504 2499 \hline 2505 No & Yes (stackless) & \textbf{3}\ \ \ @generator@ & \textbf{4}\ \ \ @monitor@ generator\\2500 No & Yes (stackless) & \textbf{3}\ \ \ @generator@ & \textbf{4}\ \ \ @monitor@ @generator@ \\ 2506 2501 \hline 2507 2502 No & Yes (stackful) & \textbf{5}\ \ \ @coroutine@ & \textbf{6}\ \ \ @monitor@ @coroutine@ \\ … … 2520 2515 2521 2516 2522 \section{Parallelism} 2523 \label{s:Parallelism} 2524 2525 Historically, computer performance was about processor speeds. 2526 However, with heat dissipation being a direct consequence of speed increase, parallelism is the new source for increased performance~\cite{Sutter05, Sutter05b}. 2527 Therefore, high-performance applications must care about parallelism, which requires concurrency. 2528 The lowest-level approach of parallelism is to use \newterm{kernel threads} in combination with semantics like @fork@, @join@, \etc. 2529 However, kernel threads are better as an implementation tool because of complexity and higher cost. 2530 Therefore, different abstractions are often layered onto kernel threads to simplify them, \eg pthreads. 2531 2532 2533 \subsection{User Threads} 2534 2535 A direct improvement on kernel threads is user threads, \eg Erlang~\cite{Erlang} and \uC~\cite{uC++book}. 2536 This approach provides an interface that matches the language paradigms, gives more control over concurrency by the language runtime, and an abstract (and portable) interface to the underlying kernel threads across operating systems. 2537 In many cases, user threads can be used on a much larger scale (100,000 threads). 2538 Like kernel threads, user threads support preemption, which maximizes nondeterminism, but increases the potential for concurrency errors: race, livelock, starvation, and deadlock. 2539 \CFA adopts user-threads as they represent the truest realization of concurrency and can build any other concurrency approach, \eg thread pools and actors~\cite{Actors}. 2540 2541 A variant of user thread is \newterm{fibres}, which removes preemption, \eg Go~\cite{Go} @goroutine@s. 2542 Like functional programming, which removes mutation and its associated problems, removing preemption from concurrency reduces nondeterminism, making race and deadlock errors more difficult to generate. 2543 However, preemption is necessary for concurrency that relies on spinning, so there are a class of problems that cannot be programmed with fibres. 2544 2545 2546 \subsection{Thread Pools} 2547 2548 In contrast to direct threading is indirect \newterm{thread pools}, \eg Java @executor@, where small jobs (work units) are inserted into a work pool for execution. 2549 If the jobs are dependent, \ie interact, there is an implicit/explicit dependency graph that ties them together. 2550 While removing direct concurrency, and hence the amount of context switching, thread pools significantly limit the interaction that can occur among jobs. 2551 Indeed, jobs should not block because that also blocks the underlying thread, which effectively means the CPU utilization, and therefore throughput, suffers. 2552 While it is possible to tune the thread pool with sufficient threads, it becomes difficult to obtain high throughput and good core utilization as job interaction increases. 2553 As well, concurrency errors return, which threads pools are suppose to mitigate. 2517 % \section{Parallelism} 2518 % \label{s:Parallelism} 2519 % 2520 % Historically, computer performance was about processor speeds. 2521 % However, with heat dissipation being a direct consequence of speed increase, parallelism is the new source for increased performance~\cite{Sutter05, Sutter05b}. 2522 % Therefore, high-performance applications must care about parallelism, which requires concurrency. 2523 % The lowest-level approach of parallelism is to use \newterm{kernel threads} in combination with semantics like @fork@, @join@, \etc. 2524 % However, kernel threads are better as an implementation tool because of complexity and higher cost. 2525 % Therefore, different abstractions are often layered onto kernel threads to simplify them, \eg pthreads. 2526 % 2527 % 2528 % \subsection{User Threads} 2529 % 2530 % A direct improvement on kernel threads is user threads, \eg Erlang~\cite{Erlang} and \uC~\cite{uC++book}. 2531 % This approach provides an interface that matches the language paradigms, gives more control over concurrency by the language runtime, and an abstract (and portable) interface to the underlying kernel threads across operating systems. 2532 % In many cases, user threads can be used on a much larger scale (100,000 threads). 2533 % Like kernel threads, user threads support preemption, which maximizes nondeterminism, but increases the potential for concurrency errors: race, livelock, starvation, and deadlock. 2534 % \CFA adopts user-threads to provide more flexibility and a low-cost mechanism to build any other concurrency approach, \eg thread pools and actors~\cite{Actors}. 2535 % 2536 % A variant of user thread is \newterm{fibres}, which removes preemption, \eg Go~\cite{Go} @goroutine@s. 2537 % Like functional programming, which removes mutation and its associated problems, removing preemption from concurrency reduces nondeterminism, making race and deadlock errors more difficult to generate. 2538 % However, preemption is necessary for fairness and to reduce tail-latency. 2539 % For concurrency that relies on spinning, if all cores spin the system is livelocked, whereas preemption breaks the livelock. 2540 % 2541 % 2542 % \subsection{Thread Pools} 2543 % 2544 % In contrast to direct threading is indirect \newterm{thread pools}, \eg Java @executor@, where small jobs (work units) are inserted into a work pool for execution. 2545 % If the jobs are dependent, \ie interact, there is an implicit/explicit dependency graph that ties them together. 2546 % While removing direct concurrency, and hence the amount of context switching, thread pools significantly limit the interaction that can occur among jobs. 2547 % Indeed, jobs should not block because that also blocks the underlying thread, which effectively means the CPU utilization, and therefore throughput, suffers. 2548 % While it is possible to tune the thread pool with sufficient threads, it becomes difficult to obtain high throughput and good core utilization as job interaction increases. 2549 % As well, concurrency errors return, which threads pools are suppose to mitigate. 2554 2550 2555 2551 2556 2552 \section{\protect\CFA Runtime Structure} 2553 \label{s:CFARuntimeStructure} 2557 2554 2558 2555 Figure~\ref{f:RunTimeStructure} illustrates the runtime structure of a \CFA program. … … 2571 2568 \label{s:RuntimeStructureCluster} 2572 2569 2573 A \newterm{cluster} is a collection of threads and virtual processors (abstract kernel-thread) that execute the threads (like a virtual machine).2570 A \newterm{cluster} is a collection of threads and virtual processors (abstract kernel-thread) that execute the threads from its own ready queue (like an OS). 2574 2571 The purpose of a cluster is to control the amount of parallelism that is possible among threads, plus scheduling and other execution defaults. 2575 2572 The default cluster-scheduler is single-queue multi-server, which provides automatic load-balancing of threads on processors. 2576 However, the scheduler is pluggable, supporting alternative schedulers .2573 However, the scheduler is pluggable, supporting alternative schedulers, such as multi-queue multi-server, with work-stealing/sharing. 2577 2574 If several clusters exist, both threads and virtual processors, can be explicitly migrated from one cluster to another. 2578 2575 No automatic load balancing among clusters is performed by \CFA. … … 2598 2595 2599 2596 2600 \subsection{Debug Kernel} 2601 2602 There are two versions of the \CFA runtime kernel: debug and non-debug. 2603 The debugging version has many runtime checks and internal assertions, \eg stack (non-writable) guard page, and checks for stack overflow whenever context switches occur among coroutines and threads, which catches most stack overflows. 2604 After a program is debugged, the non-debugging version can be used to decrease space and increase performance. 2605 2606 2597 \begin{comment} 2607 2598 \section{Implementation} 2608 2599 \label{s:Implementation} … … 2626 2617 The exception to this rule is the program main, \ie the initial kernel thread that is given to any program. 2627 2618 In order to respect C expectations, the stack of the initial kernel thread is used by program main rather than the main processor, allowing it to grow dynamically as in a normal C program. 2628 2629 Finally, an important aspect for a complete threading system is preemption, which introduces extra non-determinism via transparent interleaving, rather than cooperation among threads for proper scheduling and processor fairness from long-running threads. 2630 Because preemption frequency is usually long (1 millisecond) performance cost is negligible. 2619 \end{comment} 2620 2621 2622 \subsection{Preemption} 2623 2624 Nondeterministic preemption provides fairness from long running threads, and forces concurrent programmers to write more robust programs, rather than relying on section of code between cooperative scheduling to be atomic, 2625 A separate reason for not supporting preemption is that it significantly complicates the runtime system. 2631 2626 Preemption is normally handled by setting a count-down timer on each virtual processor. 2632 2627 When the timer expires, an interrupt is delivered, and the interrupt handler resets the count-down timer, and if the virtual processor is executing in user code, the signal handler performs a user-level context-switch, or if executing in the language runtime-kernel, the preemption is ignored or rolled forward to the point where the runtime kernel context switches back to user code. … … 2635 2630 The only issue with this approach is that signal masks from one kernel thread may be restored on another as part of returning from the signal handler; 2636 2631 therefore, the same signal mask is required for all virtual processors in a cluster. 2637 2638 However, on current UNIX systems: 2632 Because preemption frequency is usually long (1 millisecond) performance cost is negligible. 2633 2634 However, on current Linux systems: 2639 2635 \begin{cquote} 2640 2636 A process-directed signal may be delivered to any one of the threads that does not currently have the signal blocked. … … 2642 2638 SIGNAL(7) - Linux Programmer's Manual 2643 2639 \end{cquote} 2644 Hence, the timer-expiry signal, which is generated \emph{externally} by the UNIX kernel to the UNIX process, is delivered to any of its UNIXsubprocesses (kernel threads).2640 Hence, the timer-expiry signal, which is generated \emph{externally} by the Linux kernel to the Linux process, is delivered to any of its Linux subprocesses (kernel threads). 2645 2641 To ensure each virtual processor receives its own preemption signals, a discrete-event simulation is run on a special virtual processor, and only it sets and receives timer events. 2646 2642 Virtual processors register an expiration time with the discrete-event simulator, which is inserted in sorted order. 2647 2643 The simulation sets the count-down timer to the value at the head of the event list, and when the timer expires, all events less than or equal to the current time are processed. 2648 2644 Processing a preemption event sends an \emph{internal} @SIGUSR1@ signal to the registered virtual processor, which is always delivered to that processor. 2645 2646 2647 \subsection{Debug Kernel} 2648 2649 There are two versions of the \CFA runtime kernel: debug and non-debug. 2650 The debugging version has many runtime checks and internal assertions, \eg stack (non-writable) guard page, and checks for stack overflow whenever context switches occur among coroutines and threads, which catches most stack overflows. 2651 After a program is debugged, the non-debugging version can be used to decrease space and increase performance. 2649 2652 2650 2653 … … 2689 2692 2690 2693 All benchmarks are run using the following harness. 2691 \newpage2692 2694 \begin{cfa} 2693 2695 unsigned int N = 10_000_000; … … 2697 2699 Each benchmark is performed @N@ times, where @N@ varies depending on the benchmark; 2698 2700 the total time is divided by @N@ to obtain the average time for a benchmark. 2699 All omitted tests for other languages are functionally identical to the shown \CFA test. 2701 All omitted tests for other languages are functionally identical to the \CFA tests and available online~\cite{CforallBenchMarks}. 2702 2703 2704 \paragraph{Object Creation} 2705 2706 Object creation is measured by creating/deleting the specific kind of concurrent object. 2707 Figure~\ref{f:creation} shows the code for \CFA, with results in Table~\ref{tab:creation}. 2708 The only note here is that the call stacks of \CFA coroutines are lazily created, therefore without priming the coroutine to force stack creation, the creation cost is artificially low. 2709 2710 \newpage 2711 \begin{multicols}{2} 2712 \lstset{language=CFA,moredelim=**[is][\color{red}]{@}{@},deletedelim=**[is][]{`}{`}} 2713 \begin{cfa} 2714 thread MyThread {}; 2715 void main( MyThread & ) {} 2716 int main() { 2717 BENCH( for ( N ) { @MyThread m;@ } ) 2718 sout | result`ns; 2719 } 2720 \end{cfa} 2721 \captionof{figure}{\CFA object-creation benchmark} 2722 \label{f:creation} 2723 2724 \columnbreak 2725 2726 \vspace*{-16pt} 2727 \captionof{table}{Object creation comparison (nanoseconds)} 2728 \label{tab:creation} 2729 2730 \begin{tabular}[t]{@{}r*{3}{D{.}{.}{5.2}}@{}} 2731 \multicolumn{1}{@{}c}{} & \multicolumn{1}{c}{Median} & \multicolumn{1}{c}{Average} & \multicolumn{1}{c@{}}{Std Dev} \\ 2732 Pthreads & 28091 & 28073.39 & 163.1 \\ 2733 \CFA Coroutine Lazy & 6 & 6.07 & 0.26 \\ 2734 \CFA Coroutine Eager & 520 & 520.61 & 2.04 \\ 2735 \CFA Thread & 2032 & 2016.29 & 112.07 \\ 2736 \uC Coroutine & 106 & 107.36 & 1.47 \\ 2737 \uC Thread & 536.5 & 537.07 & 4.64 \\ 2738 Goroutine & 3103 & 3086.29 & 90.25 \\ 2739 Java Thread & 103416.5 & 103732.29 & 1137 \\ 2740 \end{tabular} 2741 \end{multicols} 2700 2742 2701 2743 … … 2714 2756 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 2715 2757 coroutine C {} c; 2716 void main( C & ) { for ( ;; ) { @suspend ();@ } }2758 void main( C & ) { for ( ;; ) { @suspend;@ } } 2717 2759 int main() { // coroutine test 2718 2760 BENCH( for ( N ) { @resume( c );@ } ) … … 2747 2789 \paragraph{Mutual-Exclusion} 2748 2790 2749 Mutual exclusionis measured by entering/leaving a critical section.2791 Uncontented mutual exclusion, which occurs frequently, is measured by entering/leaving a critical section. 2750 2792 For monitors, entering and leaving a monitor function is measured. 2751 2793 To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a @pthread_mutex@ lock is also measured. … … 2790 2832 \paragraph{Internal Scheduling} 2791 2833 2792 Internal scheduling is measured by waiting on and signalling a condition variable.2834 Internal scheduling is measured using a cycle of two threads signalling and waiting. 2793 2835 Figure~\ref{f:int-sched} shows the code for \CFA, with results in Table~\ref{tab:int-sched}. 2794 2836 Note, the incremental cost of bulk acquire for \CFA, which is largely a fixed cost for small numbers of mutex objects. … … 2801 2843 monitor M { condition c; } m; 2802 2844 void __attribute__((noinline)) 2803 do_call( M & mutex a1 ) { signal( c );}2845 do_call( M & mutex a1 ) { @signal( c );@ } 2804 2846 thread T {}; 2805 2847 void main( T & this ) { 2806 2848 while ( go == 0 ) { yield(); } 2807 while ( go == 1 ) { @do_call( m );@}2849 while ( go == 1 ) { do_call( m ); } 2808 2850 } 2809 2851 int __attribute__((noinline)) … … 2843 2885 \paragraph{External Scheduling} 2844 2886 2845 External scheduling is measured by accepting a call using the @waitfor@ statement (@_Accept@ in \uC).2887 External scheduling is measured using a cycle of two threads calling and accepting the call using the @waitfor@ statement. 2846 2888 Figure~\ref{f:ext-sched} shows the code for \CFA, with results in Table~\ref{tab:ext-sched}. 2847 2889 Note, the incremental cost of bulk acquire for \CFA, which is largely a fixed cost for small numbers of mutex objects. … … 2890 2932 2891 2933 2892 2893 \paragraph{Object Creation}2894 2895 Object creation is measured by creating/deleting the specific kind of concurrent object.2896 Figure~\ref{f:creation} shows the code for \CFA, with results in Table~\ref{tab:creation}.2897 The only note here is that the call stacks of \CFA coroutines are lazily created, therefore without priming the coroutine to force stack creation, the creation cost is artificially low.2898 2899 \begin{multicols}{2}2900 \lstset{language=CFA,moredelim=**[is][\color{red}]{@}{@},deletedelim=**[is][]{`}{`}}2901 \begin{cfa}2902 thread MyThread {};2903 void main( MyThread & ) {}2904 int main() {2905 BENCH( for ( N ) { @MyThread m;@ } )2906 sout | result`ns;2907 }2908 \end{cfa}2909 \captionof{figure}{\CFA object-creation benchmark}2910 \label{f:creation}2911 2912 \columnbreak2913 2914 \vspace*{-16pt}2915 \captionof{table}{Object creation comparison (nanoseconds)}2916 \label{tab:creation}2917 2918 \begin{tabular}[t]{@{}r*{3}{D{.}{.}{5.2}}@{}}2919 \multicolumn{1}{@{}c}{} & \multicolumn{1}{c}{Median} & \multicolumn{1}{c}{Average} & \multicolumn{1}{c@{}}{Std Dev} \\2920 Pthreads & 28091 & 28073.39 & 163.1 \\2921 \CFA Coroutine Lazy & 6 & 6.07 & 0.26 \\2922 \CFA Coroutine Eager & 520 & 520.61 & 2.04 \\2923 \CFA Thread & 2032 & 2016.29 & 112.07 \\2924 \uC Coroutine & 106 & 107.36 & 1.47 \\2925 \uC Thread & 536.5 & 537.07 & 4.64 \\2926 Goroutine & 3103 & 3086.29 & 90.25 \\2927 Java Thread & 103416.5 & 103732.29 & 1137 \\2928 \end{tabular}2929 \end{multicols}2930 2931 2932 2934 \section{Conclusion} 2933 2935 … … 2940 2942 The \CFA runtime provides concurrency based on a preemptive M:N user-level threading-system, executing in clusters, which encapsulate scheduling of work on multiple kernel threads providing parallelism. 2941 2943 The M:N model is judged to be efficient and provide greater flexibility than a 1:1 threading model. 2942 These concepts and the entire\CFA runtime-system are written in the \CFA language, extensively leveraging the \CFA type-system, which demonstrates the expressiveness of the \CFA language.2944 These concepts and the \CFA runtime-system are written in the \CFA language, extensively leveraging the \CFA type-system, which demonstrates the expressiveness of the \CFA language. 2943 2945 Performance comparisons with other concurrent systems/languages show the \CFA approach is competitive across all low-level operations, which translates directly into good performance in well-written concurrent applications. 2944 2946 C programmers should feel comfortable using these mechanisms for developing complex control-flow in applications, with the ability to obtain maximum available performance by selecting mechanisms at the appropriate level of need. … … 2991 2993 2992 2994 The authors would like to recognize the design assistance of Aaron Moss, Rob Schluntz and Andrew Beach on the features described in this paper. 2993 Funding for this project has been provided by Huawei Ltd.\ (\url{http://www.huawei.com}) , and Peter Buhr is partially funded by the Natural Sciences and Engineering Research Council of Canada.2995 Funding for this project has been provided by Huawei Ltd.\ (\url{http://www.huawei.com}). %, and Peter Buhr is partially funded by the Natural Sciences and Engineering Research Council of Canada. 2994 2996 2995 2997 {% -
doc/papers/concurrency/figures/monitor.fig
r9d5089e rea05f8d 8 8 -2 9 9 1200 2 10 5 1 0 1 -1 -1 0 0 -1 0.000 0 1 0 0 1500.000 3600.000 1500 3300 1200 3600 1500 3900 11 5 1 0 1 -1 -1 0 0 -1 0.000 0 1 0 0 1500.000 4500.000 1500 4200 1200 4500 1500 4800 12 6 2400 2400 2700 2700 13 1 3 0 1 -1 -1 1 0 4 0.000 1 0.0000 2550 2550 105 105 2550 2550 2655 2550 14 4 1 -1 0 0 0 10 0.0000 2 105 90 2550 2610 b\001 15 -6 16 6 2400 2700 2700 3000 17 1 3 0 1 -1 -1 1 0 4 0.000 1 0.0000 2550 2850 105 105 2550 2850 2655 2850 18 4 1 -1 0 0 0 10 0.0000 2 75 75 2550 2895 a\001 19 -6 20 6 3300 2400 3600 2700 21 1 3 0 1 -1 -1 1 0 4 0.000 1 0.0000 3450 2550 105 105 3450 2550 3555 2550 22 4 1 -1 0 0 0 10 0.0000 2 105 90 3450 2610 d\001 23 -6 24 6 1350 5550 5325 5850 25 1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 1500 5700 80 80 1500 5700 1580 5780 26 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 2850 5700 105 105 2850 5700 2955 5805 27 1 3 0 1 -1 -1 0 0 4 0.000 1 0.0000 4350 5700 105 105 4350 5700 4455 5805 28 4 0 -1 0 0 0 12 0.0000 2 180 765 4575 5775 duplicate\001 29 4 0 -1 0 0 0 12 0.0000 2 135 1035 3075 5775 blocked task\001 30 4 0 -1 0 0 0 12 0.0000 2 135 870 1650 5775 active task\001 31 -6 32 6 4200 2100 4500 2400 33 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4350 2250 105 105 4350 2250 4455 2355 34 4 1 -1 0 0 0 10 0.0000 2 105 90 4350 2310 d\001 10 5 1 0 1 -1 -1 0 0 -1 0.000 0 1 0 0 1500.000 3300.000 1500 3000 1200 3300 1500 3600 11 5 1 0 1 -1 -1 0 0 -1 0.000 0 1 0 0 1500.000 4200.000 1500 3900 1200 4200 1500 4500 12 6 1350 5250 5325 5550 13 1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 1500 5400 80 80 1500 5400 1580 5480 14 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 2850 5400 105 105 2850 5400 2955 5505 15 1 3 0 1 -1 -1 0 0 4 0.000 1 0.0000 4350 5400 105 105 4350 5400 4455 5505 16 4 0 -1 0 0 0 12 0.0000 2 180 765 4575 5475 duplicate\001 17 4 0 -1 0 0 0 12 0.0000 2 135 1035 3075 5475 blocked task\001 18 4 0 -1 0 0 0 12 0.0000 2 135 870 1650 5475 active task\001 35 19 -6 36 20 6 4200 1800 4500 2100 37 21 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4350 1950 105 105 4350 1950 4455 2055 38 4 1 -1 0 0 0 10 0.0000 2 105 90 4350 2010 b\00122 4 1 -1 0 0 0 10 0.0000 2 105 90 4350 2010 d\001 39 23 -6 40 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 1650 3750 105 105 1650 3750 1755 3855 41 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 1950 3750 105 105 1950 3750 2055 3855 42 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4950 4050 105 105 4950 4050 5055 4155 43 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 5250 4050 105 105 5250 4050 5355 4155 44 1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3450 4725 80 80 3450 4725 3530 4805 45 1 3 0 1 -1 -1 1 0 4 0.000 1 0.0000 3450 2850 105 105 3450 2850 3555 2850 46 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4350 2850 105 105 4350 2850 4455 2955 24 6 4200 1500 4500 1800 25 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4350 1650 105 105 4350 1650 4455 1755 26 4 1 -1 0 0 0 10 0.0000 2 105 90 4350 1710 b\001 27 -6 28 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 1650 3450 105 105 1650 3450 1755 3555 29 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 1950 3450 105 105 1950 3450 2055 3555 30 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4950 3750 105 105 4950 3750 5055 3855 31 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 5250 3750 105 105 5250 3750 5355 3855 32 1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3450 4425 80 80 3450 4425 3530 4505 47 33 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4350 2550 105 105 4350 2550 4455 2655 34 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4350 2250 105 105 4350 2250 4455 2355 35 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 5 36 1500 3000 2100 3000 2100 2700 2400 2700 2400 2100 37 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 4 38 1500 3600 2100 3600 2100 3900 1500 3900 39 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 3 40 1500 3300 2100 3300 2250 3525 48 41 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2 49 2400 3000 2625 3150 42 2100 3000 1950 3225 43 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 3 44 1500 4200 2100 4200 2250 4425 50 45 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2 51 3300 3000 3525 3150 52 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 5 53 1500 3300 2100 3300 2100 3000 2400 3000 2400 2400 46 2100 3900 1950 4125 54 47 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 4 55 1500 3900 2100 3900 2100 4200 1500 4200 56 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 3 57 1500 3600 2100 3600 2250 3825 48 1500 4500 2100 4500 2100 4800 3300 4800 58 49 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2 59 2100 3300 1950 3525 60 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 3 61 1500 4500 2100 4500 2250 4725 50 4800 3600 4650 3825 62 51 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2 63 2100 4200 1950 4425 52 3300 4800 3525 4950 53 2 2 1 1 -1 -1 0 0 -1 4.000 0 0 0 0 0 5 54 4200 4050 4200 3150 2700 3150 2700 4050 4200 4050 64 55 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 4 65 1500 4800 2100 4800 2100 5100 3300 5100 66 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2 67 4800 3900 4650 4125 68 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2 69 3300 5100 3525 5250 70 2 2 1 1 -1 -1 0 0 -1 4.000 0 0 0 0 0 5 71 4200 4350 4200 3450 2700 3450 2700 4350 4200 4350 72 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 4 73 2700 2400 2700 3000 3300 3000 3300 2400 74 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 4 75 3600 2400 3600 3000 4050 3000 4050 1800 56 3600 2100 3600 2700 4050 2700 4050 1500 76 57 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 9 77 3600 5100 4800 5100 4800 4200 5400 4200 5400 3900 4800 390078 4800 3000 4500 3000 4500 180058 3600 4800 4800 4800 4800 3900 5400 3900 5400 3600 4800 3600 59 4800 2700 4500 2700 4500 1500 79 60 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 80 4050 3000 4500 3150 81 4 1 -1 0 0 0 12 0.0000 2 135 315 3450 5475 exit\001 82 4 1 -1 0 0 0 12 0.0000 2 135 135 1650 3225 A\001 83 4 1 -1 0 0 0 12 0.0000 2 135 795 1650 5025 condition\001 84 4 1 -1 0 0 0 12 0.0000 2 135 135 1650 5250 B\001 85 4 0 -1 0 0 0 12 0.0000 2 135 420 4950 3825 stack\001 86 4 0 -1 0 0 0 12 0.0000 2 180 750 4950 3375 acceptor/\001 87 4 0 -1 0 0 0 12 0.0000 2 180 750 4950 3600 signalled\001 88 4 1 -1 0 0 0 12 0.0000 2 135 795 1650 3000 condition\001 89 4 1 4 0 0 0 12 0.0000 2 135 135 2550 2325 X\001 90 4 1 4 0 0 0 12 0.0000 2 135 135 3450 2325 Y\001 91 4 1 -1 0 0 0 12 0.0000 2 135 525 3450 3825 shared\001 92 4 1 -1 0 0 0 12 0.0000 2 135 735 3450 4125 variables\001 93 4 1 -1 0 0 0 10 0.0000 2 75 75 3450 2895 c\001 94 4 1 -1 0 0 0 12 0.0000 2 165 1125 3000 2100 mutex queues\001 95 4 0 -1 0 0 3 12 0.0000 2 150 540 4950 4425 urgent\001 96 4 1 -1 0 0 0 10 0.0000 2 75 75 4350 2895 a\001 97 4 1 -1 0 0 0 10 0.0000 2 75 75 4350 2595 c\001 98 4 0 -1 0 0 0 12 0.0000 2 135 525 4650 2550 arrival\001 99 4 0 -1 0 0 0 12 0.0000 2 135 630 4650 2325 order of\001 100 4 0 4 50 -1 0 11 0.0000 2 120 135 4075 2025 X\001 61 4050 2700 4500 2850 62 4 1 -1 0 0 0 12 0.0000 2 135 315 3450 5175 exit\001 63 4 1 -1 0 0 0 12 0.0000 2 135 795 1650 4725 condition\001 64 4 1 -1 0 0 0 12 0.0000 2 135 135 1650 4950 B\001 65 4 0 -1 0 0 0 12 0.0000 2 135 420 4950 3525 stack\001 66 4 0 -1 0 0 0 12 0.0000 2 180 750 4950 3075 acceptor/\001 67 4 0 -1 0 0 0 12 0.0000 2 180 750 4950 3300 signalled\001 68 4 1 -1 0 0 0 12 0.0000 2 135 525 3450 3525 shared\001 69 4 1 -1 0 0 0 12 0.0000 2 135 735 3450 3825 variables\001 70 4 0 -1 0 0 3 12 0.0000 2 150 540 4950 4125 urgent\001 71 4 1 -1 0 0 0 10 0.0000 2 75 75 4350 2595 a\001 72 4 1 -1 0 0 0 10 0.0000 2 75 75 4350 2295 c\001 73 4 0 -1 0 0 0 12 0.0000 2 135 525 4650 2250 arrival\001 74 4 0 -1 0 0 0 12 0.0000 2 135 630 4650 2025 order of\001 75 4 0 4 50 -1 0 11 0.0000 2 120 135 4075 1725 X\001 76 4 0 4 50 -1 0 11 0.0000 2 120 135 4075 2025 Y\001 101 77 4 0 4 50 -1 0 11 0.0000 2 120 135 4075 2325 Y\001 102 4 0 4 50 -1 0 11 0.0000 2 120 135 4075 2625 Y\001 103 4 0 4 50 -1 0 11 0.0000 2 120 135 4075 2925 X\001 104 4 1 -1 0 0 0 12 0.0000 2 165 960 4275 1725 entry queue\001 78 4 0 4 50 -1 0 11 0.0000 2 120 135 4075 2625 X\001 79 4 1 -1 0 0 0 12 0.0000 2 165 960 4275 1425 entry queue\001 -
src/AST/Convert.cpp
r9d5089e rea05f8d 10 10 // Created On : Thu May 09 15::37::05 2019 11 11 // Last Modified By : Andrew Beach 12 // Last Modified On : M an Jun 10 11:51:00 201913 // Update Count : 1 112 // Last Modified On : Mon Jun 17 16:44:00 2019 13 // Update Count : 12 14 14 // 15 15 … … 949 949 const ast::Expr * visit( const ast::TupleExpr * node ) override final { 950 950 auto expr = visitBaseExpr( node, 951 new UntypedTupleExpr(951 new TupleExpr( 952 952 get<Expression>().acceptL(node->exprs) 953 953 ) … … 1470 1470 decl->mangleName = old->mangleName; 1471 1471 decl->isDeleted = old->isDeleted; 1472 decl->asmName = GET_ACCEPT_1(asmName, Expr); 1472 1473 decl->uniqueId = old->uniqueId; 1473 1474 decl->extension = old->extension; … … 1494 1495 decl->mangleName = old->mangleName; 1495 1496 decl->isDeleted = old->isDeleted; 1497 decl->asmName = GET_ACCEPT_1(asmName, Expr); 1496 1498 decl->uniqueId = old->uniqueId; 1497 1499 decl->extension = old->extension; -
src/Makefile.in
r9d5089e rea05f8d 239 239 $(am__objects_7) Tuples/TupleAssignment.$(OBJEXT) \ 240 240 Tuples/TupleExpansion.$(OBJEXT) Tuples/Explode.$(OBJEXT) \ 241 Validate/HandleAttributes.$(OBJEXT) \241 Tuples/Tuples.$(OBJEXT) Validate/HandleAttributes.$(OBJEXT) \ 242 242 Validate/FindSpecialDecls.$(OBJEXT) 243 243 am_libdemangle_a_OBJECTS = $(am__objects_8) … … 270 270 Tuples/TupleAssignment.$(OBJEXT) \ 271 271 Tuples/TupleExpansion.$(OBJEXT) Tuples/Explode.$(OBJEXT) \ 272 Validate/HandleAttributes.$(OBJEXT) \272 Tuples/Tuples.$(OBJEXT) Validate/HandleAttributes.$(OBJEXT) \ 273 273 Validate/FindSpecialDecls.$(OBJEXT) \ 274 274 Virtual/ExpandCasts.$(OBJEXT) … … 561 561 $(SRC_RESOLVEXPR) ResolvExpr/AlternativePrinter.cc \ 562 562 $(SRC_SYMTAB) $(SRC_SYNTREE) Tuples/TupleAssignment.cc \ 563 Tuples/TupleExpansion.cc Tuples/Explode.cc \563 Tuples/TupleExpansion.cc Tuples/Explode.cc Tuples/Tuples.cc \ 564 564 Validate/HandleAttributes.cc Validate/FindSpecialDecls.cc \ 565 565 Virtual/ExpandCasts.cc … … 570 570 $(SRC_SYMTAB) SymTab/Demangle.cc $(SRC_SYNTREE) \ 571 571 Tuples/TupleAssignment.cc Tuples/TupleExpansion.cc \ 572 Tuples/Explode.cc Validate/HandleAttributes.cc \573 Validate/ FindSpecialDecls.cc572 Tuples/Explode.cc Tuples/Tuples.cc \ 573 Validate/HandleAttributes.cc Validate/FindSpecialDecls.cc 574 574 MAINTAINERCLEANFILES = ${libdir}/${notdir ${cfa_cpplib_PROGRAMS}} 575 575 MOSTLYCLEANFILES = Parser/lex.cc Parser/parser.cc Parser/parser.hh \ … … 1030 1030 Tuples/$(DEPDIR)/$(am__dirstamp) 1031 1031 Tuples/Explode.$(OBJEXT): Tuples/$(am__dirstamp) \ 1032 Tuples/$(DEPDIR)/$(am__dirstamp) 1033 Tuples/Tuples.$(OBJEXT): Tuples/$(am__dirstamp) \ 1032 1034 Tuples/$(DEPDIR)/$(am__dirstamp) 1033 1035 Validate/$(am__dirstamp): … … 1340 1342 @AMDEP_TRUE@@am__include@ @am__quote@Tuples/$(DEPDIR)/TupleAssignment.Po@am__quote@ 1341 1343 @AMDEP_TRUE@@am__include@ @am__quote@Tuples/$(DEPDIR)/TupleExpansion.Po@am__quote@ 1344 @AMDEP_TRUE@@am__include@ @am__quote@Tuples/$(DEPDIR)/Tuples.Po@am__quote@ 1342 1345 @AMDEP_TRUE@@am__include@ @am__quote@Validate/$(DEPDIR)/FindSpecialDecls.Po@am__quote@ 1343 1346 @AMDEP_TRUE@@am__include@ @am__quote@Validate/$(DEPDIR)/HandleAttributes.Po@am__quote@ -
src/Tuples/Explode.cc
r9d5089e rea05f8d 9 9 // Author : Rob Schluntz 10 10 // Created On : Wed Nov 9 13:12:24 2016 11 // Last Modified By : Rob Schluntz12 // Last Modified On : Wed Nov 9 13:20:24201613 // Update Count : 211 // Last Modified By : Andrew Beach 12 // Last Modified On : Wed Jun 12 16:40:00 2016 13 // Update Count : 3 14 14 // 15 15 … … 106 106 return expr; 107 107 } 108 109 namespace { 110 111 // Remove one level of reference from a reference type. 112 const ast::Type * getReferenceBase( const ast::Type * t ) { 113 if ( const ast::ReferenceType * ref = dynamic_cast< const ast::ReferenceType * >( t ) ) { 114 return ref->base; 115 } else { 116 assertf( false, "getReferenceBase for non-ref: %s", toString( t ).c_str() ); 117 return nullptr; 118 } 119 } 120 121 struct CastExploderCore { 122 bool castAdded = false; 123 bool foundUniqueExpr = false; 124 const ast::Expr * applyCast( const ast::Expr * expr, bool first = true ) { 125 // On tuple push the cast down. 126 if ( const ast::TupleExpr * tupleExpr = dynamic_cast< const ast::TupleExpr * >( expr ) ) { 127 foundUniqueExpr = true; 128 std::vector< ast::ptr< ast::Expr > > exprs; 129 for ( const ast::Expr * expr : tupleExpr->exprs ) { 130 exprs.emplace_back( applyCast( expr, false ) ); 131 //exprs.emplace_back( ast::ptr< ast::Expr >( applyCast( expr, false ) ) ); 132 } 133 if ( first ) { 134 castAdded = true; 135 const ast::Expr * tuple = new ast::TupleExpr( 136 tupleExpr->location, std::move( exprs ) ); 137 return new ast::CastExpr( tuple->location, 138 tuple, new ast::ReferenceType( tuple->result.get(), ast::CV::Qualifiers() ) ); 139 } else { 140 return new ast::TupleExpr( tupleExpr->location, std::move( exprs ) ); 141 } 142 } 143 if ( dynamic_cast< const ast::ReferenceType * >( expr->result.get() ) ) { 144 return expr; 145 } else { 146 castAdded = true; 147 return new ast::CastExpr( expr->location, expr, 148 new ast::ReferenceType( expr->result, ast::CV::Qualifiers() ) ); 149 } 150 } 151 152 const ast::Expr * postmutate( const ast::UniqueExpr * node ) { 153 // move cast into unique expr so that the unique expr has type T& rather than 154 // type T. In particular, this transformation helps with generating the 155 // correct code for reference-cast member tuple expressions, since the result 156 // should now be a tuple of references rather than a reference to a tuple. 157 // Still, this code is a bit awkward, and could use some improvement. 158 const ast::UniqueExpr * newNode = new ast::UniqueExpr( node->location, 159 applyCast( node->expr ), node->id ); 160 if ( castAdded ) { 161 // if a cast was added by applyCast, then unique expr now has one more layer of reference 162 // than it had coming into this function. To ensure types still match correctly, need to cast 163 // to reference base so that outer expressions are still correct. 164 castAdded = false; 165 const ast::Type * newType = getReferenceBase( newNode->result ); 166 return new ast::CastExpr( newNode->location, node, newType ); 167 } 168 return newNode; 169 } 170 171 const ast::Expr * postmutate( const ast::TupleIndexExpr * tupleExpr ) { 172 // tuple index expr needs to be rebuilt to ensure that the type of the 173 // field is consistent with the type of the tuple expr, since the field 174 // may have changed from type T to T&. 175 return new ast::TupleIndexExpr( tupleExpr->location, tupleExpr->tuple, tupleExpr->index ); 176 } 177 }; 178 179 } // namespace 180 181 const ast::Expr * distributeReference( const ast::Expr * expr ) { 182 ast::Pass<CastExploderCore> exploder; 183 expr = expr->accept( exploder ); 184 if ( ! exploder.pass.foundUniqueExpr ) { 185 expr = new ast::CastExpr( expr->location, expr, 186 new ast::ReferenceType( expr->result, ast::CV::Qualifiers() ) ); 187 } 188 return expr; 189 } 190 108 191 } // namespace Tuples 109 192 -
src/Tuples/Explode.h
r9d5089e rea05f8d 9 9 // Author : Rob Schluntz 10 10 // Created On : Wed Nov 9 13:12:24 2016 11 // Last Modified By : Peter A. Buhr12 // Last Modified On : Sat Jul 22 09:55:16 201713 // Update Count : 311 // Last Modified By : Andrew Beach 12 // Last Modified On : Mon Jun 17 14:36:00 2019 13 // Update Count : 4 14 14 // 15 15 … … 138 138 } 139 139 140 const ast::Expr * distributeReference( const ast::Expr * ); 141 142 /// Append candidate to an OutputIterator of Candidates. 143 template<typename OutputIterator> 144 void append( OutputIterator out, const ast::Expr * expr, const ast::TypeEnvironment & env, 145 const ast::OpenVarSet & open, const ast::AssertionList & need, 146 const ResolvExpr::Cost & cost, const ResolvExpr::Cost & cvtCost ) { 147 ast::TypeEnvironment copyEnv = env; 148 ast::OpenVarSet copyOpen = open; 149 ast::AssertionSet set; 150 mergeAssertionSet( set, need ); 151 *out++ = std::make_shared<ResolvExpr::Candidate>( expr, std::move( copyEnv ), 152 std::move( copyOpen ), std::move( set ), cost, cvtCost ); 153 } 154 155 /// Append candidate to an ExplodedArg. 156 static inline void append( ResolvExpr::ExplodedArg& ea, const ast::Expr * expr, 157 const ast::TypeEnvironment&, const ast::OpenVarSet&, 158 const ast::AssertionList&, const ResolvExpr::Cost&, const ResolvExpr::Cost& ) { 159 // I'm not sure why most of the arguments are unused. But they were in the old version. 160 ea.exprs.emplace_back( expr ); 161 } 162 163 /// Check if the expression is a cast to a reference type, return it if it is. 164 static inline const ast::CastExpr * isReferenceCast( const ast::Expr * expr ) { 165 if ( const ast::CastExpr * cast = dynamic_cast< const ast::CastExpr * >( expr ) ) { 166 if ( dynamic_cast< const ast::ReferenceType * >( cast->result.get() ) ) { 167 return cast; 168 } 169 } 170 return nullptr; 171 } 172 173 /// helper function (indirectely) used by explode 174 template< typename Output > 175 void explodeRecursive( 176 const ast::CastExpr * expr, const ResolvExpr::Candidate & arg, 177 const ast::SymbolTable & symtab, Output && out 178 ) { 179 } 180 140 181 /// helper function used by explode 141 182 template< typename Output > 142 void explodeUnique( 143 const ast:: Expr * expr, const ResolvExpr::Candidate & arg, const ast::SymbolTable & symtab,144 Output && out, bool isTupleAssign183 void explodeUnique( 184 const ast::ptr< ast::Expr > & expr, const ResolvExpr::Candidate & arg, 185 const ast::SymbolTable & symtab, Output && out, bool isTupleAssign 145 186 ) { 146 #warning unimplemented 147 (void)expr; (void)arg; (void)symtab; (void)out; (void)isTupleAssign; 148 assert(false); 187 // Tuple assignment can use a faster method if it is cast. Uses recursive exploding. 188 if ( isTupleAssign ) if ( const ast::CastExpr * castExpr = isReferenceCast( expr ) ) { 189 ResolvExpr::CandidateList candidates; 190 explodeUnique( castExpr->arg, arg, symtab, back_inserter( candidates ), true ); 191 for ( ResolvExpr::CandidateRef & cand : candidates ) { 192 // Distribute the reference cast over all components of the candidate. 193 append( std::forward<Output>(out), distributeReference( cand->expr ), cand->env, 194 cand->open, cand->need, cand->cost, cand->cvtCost ); 195 } 196 return; 197 } 198 const ast::Type * res = expr->result->stripReferences(); 199 if ( const ast::TupleType * tupleType = dynamic_cast< const ast::TupleType * >( res ) ) { 200 if ( const ast::ptr< ast::TupleExpr > & tupleExpr = expr.as< ast::TupleExpr >() ) { 201 // Open the tuple expr and continue on its components. 202 for ( const ast::Expr * expr : tupleExpr->exprs ) { 203 explodeUnique( expr, arg, symtab, std::forward<Output>(out), isTupleAssign ); 204 } 205 } else { 206 ast::ptr< ast::Expr > local = expr; 207 // Expressions which may have side effects require a single unique instance. 208 if ( Tuples::maybeImpureIgnoreUnique( local ) ) { 209 local = new ast::UniqueExpr( local->location, local ); 210 } 211 // Cast a reference away to a value-type to allow further explosion. 212 if ( dynamic_cast< const ast::ReferenceType *>( local->result.get() ) ) { 213 local = new ast::CastExpr( local->location, local, tupleType ); 214 } 215 // Now we have to go across the tuple via indexing. 216 for ( unsigned int i = 0 ; i < tupleType->size() ; ++i ) { 217 ast::TupleIndexExpr * idx = new ast::TupleIndexExpr( local->location, local, i ); 218 explodeUnique( idx, arg, symtab, std::forward<Output>(out), isTupleAssign ); 219 // TODO: We need more input to figure out the exact lifetimes of these types. 220 // delete idx; 221 } 222 // delete local; 223 } 224 } else { 225 // For atomic/non-tuple types, no explosion is used. 226 append( std::forward<Output>(out), expr, arg.env, arg.open, arg.need, arg.cost, 227 arg.cvtCost ); 228 } 149 229 } 150 230 151 231 /// expands a tuple-valued candidate into multiple candidates, each with a non-tuple type 152 232 template< typename Output > 153 void explode( 154 const ResolvExpr::Candidate & arg, const ast::SymbolTable & symtab, Output && out, 233 void explode( 234 const ResolvExpr::Candidate & arg, const ast::SymbolTable & symtab, Output && out, 155 235 bool isTupleAssign = false 156 236 ) { -
src/Tuples/Tuples.h
r9d5089e rea05f8d 9 9 // Author : Rodolfo G. Esteves 10 10 // Created On : Mon May 18 07:44:20 2015 11 // Last Modified By : Peter A. Buhr12 // Last Modified On : Sat Jul 22 09:55:00 201713 // Update Count : 1 611 // Last Modified By : Andrew Beach 12 // Last Modified On : Wed Jun 12 10:39:00 2017 13 // Update Count : 17 14 14 // 15 15 … … 57 57 bool maybeImpure( Expression * expr ); 58 58 59 /// returns true if the expression may contain side-effect, ignoring the presence of unique expressions. 59 /// Returns true if the expression may contain side-effect, 60 /// ignoring the presence of unique expressions. 60 61 bool maybeImpureIgnoreUnique( Expression * expr ); 62 bool maybeImpureIgnoreUnique( const ast::Expr * expr ); 61 63 } // namespace Tuples 62 64 -
src/Tuples/module.mk
r9d5089e rea05f8d 15 15 ############################################################################### 16 16 17 SRC += Tuples/TupleAssignment.cc Tuples/TupleExpansion.cc Tuples/Explode.cc 18 SRCDEMANGLE += Tuples/TupleAssignment.cc Tuples/TupleExpansion.cc Tuples/Explode.cc 17 SRC += Tuples/TupleAssignment.cc Tuples/TupleExpansion.cc Tuples/Explode.cc \ 18 Tuples/Tuples.cc 19 SRCDEMANGLE += Tuples/TupleAssignment.cc Tuples/TupleExpansion.cc Tuples/Explode.cc \ 20 Tuples/Tuples.cc
Note: See TracChangeset
for help on using the changeset viewer.