Ignore:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/concurrency/Paper.tex

    r48b9b36 r6f9bc09  
    741741The coroutine main's stack holds the state for the next generation, @f1@ and @f2@, and the code has the three suspend points, representing the three states in the Fibonacci formula, to context switch back to the caller's resume.
    742742The interface function, @next@, takes a Fibonacci instance and context switches to it using @resume@;
    743 on return, the Fibonacci field, @fn@, contains the next value in the sequence, which is returned.
     743on restart, the Fibonacci field, @fn@, contains the next value in the sequence, which is returned.
    744744The first @resume@ is special because it cocalls the coroutine at its coroutine main and allocates the stack;
    745745when the coroutine main returns, its stack is deallocated.
    746746Hence, @Fib@ is an object at creation, transitions to a coroutine on its first resume, and transitions back to an object when the coroutine main finishes.
    747747Figure~\ref{f:Coroutine1State} shows the coroutine version of the C version in Figure~\ref{f:ExternalState}.
    748 Coroutine generators are called \newterm{output coroutines} because values are returned by the coroutine.
    749 
    750 Figure~\ref{f:CFAFmt} shows an \newterm{input coroutine}, @Format@, for restructuring text into groups of character blocks of fixed size.
     748Coroutine generators are called \newterm{output coroutines} because values are only returned.
     749
     750Figure~\ref{f:CFAFmt} shows an \newterm{input coroutine}, @Format@, for restructuring text into groups of characters of fixed-size blocks.
    751751For example, the input of the left is reformatted into the output on the right.
    752752\begin{quote}
     
    763763\end{tabular}
    764764\end{quote}
    765 The example takes advantage of resuming coroutines in the constructor to prime the coroutine loops so the first character sent for formatting appears inside the nested loops.
     765The example takes advantage of resuming a coroutine in the constructor to prime the loops so the first character sent for formatting appears inside the nested loops.
    766766The destruction provides a newline if formatted text ends with a full line.
    767767Figure~\ref{f:CFmt} shows the C equivalent formatter, where the loops of the coroutine are flatten (linearized) and rechecked on each call because execution location is not retained between calls.
     
    778778void main( Format & fmt ) with( fmt ) {
    779779        for ( ;; ) {   
    780                 for ( g = 0; g < 5; g += 1 ) {  // group
     780                for ( g = 0; g < 5; g += 1 ) {      // group
    781781                        for ( b = 0; b < 4; b += 1 ) { // block
    782782                                `suspend();`
     
    814814};
    815815void format( struct Format * fmt ) {
    816         if ( fmt->ch != -1 ) { // not EOF
     816        if ( fmt->ch != -1 ) {      // not EOF ?
    817817                printf( "%c", fmt->ch );
    818818                fmt->b += 1;
     
    823823                }
    824824                if ( fmt->g == 5 ) {  // group
    825                         printf( "\n" );      // separator
     825                        printf( "\n" );     // separator
    826826                        fmt->g = 0;
    827827                }
     
    850850
    851851The previous examples are \newterm{asymmetric (semi) coroutine}s because one coroutine always calls a resuming function for another coroutine, and the resumed coroutine always suspends back to its last resumer, similar to call/return for normal functions.
    852 However, there is no stack growth because @resume@/@suspend@ context switch to an existing stack frames rather than create a new one.
    853 \newterm{Symmetric (full) coroutine}s have a coroutine call a resuming function for another coroutine, which eventually forms a cycle.
     852However, there is no stack growth because @resume@/@suspend@ context switch to existing stack-frames rather than create new ones.
     853\newterm{Symmetric (full) coroutine}s have a coroutine call a resuming function for another coroutine, which eventually forms a resuming-call cycle.
    854854(The trivial cycle is a coroutine resuming itself.)
    855855This control flow is similar to recursion for normal routines, but again there is no stack growth from the context switch.
     
    935935The @start@ function communicates both the number of elements to be produced and the consumer into the producer's coroutine structure.
    936936Then the @resume@ to @prod@ creates @prod@'s stack with a frame for @prod@'s coroutine main at the top, and context switches to it.
    937 @prod@'s coroutine main starts, creates local variables that are retained between coroutine activations, and executes $N$ iterations, each generating two random vales, calling the consumer to deliver the values, and printing the status returned from the consumer.
     937@prod@'s coroutine main starts, creates local variables that are retained between coroutine activations, and executes $N$ iterations, each generating two random values, calling the consumer to deliver the values, and printing the status returned from the consumer.
    938938
    939939The producer call to @delivery@ transfers values into the consumer's communication variables, resumes the consumer, and returns the consumer status.
    940940For the first resume, @cons@'s stack is initialized, creating local variables retained between subsequent activations of the coroutine.
    941 The consumer iterates until the @done@ flag is set, prints, increments status, and calls back to the producer's @payment@ member, and on return prints the receipt from the producer and increments the money for the next payment.
    942 The call from the consumer to the producer's @payment@ member introduces the cycle between producer and consumer.
     941The consumer iterates until the @done@ flag is set, prints, increments status, and calls back to the producer via @payment@, and on return from @payment@, prints the receipt from the producer and increments @money@ (inflation).
     942The call from the consumer to the @payment@ introduces the cycle between producer and consumer.
    943943When @payment@ is called, the consumer copies values into the producer's communication variable and a resume is executed.
    944 The context switch restarts the producer at the point where it was last context switched and it continues in member @delivery@ after the resume.
    945 
    946 The @delivery@ member returns the status value in @prod@'s @main@ member, where the status is printed.
     944The context switch restarts the producer at the point where it was last context switched, so it continues in @delivery@ after the resume.
     945
     946@delivery@ returns the status value in @prod@'s coroutine main, where the status is printed.
    947947The loop then repeats calling @delivery@, where each call resumes the consumer coroutine.
    948948The context switch to the consumer continues in @payment@.
    949 The consumer increments and returns the receipt to the call in @cons@'s @main@ member.
     949The consumer increments and returns the receipt to the call in @cons@'s coroutine main.
    950950The loop then repeats calling @payment@, where each call resumes the producer coroutine.
    951951
     
    954954The context switch restarts @cons@ in @payment@ and it returns with the last receipt.
    955955The consumer terminates its loops because @done@ is true, its @main@ terminates, so @cons@ transitions from a coroutine back to an object, and @prod@ reactivates after the resume in @stop@.
    956 The @stop@ member returns and @prod@'s @main@ member terminates.
     956@stop@ returns and @prod@'s coroutine main terminates.
    957957The program main restarts after the resume in @start@.
    958 The @start@ member returns and the program main terminates.
    959 
    960 
    961 \subsubsection{Construction}
    962 
    963 One important design challenge for implementing coroutines and threads (shown in section \ref{threads}) is that the runtime system needs to run code after the user-constructor runs to connect the fully constructed object into the system.
    964 In the case of coroutines, this challenge is simpler since there is no non-determinism from preemption or scheduling.
    965 However, the underlying challenge remains the same for coroutines and threads.
    966 
    967 The runtime system needs to create the coroutine's stack and, more importantly, prepare it for the first resumption.
    968 The timing of the creation is non-trivial since users expect both to have fully constructed objects once execution enters the coroutine main and to be able to resume the coroutine from the constructor.
    969 There are several solutions to this problem but the chosen option effectively forces the design of the coroutine.
    970 
    971 Furthermore, \CFA faces an extra challenge as polymorphic routines create invisible thunks when cast to non-polymorphic routines and these thunks have function scope.
    972 For example, the following code, while looking benign, can run into undefined behaviour because of thunks:
    973 
    974 \begin{cfa}
    975 // async: Runs function asynchronously on another thread
    976 forall(otype T)
    977 extern void async(void (*func)(T*), T* obj);
    978 
    979 forall(otype T)
    980 void noop(T*) {}
    981 
    982 void bar() {
    983         int a;
    984         async(noop, &a); // start thread running noop with argument a
    985 }
    986 \end{cfa}
    987 
    988 The generated C code\footnote{Code trimmed down for brevity} creates a local thunk to hold type information:
    989 
    990 \begin{cfa}
    991 extern void async(/* omitted */, void (*func)(void*), void* obj);
    992 
    993 void noop(/* omitted */, void* obj){}
    994 
    995 void bar(){
    996         int a;
    997         void _thunk0(int* _p0){
    998                 /* omitted */
    999                 noop(/* omitted */, _p0);
    1000         }
    1001         /* omitted */
    1002         async(/* omitted */, ((void (*)(void*))(&_thunk0)), (&a));
    1003 }
    1004 \end{cfa}
    1005 The problem in this example is a storage management issue, the function pointer @_thunk0@ is only valid until the end of the block, which limits the viable solutions because storing the function pointer for too long causes undefined behaviour; \ie the stack-based thunk being destroyed before it can be used.
    1006 This challenge is an extension of challenges that come with second-class routines.
    1007 Indeed, GCC nested routines also have the limitation that nested routine cannot be passed outside of the declaration scope.
    1008 The case of coroutines and threads is simply an extension of this problem to multiple call stacks.
    1009 
    1010 
    1011 \subsubsection{Alternative: Composition}
    1012 
    1013 One solution to this challenge is to use composition/containment, where coroutine fields are added to manage the coroutine.
    1014 
    1015 \begin{cfa}
    1016 struct Fibonacci {
    1017         int fn; // used for communication
    1018         coroutine c; // composition
    1019 };
    1020 
    1021 void FibMain(void*) {
    1022         //...
    1023 }
    1024 
    1025 void ?{}(Fibonacci& this) {
    1026         this.fn = 0;
    1027         // Call constructor to initialize coroutine
    1028         (this.c){myMain};
    1029 }
    1030 \end{cfa}
    1031 The downside of this approach is that users need to correctly construct the coroutine handle before using it.
    1032 Like any other objects, the user must carefully choose construction order to prevent usage of objects not yet constructed.
    1033 However, in the case of coroutines, users must also pass to the coroutine information about the coroutine main, like in the previous example.
    1034 This opens the door for user errors and requires extra runtime storage to pass at runtime information that can be known statically.
    1035 
    1036 
    1037 \subsubsection{Alternative: Reserved keyword}
    1038 
    1039 The next alternative is to use language support to annotate coroutines as follows:
    1040 \begin{cfa}
    1041 coroutine Fibonacci {
    1042         int fn; // used for communication
    1043 };
    1044 \end{cfa}
    1045 The @coroutine@ keyword means the compiler can find and inject code where needed.
    1046 The downside of this approach is that it makes coroutine a special case in the language.
    1047 Users wanting to extend coroutines or build their own for various reasons can only do so in ways offered by the language.
    1048 Furthermore, implementing coroutines without language supports also displays the power of the programming language used.
    1049 While this is ultimately the option used for idiomatic \CFA code, coroutines and threads can still be constructed by users without using the language support.
    1050 The reserved keywords are only present to improve ease of use for the common cases.
    1051 
    1052 
    1053 \subsubsection{Alternative: Lambda Objects}
     958@start@ returns and the program main terminates.
     959
     960
     961\subsection{Coroutine Implementation}
     962
     963A significant implementation challenge for coroutines (and threads, see section \ref{threads}) is adding extra fields and executing code after/before the coroutine constructor/destructor and coroutine main to create/initialize/de-initialize/destroy extra fields and the stack.
     964There are several solutions to this problem and the chosen option forced the \CFA coroutine design.
     965
     966Object-oriented inheritance provides extra fields and code in a restricted context, but it requires programmers to explicitly perform the inheritance:
     967\begin{cfa}
     968struct mycoroutine $\textbf{\textsf{inherits}}$ baseCoroutine { ... }
     969\end{cfa}
     970and the programming language (and possibly its tool set, \eg debugger) may need to understand @baseCoroutine@ because of the stack.
     971Furthermore, the execution of constructs/destructors is in the wrong order for certain operations, \eg for threads;
     972\eg, if the thread is implicitly started, it must start \emph{after} all constructors, because the thread relies on a completely initialized object, but the inherited constructor runs \emph{before} the derived.
     973
     974An alternatively is composition:
     975\begin{cfa}
     976struct mycoroutine {
     977        ... // declarations
     978        baseCoroutine dummy; // composition, last declaration
     979}
     980\end{cfa}
     981which also requires an explicit declaration that must be the last one to ensure correct initialization order.
     982However, there is nothing preventing wrong placement or multiple declarations.
    1054983
    1055984For coroutines as for threads, many implementations are based on routine pointers or function objects~\cite{Butenhof97, C++14, MS:VisualC++, BoostCoroutines15}.
    1056 For example, Boost implements coroutines in terms of four functor object types:
     985For example, Boost implements coroutines in terms of four functor object-types:
    1057986\begin{cfa}
    1058987asymmetric_coroutine<>::pull_type
     
    1061990symmetric_coroutine<>::yield_type
    1062991\end{cfa}
    1063 Often, the canonical threading paradigm in languages is based on function pointers, @pthread@ being one of the most well-known examples.
    1064 The main problem of this approach is that the thread usage is limited to a generic handle that must otherwise be wrapped in a custom type.
    1065 Since the custom type is simple to write in \CFA and solves several issues, added support for routine/lambda based coroutines adds very little.
    1066 
    1067 A variation of this would be to use a simple function pointer in the same way @pthread@ does for threads:
    1068 \begin{cfa}
    1069 void foo( coroutine_t cid, void* arg ) {
    1070         int* value = (int*)arg;
     992Similarly, the canonical threading paradigm is often based on function pointers, \eg @pthread@~\cite{pthreads}, \Csharp~\cite{Csharp}, Go~\cite{Go}, and Scala~\cite{Scala}.
     993However, the generic thread-handle (identifier) is limited (few operations), unless it is wrapped in a custom type.
     994\begin{cfa}
     995void mycor( coroutine_t cid, void * arg ) {
     996        int * value = (int *)arg;                               $\C{// type unsafe, pointer-size only}$
    1071997        // Coroutine body
    1072998}
    1073 
    1074999int main() {
    1075         int value = 0;
    1076         coroutine_t cid = coroutine_create( &foo, (void*)&value );
    1077         coroutine_resume( &cid );
    1078 }
    1079 \end{cfa}
    1080 This semantics is more common for thread interfaces but coroutines work equally well.
    1081 As discussed in section \ref{threads}, this approach is superseded by static approaches in terms of expressivity.
    1082 
    1083 
    1084 \subsubsection{Alternative: Trait-Based Coroutines}
    1085 
    1086 Finally, the underlying approach, which is the one closest to \CFA idioms, is to use trait-based lazy coroutines.
    1087 This approach defines a coroutine as anything that satisfies the trait @is_coroutine@ (as defined below) and is used as a coroutine.
    1088 
    1089 \begin{cfa}
    1090 trait is_coroutine(dtype T) {
    1091       void main(T& this);
    1092       coroutine_desc* get_coroutine(T& this);
     1000        int input = 0, output;
     1001        coroutine_t cid = coroutine_create( &mycor, (void *)&input ); $\C{// type unsafe, pointer-size only}$
     1002        coroutine_resume( cid, (void *)input, (void **)&output ); $\C{// type unsafe, pointer-size only}$
     1003}
     1004\end{cfa}
     1005Since the custom type is simple to write in \CFA and solves several issues, added support for routine/lambda-based coroutines adds very little.
     1006
     1007The selected approach is to use language support by introducing a new kind of aggregate (structure):
     1008\begin{cfa}
     1009coroutine Fibonacci {
     1010        int fn; // communication variables
    10931011};
    1094 
    1095 forall( dtype T | is_coroutine(T) ) void suspend(T&);
    1096 forall( dtype T | is_coroutine(T) ) void resume (T&);
    1097 \end{cfa}
    1098 This ensures that an object is not a coroutine until @resume@ is called on the object.
    1099 Correspondingly, any object that is passed to @resume@ is a coroutine since it must satisfy the @is_coroutine@ trait to compile.
     1012\end{cfa}
     1013The @coroutine@ keyword means the compiler (and tool set) can find and inject code where needed.
     1014The downside of this approach is that it makes coroutine a special case in the language.
     1015Users wanting to extend coroutines or build their own for various reasons can only do so in ways offered by the language.
     1016Furthermore, implementing coroutines without language supports also displays the power of a programming language.
     1017While this is ultimately the option used for idiomatic \CFA code, coroutines and threads can still be constructed without using the language support.
     1018The reserved keyword eases use for the common cases.
     1019
     1020Part of the mechanism to generalize coroutines is using a \CFA trait, which defines a coroutine as anything satisfying the trait @is_coroutine@, and this trait is used to restrict coroutine-manipulation functions:
     1021\begin{cfa}
     1022trait is_coroutine( dtype T ) {
     1023      void main( T & this );
     1024      coroutine_desc * get_coroutine( T & this );
     1025};
     1026forall( dtype T | is_coroutine(T) ) void get_coroutine( T & );
     1027forall( dtype T | is_coroutine(T) ) void suspend( T & );
     1028forall( dtype T | is_coroutine(T) ) void resume( T & );
     1029\end{cfa}
     1030This definition ensures there is a statically-typed @main@ function that is the starting point (first stack frame) of a coroutine.
     1031No return value or additional parameters are necessary for this function, because the coroutine type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values.
     1032As well, any object passed to @suspend@ and @resume@ is a coroutine since it must satisfy the @is_coroutine@ trait to compile.
    11001033The advantage of this approach is that users can easily create different types of coroutines, for example, changing the memory layout of a coroutine is trivial when implementing the @get_coroutine@ routine.
    1101 The \CFA keyword @coroutine@ simply has the effect of implementing the getter and forward declarations required for users to implement the main routine.
    1102 
    1103 \begin{center}
    1104 \begin{tabular}{c c c}
    1105 \begin{cfa}[tabsize=3]
    1106 coroutine MyCoroutine {
    1107         int someValue;
     1034The \CFA keyword @coroutine@ implicitly implements the getter and forward declarations required for implementing the coroutine main:
     1035\begin{cquote}
     1036\begin{tabular}{@{}ccc@{}}
     1037\begin{cfa}
     1038coroutine MyCor {
     1039        int value;
     1040
    11081041};
    1109 \end{cfa} & == & \begin{cfa}[tabsize=3]
    1110 struct MyCoroutine {
    1111         int someValue;
    1112         coroutine_desc __cor;
     1042\end{cfa}
     1043& {\Large $\Rightarrow$} &
     1044\begin{tabular}{@{}ccc@{}}
     1045\begin{cfa}
     1046struct MyCor {
     1047        int value;
     1048        coroutine_desc cor;
    11131049};
    1114 
    1115 static inline
    1116 coroutine_desc* get_coroutine(
    1117         struct MyCoroutine& this
    1118 ) {
    1119         return &this.__cor;
    1120 }
    1121 
    1122 void main(struct MyCoroutine* this);
     1050\end{cfa}
     1051&
     1052\begin{cfa}
     1053static inline coroutine_desc *
     1054get_coroutine( MyCor & this ) {
     1055        return &this.cor;
     1056}
     1057\end{cfa}
     1058&
     1059\begin{cfa}
     1060void main( MyCor * this );
     1061
     1062
     1063
    11231064\end{cfa}
    11241065\end{tabular}
    1125 \end{center}
    1126 
    1127 The combination of these two approaches allows users new to coroutining and concurrency to have an easy and concise specification, while more advanced users have tighter control on memory layout and initialization.
    1128 
    1129 \subsection{Thread Interface}\label{threads}
    1130 The basic building blocks of multithreading in \CFA are \textbf{cfathread}.
    1131 Both user and kernel threads are supported, where user threads are the concurrency mechanism and kernel threads are the parallel mechanism.
    1132 User threads offer a flexible and lightweight interface.
    1133 A thread can be declared using a struct declaration @thread@ as follows:
    1134 
    1135 \begin{cfa}
    1136 thread foo {};
    1137 \end{cfa}
    1138 
    1139 As for coroutines, the keyword is a thin wrapper around a \CFA trait:
    1140 
    1141 \begin{cfa}
    1142 trait is_thread(dtype T) {
    1143       void ^?{}(T & mutex this);
    1144       void main(T & this);
    1145       thread_desc* get_thread(T & this);
     1066\end{tabular}
     1067\end{cquote}
     1068The combination of these two approaches allows an easy and concise specification to coroutining (and concurrency) for normal users, while more advanced users have tighter control on memory layout and initialization.
     1069
     1070
     1071\subsection{Thread Interface}
     1072\label{threads}
     1073
     1074Both user and kernel threads are supported, where user threads provide concurrency and kernel threads provide parallelism.
     1075Like coroutines and for the same design reasons, the selected approach for user threads is to use language support by introducing a new kind of aggregate (structure) and a \CFA trait:
     1076\begin{cquote}
     1077\begin{tabular}{@{}c@{\hspace{2\parindentlnth}}c@{}}
     1078\begin{cfa}
     1079thread myThread {
     1080        // communication variables
    11461081};
    1147 \end{cfa}
    1148 
    1149 Obviously, for this thread implementation to be useful it must run some user code.
    1150 Several other threading interfaces use a function-pointer representation as the interface of threads (for example \Csharp~\cite{Csharp} and Scala~\cite{Scala}).
    1151 However, this proposal considers that statically tying a @main@ routine to a thread supersedes this approach.
    1152 Since the @main@ routine is already a special routine in \CFA (where the program begins), it is a natural extension of the semantics to use overloading to declare mains for different threads (the normal main being the main of the initial thread).
     1082
     1083
     1084\end{cfa}
     1085&
     1086\begin{cfa}
     1087trait is_thread( dtype T ) {
     1088      void main( T & this );
     1089      thread_desc * get_thread( T & this );
     1090      void ^?{}( T & `mutex` this );
     1091};
     1092\end{cfa}
     1093\end{tabular}
     1094\end{cquote}
     1095(The qualifier @mutex@ for the destructor parameter is discussed in Section~\ref{s:Monitors}.)
     1096Like a coroutine, the statically-typed @main@ function is the starting point (first stack frame) of a user thread.
     1097The difference is that a coroutine borrows a thread from its caller, so the first thread resuming a coroutine creates an instance of @main@;
     1098whereas, a user thread receives its own thread from the runtime system, which starts in @main@ as some point after the thread constructor is run.\footnote{
     1099The \lstinline@main@ function is already a special routine in C (where the program begins), so it is a natural extension of the semantics to use overloading to declare mains for different coroutines/threads (the normal main being the main of the initial thread).}
     1100No return value or additional parameters are necessary for this function, because the task type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values.
     1101
     1102\begin{comment} % put in appendix with coroutine version ???
    11531103As such the @main@ routine of a thread can be defined as
    11541104\begin{cfa}
     
    11891139}
    11901140\end{cfa}
    1191 
    11921141A consequence of the strongly typed approach to main is that memory layout of parameters and return values to/from a thread are now explicitly specified in the \textbf{api}.
    1193 
    1194 Of course, for threads to be useful, it must be possible to start and stop threads and wait for them to complete execution.
    1195 While using an \textbf{api} such as @fork@ and @join@ is relatively common in the literature, such an interface is unnecessary.
    1196 Indeed, the simplest approach is to use \textbf{raii} principles and have threads @fork@ after the constructor has completed and @join@ before the destructor runs.
    1197 \begin{cfa}
    1198 thread World;
    1199 
    1200 void main(World & this) {
     1142\end{comment}
     1143
     1144For user threads to be useful, it must be possible to start and stop the underlying thread, and wait for it to complete execution.
     1145While using an API such as @fork@ and @join@ is relatively common, such an interface is awkward and unnecessary.
     1146A simple approach is to use allocation/deallocation principles, and have threads implicitly @fork@ after construction and @join@ before destruction.
     1147\begin{cfa}
     1148thread World {};
     1149void main( World & this ) {
    12011150        sout | "World!" | endl;
    12021151}
    1203 
    1204 void main() {
    1205         World w;
    1206         // Thread forks here
    1207 
    1208         // Printing "Hello " and "World!" are run concurrently
    1209         sout | "Hello " | endl;
    1210 
    1211         // Implicit join at end of scope
    1212 }
    1213 \end{cfa}
    1214 
    1215 This semantic has several advantages over explicit semantics: a thread is always started and stopped exactly once, users cannot make any programming errors, and it naturally scales to multiple threads meaning basic synchronization is very simple.
    1216 
    1217 \begin{cfa}
    1218 thread MyThread {
    1219         //...
    1220 };
    1221 
    1222 // main
    1223 void main(MyThread& this) {
    1224         //...
    1225 }
    1226 
    1227 void foo() {
    1228         MyThread thrds[10];
    1229         // Start 10 threads at the beginning of the scope
    1230 
    1231         DoStuff();
    1232 
    1233         // Wait for the 10 threads to finish
    1234 }
    1235 \end{cfa}
    1236 
    1237 However, one of the drawbacks of this approach is that threads always form a tree where nodes must always outlive their children, \ie they are always destroyed in the opposite order of construction because of C scoping rules.
    1238 This restriction is relaxed by using dynamic allocation, so threads can outlive the scope in which they are created, much like dynamically allocating memory lets objects outlive the scope in which they are created.
    1239 
    1240 \begin{cfa}
    1241 thread MyThread {
    1242         //...
    1243 };
    1244 
    1245 void main(MyThread& this) {
    1246         //...
    1247 }
    1248 
    1249 void foo() {
    1250         MyThread* long_lived;
     1152int main() {
     1153        World w`[10]`;                                                  $\C{// implicit forks after creation}$
     1154        sout | "Hello " | endl;                                 $\C{// "Hello " and 10 "World!" printed concurrently}$
     1155}                                                                                       $\C{// implicit joins before destruction}$
     1156\end{cfa}
     1157This semantics ensures a thread is started and stopped exactly once, eliminating some programming error, and scales to multiple threads for basic (termination) synchronization.
     1158This tree-structure (lattice) create/delete from C block-structure is generalized by using dynamic allocation, so threads can outlive the scope in which they are created, much like dynamically allocating memory lets objects outlive the scope in which they are created.
     1159\begin{cfa}
     1160int main() {
     1161        MyThread * heapLived;
    12511162        {
    1252                 // Start a thread at the beginning of the scope
    1253                 MyThread short_lived;
    1254 
    1255                 // create another thread that will outlive the thread in this scope
    1256                 long_lived = new MyThread;
    1257 
    1258                 DoStuff();
    1259 
    1260                 // Wait for the thread short_lived to finish
    1261         }
    1262         DoMoreStuff();
    1263 
    1264         // Now wait for the long_lived to finish
    1265         delete long_lived;
    1266 }
    1267 \end{cfa}
    1268 
    1269 
    1270 % ======================================================================
    1271 % ======================================================================
    1272 \section{Concurrency}
    1273 % ======================================================================
    1274 % ======================================================================
    1275 Several tools can be used to solve concurrency challenges.
    1276 Since many of these challenges appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared state (Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, Akka (Scala)~\cite{Akka}).
    1277 In these paradigms, interaction among concurrent objects relies on message passing~\cite{Thoth,Harmony,V-Kernel} or other paradigms closely relate to networking concepts (channels~\cite{CSP,Go} for example).
    1278 However, in languages that use routine calls as their core abstraction mechanism, these approaches force a clear distinction between concurrent and non-concurrent paradigms (\ie message passing versus routine calls).
    1279 This distinction in turn means that, in order to be effective, programmers need to learn two sets of design patterns.
     1163                MyThread blockLived;                            $\C{// fork block-based thread}$
     1164                heapLived = `new`( MyThread );          $\C{// fork heap-based thread}$
     1165                ...
     1166        }                                                                               $\C{// join block-based thread}$
     1167        ...
     1168        `delete`( heapLived );                                  $\C{// join heap-based thread}$
     1169}
     1170\end{cfa}
     1171The heap-based approach allows arbitrary thread-creation topologies, with respect to fork/join-style concurrency.
     1172
     1173
     1174\section{Synchronization / Mutual Exclusion}
     1175
     1176Uncontrolled non-deterministic execution is meaningless.
     1177To reestablish meaningful execution requires mechanisms to reintroduce determinism (control non-determinism), called synchronization and mutual exclusion, where \newterm{synchronization} is a timing relationship among threads and \newterm{mutual exclusion} is an access-control mechanism on data shared by threads.
     1178Since many deterministic challenges appear with the use of mutable shared state, some languages/libraries disallow it (Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, Akka~\cite{Akka} (Scala)).
     1179In these paradigms, interaction among concurrent objects is performed by stateless message-passing~\cite{Thoth,Harmony,V-Kernel} or other paradigms closely relate to networking concepts (\eg channels~\cite{CSP,Go}).
     1180However, in call/return-based languages, these approaches force a clear distinction (\ie introduce a new programming paradigm) between non-concurrent and concurrent computation (\ie function call versus message passing).
     1181This distinction means a programmers needs to learn two sets of design patterns.
    12801182While this distinction can be hidden away in library code, effective use of the library still has to take both paradigms into account.
    1281 
    1282 Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on basic constructs like routine calls and shared objects.
    1283 At the lowest level, concurrent paradigms are implemented as atomic operations and locks.
    1284 Many such mechanisms have been proposed, including semaphores~\cite{Dijkstra68b} and path expressions~\cite{Campbell74}.
    1285 However, for productivity reasons it is desirable to have a higher-level construct be the core concurrency paradigm~\cite{Hochstein05}.
    1286 
    1287 An approach that is worth mentioning because it is gaining in popularity is transactional memory~\cite{Herlihy93}.
    1288 While this approach is even pursued by system languages like \CC~\cite{Cpp-Transactions}, the performance and feature set is currently too restrictive to be the main concurrency paradigm for system languages, which is why it was rejected as the core paradigm for concurrency in \CFA.
    1289 
    1290 One of the most natural, elegant, and efficient mechanisms for synchronization and communication, especially for shared-memory systems, is the \emph{monitor}.
     1183In contrast, approaches based on statefull models more closely resemble the standard call/return programming-model, resulting in a single programming paradigm.
     1184
     1185At the lowest level, concurrent control is implemented as atomic operations, upon which difference kinds of locks/approaches are constructed, \eg semaphores~\cite{Dijkstra68b} and path expressions~\cite{Campbell74}.
     1186However, for productivity it is always desirable to use the highest-level construct that provides the necessary efficiency~\cite{Hochstein05}.
     1187An newer approach worth mentioning is transactional memory~\cite{Herlihy93}.
     1188While this approach is pursued in hardware~\cite{} and system languages, like \CC~\cite{Cpp-Transactions}, the performance and feature set is still too restrictive to be the main concurrency paradigm for system languages, which is why it was rejected as the core paradigm for concurrency in \CFA.
     1189
     1190One of the most natural, elegant, and efficient mechanisms for synchronization and mutual exclusion for shared-memory systems is the \emph{monitor}.
    12911191Monitors were first proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}.
    12921192Many programming languages---\eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java}---provide monitors as explicit language constructs.
    12931193In addition, operating-system kernels and device drivers have a monitor-like structure, although they often use lower-level primitives such as semaphores or locks to simulate monitors.
    1294 For these reasons, this project proposes monitors as the core concurrency construct.
     1194For these reasons, this project proposes monitors as the core concurrency construct, upon which even higher-level approaches can be easily constructed..
    12951195
    12961196
     
    13291229
    13301230
    1331 % ======================================================================
    1332 % ======================================================================
    13331231\section{Monitors}
    1334 % ======================================================================
    1335 % ======================================================================
     1232\label{s:Monitors}
     1233
    13361234A \textbf{monitor} is a set of routines that ensure mutual-exclusion when accessing shared state.
    13371235More precisely, a monitor is a programming technique that associates mutual-exclusion to routine scopes, as opposed to mutex locks, where mutual-exclusion is defined by lock/release calls independently of any scoping of the calling routine.
     
    25012399Given these building blocks, it is possible to reproduce all three of the popular paradigms.
    25022400Indeed, \textbf{uthread} is the default paradigm in \CFA.
    2503 However, disabling \textbf{preemption} on the \textbf{cfacluster} means \textbf{cfathread} effectively become \textbf{fiber}.
     2401However, disabling \textbf{preemption} on a cluster means threads effectively become fibers.
    25042402Since several \textbf{cfacluster} with different scheduling policy can coexist in the same application, this allows \textbf{fiber} and \textbf{uthread} to coexist in the runtime of an application.
    25052403Finally, it is possible to build executors for thread pools from \textbf{uthread} or \textbf{fiber}, which includes specialized jobs like actors~\cite{Actors}.
Note: See TracChangeset for help on using the changeset viewer.