Changes in / [03bd407:b9da9585]


Ignore:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/concurrency/Paper.tex

    r03bd407 rb9da9585  
    741741The coroutine main's stack holds the state for the next generation, @f1@ and @f2@, and the code has the three suspend points, representing the three states in the Fibonacci formula, to context switch back to the caller's resume.
    742742The interface function, @next@, takes a Fibonacci instance and context switches to it using @resume@;
    743 on restart, the Fibonacci field, @fn@, contains the next value in the sequence, which is returned.
     743on return, the Fibonacci field, @fn@, contains the next value in the sequence, which is returned.
    744744The first @resume@ is special because it cocalls the coroutine at its coroutine main and allocates the stack;
    745745when the coroutine main returns, its stack is deallocated.
    746746Hence, @Fib@ is an object at creation, transitions to a coroutine on its first resume, and transitions back to an object when the coroutine main finishes.
    747747Figure~\ref{f:Coroutine1State} shows the coroutine version of the C version in Figure~\ref{f:ExternalState}.
    748 Coroutine generators are called \newterm{output coroutines} because values are only returned.
    749 
    750 Figure~\ref{f:CFAFmt} shows an \newterm{input coroutine}, @Format@, for restructuring text into groups of characters of fixed-size blocks.
     748Coroutine generators are called \newterm{output coroutines} because values are returned by the coroutine.
     749
     750Figure~\ref{f:CFAFmt} shows an \newterm{input coroutine}, @Format@, for restructuring text into groups of character blocks of fixed size.
    751751For example, the input of the left is reformatted into the output on the right.
    752752\begin{quote}
     
    763763\end{tabular}
    764764\end{quote}
    765 The example takes advantage of resuming a coroutine in the constructor to prime the loops so the first character sent for formatting appears inside the nested loops.
     765The example takes advantage of resuming coroutines in the constructor to prime the coroutine loops so the first character sent for formatting appears inside the nested loops.
    766766The destruction provides a newline if formatted text ends with a full line.
    767767Figure~\ref{f:CFmt} shows the C equivalent formatter, where the loops of the coroutine are flatten (linearized) and rechecked on each call because execution location is not retained between calls.
     
    778778void main( Format & fmt ) with( fmt ) {
    779779        for ( ;; ) {   
    780                 for ( g = 0; g < 5; g += 1 ) {      // group
     780                for ( g = 0; g < 5; g += 1 ) {  // group
    781781                        for ( b = 0; b < 4; b += 1 ) { // block
    782782                                `suspend();`
     
    814814};
    815815void format( struct Format * fmt ) {
    816         if ( fmt->ch != -1 ) {      // not EOF ?
     816        if ( fmt->ch != -1 ) { // not EOF
    817817                printf( "%c", fmt->ch );
    818818                fmt->b += 1;
     
    823823                }
    824824                if ( fmt->g == 5 ) {  // group
    825                         printf( "\n" );     // separator
     825                        printf( "\n" );      // separator
    826826                        fmt->g = 0;
    827827                }
     
    850850
    851851The previous examples are \newterm{asymmetric (semi) coroutine}s because one coroutine always calls a resuming function for another coroutine, and the resumed coroutine always suspends back to its last resumer, similar to call/return for normal functions.
    852 However, there is no stack growth because @resume@/@suspend@ context switch to existing stack-frames rather than create new ones.
    853 \newterm{Symmetric (full) coroutine}s have a coroutine call a resuming function for another coroutine, which eventually forms a resuming-call cycle.
     852However, there is no stack growth because @resume@/@suspend@ context switch to an existing stack frames rather than create a new one.
     853\newterm{Symmetric (full) coroutine}s have a coroutine call a resuming function for another coroutine, which eventually forms a cycle.
    854854(The trivial cycle is a coroutine resuming itself.)
    855855This control flow is similar to recursion for normal routines, but again there is no stack growth from the context switch.
     
    935935The @start@ function communicates both the number of elements to be produced and the consumer into the producer's coroutine structure.
    936936Then the @resume@ to @prod@ creates @prod@'s stack with a frame for @prod@'s coroutine main at the top, and context switches to it.
    937 @prod@'s coroutine main starts, creates local variables that are retained between coroutine activations, and executes $N$ iterations, each generating two random values, calling the consumer to deliver the values, and printing the status returned from the consumer.
     937@prod@'s coroutine main starts, creates local variables that are retained between coroutine activations, and executes $N$ iterations, each generating two random vales, calling the consumer to deliver the values, and printing the status returned from the consumer.
    938938
    939939The producer call to @delivery@ transfers values into the consumer's communication variables, resumes the consumer, and returns the consumer status.
    940940For the first resume, @cons@'s stack is initialized, creating local variables retained between subsequent activations of the coroutine.
    941 The consumer iterates until the @done@ flag is set, prints, increments status, and calls back to the producer via @payment@, and on return from @payment@, prints the receipt from the producer and increments @money@ (inflation).
    942 The call from the consumer to the @payment@ introduces the cycle between producer and consumer.
     941The consumer iterates until the @done@ flag is set, prints, increments status, and calls back to the producer's @payment@ member, and on return prints the receipt from the producer and increments the money for the next payment.
     942The call from the consumer to the producer's @payment@ member introduces the cycle between producer and consumer.
    943943When @payment@ is called, the consumer copies values into the producer's communication variable and a resume is executed.
    944 The context switch restarts the producer at the point where it was last context switched, so it continues in @delivery@ after the resume.
    945 
    946 @delivery@ returns the status value in @prod@'s coroutine main, where the status is printed.
     944The context switch restarts the producer at the point where it was last context switched and it continues in member @delivery@ after the resume.
     945
     946The @delivery@ member returns the status value in @prod@'s @main@ member, where the status is printed.
    947947The loop then repeats calling @delivery@, where each call resumes the consumer coroutine.
    948948The context switch to the consumer continues in @payment@.
    949 The consumer increments and returns the receipt to the call in @cons@'s coroutine main.
     949The consumer increments and returns the receipt to the call in @cons@'s @main@ member.
    950950The loop then repeats calling @payment@, where each call resumes the producer coroutine.
    951951
     
    954954The context switch restarts @cons@ in @payment@ and it returns with the last receipt.
    955955The consumer terminates its loops because @done@ is true, its @main@ terminates, so @cons@ transitions from a coroutine back to an object, and @prod@ reactivates after the resume in @stop@.
    956 @stop@ returns and @prod@'s coroutine main terminates.
     956The @stop@ member returns and @prod@'s @main@ member terminates.
    957957The program main restarts after the resume in @start@.
    958 @start@ returns and the program main terminates.
    959 
    960 
    961 \subsection{Coroutine Implementation}
    962 
    963 A significant implementation challenge for coroutines (and threads, see section \ref{threads}) is adding extra fields and executing code after/before the coroutine constructor/destructor and coroutine main to create/initialize/de-initialize/destroy extra fields and the stack.
    964 There are several solutions to this problem and the chosen option forced the \CFA coroutine design.
    965 
    966 Object-oriented inheritance provides extra fields and code in a restricted context, but it requires programmers to explicitly perform the inheritance:
    967 \begin{cfa}
    968 struct mycoroutine $\textbf{\textsf{inherits}}$ baseCoroutine { ... }
    969 \end{cfa}
    970 and the programming language (and possibly its tool set, \eg debugger) may need to understand @baseCoroutine@ because of the stack.
    971 Furthermore, the execution of constructs/destructors is in the wrong order for certain operations, \eg for threads;
    972 \eg, if the thread is implicitly started, it must start \emph{after} all constructors, because the thread relies on a completely initialized object, but the inherited constructor runs \emph{before} the derived.
    973 
    974 An alternatively is composition:
    975 \begin{cfa}
    976 struct mycoroutine {
    977         ... // declarations
    978         baseCoroutine dummy; // composition, last declaration
    979 }
    980 \end{cfa}
    981 which also requires an explicit declaration that must be the last one to ensure correct initialization order.
    982 However, there is nothing preventing wrong placement or multiple declarations.
     958The @start@ member returns and the program main terminates.
     959
     960
     961\subsubsection{Construction}
     962
     963One important design challenge for implementing coroutines and threads (shown in section \ref{threads}) is that the runtime system needs to run code after the user-constructor runs to connect the fully constructed object into the system.
     964In the case of coroutines, this challenge is simpler since there is no non-determinism from preemption or scheduling.
     965However, the underlying challenge remains the same for coroutines and threads.
     966
     967The runtime system needs to create the coroutine's stack and, more importantly, prepare it for the first resumption.
     968The timing of the creation is non-trivial since users expect both to have fully constructed objects once execution enters the coroutine main and to be able to resume the coroutine from the constructor.
     969There are several solutions to this problem but the chosen option effectively forces the design of the coroutine.
     970
     971Furthermore, \CFA faces an extra challenge as polymorphic routines create invisible thunks when cast to non-polymorphic routines and these thunks have function scope.
     972For example, the following code, while looking benign, can run into undefined behaviour because of thunks:
     973
     974\begin{cfa}
     975// async: Runs function asynchronously on another thread
     976forall(otype T)
     977extern void async(void (*func)(T*), T* obj);
     978
     979forall(otype T)
     980void noop(T*) {}
     981
     982void bar() {
     983        int a;
     984        async(noop, &a); // start thread running noop with argument a
     985}
     986\end{cfa}
     987
     988The generated C code\footnote{Code trimmed down for brevity} creates a local thunk to hold type information:
     989
     990\begin{cfa}
     991extern void async(/* omitted */, void (*func)(void*), void* obj);
     992
     993void noop(/* omitted */, void* obj){}
     994
     995void bar(){
     996        int a;
     997        void _thunk0(int* _p0){
     998                /* omitted */
     999                noop(/* omitted */, _p0);
     1000        }
     1001        /* omitted */
     1002        async(/* omitted */, ((void (*)(void*))(&_thunk0)), (&a));
     1003}
     1004\end{cfa}
     1005The problem in this example is a storage management issue, the function pointer @_thunk0@ is only valid until the end of the block, which limits the viable solutions because storing the function pointer for too long causes undefined behaviour; \ie the stack-based thunk being destroyed before it can be used.
     1006This challenge is an extension of challenges that come with second-class routines.
     1007Indeed, GCC nested routines also have the limitation that nested routine cannot be passed outside of the declaration scope.
     1008The case of coroutines and threads is simply an extension of this problem to multiple call stacks.
     1009
     1010
     1011\subsubsection{Alternative: Composition}
     1012
     1013One solution to this challenge is to use composition/containment, where coroutine fields are added to manage the coroutine.
     1014
     1015\begin{cfa}
     1016struct Fibonacci {
     1017        int fn; // used for communication
     1018        coroutine c; // composition
     1019};
     1020
     1021void FibMain(void*) {
     1022        //...
     1023}
     1024
     1025void ?{}(Fibonacci& this) {
     1026        this.fn = 0;
     1027        // Call constructor to initialize coroutine
     1028        (this.c){myMain};
     1029}
     1030\end{cfa}
     1031The downside of this approach is that users need to correctly construct the coroutine handle before using it.
     1032Like any other objects, the user must carefully choose construction order to prevent usage of objects not yet constructed.
     1033However, in the case of coroutines, users must also pass to the coroutine information about the coroutine main, like in the previous example.
     1034This opens the door for user errors and requires extra runtime storage to pass at runtime information that can be known statically.
     1035
     1036
     1037\subsubsection{Alternative: Reserved keyword}
     1038
     1039The next alternative is to use language support to annotate coroutines as follows:
     1040\begin{cfa}
     1041coroutine Fibonacci {
     1042        int fn; // used for communication
     1043};
     1044\end{cfa}
     1045The @coroutine@ keyword means the compiler can find and inject code where needed.
     1046The downside of this approach is that it makes coroutine a special case in the language.
     1047Users wanting to extend coroutines or build their own for various reasons can only do so in ways offered by the language.
     1048Furthermore, implementing coroutines without language supports also displays the power of the programming language used.
     1049While this is ultimately the option used for idiomatic \CFA code, coroutines and threads can still be constructed by users without using the language support.
     1050The reserved keywords are only present to improve ease of use for the common cases.
     1051
     1052
     1053\subsubsection{Alternative: Lambda Objects}
    9831054
    9841055For coroutines as for threads, many implementations are based on routine pointers or function objects~\cite{Butenhof97, C++14, MS:VisualC++, BoostCoroutines15}.
    985 For example, Boost implements coroutines in terms of four functor object-types:
     1056For example, Boost implements coroutines in terms of four functor object types:
    9861057\begin{cfa}
    9871058asymmetric_coroutine<>::pull_type
     
    9901061symmetric_coroutine<>::yield_type
    9911062\end{cfa}
    992 Similarly, the canonical threading paradigm is often based on function pointers, \eg @pthread@~\cite{pthreads}, \Csharp~\cite{Csharp}, Go~\cite{Go}, and Scala~\cite{Scala}.
    993 However, the generic thread-handle (identifier) is limited (few operations), unless it is wrapped in a custom type.
    994 \begin{cfa}
    995 void mycor( coroutine_t cid, void * arg ) {
    996         int * value = (int *)arg;                               $\C{// type unsafe, pointer-size only}$
     1063Often, the canonical threading paradigm in languages is based on function pointers, @pthread@ being one of the most well-known examples.
     1064The main problem of this approach is that the thread usage is limited to a generic handle that must otherwise be wrapped in a custom type.
     1065Since the custom type is simple to write in \CFA and solves several issues, added support for routine/lambda based coroutines adds very little.
     1066
     1067A variation of this would be to use a simple function pointer in the same way @pthread@ does for threads:
     1068\begin{cfa}
     1069void foo( coroutine_t cid, void* arg ) {
     1070        int* value = (int*)arg;
    9971071        // Coroutine body
    9981072}
     1073
    9991074int main() {
    1000         int input = 0, output;
    1001         coroutine_t cid = coroutine_create( &mycor, (void *)&input ); $\C{// type unsafe, pointer-size only}$
    1002         coroutine_resume( cid, (void *)input, (void **)&output ); $\C{// type unsafe, pointer-size only}$
    1003 }
    1004 \end{cfa}
    1005 Since the custom type is simple to write in \CFA and solves several issues, added support for routine/lambda-based coroutines adds very little.
    1006 
    1007 The selected approach is to use language support by introducing a new kind of aggregate (structure):
    1008 \begin{cfa}
    1009 coroutine Fibonacci {
    1010         int fn; // communication variables
     1075        int value = 0;
     1076        coroutine_t cid = coroutine_create( &foo, (void*)&value );
     1077        coroutine_resume( &cid );
     1078}
     1079\end{cfa}
     1080This semantics is more common for thread interfaces but coroutines work equally well.
     1081As discussed in section \ref{threads}, this approach is superseded by static approaches in terms of expressivity.
     1082
     1083
     1084\subsubsection{Alternative: Trait-Based Coroutines}
     1085
     1086Finally, the underlying approach, which is the one closest to \CFA idioms, is to use trait-based lazy coroutines.
     1087This approach defines a coroutine as anything that satisfies the trait @is_coroutine@ (as defined below) and is used as a coroutine.
     1088
     1089\begin{cfa}
     1090trait is_coroutine(dtype T) {
     1091      void main(T& this);
     1092      coroutine_desc* get_coroutine(T& this);
    10111093};
    1012 \end{cfa}
    1013 The @coroutine@ keyword means the compiler (and tool set) can find and inject code where needed.
    1014 The downside of this approach is that it makes coroutine a special case in the language.
    1015 Users wanting to extend coroutines or build their own for various reasons can only do so in ways offered by the language.
    1016 Furthermore, implementing coroutines without language supports also displays the power of a programming language.
    1017 While this is ultimately the option used for idiomatic \CFA code, coroutines and threads can still be constructed without using the language support.
    1018 The reserved keyword eases use for the common cases.
    1019 
    1020 Part of the mechanism to generalize coroutines is using a \CFA trait, which defines a coroutine as anything satisfying the trait @is_coroutine@, and this trait is used to restrict coroutine-manipulation functions:
    1021 \begin{cfa}
    1022 trait is_coroutine( dtype T ) {
    1023       void main( T & this );
    1024       coroutine_desc * get_coroutine( T & this );
     1094
     1095forall( dtype T | is_coroutine(T) ) void suspend(T&);
     1096forall( dtype T | is_coroutine(T) ) void resume (T&);
     1097\end{cfa}
     1098This ensures that an object is not a coroutine until @resume@ is called on the object.
     1099Correspondingly, any object that is passed to @resume@ is a coroutine since it must satisfy the @is_coroutine@ trait to compile.
     1100The advantage of this approach is that users can easily create different types of coroutines, for example, changing the memory layout of a coroutine is trivial when implementing the @get_coroutine@ routine.
     1101The \CFA keyword @coroutine@ simply has the effect of implementing the getter and forward declarations required for users to implement the main routine.
     1102
     1103\begin{center}
     1104\begin{tabular}{c c c}
     1105\begin{cfa}[tabsize=3]
     1106coroutine MyCoroutine {
     1107        int someValue;
    10251108};
    1026 forall( dtype T | is_coroutine(T) ) void get_coroutine( T & );
    1027 forall( dtype T | is_coroutine(T) ) void suspend( T & );
    1028 forall( dtype T | is_coroutine(T) ) void resume( T & );
    1029 \end{cfa}
    1030 This definition ensures there is a statically-typed @main@ function that is the starting point (first stack frame) of a coroutine.
    1031 No return value or additional parameters are necessary for this function, because the coroutine type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values.
    1032 As well, any object passed to @suspend@ and @resume@ is a coroutine since it must satisfy the @is_coroutine@ trait to compile.
    1033 The advantage of this approach is that users can easily create different types of coroutines, for example, changing the memory layout of a coroutine is trivial when implementing the @get_coroutine@ routine.
    1034 The \CFA keyword @coroutine@ implicitly implements the getter and forward declarations required for implementing the coroutine main:
    1035 \begin{cquote}
    1036 \begin{tabular}{@{}ccc@{}}
    1037 \begin{cfa}
    1038 coroutine MyCor {
    1039         int value;
    1040 
     1109\end{cfa} & == & \begin{cfa}[tabsize=3]
     1110struct MyCoroutine {
     1111        int someValue;
     1112        coroutine_desc __cor;
    10411113};
    1042 \end{cfa}
    1043 & {\Large $\Rightarrow$} &
    1044 \begin{tabular}{@{}ccc@{}}
    1045 \begin{cfa}
    1046 struct MyCor {
    1047         int value;
    1048         coroutine_desc cor;
     1114
     1115static inline
     1116coroutine_desc* get_coroutine(
     1117        struct MyCoroutine& this
     1118) {
     1119        return &this.__cor;
     1120}
     1121
     1122void main(struct MyCoroutine* this);
     1123\end{cfa}
     1124\end{tabular}
     1125\end{center}
     1126
     1127The combination of these two approaches allows users new to coroutining and concurrency to have an easy and concise specification, while more advanced users have tighter control on memory layout and initialization.
     1128
     1129\subsection{Thread Interface}\label{threads}
     1130The basic building blocks of multithreading in \CFA are \textbf{cfathread}.
     1131Both user and kernel threads are supported, where user threads are the concurrency mechanism and kernel threads are the parallel mechanism.
     1132User threads offer a flexible and lightweight interface.
     1133A thread can be declared using a struct declaration @thread@ as follows:
     1134
     1135\begin{cfa}
     1136thread foo {};
     1137\end{cfa}
     1138
     1139As for coroutines, the keyword is a thin wrapper around a \CFA trait:
     1140
     1141\begin{cfa}
     1142trait is_thread(dtype T) {
     1143      void ^?{}(T & mutex this);
     1144      void main(T & this);
     1145      thread_desc* get_thread(T & this);
    10491146};
    10501147\end{cfa}
    1051 &
    1052 \begin{cfa}
    1053 static inline coroutine_desc *
    1054 get_coroutine( MyCor & this ) {
    1055         return &this.cor;
    1056 }
    1057 \end{cfa}
    1058 &
    1059 \begin{cfa}
    1060 void main( MyCor * this );
    1061 
    1062 
    1063 
    1064 \end{cfa}
    1065 \end{tabular}
    1066 \end{tabular}
    1067 \end{cquote}
    1068 The combination of these two approaches allows an easy and concise specification to coroutining (and concurrency) for normal users, while more advanced users have tighter control on memory layout and initialization.
    1069 
    1070 
    1071 \subsection{Thread Interface}
    1072 \label{threads}
    1073 
    1074 Both user and kernel threads are supported, where user threads provide concurrency and kernel threads provide parallelism.
    1075 Like coroutines and for the same design reasons, the selected approach for user threads is to use language support by introducing a new kind of aggregate (structure) and a \CFA trait:
    1076 \begin{cquote}
    1077 \begin{tabular}{@{}c@{\hspace{2\parindentlnth}}c@{}}
    1078 \begin{cfa}
    1079 thread myThread {
    1080         // communication variables
    1081 };
    1082 
    1083 
    1084 \end{cfa}
    1085 &
    1086 \begin{cfa}
    1087 trait is_thread( dtype T ) {
    1088       void main( T & this );
    1089       thread_desc * get_thread( T & this );
    1090       void ^?{}( T & `mutex` this );
    1091 };
    1092 \end{cfa}
    1093 \end{tabular}
    1094 \end{cquote}
    1095 (The qualifier @mutex@ for the destructor parameter is discussed in Section~\ref{s:Monitors}.)
    1096 Like a coroutine, the statically-typed @main@ function is the starting point (first stack frame) of a user thread.
    1097 The difference is that a coroutine borrows a thread from its caller, so the first thread resuming a coroutine creates an instance of @main@;
    1098 whereas, a user thread receives its own thread from the runtime system, which starts in @main@ as some point after the thread constructor is run.\footnote{
    1099 The \lstinline@main@ function is already a special routine in C (where the program begins), so it is a natural extension of the semantics to use overloading to declare mains for different coroutines/threads (the normal main being the main of the initial thread).}
    1100 No return value or additional parameters are necessary for this function, because the task type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values.
    1101 
    1102 \begin{comment} % put in appendix with coroutine version ???
     1148
     1149Obviously, for this thread implementation to be useful it must run some user code.
     1150Several other threading interfaces use a function-pointer representation as the interface of threads (for example \Csharp~\cite{Csharp} and Scala~\cite{Scala}).
     1151However, this proposal considers that statically tying a @main@ routine to a thread supersedes this approach.
     1152Since the @main@ routine is already a special routine in \CFA (where the program begins), it is a natural extension of the semantics to use overloading to declare mains for different threads (the normal main being the main of the initial thread).
    11031153As such the @main@ routine of a thread can be defined as
    11041154\begin{cfa}
     
    11391189}
    11401190\end{cfa}
     1191
    11411192A consequence of the strongly typed approach to main is that memory layout of parameters and return values to/from a thread are now explicitly specified in the \textbf{api}.
    1142 \end{comment}
    1143 
    1144 For user threads to be useful, it must be possible to start and stop the underlying thread, and wait for it to complete execution.
    1145 While using an API such as @fork@ and @join@ is relatively common, such an interface is awkward and unnecessary.
    1146 A simple approach is to use allocation/deallocation principles, and have threads implicitly @fork@ after construction and @join@ before destruction.
    1147 \begin{cfa}
    1148 thread World {};
    1149 void main( World & this ) {
     1193
     1194Of course, for threads to be useful, it must be possible to start and stop threads and wait for them to complete execution.
     1195While using an \textbf{api} such as @fork@ and @join@ is relatively common in the literature, such an interface is unnecessary.
     1196Indeed, the simplest approach is to use \textbf{raii} principles and have threads @fork@ after the constructor has completed and @join@ before the destructor runs.
     1197\begin{cfa}
     1198thread World;
     1199
     1200void main(World & this) {
    11501201        sout | "World!" | endl;
    11511202}
    1152 int main() {
    1153         World w`[10]`;                                                  $\C{// implicit forks after creation}$
    1154         sout | "Hello " | endl;                                 $\C{// "Hello " and 10 "World!" printed concurrently}$
    1155 }                                                                                       $\C{// implicit joins before destruction}$
    1156 \end{cfa}
    1157 This semantics ensures a thread is started and stopped exactly once, eliminating some programming error, and scales to multiple threads for basic (termination) synchronization.
    1158 This tree-structure (lattice) create/delete from C block-structure is generalized by using dynamic allocation, so threads can outlive the scope in which they are created, much like dynamically allocating memory lets objects outlive the scope in which they are created.
    1159 \begin{cfa}
    1160 int main() {
    1161         MyThread * heapLived;
     1203
     1204void main() {
     1205        World w;
     1206        // Thread forks here
     1207
     1208        // Printing "Hello " and "World!" are run concurrently
     1209        sout | "Hello " | endl;
     1210
     1211        // Implicit join at end of scope
     1212}
     1213\end{cfa}
     1214
     1215This semantic has several advantages over explicit semantics: a thread is always started and stopped exactly once, users cannot make any programming errors, and it naturally scales to multiple threads meaning basic synchronization is very simple.
     1216
     1217\begin{cfa}
     1218thread MyThread {
     1219        //...
     1220};
     1221
     1222// main
     1223void main(MyThread& this) {
     1224        //...
     1225}
     1226
     1227void foo() {
     1228        MyThread thrds[10];
     1229        // Start 10 threads at the beginning of the scope
     1230
     1231        DoStuff();
     1232
     1233        // Wait for the 10 threads to finish
     1234}
     1235\end{cfa}
     1236
     1237However, one of the drawbacks of this approach is that threads always form a tree where nodes must always outlive their children, \ie they are always destroyed in the opposite order of construction because of C scoping rules.
     1238This restriction is relaxed by using dynamic allocation, so threads can outlive the scope in which they are created, much like dynamically allocating memory lets objects outlive the scope in which they are created.
     1239
     1240\begin{cfa}
     1241thread MyThread {
     1242        //...
     1243};
     1244
     1245void main(MyThread& this) {
     1246        //...
     1247}
     1248
     1249void foo() {
     1250        MyThread* long_lived;
    11621251        {
    1163                 MyThread blockLived;                            $\C{// fork block-based thread}$
    1164                 heapLived = `new`( MyThread );          $\C{// fork heap-based thread}$
    1165                 ...
    1166         }                                                                               $\C{// join block-based thread}$
    1167         ...
    1168         `delete`( heapLived );                                  $\C{// join heap-based thread}$
    1169 }
    1170 \end{cfa}
    1171 The heap-based approach allows arbitrary thread-creation topologies, with respect to fork/join-style concurrency.
    1172 
    1173 
    1174 \section{Synchronization / Mutual Exclusion}
    1175 
    1176 Uncontrolled non-deterministic execution is meaningless.
    1177 To reestablish meaningful execution requires mechanisms to reintroduce determinism (control non-determinism), called synchronization and mutual exclusion, where \newterm{synchronization} is a timing relationship among threads and \newterm{mutual exclusion} is an access-control mechanism on data shared by threads.
    1178 Since many deterministic challenges appear with the use of mutable shared state, some languages/libraries disallow it (Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, Akka~\cite{Akka} (Scala)).
    1179 In these paradigms, interaction among concurrent objects is performed by stateless message-passing~\cite{Thoth,Harmony,V-Kernel} or other paradigms closely relate to networking concepts (\eg channels~\cite{CSP,Go}).
    1180 However, in call/return-based languages, these approaches force a clear distinction (\ie introduce a new programming paradigm) between non-concurrent and concurrent computation (\ie function call versus message passing).
    1181 This distinction means a programmers needs to learn two sets of design patterns.
     1252                // Start a thread at the beginning of the scope
     1253                MyThread short_lived;
     1254
     1255                // create another thread that will outlive the thread in this scope
     1256                long_lived = new MyThread;
     1257
     1258                DoStuff();
     1259
     1260                // Wait for the thread short_lived to finish
     1261        }
     1262        DoMoreStuff();
     1263
     1264        // Now wait for the long_lived to finish
     1265        delete long_lived;
     1266}
     1267\end{cfa}
     1268
     1269
     1270% ======================================================================
     1271% ======================================================================
     1272\section{Concurrency}
     1273% ======================================================================
     1274% ======================================================================
     1275Several tools can be used to solve concurrency challenges.
     1276Since many of these challenges appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared state (Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, Akka (Scala)~\cite{Akka}).
     1277In these paradigms, interaction among concurrent objects relies on message passing~\cite{Thoth,Harmony,V-Kernel} or other paradigms closely relate to networking concepts (channels~\cite{CSP,Go} for example).
     1278However, in languages that use routine calls as their core abstraction mechanism, these approaches force a clear distinction between concurrent and non-concurrent paradigms (\ie message passing versus routine calls).
     1279This distinction in turn means that, in order to be effective, programmers need to learn two sets of design patterns.
    11821280While this distinction can be hidden away in library code, effective use of the library still has to take both paradigms into account.
    1183 In contrast, approaches based on statefull models more closely resemble the standard call/return programming-model, resulting in a single programming paradigm.
    1184 
    1185 At the lowest level, concurrent control is implemented as atomic operations, upon which difference kinds of locks/approaches are constructed, \eg semaphores~\cite{Dijkstra68b} and path expressions~\cite{Campbell74}.
    1186 However, for productivity it is always desirable to use the highest-level construct that provides the necessary efficiency~\cite{Hochstein05}.
    1187 An newer approach worth mentioning is transactional memory~\cite{Herlihy93}.
    1188 While this approach is pursued in hardware~\cite{} and system languages, like \CC~\cite{Cpp-Transactions}, the performance and feature set is still too restrictive to be the main concurrency paradigm for system languages, which is why it was rejected as the core paradigm for concurrency in \CFA.
    1189 
    1190 One of the most natural, elegant, and efficient mechanisms for synchronization and mutual exclusion for shared-memory systems is the \emph{monitor}.
     1281
     1282Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on basic constructs like routine calls and shared objects.
     1283At the lowest level, concurrent paradigms are implemented as atomic operations and locks.
     1284Many such mechanisms have been proposed, including semaphores~\cite{Dijkstra68b} and path expressions~\cite{Campbell74}.
     1285However, for productivity reasons it is desirable to have a higher-level construct be the core concurrency paradigm~\cite{Hochstein05}.
     1286
     1287An approach that is worth mentioning because it is gaining in popularity is transactional memory~\cite{Herlihy93}.
     1288While this approach is even pursued by system languages like \CC~\cite{Cpp-Transactions}, the performance and feature set is currently too restrictive to be the main concurrency paradigm for system languages, which is why it was rejected as the core paradigm for concurrency in \CFA.
     1289
     1290One of the most natural, elegant, and efficient mechanisms for synchronization and communication, especially for shared-memory systems, is the \emph{monitor}.
    11911291Monitors were first proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}.
    11921292Many programming languages---\eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java}---provide monitors as explicit language constructs.
    11931293In addition, operating-system kernels and device drivers have a monitor-like structure, although they often use lower-level primitives such as semaphores or locks to simulate monitors.
    1194 For these reasons, this project proposes monitors as the core concurrency construct, upon which even higher-level approaches can be easily constructed..
     1294For these reasons, this project proposes monitors as the core concurrency construct.
    11951295
    11961296
     
    12291329
    12301330
     1331% ======================================================================
     1332% ======================================================================
    12311333\section{Monitors}
    1232 \label{s:Monitors}
    1233 
     1334% ======================================================================
     1335% ======================================================================
    12341336A \textbf{monitor} is a set of routines that ensure mutual-exclusion when accessing shared state.
    12351337More precisely, a monitor is a programming technique that associates mutual-exclusion to routine scopes, as opposed to mutex locks, where mutual-exclusion is defined by lock/release calls independently of any scoping of the calling routine.
     
    23992501Given these building blocks, it is possible to reproduce all three of the popular paradigms.
    24002502Indeed, \textbf{uthread} is the default paradigm in \CFA.
    2401 However, disabling \textbf{preemption} on a cluster means threads effectively become fibers.
     2503However, disabling \textbf{preemption} on the \textbf{cfacluster} means \textbf{cfathread} effectively become \textbf{fiber}.
    24022504Since several \textbf{cfacluster} with different scheduling policy can coexist in the same application, this allows \textbf{fiber} and \textbf{uthread} to coexist in the runtime of an application.
    24032505Finally, it is possible to build executors for thread pools from \textbf{uthread} or \textbf{fiber}, which includes specialized jobs like actors~\cite{Actors}.
Note: See TracChangeset for help on using the changeset viewer.