Changeset e982385 for doc/papers/concurrency/Paper.tex
- Timestamp:
- May 24, 2018, 1:36:39 PM (5 years ago)
- Branches:
- aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, with_gc
- Children:
- b4f0dde
- Parents:
- dd226e3 (diff), 1dc58fd (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/concurrency/Paper.tex
rdd226e3 re982385 70 70 %\DeclareTextCommandDefault{\textunderscore}{\leavevmode\makebox[1.2ex][c]{\rule{1ex}{0.1ex}}} 71 71 \renewcommand{\textunderscore}{\leavevmode\makebox[1.2ex][c]{\rule{1ex}{0.075ex}}} 72 %\def\myCHarFont{\fontencoding{T1}\selectfont}%73 % \def\{{\ttfamily\upshape\myCHarFont \char`\}}}%74 72 75 73 \renewcommand*{\thefootnote}{\Alph{footnote}} % hack because fnsymbol does not work … … 741 739 The coroutine main's stack holds the state for the next generation, @f1@ and @f2@, and the code has the three suspend points, representing the three states in the Fibonacci formula, to context switch back to the caller's resume. 742 740 The interface function, @next@, takes a Fibonacci instance and context switches to it using @resume@; 743 on re turn, the Fibonacci field, @fn@, contains the next value in the sequence, which is returned.741 on restart, the Fibonacci field, @fn@, contains the next value in the sequence, which is returned. 744 742 The first @resume@ is special because it cocalls the coroutine at its coroutine main and allocates the stack; 745 743 when the coroutine main returns, its stack is deallocated. 746 744 Hence, @Fib@ is an object at creation, transitions to a coroutine on its first resume, and transitions back to an object when the coroutine main finishes. 747 745 Figure~\ref{f:Coroutine1State} shows the coroutine version of the C version in Figure~\ref{f:ExternalState}. 748 Coroutine generators are called \newterm{output coroutines} because values are returned by the coroutine.749 750 Figure~\ref{f:CFAFmt} shows an \newterm{input coroutine}, @Format@, for restructuring text into groups of character blocks of fixed size.746 Coroutine generators are called \newterm{output coroutines} because values are only returned. 747 748 Figure~\ref{f:CFAFmt} shows an \newterm{input coroutine}, @Format@, for restructuring text into groups of characters of fixed-size blocks. 751 749 For example, the input of the left is reformatted into the output on the right. 752 750 \begin{quote} … … 763 761 \end{tabular} 764 762 \end{quote} 765 The example takes advantage of resuming coroutines in the constructor to prime the coroutine loops so the first character sent for formatting appears inside the nested loops.763 The example takes advantage of resuming a coroutine in the constructor to prime the loops so the first character sent for formatting appears inside the nested loops. 766 764 The destruction provides a newline if formatted text ends with a full line. 767 765 Figure~\ref{f:CFmt} shows the C equivalent formatter, where the loops of the coroutine are flatten (linearized) and rechecked on each call because execution location is not retained between calls. … … 778 776 void main( Format & fmt ) with( fmt ) { 779 777 for ( ;; ) { 780 for ( g = 0; g < 5; g += 1 ) { // group778 for ( g = 0; g < 5; g += 1 ) { // group 781 779 for ( b = 0; b < 4; b += 1 ) { // block 782 780 `suspend();` … … 814 812 }; 815 813 void format( struct Format * fmt ) { 816 if ( fmt->ch != -1 ) { // not EOF814 if ( fmt->ch != -1 ) { // not EOF ? 817 815 printf( "%c", fmt->ch ); 818 816 fmt->b += 1; … … 823 821 } 824 822 if ( fmt->g == 5 ) { // group 825 printf( "\n" ); 823 printf( "\n" ); // separator 826 824 fmt->g = 0; 827 825 } … … 850 848 851 849 The previous examples are \newterm{asymmetric (semi) coroutine}s because one coroutine always calls a resuming function for another coroutine, and the resumed coroutine always suspends back to its last resumer, similar to call/return for normal functions. 852 However, there is no stack growth because @resume@/@suspend@ context switch to an existing stack frames rather than create a new one.853 \newterm{Symmetric (full) coroutine}s have a coroutine call a resuming function for another coroutine, which eventually forms a cycle.850 However, there is no stack growth because @resume@/@suspend@ context switch to existing stack-frames rather than create new ones. 851 \newterm{Symmetric (full) coroutine}s have a coroutine call a resuming function for another coroutine, which eventually forms a resuming-call cycle. 854 852 (The trivial cycle is a coroutine resuming itself.) 855 853 This control flow is similar to recursion for normal routines, but again there is no stack growth from the context switch. … … 935 933 The @start@ function communicates both the number of elements to be produced and the consumer into the producer's coroutine structure. 936 934 Then the @resume@ to @prod@ creates @prod@'s stack with a frame for @prod@'s coroutine main at the top, and context switches to it. 937 @prod@'s coroutine main starts, creates local variables that are retained between coroutine activations, and executes $N$ iterations, each generating two random val es, calling the consumer to deliver the values, and printing the status returned from the consumer.935 @prod@'s coroutine main starts, creates local variables that are retained between coroutine activations, and executes $N$ iterations, each generating two random values, calling the consumer to deliver the values, and printing the status returned from the consumer. 938 936 939 937 The producer call to @delivery@ transfers values into the consumer's communication variables, resumes the consumer, and returns the consumer status. 940 938 For the first resume, @cons@'s stack is initialized, creating local variables retained between subsequent activations of the coroutine. 941 The consumer iterates until the @done@ flag is set, prints, increments status, and calls back to the producer 's @payment@ member, and on return prints the receipt from the producer and increments the money for the next payment.942 The call from the consumer to the producer's @payment@ memberintroduces the cycle between producer and consumer.939 The consumer iterates until the @done@ flag is set, prints, increments status, and calls back to the producer via @payment@, and on return from @payment@, prints the receipt from the producer and increments @money@ (inflation). 940 The call from the consumer to the @payment@ introduces the cycle between producer and consumer. 943 941 When @payment@ is called, the consumer copies values into the producer's communication variable and a resume is executed. 944 The context switch restarts the producer at the point where it was last context switched and it continues in member@delivery@ after the resume.945 946 The @delivery@ member returns the status value in @prod@'s @main@ member, where the status is printed.942 The context switch restarts the producer at the point where it was last context switched, so it continues in @delivery@ after the resume. 943 944 @delivery@ returns the status value in @prod@'s coroutine main, where the status is printed. 947 945 The loop then repeats calling @delivery@, where each call resumes the consumer coroutine. 948 946 The context switch to the consumer continues in @payment@. 949 The consumer increments and returns the receipt to the call in @cons@'s @main@ member.947 The consumer increments and returns the receipt to the call in @cons@'s coroutine main. 950 948 The loop then repeats calling @payment@, where each call resumes the producer coroutine. 951 949 … … 954 952 The context switch restarts @cons@ in @payment@ and it returns with the last receipt. 955 953 The consumer terminates its loops because @done@ is true, its @main@ terminates, so @cons@ transitions from a coroutine back to an object, and @prod@ reactivates after the resume in @stop@. 956 The @stop@ member returns and @prod@'s @main@ memberterminates.954 @stop@ returns and @prod@'s coroutine main terminates. 957 955 The program main restarts after the resume in @start@. 958 The @start@ member returns and the program main terminates. 959 960 961 \subsubsection{Construction} 962 963 One important design challenge for implementing coroutines and threads (shown in section \ref{threads}) is that the runtime system needs to run code after the user-constructor runs to connect the fully constructed object into the system. 964 In the case of coroutines, this challenge is simpler since there is no non-determinism from preemption or scheduling. 965 However, the underlying challenge remains the same for coroutines and threads. 966 967 The runtime system needs to create the coroutine's stack and, more importantly, prepare it for the first resumption. 968 The timing of the creation is non-trivial since users expect both to have fully constructed objects once execution enters the coroutine main and to be able to resume the coroutine from the constructor. 969 There are several solutions to this problem but the chosen option effectively forces the design of the coroutine. 970 971 Furthermore, \CFA faces an extra challenge as polymorphic routines create invisible thunks when cast to non-polymorphic routines and these thunks have function scope. 972 For example, the following code, while looking benign, can run into undefined behaviour because of thunks: 973 974 \begin{cfa} 975 // async: Runs function asynchronously on another thread 976 forall(otype T) 977 extern void async(void (*func)(T*), T* obj); 978 979 forall(otype T) 980 void noop(T*) {} 981 982 void bar() { 983 int a; 984 async(noop, &a); // start thread running noop with argument a 985 } 986 \end{cfa} 987 988 The generated C code\footnote{Code trimmed down for brevity} creates a local thunk to hold type information: 989 990 \begin{cfa} 991 extern void async(/* omitted */, void (*func)(void*), void* obj); 992 993 void noop(/* omitted */, void* obj){} 994 995 void bar(){ 996 int a; 997 void _thunk0(int* _p0){ 998 /* omitted */ 999 noop(/* omitted */, _p0); 1000 } 1001 /* omitted */ 1002 async(/* omitted */, ((void (*)(void*))(&_thunk0)), (&a)); 1003 } 1004 \end{cfa} 1005 The problem in this example is a storage management issue, the function pointer @_thunk0@ is only valid until the end of the block, which limits the viable solutions because storing the function pointer for too long causes undefined behaviour; \ie the stack-based thunk being destroyed before it can be used. 1006 This challenge is an extension of challenges that come with second-class routines. 1007 Indeed, GCC nested routines also have the limitation that nested routine cannot be passed outside of the declaration scope. 1008 The case of coroutines and threads is simply an extension of this problem to multiple call stacks. 1009 1010 1011 \subsubsection{Alternative: Composition} 1012 1013 One solution to this challenge is to use composition/containment, where coroutine fields are added to manage the coroutine. 1014 1015 \begin{cfa} 1016 struct Fibonacci { 1017 int fn; // used for communication 1018 coroutine c; // composition 1019 }; 1020 1021 void FibMain(void*) { 1022 //... 1023 } 1024 1025 void ?{}(Fibonacci& this) { 1026 this.fn = 0; 1027 // Call constructor to initialize coroutine 1028 (this.c){myMain}; 1029 } 1030 \end{cfa} 1031 The downside of this approach is that users need to correctly construct the coroutine handle before using it. 1032 Like any other objects, the user must carefully choose construction order to prevent usage of objects not yet constructed. 1033 However, in the case of coroutines, users must also pass to the coroutine information about the coroutine main, like in the previous example. 1034 This opens the door for user errors and requires extra runtime storage to pass at runtime information that can be known statically. 1035 1036 1037 \subsubsection{Alternative: Reserved keyword} 1038 1039 The next alternative is to use language support to annotate coroutines as follows: 1040 \begin{cfa} 1041 coroutine Fibonacci { 1042 int fn; // used for communication 1043 }; 1044 \end{cfa} 1045 The @coroutine@ keyword means the compiler can find and inject code where needed. 1046 The downside of this approach is that it makes coroutine a special case in the language. 1047 Users wanting to extend coroutines or build their own for various reasons can only do so in ways offered by the language. 1048 Furthermore, implementing coroutines without language supports also displays the power of the programming language used. 1049 While this is ultimately the option used for idiomatic \CFA code, coroutines and threads can still be constructed by users without using the language support. 1050 The reserved keywords are only present to improve ease of use for the common cases. 1051 1052 1053 \subsubsection{Alternative: Lambda Objects} 956 @start@ returns and the program main terminates. 957 958 959 \subsection{Coroutine Implementation} 960 961 A significant implementation challenge for coroutines (and threads, see section \ref{threads}) is adding extra fields and executing code after/before the coroutine constructor/destructor and coroutine main to create/initialize/de-initialize/destroy extra fields and the stack. 962 There are several solutions to this problem and the chosen option forced the \CFA coroutine design. 963 964 Object-oriented inheritance provides extra fields and code in a restricted context, but it requires programmers to explicitly perform the inheritance: 965 \begin{cfa} 966 struct mycoroutine $\textbf{\textsf{inherits}}$ baseCoroutine { ... } 967 \end{cfa} 968 and the programming language (and possibly its tool set, \eg debugger) may need to understand @baseCoroutine@ because of the stack. 969 Furthermore, the execution of constructs/destructors is in the wrong order for certain operations, \eg for threads; 970 \eg, if the thread is implicitly started, it must start \emph{after} all constructors, because the thread relies on a completely initialized object, but the inherited constructor runs \emph{before} the derived. 971 972 An alternatively is composition: 973 \begin{cfa} 974 struct mycoroutine { 975 ... // declarations 976 baseCoroutine dummy; // composition, last declaration 977 } 978 \end{cfa} 979 which also requires an explicit declaration that must be the last one to ensure correct initialization order. 980 However, there is nothing preventing wrong placement or multiple declarations. 1054 981 1055 982 For coroutines as for threads, many implementations are based on routine pointers or function objects~\cite{Butenhof97, C++14, MS:VisualC++, BoostCoroutines15}. 1056 For example, Boost implements coroutines in terms of four functor object 983 For example, Boost implements coroutines in terms of four functor object-types: 1057 984 \begin{cfa} 1058 985 asymmetric_coroutine<>::pull_type … … 1061 988 symmetric_coroutine<>::yield_type 1062 989 \end{cfa} 1063 Often, the canonical threading paradigm in languages is based on function pointers, @pthread@ being one of the most well-known examples. 1064 The main problem of this approach is that the thread usage is limited to a generic handle that must otherwise be wrapped in a custom type. 1065 Since the custom type is simple to write in \CFA and solves several issues, added support for routine/lambda based coroutines adds very little. 1066 1067 A variation of this would be to use a simple function pointer in the same way @pthread@ does for threads: 1068 \begin{cfa} 1069 void foo( coroutine_t cid, void* arg ) { 1070 int* value = (int*)arg; 990 Similarly, the canonical threading paradigm is often based on function pointers, \eg @pthread@~\cite{pthreads}, \Csharp~\cite{Csharp}, Go~\cite{Go}, and Scala~\cite{Scala}. 991 However, the generic thread-handle (identifier) is limited (few operations), unless it is wrapped in a custom type. 992 \begin{cfa} 993 void mycor( coroutine_t cid, void * arg ) { 994 int * value = (int *)arg; $\C{// type unsafe, pointer-size only}$ 1071 995 // Coroutine body 1072 996 } 1073 1074 997 int main() { 1075 int value = 0; 1076 coroutine_t cid = coroutine_create( &foo, (void*)&value ); 1077 coroutine_resume( &cid ); 1078 } 1079 \end{cfa} 1080 This semantics is more common for thread interfaces but coroutines work equally well. 1081 As discussed in section \ref{threads}, this approach is superseded by static approaches in terms of expressivity. 1082 1083 1084 \subsubsection{Alternative: Trait-Based Coroutines} 1085 1086 Finally, the underlying approach, which is the one closest to \CFA idioms, is to use trait-based lazy coroutines. 1087 This approach defines a coroutine as anything that satisfies the trait @is_coroutine@ (as defined below) and is used as a coroutine. 1088 1089 \begin{cfa} 1090 trait is_coroutine(dtype T) { 1091 void main(T& this); 1092 coroutine_desc* get_coroutine(T& this); 998 int input = 0, output; 999 coroutine_t cid = coroutine_create( &mycor, (void *)&input ); $\C{// type unsafe, pointer-size only}$ 1000 coroutine_resume( cid, (void *)input, (void **)&output ); $\C{// type unsafe, pointer-size only}$ 1001 } 1002 \end{cfa} 1003 Since the custom type is simple to write in \CFA and solves several issues, added support for routine/lambda-based coroutines adds very little. 1004 1005 The selected approach is to use language support by introducing a new kind of aggregate (structure): 1006 \begin{cfa} 1007 coroutine Fibonacci { 1008 int fn; // communication variables 1093 1009 }; 1094 1095 forall( dtype T | is_coroutine(T) ) void suspend(T&); 1096 forall( dtype T | is_coroutine(T) ) void resume (T&); 1097 \end{cfa} 1098 This ensures that an object is not a coroutine until @resume@ is called on the object. 1099 Correspondingly, any object that is passed to @resume@ is a coroutine since it must satisfy the @is_coroutine@ trait to compile. 1010 \end{cfa} 1011 The @coroutine@ keyword means the compiler (and tool set) can find and inject code where needed. 1012 The downside of this approach is that it makes coroutine a special case in the language. 1013 Users wanting to extend coroutines or build their own for various reasons can only do so in ways offered by the language. 1014 Furthermore, implementing coroutines without language supports also displays the power of a programming language. 1015 While this is ultimately the option used for idiomatic \CFA code, coroutines and threads can still be constructed without using the language support. 1016 The reserved keyword eases use for the common cases. 1017 1018 Part of the mechanism to generalize coroutines is using a \CFA trait, which defines a coroutine as anything satisfying the trait @is_coroutine@, and this trait is used to restrict coroutine-manipulation functions: 1019 \begin{cfa} 1020 trait is_coroutine( dtype T ) { 1021 void main( T & this ); 1022 coroutine_desc * get_coroutine( T & this ); 1023 }; 1024 forall( dtype T | is_coroutine(T) ) void get_coroutine( T & ); 1025 forall( dtype T | is_coroutine(T) ) void suspend( T & ); 1026 forall( dtype T | is_coroutine(T) ) void resume( T & ); 1027 \end{cfa} 1028 This definition ensures there is a statically-typed @main@ function that is the starting point (first stack frame) of a coroutine. 1029 No return value or additional parameters are necessary for this function, because the coroutine type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values. 1030 As well, any object passed to @suspend@ and @resume@ is a coroutine since it must satisfy the @is_coroutine@ trait to compile. 1100 1031 The advantage of this approach is that users can easily create different types of coroutines, for example, changing the memory layout of a coroutine is trivial when implementing the @get_coroutine@ routine. 1101 The \CFA keyword @coroutine@ simply has the effect of implementing the getter and forward declarations required for users to implement the main routine.1102 1103 \begin{ center}1104 \begin{ tabular}{c c c}1105 \begin{cfa}[tabsize=3] 1106 coroutine MyCoroutine { 1107 int someValue; 1032 The \CFA keyword @coroutine@ implicitly implements the getter and forward declarations required for implementing the coroutine main: 1033 \begin{cquote} 1034 \begin{tabular}{@{}ccc@{}} 1035 \begin{cfa} 1036 coroutine MyCor { 1037 int value; 1038 1108 1039 }; 1109 \end{cfa} & == & \begin{cfa}[tabsize=3] 1110 struct MyCoroutine { 1111 int someValue; 1112 coroutine_desc __cor; 1040 \end{cfa} 1041 & {\Large $\Rightarrow$} & 1042 \begin{tabular}{@{}ccc@{}} 1043 \begin{cfa} 1044 struct MyCor { 1045 int value; 1046 coroutine_desc cor; 1113 1047 }; 1114 1115 static inline 1116 coroutine_desc* get_coroutine( 1117 struct MyCoroutine& this 1118 ) { 1119 return &this.__cor; 1120 } 1121 1122 void main(struct MyCoroutine* this); 1048 \end{cfa} 1049 & 1050 \begin{cfa} 1051 static inline coroutine_desc * 1052 get_coroutine( MyCor & this ) { 1053 return &this.cor; 1054 } 1055 \end{cfa} 1056 & 1057 \begin{cfa} 1058 void main( MyCor * this ); 1059 1060 1061 1123 1062 \end{cfa} 1124 1063 \end{tabular} 1125 \end{center} 1126 1127 The combination of these two approaches allows users new to coroutining and concurrency to have an easy and concise specification, while more advanced users have tighter control on memory layout and initialization. 1128 1129 \subsection{Thread Interface}\label{threads} 1130 The basic building blocks of multithreading in \CFA are \textbf{cfathread}. 1131 Both user and kernel threads are supported, where user threads are the concurrency mechanism and kernel threads are the parallel mechanism. 1132 User threads offer a flexible and lightweight interface. 1133 A thread can be declared using a struct declaration @thread@ as follows: 1134 1135 \begin{cfa} 1136 thread foo {}; 1137 \end{cfa} 1138 1139 As for coroutines, the keyword is a thin wrapper around a \CFA trait: 1140 1141 \begin{cfa} 1142 trait is_thread(dtype T) { 1143 void ^?{}(T & mutex this); 1144 void main(T & this); 1145 thread_desc* get_thread(T & this); 1064 \end{tabular} 1065 \end{cquote} 1066 The combination of these two approaches allows an easy and concise specification to coroutining (and concurrency) for normal users, while more advanced users have tighter control on memory layout and initialization. 1067 1068 1069 \subsection{Thread Interface} 1070 \label{threads} 1071 1072 Both user and kernel threads are supported, where user threads provide concurrency and kernel threads provide parallelism. 1073 Like coroutines and for the same design reasons, the selected approach for user threads is to use language support by introducing a new kind of aggregate (structure) and a \CFA trait: 1074 \begin{cquote} 1075 \begin{tabular}{@{}c@{\hspace{2\parindentlnth}}c@{}} 1076 \begin{cfa} 1077 thread myThread { 1078 // communication variables 1146 1079 }; 1147 \end{cfa} 1148 1149 Obviously, for this thread implementation to be useful it must run some user code. 1150 Several other threading interfaces use a function-pointer representation as the interface of threads (for example \Csharp~\cite{Csharp} and Scala~\cite{Scala}). 1151 However, this proposal considers that statically tying a @main@ routine to a thread supersedes this approach. 1152 Since the @main@ routine is already a special routine in \CFA (where the program begins), it is a natural extension of the semantics to use overloading to declare mains for different threads (the normal main being the main of the initial thread). 1080 1081 1082 \end{cfa} 1083 & 1084 \begin{cfa} 1085 trait is_thread( dtype T ) { 1086 void main( T & this ); 1087 thread_desc * get_thread( T & this ); 1088 void ^?{}( T & `mutex` this ); 1089 }; 1090 \end{cfa} 1091 \end{tabular} 1092 \end{cquote} 1093 (The qualifier @mutex@ for the destructor parameter is discussed in Section~\ref{s:Monitors}.) 1094 Like a coroutine, the statically-typed @main@ function is the starting point (first stack frame) of a user thread. 1095 The difference is that a coroutine borrows a thread from its caller, so the first thread resuming a coroutine creates an instance of @main@; 1096 whereas, a user thread receives its own thread from the runtime system, which starts in @main@ as some point after the thread constructor is run.\footnote{ 1097 The \lstinline@main@ function is already a special routine in C (where the program begins), so it is a natural extension of the semantics to use overloading to declare mains for different coroutines/threads (the normal main being the main of the initial thread).} 1098 No return value or additional parameters are necessary for this function, because the task type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values. 1099 1100 \begin{comment} % put in appendix with coroutine version ??? 1153 1101 As such the @main@ routine of a thread can be defined as 1154 1102 \begin{cfa} … … 1189 1137 } 1190 1138 \end{cfa} 1191 1192 1139 A consequence of the strongly typed approach to main is that memory layout of parameters and return values to/from a thread are now explicitly specified in the \textbf{api}. 1193 1194 Of course, for threads to be useful, it must be possible to start and stop threads and wait for them to complete execution. 1195 While using an \textbf{api} such as @fork@ and @join@ is relatively common in the literature, such an interface is unnecessary.1196 Indeed, the simplest approach is to use \textbf{raii} principles and have threads @fork@ after the constructor has completed and @join@ before the destructor runs.1197 \begin{cfa} 1198 thread World; 1199 1200 void main( World & this) {1140 \end{comment} 1141 1142 For user threads to be useful, it must be possible to start and stop the underlying thread, and wait for it to complete execution. 1143 While using an API such as @fork@ and @join@ is relatively common, such an interface is awkward and unnecessary. 1144 A simple approach is to use allocation/deallocation principles, and have threads implicitly @fork@ after construction and @join@ before destruction. 1145 \begin{cfa} 1146 thread World {}; 1147 void main( World & this ) { 1201 1148 sout | "World!" | endl; 1202 1149 } 1203 1204 void main() { 1205 World w; 1206 // Thread forks here 1207 1208 // Printing "Hello " and "World!" are run concurrently 1209 sout | "Hello " | endl; 1210 1211 // Implicit join at end of scope 1212 } 1213 \end{cfa} 1214 1215 This semantic has several advantages over explicit semantics: a thread is always started and stopped exactly once, users cannot make any programming errors, and it naturally scales to multiple threads meaning basic synchronization is very simple. 1216 1217 \begin{cfa} 1218 thread MyThread { 1219 //... 1150 int main() { 1151 World w`[10]`; $\C{// implicit forks after creation}$ 1152 sout | "Hello " | endl; $\C{// "Hello " and 10 "World!" printed concurrently}$ 1153 } $\C{// implicit joins before destruction}$ 1154 \end{cfa} 1155 This semantics ensures a thread is started and stopped exactly once, eliminating some programming error, and scales to multiple threads for basic (termination) synchronization. 1156 This tree-structure (lattice) create/delete from C block-structure is generalized by using dynamic allocation, so threads can outlive the scope in which they are created, much like dynamically allocating memory lets objects outlive the scope in which they are created. 1157 \begin{cfa} 1158 int main() { 1159 MyThread * heapLived; 1160 { 1161 MyThread blockLived; $\C{// fork block-based thread}$ 1162 heapLived = `new`( MyThread ); $\C{// fork heap-based thread}$ 1163 ... 1164 } $\C{// join block-based thread}$ 1165 ... 1166 `delete`( heapLived ); $\C{// join heap-based thread}$ 1167 } 1168 \end{cfa} 1169 The heap-based approach allows arbitrary thread-creation topologies, with respect to fork/join-style concurrency. 1170 1171 Figure~\ref{s:ConcurrentMatrixSummation} shows concurrently adding the rows of a matrix and then totalling the subtotals sequential, after all the row threads have terminated. 1172 The program uses heap-based threads because each thread needs different constructor values. 1173 (Python provides a simple iteration mechanism to initialize array elements to different values allowing stack allocation.) 1174 The allocation/deallocation pattern appears unusual because allocated objects are immediately deleted without any intervening code. 1175 However, for threads, the deletion provides implicit synchronization, which is the intervening code. 1176 While the subtotals are added in linear order rather than completion order, which slight inhibits concurrency, the computation is restricted by the critical-path thread (\ie the thread that takes the longest), and so any inhibited concurrency is very small as totalling the subtotals is trivial. 1177 1178 \begin{figure} 1179 \begin{cfa} 1180 thread Adder { 1181 int * row, cols, * subtotal; $\C{// communication}$ 1220 1182 }; 1221 1222 // main 1223 void main(MyThread& this) { 1224 //... 1225 } 1226 1227 void foo() { 1228 MyThread thrds[10]; 1229 // Start 10 threads at the beginning of the scope 1230 1231 DoStuff(); 1232 1233 // Wait for the 10 threads to finish 1234 } 1235 \end{cfa} 1236 1237 However, one of the drawbacks of this approach is that threads always form a tree where nodes must always outlive their children, \ie they are always destroyed in the opposite order of construction because of C scoping rules. 1238 This restriction is relaxed by using dynamic allocation, so threads can outlive the scope in which they are created, much like dynamically allocating memory lets objects outlive the scope in which they are created. 1239 1240 \begin{cfa} 1241 thread MyThread { 1242 //... 1243 }; 1244 1245 void main(MyThread& this) { 1246 //... 1247 } 1248 1249 void foo() { 1250 MyThread* long_lived; 1251 { 1252 // Start a thread at the beginning of the scope 1253 MyThread short_lived; 1254 1255 // create another thread that will outlive the thread in this scope 1256 long_lived = new MyThread; 1257 1258 DoStuff(); 1259 1260 // Wait for the thread short_lived to finish 1261 } 1262 DoMoreStuff(); 1263 1264 // Now wait for the long_lived to finish 1265 delete long_lived; 1266 } 1267 \end{cfa} 1268 1269 1270 % ====================================================================== 1271 % ====================================================================== 1272 \section{Concurrency} 1273 % ====================================================================== 1274 % ====================================================================== 1275 Several tools can be used to solve concurrency challenges. 1276 Since many of these challenges appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared state (Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, Akka (Scala)~\cite{Akka}). 1277 In these paradigms, interaction among concurrent objects relies on message passing~\cite{Thoth,Harmony,V-Kernel} or other paradigms closely relate to networking concepts (channels~\cite{CSP,Go} for example). 1278 However, in languages that use routine calls as their core abstraction mechanism, these approaches force a clear distinction between concurrent and non-concurrent paradigms (\ie message passing versus routine calls). 1279 This distinction in turn means that, in order to be effective, programmers need to learn two sets of design patterns. 1183 void ?{}( Adder & adder, int row[], int cols, int & subtotal ) { 1184 adder.[ row, cols, subtotal ] = [ row, cols, &subtotal ]; 1185 } 1186 void main( Adder & adder ) with( adder ) { 1187 *subtotal = 0; 1188 for ( int c = 0; c < cols; c += 1 ) { 1189 *subtotal += row[c]; 1190 } 1191 } 1192 int main() { 1193 const int rows = 10, cols = 1000; 1194 int matrix[rows][cols], subtotals[rows], total = 0; 1195 // read matrix 1196 Adder * adders[rows]; 1197 for ( int r = 0; r < rows; r += 1 ) { $\C{// start threads to sum rows}$ 1198 adders[r] = new( matrix[r], cols, &subtotals[r] ); 1199 } 1200 for ( int r = 0; r < rows; r += 1 ) { $\C{// wait for threads to finish}$ 1201 delete( adders[r] ); $\C{// termination join}$ 1202 total += subtotals[r]; $\C{// total subtotal}$ 1203 } 1204 sout | total | endl; 1205 } 1206 \end{cfa} 1207 \caption{Concurrent Matrix Summation} 1208 \label{s:ConcurrentMatrixSummation} 1209 \end{figure} 1210 1211 1212 \section{Synchronization / Mutual Exclusion} 1213 1214 Uncontrolled non-deterministic execution is meaningless. 1215 To reestablish meaningful execution requires mechanisms to reintroduce determinism (control non-determinism), called synchronization and mutual exclusion, where \newterm{synchronization} is a timing relationship among threads and \newterm{mutual exclusion} is an access-control mechanism on data shared by threads. 1216 Since many deterministic challenges appear with the use of mutable shared state, some languages/libraries disallow it (Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, Akka~\cite{Akka} (Scala)). 1217 In these paradigms, interaction among concurrent objects is performed by stateless message-passing~\cite{Thoth,Harmony,V-Kernel} or other paradigms closely relate to networking concepts (\eg channels~\cite{CSP,Go}). 1218 However, in call/return-based languages, these approaches force a clear distinction (\ie introduce a new programming paradigm) between non-concurrent and concurrent computation (\ie function call versus message passing). 1219 This distinction means a programmers needs to learn two sets of design patterns. 1280 1220 While this distinction can be hidden away in library code, effective use of the library still has to take both paradigms into account. 1281 1282 Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on basic constructs like routine calls and shared objects. 1283 At the lowest level, concurrent paradigms are implemented as atomic operations and locks. 1284 Many such mechanisms have been proposed, including semaphores~\cite{Dijkstra68b} and path expressions~\cite{Campbell74}. 1285 However, for productivity reasons it is desirable to have a higher-level construct be the core concurrency paradigm~\cite{Hochstein05}. 1286 1287 An approach that is worth mentioning because it is gaining in popularity is transactional memory~\cite{Herlihy93}. 1288 While this approach is even pursued by system languages like \CC~\cite{Cpp-Transactions}, the performance and feature set is currently too restrictive to be the main concurrency paradigm for system languages, which is why it was rejected as the core paradigm for concurrency in \CFA. 1289 1290 One of the most natural, elegant, and efficient mechanisms for synchronization and communication, especially for shared-memory systems, is the \emph{monitor}. 1221 In contrast, approaches based on statefull models more closely resemble the standard call/return programming-model, resulting in a single programming paradigm. 1222 1223 At the lowest level, concurrent control is implemented as atomic operations, upon which different kinds of locks mechanism are constructed, \eg semaphores~\cite{Dijkstra68b} and path expressions~\cite{Campbell74}. 1224 However, for productivity it is always desirable to use the highest-level construct that provides the necessary efficiency~\cite{Hochstein05}. 1225 A newer approach is transactional memory~\cite{Herlihy93}. 1226 While this approach is pursued in hardware~\cite{Nakaike15} and system languages, like \CC~\cite{Cpp-Transactions}, the performance and feature set is still too restrictive to be the main concurrency paradigm for system languages, which is why it was rejected as the core paradigm for concurrency in \CFA. 1227 1228 One of the most natural, elegant, and efficient mechanisms for synchronization and mutual exclusion for shared-memory systems is the \emph{monitor}. 1291 1229 Monitors were first proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}. 1292 Many programming languages ---\eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java}---provide monitors as explicit language constructs.1230 Many programming languages -- \eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java} -- provide monitors as explicit language constructs. 1293 1231 In addition, operating-system kernels and device drivers have a monitor-like structure, although they often use lower-level primitives such as semaphores or locks to simulate monitors. 1294 For these reasons, this project proposes monitors as the core concurrency construct .1232 For these reasons, this project proposes monitors as the core concurrency construct, upon which even higher-level approaches can be easily constructed.. 1295 1233 1296 1234 … … 1329 1267 1330 1268 1331 % ======================================================================1332 % ======================================================================1333 1269 \section{Monitors} 1334 % ====================================================================== 1335 % ====================================================================== 1270 \label{s:Monitors} 1271 1336 1272 A \textbf{monitor} is a set of routines that ensure mutual-exclusion when accessing shared state. 1337 1273 More precisely, a monitor is a programming technique that associates mutual-exclusion to routine scopes, as opposed to mutex locks, where mutual-exclusion is defined by lock/release calls independently of any scoping of the calling routine. … … 2501 2437 Given these building blocks, it is possible to reproduce all three of the popular paradigms. 2502 2438 Indeed, \textbf{uthread} is the default paradigm in \CFA. 2503 However, disabling \textbf{preemption} on the \textbf{cfacluster} means \textbf{cfathread} effectively become \textbf{fiber}.2439 However, disabling \textbf{preemption} on a cluster means threads effectively become fibers. 2504 2440 Since several \textbf{cfacluster} with different scheduling policy can coexist in the same application, this allows \textbf{fiber} and \textbf{uthread} to coexist in the runtime of an application. 2505 2441 Finally, it is possible to build executors for thread pools from \textbf{uthread} or \textbf{fiber}, which includes specialized jobs like actors~\cite{Actors}.
Note: See TracChangeset
for help on using the changeset viewer.