- Timestamp:
- Mar 23, 2018, 9:27:42 AM (7 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, with_gc
- Children:
- 7e419e7
- Parents:
- 84832d87
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/concurrency/Paper.tex
r84832d87 r766309d 484 484 One of the important features that are missing in C is threading\footnote{While the C11 standard defines a ``threads.h'' header, it is minimal and defined as optional. 485 485 As such, library support for threading is far from widespread. 486 At the time of writing the paper, neither \ texttt{gcc} nor \texttt{clang} support ``threads.h'' in their respectivestandard libraries.}.486 At the time of writing the paper, neither \protect\lstinline|gcc| nor \protect\lstinline|clang| support ``threads.h'' in their standard libraries.}. 487 487 On modern architectures, a lack of threading is unacceptable~\cite{Sutter05, Sutter05b}, and therefore modern programming languages must have the proper tools to allow users to write efficient concurrent programs to take advantage of parallelism. 488 488 As an extension of C, \CFA needs to express these concepts in a way that is as natural as possible to programmers familiar with imperative languages. … … 500 500 \begin{figure} 501 501 \begin{center} 502 \begin{tabular}{c|c|c} 503 \begin{cfa} 504 // Using callback 505 void fibonacci_func( 506 int n, 507 void (* callback)( int ) 502 \begin{tabular}{@{}lll@{}} 503 \multicolumn{1}{c}{\textbf{callback}} & \multicolumn{1}{c}{\textbf{output array}} & \multicolumn{1}{c}{\textbf{external state}} \\ 504 \begin{cfa} 505 void fib_func( 506 int n, void (* callback)( int ) 508 507 ) { 509 508 int fn, f1 = 0, f2 = 1; … … 518 517 printf( "%d\n", n ); 519 518 } 520 fib onacci_func( 10, print_fib );519 fib_func( 10, print_fib ); 521 520 } 522 521 … … 524 523 & 525 524 \begin{cfa} 526 // Using output array 527 void fibonacci_array( 528 int n, 529 int * array 525 void fib_array( 526 int n, int * array 530 527 ) { 531 528 int fn, f1 = 0, f2 = 1; … … 538 535 int main() { 539 536 int a[10]; 540 fib onacci_array( 10, a );537 fib_array( 10, a ); 541 538 for ( int i = 0; i < 10; i++ ) { 542 539 printf( "%d\n", a[i] ); … … 546 543 & 547 544 \begin{cfa} 548 // Using external state 549 typedef struct { 550 int f1, f2; 551 } Iterator_t; 552 int fibonacci_state( 553 Iterator_t * it 545 546 typedef struct { int f1, f2; } Fib; 547 int fib_state( 548 Fib * fib 554 549 ) { 555 int ret = it->f1;556 int fn = it->f1 + it->f2;557 it->f2 = it->f1; it->f1 = fn;550 int ret = fib->f1; 551 int fn = fib->f1 + fib->f2; 552 fib->f2 = fib->f1; fib->f1 = fn; 558 553 return ret; 559 554 } 560 555 int main() { 561 Iterator_t it = { 0, 1 }; 556 Fib fib = { 0, 1 }; 557 562 558 for ( int i = 0; i < 10; i++ ) { 563 printf( "%d\n", 564 fibonacci_state( &it ) ); 559 printf( "%d\n", fib_state( &fib ) ); 565 560 } 566 561 } … … 568 563 \end{tabular} 569 564 \end{center} 570 \caption{ C Fibonacci Implementations}571 \label{lst:fib onacci-c}565 \caption{Fibonacci Implementations in C} 566 \label{lst:fib-c} 572 567 \end{figure} 573 568 … … 605 600 int main() { 606 601 Fibonacci f1, f2; 607 for ( int i = 1; i <= 10; i += 1) {602 for ( int i = 1; i <= 10; i++ ) { 608 603 sout | next( f1 ) | next( f2 ) | endl; 609 604 } … … 761 756 symmetric_coroutine<>::yield_type 762 757 \end{cfa} 763 Often, the canonical threading paradigm in languages is based on function pointers, \texttt{pthread}being one of the most well-known examples.758 Often, the canonical threading paradigm in languages is based on function pointers, @pthread@ being one of the most well-known examples. 764 759 The main problem of this approach is that the thread usage is limited to a generic handle that must otherwise be wrapped in a custom type. 765 760 Since the custom type is simple to write in \CFA and solves several issues, added support for routine/lambda based coroutines adds very little. 766 761 767 A variation of this would be to use a simple function pointer in the same way \texttt{pthread}does for threads:762 A variation of this would be to use a simple function pointer in the same way @pthread@ does for threads: 768 763 \begin{cfa} 769 764 void foo( coroutine_t cid, void* arg ) { … … 1439 1434 For the purpose of translating the given cfa-code into \CFA-code, any method of introducing a monitor is acceptable, e.g., @mutex@ parameters, global variables, pointer parameters, or using locals with the @mutex@ statement. 1440 1435 1441 \begin{figure} [!t]1436 \begin{figure} 1442 1437 \begin{multicols}{2} 1443 1438 Waiting thread … … 1664 1659 \hline 1665 1660 \begin{cfa}[tabsize=3] 1666 monitor DatingService 1667 { 1661 monitor DatingService { 1668 1662 // compatibility codes 1669 1663 enum{ CCodes = 20 }; … … 1677 1671 condition exchange; 1678 1672 1679 int girl(int phoneNo, int cfa) 1680 { 1673 int girl(int phoneNo, int cfa) { 1681 1674 // no compatible boy ? 1682 if(empty(boys[cfa])) 1683 { 1684 // wait for boy 1685 wait(girls[cfa]); 1686 1687 // make phone number available 1688 girlPhoneNo = phoneNo; 1689 1690 // wake boy from chair 1691 signal(exchange); 1692 } 1693 else 1694 { 1695 // make phone number available 1696 girlPhoneNo = phoneNo; 1697 1698 // wake boy 1699 signal(boys[cfa]); 1700 1701 // sit in chair 1702 wait(exchange); 1675 if(empty(boys[cfa])) { 1676 wait(girls[cfa]); // wait for boy 1677 girlPhoneNo = phoneNo; // make phone number available 1678 signal(exchange); // wake boy from chair 1679 } else { 1680 girlPhoneNo = phoneNo; // make phone number available 1681 signal(boys[cfa]); // wake boy 1682 wait(exchange); // sit in chair 1703 1683 } 1704 1684 return boyPhoneNo; 1705 1685 } 1706 1707 int boy(int phoneNo, int cfa) 1708 { 1686 int boy(int phoneNo, int cfa) { 1709 1687 // same as above 1710 1688 // with boy/girl interchanged 1711 1689 } 1712 1690 \end{cfa}&\begin{cfa}[tabsize=3] 1713 monitor DatingService 1714 { 1715 // compatibility codes 1716 enum{ CCodes = 20 }; 1691 monitor DatingService { 1692 1693 enum{ CCodes = 20 }; // compatibility codes 1717 1694 1718 1695 int girlPhoneNo; … … 1724 1701 // exchange is not needed 1725 1702 1726 int girl(int phoneNo, int cfa) 1727 { 1703 int girl(int phoneNo, int cfa) { 1728 1704 // no compatible boy ? 1729 if(empty(boys[cfa])) 1730 { 1731 // wait for boy 1732 wait(girls[cfa]); 1733 1734 // make phone number available 1735 girlPhoneNo = phoneNo; 1736 1737 // wake boy from chair 1738 signal(exchange); 1739 } 1740 else 1741 { 1742 // make phone number available 1743 girlPhoneNo = phoneNo; 1744 1745 // wake boy 1746 signal_block(boys[cfa]); 1705 if(empty(boys[cfa])) { 1706 wait(girls[cfa]); // wait for boy 1707 girlPhoneNo = phoneNo; // make phone number available 1708 signal(exchange); // wake boy from chair 1709 } else { 1710 girlPhoneNo = phoneNo; // make phone number available 1711 signal_block(boys[cfa]); // wake boy 1747 1712 1748 1713 // second handshake unnecessary … … 1752 1717 } 1753 1718 1754 int boy(int phoneNo, int cfa) 1755 { 1719 int boy(int phoneNo, int cfa) { 1756 1720 // same as above 1757 1721 // with boy/girl interchanged … … 2071 2035 2072 2036 \begin{figure} 2073 \begin{cfa}[caption={Various correct and incorrect uses of the or, else, and timeout clause around a waitfor statement},label={lst:waitfor2}] 2037 \lstset{language=CFA,deletedelim=**[is][]{`}{`}} 2038 \begin{cfa} 2074 2039 monitor A{}; 2075 2040 … … 2078 2043 2079 2044 void foo( A & mutex a, bool b, int t ) { 2080 // Correct : blocking case 2081 waitfor(f1, a); 2082 2083 // Correct : block with statement 2084 waitfor(f1, a) { 2045 waitfor(f1, a); $\C{// Correct : blocking case}$ 2046 2047 waitfor(f1, a) { $\C{// Correct : block with statement}$ 2085 2048 sout | "f1" | endl; 2086 2049 } 2087 2088 // Correct : block waiting for f1 or f2 2089 waitfor(f1, a) { 2050 waitfor(f1, a) { $\C{// Correct : block waiting for f1 or f2}$ 2090 2051 sout | "f1" | endl; 2091 2052 } or waitfor(f2, a) { 2092 2053 sout | "f2" | endl; 2093 2054 } 2094 2095 // Correct : non-blocking case 2096 waitfor(f1, a); or else; 2097 2098 // Correct : non-blocking case 2099 waitfor(f1, a) { 2055 waitfor(f1, a); or else; $\C{// Correct : non-blocking case}$ 2056 2057 waitfor(f1, a) { $\C{// Correct : non-blocking case}$ 2100 2058 sout | "blocked" | endl; 2101 2059 } or else { 2102 2060 sout | "didn't block" | endl; 2103 2061 } 2104 2105 // Correct : block at most 10 seconds 2106 waitfor(f1, a) { 2062 waitfor(f1, a) { $\C{// Correct : block at most 10 seconds}$ 2107 2063 sout | "blocked" | endl; 2108 2064 } or timeout( 10`s) { 2109 2065 sout | "didn't block" | endl; 2110 2066 } 2111 2112 // Correct : block only if b == true 2113 // if b == false, don't even make the call 2067 // Correct : block only if b == true if b == false, don't even make the call 2114 2068 when(b) waitfor(f1, a); 2115 2069 2116 // Correct : block only if b == true 2117 // if b == false, make non-blocking call 2070 // Correct : block only if b == true if b == false, make non-blocking call 2118 2071 waitfor(f1, a); or when(!b) else; 2119 2072 … … 2124 2077 waitfor(f1, a); or timeout(t); or else; 2125 2078 2126 // Incorrect : order must be 2127 // waitfor [or waitfor... [or timeout] [or else]] 2079 // Incorrect : order must be waitfor [or waitfor... [or timeout] [or else]] 2128 2080 timeout(t); or waitfor(f1, a); or else; 2129 2081 } 2130 2082 \end{cfa} 2083 \caption{Correct and incorrect uses of the or, else, and timeout clause around a waitfor statement} 2084 \label{lst:waitfor2} 2131 2085 \end{figure} 2132 2086 … … 2304 2258 It is important to present the difference between the two acquiring options: \textbf{callsite-locking} and entry-point locking, i.e., acquiring the monitors before making a mutex routine-call or as the first operation of the mutex routine-call. 2305 2259 For example: 2306 \begin{table} [H]2260 \begin{table} 2307 2261 \begin{center} 2308 2262 \begin{tabular}{|c|c|c|} … … 2394 2348 2395 2349 \subsection{Processors} 2396 Parallelism in \CFA is built around using processors to specify how much parallelism is desired. \CFA processors are object wrappers around kernel threads, specifically \texttt{pthread}s in the current implementation of \CFA.2350 Parallelism in \CFA is built around using processors to specify how much parallelism is desired. \CFA processors are object wrappers around kernel threads, specifically @pthread@s in the current implementation of \CFA. 2397 2351 Indeed, any parallelism must go through operating-system libraries. 2398 2352 However, \textbf{uthread} are still the main source of concurrency, processors are simply the underlying source of parallelism. … … 2403 2357 \subsection{Stack Management} 2404 2358 One of the challenges of this system is to reduce the footprint as much as possible. 2405 Specifically, all \texttt{pthread}s created also have a stack created with them, which should be used as much as possible.2359 Specifically, all @pthread@s created also have a stack created with them, which should be used as much as possible. 2406 2360 Normally, coroutines also create their own stack to run on, however, in the case of the coroutines used for processors, these coroutines run directly on the \textbf{kthread} stack, effectively stealing the processor stack. 2407 2361 The exception to this rule is the Main Processor, i.e., the initial \textbf{kthread} that is given to any program. … … 2468 2422 The following figure is the traditional illustration of a monitor (repeated from page~\pageref{fig:ClassicalMonitor} for convenience): 2469 2423 2470 \begin{figure} [H]2424 \begin{figure} 2471 2425 \begin{center} 2472 2426 {\resizebox{0.4\textwidth}{!}{\input{monitor}}} … … 2483 2437 Secondly, the thread waiting on the condition has to be separated across multiple monitors, seen in figure \ref{fig:monitor_cfa}. 2484 2438 2485 \begin{figure} [H]2439 \begin{figure} 2486 2440 \begin{center} 2487 2441 {\resizebox{0.8\textwidth}{!}{\input{int_monitor}}} … … 2497 2451 For a specific signalling operation every monitor needs a piece of thread on its AS-stack. 2498 2452 2499 \begin{figure} [b]2453 \begin{figure} 2500 2454 \begin{multicols}{2} 2501 2455 Entry … … 2533 2487 The data structures used for the AS-stack are reused extensively for external scheduling, but in the case of internal scheduling, the data is allocated using variable-length arrays on the call stack of the @wait@ and @signal_block@ routines. 2534 2488 2535 \begin{figure} [H]2489 \begin{figure} 2536 2490 \begin{center} 2537 2491 {\resizebox{0.8\textwidth}{!}{\input{monitor_structs.pstex_t}}} … … 2682 2636 As it was subtly alluded in section \ref{threads}, @thread@s in \CFA are in fact monitors, which means that all monitor features are available when using threads. 2683 2637 For example, here is a very simple two thread pipeline that could be used for a simulator of a game engine: 2684 \begin{figure} [H]2638 \begin{figure} 2685 2639 \begin{cfa}[caption={Toy simulator using \protect\lstinline|thread|s and \protect\lstinline|monitor|s.},label={lst:engine-v1}] 2686 2640 // Visualization declaration … … 2714 2668 One of the obvious complaints of the previous code snippet (other than its toy-like simplicity) is that it does not handle exit conditions and just goes on forever. 2715 2669 Luckily, the monitor semantics can also be used to clearly enforce a shutdown order in a concise manner: 2716 \begin{figure} [H]2670 \begin{figure} 2717 2671 \begin{cfa}[caption={Same toy simulator with proper termination condition.},label={lst:engine-v2}] 2718 2672 // Visualization declaration … … 2767 2721 However, once clusters are fully implemented, it will be possible to create fibers and \textbf{uthread} in the same system, as in listing \ref{lst:fiber-uthread} 2768 2722 \begin{figure} 2723 \lstset{language=CFA,deletedelim=**[is][]{`}{`}} 2769 2724 \begin{cfa}[caption={Using fibers and \textbf{uthread} side-by-side in \CFA},label={lst:fiber-uthread}] 2770 2725 // Cluster forward declaration … … 2826 2781 Table \ref{tab:machine} shows the characteristics of the machine used to run the benchmarks. 2827 2782 All tests were made on this machine. 2828 \begin{table} [H]2783 \begin{table} 2829 2784 \begin{center} 2830 2785 \begin{tabular}{| l | r | l | r |} … … 3108 3063 \subsection{Object Creation} 3109 3064 Finally, the last benchmark measures the cost of creation for concurrent objects. 3110 Listing \ref{lst:creation} shows the code for \texttt{pthread}s and \CFA threads, with results shown in table \ref{tab:creation}.3065 Listing \ref{lst:creation} shows the code for @pthread@s and \CFA threads, with results shown in table \ref{tab:creation}. 3111 3066 As with all other benchmarks, all omitted tests are functionally identical to one of these tests. 3112 3067 The only note here is that the call stacks of \CFA coroutines are lazily created, therefore without priming the coroutine, the creation cost is very low. … … 3114 3069 \begin{figure} 3115 3070 \begin{center} 3116 \texttt{pthread} 3071 @pthread@ 3117 3072 \begin{cfa} 3118 3073 int main() { … … 3151 3106 \end{cfa} 3152 3107 \end{center} 3153 \ begin{cfa}[caption={Benchmark code for \texttt{pthread}s and \CFA to measure object creation},label={lst:creation}]3154 \ end{cfa}3108 \caption{Benchmark code for \protect\lstinline|pthread|s and \CFA to measure object creation} 3109 \label{lst:creation} 3155 3110 \end{figure} 3156 3111
Note: See TracChangeset
for help on using the changeset viewer.