% ====================================================================== % ====================================================================== \chapter{Conclusion}\label{s:conclusion} % ====================================================================== % ====================================================================== The goal of this thesis is to expand concurrent support in \CFA to fill in gaps and increase support for writing safe and efficient concurrent programs. The presented features achieve this goal and provide users with the means to write scalable concurrent programs in \CFA through multiple avenues. Additionally, the tools presented provide safety and productivity features including: detection of deadlock and other common concurrency errors, easy concurrent shutdown, and toggleable performance statistics. For locking, the mutex statement provides a safe and easy-to-use interface for mutual exclusion. If programmers prefer the message-passing paradigm, \CFA now supports it in the form of channels and actors. The @waituntil@ statement simplifies writing concurrent programs in both the message-passing and shared-memory paradigms of concurrency. Finally, no other programming language provides a synchronous multiplexing tool that is polymorphic over resources like \CFA's @waituntil@. This work successfully provides users with familiar concurrent-language support, but with additional value added over similar utilities in other popular languages. On overview of the contributions made in this thesis include the following: \begin{enumerate} \item The mutex statement, which provides performant and deadlock-free multi-lock acquisition. \item Channels with comparable performance to Go, which have safety and productivity features including deadlock detection and an easy-to-use exception-based channel @close@ routine. \item An in-memory actor system, which achieves the lowest latency message send of systems tested due to the novel copy-queue data structure. \item As well, the actor system has built-in detection of six common actor errors, with excellent performance compared to other systems across all benchmarks. \item A @waituntil@ statement, which tackles the hard problem of allowing a thread to safely wait synchronously for an arbitrary set of concurrent resources. \end{enumerate} The added features are now commonly used to solve concurrent problems in \CFA. The @mutex@ statement sees use across almost all concurrent code in \CFA, as it is the simplest mechanism for providing thread-safe input and output. The channels and the @waituntil@ statement see use in programs where a thread operates as a server or administrator, which accepts and distributes work among channels based on some shared state. When implemented, the polymorphic support of the @waituntil@ statement will see use with the actor system to enable user threads outside the actor system to wait for work to be done or for actors to finish. Finally, the new features are often combined, \eg channels pass pointers to shared memory that may still need mutual exclusion, requiring the @mutex@ statement to be used. From the novel copy-queue data structure in the actor system and the plethora of user-supporting safety features, all these utilities build upon existing concurrent tooling with value added. Performance results verify that each new feature is comparable or better than similar features in other programming languages. All in all, this suite of concurrent tools expands a \CFA programmer's ability to easily write safe and performant multi-threaded programs. \section{Future Work} \subsection{Further Implicit Concurrency} This thesis only scratches the surface of implicit concurrency by providing an actor system. There is room for more implicit concurrency tools in \CFA. User-defined implicit concurrency in the form of annotated loops or recursive concurrent functions exists in other languages and libraries~\cite{uC++,OpenMP}. Similar implicit concurrency mechanisms could be implemented and expanded on in \CFA. Additionally, the problem of automatic parallelism of sequential programs via the compiler is an interesting research space that other languages have approached~\cite{wilson94,haskell:parallel} and could be explored in \CFA. \subsection{Advanced Actor Stealing Heuristics} In this thesis, two basic victim-selection heuristics are chosen when implementing the work-stealing actor-system. Good victim selection is an active area of work-stealing research, especially when taking into account NUMA effects and cache locality~\cite{barghi18,wolke17}. The actor system in \CFA is modular and exploration of other victim-selection heuristics for queue stealing in \CFA could be provided by pluggable modules. Another question in work stealing is: when should a worker thread steal? Work-stealing systems can often be too aggressive when stealing, causing the cost of the steal to be have a negative rather positive effect on performance. Given that thief threads often have cycles to spare, there is room for a more nuanced approaches when stealing. Finally, there is the very difficult problem of blocking and unblocking idle threads for workloads with extreme oscillations in CPU needs. \subsection{Synchronously Multiplexing System Calls} There are many tools that try to synchronously wait for or asynchronously check I/O. Improvements in this area pay dividends in many areas of I/O based programming~\cite{linux:select,linux:poll,linux:epoll,linux:iouring}. Research on improving user-space tools to synchronize over I/O and other system calls is an interesting area to explore in the world of concurrent tooling. Specifically, incorporating I/O into the @waituntil@ to allow a network server to work with multiple kinds of asynchronous I/O interconnects without using tradition event loops. \subsection{Better Atomic Operations} When writing low-level concurrent programs, especially lock/wait-free programs, low-level atomic instructions need to be used. In C, the gcc-builtin atomics~\cite{gcc:atomics} are commonly used, but leave much to be desired. Some of the problems include the following. Archaic and opaque macros often have to be used to ensure that atomic assembly is generated instead of locks. The builtins are polymorphic, but not type safe since they use void pointers. The semantics and safety of these builtins require careful navigation since they require the user to have a deep understanding of concurrent memory-ordering models. Furthermore, these atomics also often require a user to understand how to fence appropriately to ensure correctness. All these problems and more would benefit from language support in \CFA. Adding good language support for atomics is a difficult problem, which if solved well, would allow for easier and safer writing of low-level concurrent code.