Changeset 9f10d1f2 for doc/proposals/concurrency/text/internals.tex
- Timestamp:
- Nov 23, 2017, 1:31:43 PM (7 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
- Children:
- 88ef2af
- Parents:
- 07c1e595
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/proposals/concurrency/text/internals.tex
r07c1e595 r9f10d1f2 1 1 2 2 \chapter{Behind the scene} 3 There are several challenges specific to \CFA when implementing concurrency. These challenges are a direct result of \gls{bulk-acq} and loose object-definitions. These two constraints are the root cause of most design decisions in the implementation. Furthermore, to avoid contention from dynamically allocating memory in a concurrent environment, the internal-scheduling design is (almost) entirely free of mallocs. This is to avoid the chicken and egg problem 4 5 The main memory concern for concurrency is queues. All blocking operations are made by parking threads onto queues. The queue design needs to be intrusive \cite{IntrusiveData} to avoid the need for memory allocation, which entails that all the nodes need specific fields to keep track of all needed information. Since many concurrency operations can use an unbound amount of memory (depending on \gls{bulk-acq}), statically defining information in the intrusive fields of threads is insufficient. The only variable sized container that does not require memory allocation is the call-stack, which is heavily used in the implementation of internal scheduling. Particularly variable length arrays, which are used extensively.3 There are several challenges specific to \CFA when implementing concurrency. These challenges are a direct result of \gls{bulk-acq} and loose object-definitions. These two constraints are the root cause of most design decisions in the implementation. Furthermore, to avoid contention from dynamically allocating memory in a concurrent environment, the internal-scheduling design is (almost) entirely free of mallocs. This is to avoid the chicken and egg problem~\cite{Chicken} of having a memory allocator that relies on the threading system and a threading system that relies on the runtime. This extra goal, means that memory management is a constant concern in the design of the system. 4 5 The main memory concern for concurrency is queues. All blocking operations are made by parking threads onto queues. The queue design needs to be intrusive~\cite{IntrusiveData} to avoid the need for memory allocation, which entails that all the nodes need specific fields to keep track of all needed information. Since many concurrency operations can use an unbound amount of memory (depending on \gls{bulk-acq}), statically defining information in the intrusive fields of threads is insufficient. The only variable sized container that does not require memory allocation is the call-stack, which is heavily used in the implementation of internal scheduling. Particularly variable length arrays, which are used extensively. 6 6 7 7 Since stack allocation is based around scope, the first step of the implementation is to identify the scopes that are available to store the information, and which of these can have a variable length. The threads and the condition both allow a fixed amount of memory to be stored, while mutex-routines and the actual blocking call allow for an unbound amount (though the later is preferable in terms of performance). … … 15 15 % ====================================================================== 16 16 17 The first step towards the monitor implementation is simple mutex-routines using monitors. In the single monitor case, this is done using the entry/exit procedure highlighted in listing \ref{lst:entry1}. This entry/exit procedure does not actually have to be extended to support multiple monitors, indeed it is sufficient to enter/leave monitors one-by-one as long as the order is correct to prevent deadlocks \cite{Havender68}. In \CFA, ordering of monitor relies on memory ordering, this is sufficient because all objects are guaranteed to have distinct non-overlapping memory layouts and mutual-exclusion for a monitor is only defined for its lifetime, meaning that destroying a monitor while it is acquired is undefined behavior. When a mutex call is made, the concerned monitors are aggregated into a variable-length pointer array and sorted based on pointer values. This array persists for the entire duration of the mutual-exclusion and its ordering reused extensively.17 The first step towards the monitor implementation is simple mutex-routines using monitors. In the single monitor case, this is done using the entry/exit procedure highlighted in listing \ref{lst:entry1}. This entry/exit procedure does not actually have to be extended to support multiple monitors, indeed it is sufficient to enter/leave monitors one-by-one as long as the order is correct to prevent deadlocks~\cite{Havender68}. In \CFA, ordering of monitor relies on memory ordering, this is sufficient because all objects are guaranteed to have distinct non-overlapping memory layouts and mutual-exclusion for a monitor is only defined for its lifetime, meaning that destroying a monitor while it is acquired is Undefined Behavior. When a mutex call is made, the concerned monitors are aggregated into a variable-length pointer array and sorted based on pointer values. This array persists for the entire duration of the mutual-exclusion and its ordering reused extensively. 18 18 \begin{figure} 19 19 \begin{multicols}{2} … … 170 170 This picture has several components, the two most important being the entry-queue and the AS-stack. The entry-queue is an (almost) FIFO list where threads waiting to enter are parked, while the acceptor-signaler (AS) stack is a FILO list used for threads that have been signalled or otherwise marked as running next. 171 171 172 For \CFA, this picture does not have support for blocking multiple monitors on a single condition. To support \gls{bulk-acq} two changes to this picture are required. First, it is non longer helpful to attach the condition to a singlemonitor. Secondly, the thread waiting on the conditions has to be separated multiple monitors, which yields :172 For \CFA, this picture does not have support for blocking multiple monitors on a single condition. To support \gls{bulk-acq} two changes to this picture are required. First, it is non longer helpful to attach the condition to \emph{a single} monitor. Secondly, the thread waiting on the conditions has to be separated multiple monitors, which yields : 173 173 174 174 \begin{figure}[H]
Note: See TracChangeset
for help on using the changeset viewer.