- Timestamp:
- Mar 23, 2020, 2:19:13 PM (5 years ago)
- Branches:
- ADT, arm-eh, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
- Children:
- 3a3b2b9
- Parents:
- 9640813
- Location:
- doc/theses/thierry_delisle_PhD/comp_II
- Files:
-
- 2 added
- 4 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/thierry_delisle_PhD/comp_II/Makefile
r9640813 r7a0f1d25 22 22 emptybit \ 23 23 emptytree \ 24 emptytls \ 24 25 resize \ 26 system \ 25 27 } 26 28 -
doc/theses/thierry_delisle_PhD/comp_II/comp_II.tex
r9640813 r7a0f1d25 131 131 \begin{figure} 132 132 \begin{center} 133 % {\resizebox{0.8\textwidth}{!}{\input{base}}}134 133 \input{base} 135 134 \end{center} 136 \caption{Relaxed FIFO list at the base of the scheduler: an array of strictly FIFO lists. }135 \caption{Relaxed FIFO list at the base of the scheduler: an array of strictly FIFO lists. The timestamp is in all nodes and cell arrays.} 137 136 \label{fig:base} 138 137 \end{figure} … … 140 139 \begin{figure} 141 140 \begin{center} 142 % {\resizebox{0.8\textwidth}{!}{\input{empty}}}143 141 \input{empty} 144 142 \end{center} … … 199 197 Finally, a third approach is to use dense information, similar to the bitmap, but have each thread keep its own independant copies of it. While this approach can offer good scalability \emph{and} low latency, the livelyness of the information can become a problem. In the simple cases, local copies of which underlying queues are empty can become stale and end-up not being useful for the pop operation. A more serious problem is that reliable information is necessary for some parts of this algorithm to be correct. As mentionned in this section, processors must know \emph{reliably} whether the list is empty or not to decide if they can return \texttt{NULL} or if they must keep looking during a pop operation. Section~\ref{sec:sleep} discusses another case where reliable information is required for the algorithm to be correct. 200 198 199 \begin{figure} 200 \begin{center} 201 {\resizebox{0.8\textwidth}{!}{\input{emptytls}}} 202 \end{center} 203 \caption{``More empty'' queue with added per processor bitmask to indicate which array cells have items.} 204 \label{fig:emptytls} 205 \end{figure} 206 201 207 There is a fundamental tradeoff among these approach. Dense global information about empty underlying queues helps zero-contention cases at the cost of high-contention case. Sparse global information helps high-contention cases but increases latency in zero-contention-cases, to read and ``aggregate'' the information\footnote{Hiearchical structures, e.g., binary search tree, effectively aggregate information but following pointer chains, learning information for each node. Similarly, other sparse schemes would need to read multiple cachelines to acquire all the information needed.}. Finally, dense local information has both the advantages of low latency in zero-contention cases and scalability in high-contention cases, however the information can become stale making it difficult to use to ensure correctness. The fact that these solutions have these fundamental limits suggest to me that a better solution combines these properties in an interesting ways. The lock discussed in Section~\ref{sec:resize} also allows for solutions that adapt to the number of processors, which could also prove useful. 202 208 203 209 \paragraph{Objectives and Existing Work} 204 210 205 How much scalability is actually needed is highly debatable, libfibre\cit has compared favorably to other schedulers in webserver tests\cit and uses a single atomic counter in its scheduling algorithm similarly to the proposed bitmask. As such, the single atomic instruction on a shared cacheline may be sufficiently performant. 206 207 I have built a prototype of this ready-queue (including the bitmask and BMI2 usage, but not the sharded bitmask) and ran performance experiments on it but it is difficult to compare this prototype to a thread scheduler as the prototype is used as a data-queue. I have also integrated this prototype into the \CFA runtime, but have not yet created performance experiments to compare results. I believe that the bitmask approach is currently one of the larger risks of the proposal, early tests lead me to believe it may work but it is not clear that the contention problem can be overcome. The worst-case scenario is a case where the number of processors and the number of ready threads are similar, yet scheduling events are very frequent. Fewer threads should lead to the Idle Sleep mechanism discussed in Section~\ref{sec:sleep} to reduce contention while having many threads ready leads to optimal performance. It is difficult to evaluate the likeliness of this worst-case scenario in real workloads. I believe, frequent scheduling events suggest a more ``bursty'' workload where new work is finely divided among many threads which race to completion. This type of workload would only see a peek of contention close to the end of the work, but no sustained contention. Very fine-grained pipelines are less ``bursty'', these may lead to more sustained contention. However, they could also easily benefit from a direct hand-off strategy which would circumvent the problem entirely. 211 How much scalability is actually needed is highly debatable \emph{libfibre}\cit has compared favorably to other schedulers in webserver tests\cit and uses a single atomic counter in its scheduling algorithm similarly to the proposed bitmask. As such, the single atomic instruction on a shared cacheline may be sufficiently performant. 212 213 I have built a prototype of this ready-queue in the shape of a data-queue, i.e., nodes on the queue are structures with a single int and the intrusive data fields. Using this prototype I ran preliminary performance experiments which confirm the expected performance in Table~\ref{tab:perfcases}. However, these experiments only offer a hint at the actual performance of the scheduler since threads form more complex operations than simple integer nodes, e.g., threads are not independant of each other, when a thread blocks some other thread must intervene to wake it. 214 215 I have also integrated this prototype into the \CFA runtime, but have not yet created performance experiments to compare results. As creating one-to-one comparisons with the prototype will be complex. 208 216 209 217 \subsection{Dynamic Resizing} \label{sec:resize} 210 The \CFA runtime system currently handles dynamically adding and removing processors from clusters at any time. Since this is part of the existing design, the proposed scheduler must also support this behaviour. However, dynamicly resizing the clusters is considered a rare event associated with setup, teardown and major configuration changes. This assumptions is made both in the design of the proposed scheduler as well as in the original design of the \CFA runtime system. As such, the proposed scheduler must honor the correctness of these behaviour but does not have any performance objectives with regards to resizing a cluster. How long adding or removing processors take and how much this disrupts the performance of other threads is considered a secondary concern since it should be amortized over long period of times. However, as mentionned in Section~\ref{sec:queue}, contention on the underlying queues can have a direct impact on performance, the number of underlying queues must therefore be adjusted as the number of processors grows or shrinks. Since the underlying queues are stored in a dense array, changing the number of queues requires resizing the array and therefore moving it. This can introduce memory reclamation problems if not done correctly. 211 212 \begin{figure} 213 \begin{center} 214 % {\resizebox{0.8\textwidth}{!}{\input{resize}}} 218 \begin{figure} 219 \begin{center} 220 \input{system} 221 \end{center} 222 \caption{Global structure of the \CFA runtime system.} 223 \label{fig:system} 224 \end{figure} 225 226 The \CFA runtime system groups processors together as clusters. Threads on a cluster are always scheduled on one of the processors of the cluster. Currently, the runtime handles dynamically adding and removing processors from clusters at any time. Since this is part of the existing design, the proposed scheduler must also support this behaviour. However, dynamicaly resizing a cluster is considered a rare event associated with setup, teardown and major configuration changes. This assumption is made both in the design of the proposed scheduler as well as in the original design of the \CFA runtime system. As such, the proposed scheduler must honor the correctness of these behaviour but does not have any performance objectives with regard to resizing a cluster. How long adding or removing processors take and how much this disrupts the performance of other threads is considered a secondary concern since it should be amortized over long period of times. However, as mentionned in Section~\ref{sec:queue}, contention on the underlying queues can have a direct impact on performance. The number of underlying queues must therefore be adjusted as the number of processors grows or shrinks. Since the underlying queues are stored in a dense array, changing the number of queues requires resizing the array and expanding the array requires moving it. This can introduce memory reclamation problems if not done correctly. 227 228 \begin{figure} 229 \begin{center} 215 230 \input{resize} 216 231 \end{center} 217 \caption{Copy of data structure shown in Figure~\ref{fig:base}. The cells of the array can be modified concurrently but resizing the array, which requires moving it, is not safe to do concurrently. This can also be true of the accompanying data structures used to find non-empty queues.}232 \caption{Copy of data structure shown in Figure~\ref{fig:base}. } 218 233 \label{fig:base2} 219 234 \end{figure} 220 235 221 It is important to note that how the array is used in this case. While the array cells are modified by every push and pop operation, the array itself, i.e., the pointer that would change when resized, is only read during these operations. Therefore the use is this pointer can be described as frequent reads and in frequent writes. This description effectively matches with the description of a Reader-Writer lock, infrequent but invasive updates among frequent read operations. In the case of the Ready-Queue described above, read operations are operations that push or pop from the ready-queue but do not invalidate any references to the ready queue data structures. Writes on the other-hand would add or remove inner queues, invalidating references to the array of inner queues in the process. Therefore, the current proposed approach to this problem is theadd a per-cluster Reader Writer lock around the ready queue to prevent restructuring of the ready-queue data structure while threads are being pushed or popped.222 223 There are possible alternatives to the Reader Writer lock solution. This problem is effectively a memory reclamation problem and as such there is a large body of research on the subject . However, the RWlock solution is simple and can be leveraged to solve other problems (e.g. processor ordering and memory reclamation of threads)which makes it an attractive solution.236 It is important to note how the array is used in this case. While the array cells are modified by every push and pop operation, the array itself, i.e., the pointer that would change when resized, is only read during these operations. Therefore the use of this pointer can be described as frequent reads and infrequent writes. This description effectively matches with the description of a Reader-Writer lock, infrequent but invasive updates among frequent read operations. In the case of the Ready-Queue described above, read operations are operations that push or pop from the ready-queue but do not invalidate any references to the ready queue data structures. Writes on the other-hand would add or remove inner queues, invalidating references to the array of inner queues in the process. Therefore, the current proposed approach to this problem is to add a per-cluster Reader Writer lock around the ready queue to prevent restructuring of the ready-queue data structure while threads are being pushed or popped. 237 238 There are possible alternatives to the Reader Writer lock solution. This problem is effectively a memory reclamation problem and as such there is a large body of research on the subject\cit. However, the RWlock solution is simple and can be leveraged to solve other problems (e.g., processor ordering and memory reclamation of threads), which makes it an attractive solution. 224 239 225 240 \paragraph{Objectives and Existing Work} … … 227 242 228 243 \subsection{Idle Sleep} \label{sec:sleep} 229 As mention ned above, idle sleep is the process of putting processors to sleep while they do not have threads to execute. In this context, processors are kernel-threads and sleeping refers to asking the kernel to block a thread. This can be achieved with either thread synchronization operations like pthread\_cond\_wait or using signal operations like sigsuspend. The goal of putting idle processors to sleep is two-fold, it reduces energy consumption in cases where more idle kernel-threads translate to idle hardware threads, and reduces contention on the ready queue, since the otherwise idle processors generally contend trying to pop items from the queue. Since energy efficiency is a growing concern in many industry sectors\cit, there is not real need to solve the contention problem without using idle sleep.230 231 Support for idle sleep broadly involves calling the operating system to block the kernel thread but also handling the race between the sleeping and the waking up, and handling which kernel thread should sleep or wake-up.232 233 When a processor decides to sleep, there is a race that occurs between it signalling that i t will go to sleep (so other processors can find sleeping processors) and actually blocking the kernel thread. This is equivalent to the classic problem of missing signals when using condition variables, the ``sleepy'' processor indicates that it will sleep but has not yet gone to sleep, if another processor attempts to wake it up, the waking-up operation may claim nothing needs to be done and the signal will have been missed. In cases where threads are scheduled from processors on the current cluster, loosing signals is not necessarily critical, because at least some processors on the cluster are awake. Individual processors always finish scheduling threads before looking for new work, which means that the last processor to go to sleep cannot miss threads scheduled from inside the cluster (if they do, that demonstrates the ready-queue is not linearizable). However, this guarantee does not hold if threads are scheduled from outside the cluster, either due to an external event like timers and I/O, or due to a thread migrating from a different cluster. In this case, missed signals can lead to the cluster deadlocking where it should not\footnote{Clusters ``should'' never deadlock, but for this proposal, cases where \CFA users \emph{actually} wrote \CFA code that leads to a deadlock it is considered as a deadlock that ``should'' happen. }. Therefore, it is important that the scheduling of threads include a mechanism where signals \emph{cannot} be missed. For performance reasons, it can be advantageous to have a secondary mechanism that allows signals to be missed in cases where it cannot lead to a deadlock. To be safe, this process must include a ``handshake'' where it is guaranteed that either~: the sleepy processor notices that a thread was scheduled after it signalled its intent to block or code scheduling threads sees the intent to sleep before scheduling and be able to wake-up the processor. This matter is complicated by the fact that pthreadoffers few tools to implement this solution and offers no guarantee of ordering of threads waking up for most of these tools.234 235 Another issues is trying to avoid kernel sleeping and waking frequently. A possible partial solution is to order the processors so that the one which most recently went to sleep is woken up. This allows other sleeping processors to reach deeper sleep state (when these are available) while keeping ``hot'' processors warmer. Note that while this generally means organising the processors in a stack, I believe that the unique index provided by the ReaderWriter lock can be reused to strictly order the waking order of processors, causing a LIFO like waking order. While a strict LIFO stack is probably better, using the processor index could prove useful and offer a sufficiently LIFO ordering.236 237 Finally, another important aspect of Idle Sleep is when should processors make the decision to sleep and when i t is appropriate for sleeping processors to be woken up. Processors that are unnecessarily awake lead to unnecessary contention and power consumption, while too many sleeping processors can lead to sub-optimal throughput. Furthermore, transitions from sleeping to awake and vice-versa also add unnecessary latency. There is already a wealth of research on the subject and I do not plan to implement a novel ideafor the Idle Sleep heuristic in this project.244 As mentioned, idle sleep is the process of putting processors to sleep when they have no threads to execute. In this context, processors are kernel-threads and sleeping refers to asking the kernel to block a thread. This benefit can be achieved with either thread synchronization operations like pthread\_cond\_wait or using signal operations like sigsuspend. The goal of putting idle processors to sleep is two-fold, it reduces energy consumption in cases where more idle kernel-threads translate to idle hardware threads, and reduces contention on the ready queue, since the otherwise idle processors generally contend trying to pop items from the queue. 245 246 Support for idle sleep broadly involves calling the operating system to block the kernel thread and handling the race between a blocking thread and the waking thread, and handling which kernel thread should sleep or wake-up. 247 248 When a processor decides to sleep, there is a race that occurs between it signalling that is going to sleep (so other processors can find sleeping processors) and actually blocking the kernel thread. This is equivalent to the classic problem of missing signals when using condition variables: the ``sleepy'' processor indicates that it will sleep but has not yet gone to sleep, when another processor attempts to wake it up, the waking-up operation may claim nothing needs to be done and the signal is missed. In cases where threads are scheduled from processors on the current cluster, loosing signals is not necessarily critical, because at least some processors on the cluster are awake and may check for more processors eventually. Individual processors always finish scheduling threads before looking for new work, which means that the last processor to go to sleep cannot miss threads scheduled from inside the cluster (if they do, that demonstrates the ready-queue is not linearizable). However, this guarantee does not hold if threads are scheduled from outside the cluster, either due to an external event like timers and I/O, or due to a thread migrating from a different cluster. In this case, missed signals can lead to the cluster deadlocking where it should not\footnote{Clusters ``should'' never deadlock, but for this proposal, cases where \CFA users \emph{actually} write \CFA code that leads to a deadlock is considered as a deadlock that ``should'' happen. }. Therefore, it is important that the scheduling of threads include a mechanism where signals \emph{cannot} be missed. For performance reasons, it can be advantageous to have a secondary mechanism that allows signals to be missed in cases where it cannot lead to a deadlock. To be safe, this process must include a ``handshake'' where it is guaranteed that either~: the sleepy processor notices that a thread is scheduled after it signalled its intent to block or code scheduling threads sees the intent to sleep before scheduling and be able to wake-up the processor. This matter is complicated by the fact that pthreads offers few tools to implement this solution and offers no guarantee of ordering of threads waking up for most of these tools. 249 250 Another issues is trying to avoid kernel threads sleeping and waking frequently. A possible partial solution is to order the processors so that the one which most recently went to sleep is woken up. This allows other sleeping processors to reach deeper sleep state (when these are available) while keeping ``hot'' processors warmer. Note that while this generally means organising the processors in a stack, I believe that the unique index provided by the ReaderWriter lock can be reused to strictly order the waking order of processors, causing a LIFO like waking order. While a strict LIFO stack is probably better, using the processor index could prove useful and offer a sufficiently LIFO ordering. 251 252 Finally, another important aspect of Idle Sleep is when should processors make the decision to sleep and when is it appropriate for sleeping processors to be woken up. Processors that are unnecessarily unblocked lead to unnecessary contention and power consumption, while too many sleeping processors can lead to sub-optimal throughput. Furthermore, transitions from sleeping to awake and vice-versa also add unnecessary latency. There is already a wealth of research on the subject and I may use an existing approach for the Idle Sleep heuristic in this project. 238 253 239 254 \subsection{Asynchronous I/O} 240 The final aspect of this proposal is asynchronous I/O. Without it, user threads that execute I/O operations will block the underlying kernel thread. This leads to poor throughput, it would be preferrable to block the user-thread and reuse the underlying kernel-thread to run other ready threads. This requires intercepting the user-threads' calls to I/O operations, redirecting them to an asynchronous I/O interface and handling the multiplexing between the synchronous and asynchronous API. As such, these are the three components needed to implemented to supportasynchronous I/O : an OS abstraction layer over the asynchronous interface, an event-engine to (de)multiplex the operations and a synchronous interface for users to use. None of these components currently exist in \CFA and I will need to build all three for this project.255 The final aspect of this proposal is asynchronous I/O. Without it, user threads that execute I/O operations block the underlying kernel thread, which leads to poor throughput. It would be preferrable to block the user-thread and reuse the underlying kernel-thread to run other ready threads. This approach requires intercepting the user-threads' calls to I/O operations, redirecting them to an asynchronous I/O interface and handling the multiplexing between the synchronous and asynchronous API. As such, these are the three components needed to implemented support for asynchronous I/O : an OS abstraction layer over the asynchronous interface, an event-engine to (de)multiplex the operations and a synchronous interface for users to use. None of these components currently exist in \CFA and I will need to build all three for this project. 241 256 242 257 \paragraph{OS Abstraction} 243 One of the fundamental part of converting blocking I/O operations into non-blocking ones is having an underlying asynchronous I/O interface to direct the I/O operations. While there exists many different APIs for asynchronous I/O, it is not part of this proposal to create a novel API, simply to use an existing one that is sufficient. uC++ uses the \texttt{select} as its interface, which handles ttys, pipes and sockets, but not disk. It entails significant complexity and is being replaced which make it a less interesting alternative. Another interface which is becoming popular recently\cit is \texttt{epoll}, which is supposed to be cheaper than \texttt{select}. However, epoll also does not handle file system and seems to have problem to linux pipes and \texttt{TTY}s\cit. A very recent alternative that must still be investigated is \texttt{io\_uring}. It claims to address some of the issues with \texttt{epoll} but is too recent to be confident that it does. Finally, a popular cross-platform alternative is \texttt{libuv}, which offers asynchronous sockets and asynchronous file system operations (among other features). However, as a full-featured library it includes much more than what is needed and could conflict with other features of \CFA unless significant efforts aremade to merge them together.258 One fundamental part for converting blocking I/O operations into non-blocking ones is having an underlying asynchronous I/O interface to direct the I/O operations. While there exists many different APIs for asynchronous I/O, it is not part of this proposal to create a novel API. I plan to use an existing one that is sufficient. uC++ uses the \texttt{select} as its interface, which handles ttys, pipes and sockets, but not disk. It entails significant complexity and is being replaced, which make it a less interesting alternative. Another popular interface is \texttt{epoll}\cit, which is supposed to be cheaper than \texttt{select}. However, epoll also does not handle the file system and seems to have problem to linux pipes and \texttt{TTY}s\cit. A very recent alternative that must still be investigated is \texttt{io\_uring}. It claims to address some of the issues with \texttt{epoll} but is too recent to be confident that it does. Finally, a popular cross-platform alternative is \texttt{libuv}, which offers asynchronous sockets and asynchronous file system operations (among other features). However, as a full-featured library it includes much more than what is needed and could conflict with other features of \CFA unless significant effort is made to merge them together. 244 259 245 260 \paragraph{Event-Engine} 246 Laying on top of the asynchronous interface layer is the event-engine. This engine is responsible for multiplexing (batching) the synchronous I/O requests into an asynchronous I/O request and demultiplexing the results onto appropriate blocked threads. This can be straightforward for the simple cases, but can become quite complex. Decisions that willneed to be made include : whether to poll from a seperate kernel thread or a regularly scheduled user thread, what should be the ordering used when results satisfy many requests, how to handle threads waiting for multiple operations, etc.261 Laying on top of the asynchronous interface layer is the event-engine. This engine is responsible for multiplexing (batching) the synchronous I/O requests into an asynchronous I/O request and demultiplexing the results onto appropriate blocked threads. This step can be straightforward for the simple cases, but can become quite complex. Decisions that need to be made include : whether to poll from a seperate kernel thread or a regularly scheduled user thread, what should be the ordering used when results satisfy many requests, how to handle threads waiting for multiple operations, etc. 247 262 248 263 \paragraph{Interface} 249 Finally, for these components to be available, it is necessary to expose them through a synchronous interface. Th is can be a novel interface but it is preferrable to attempt to intercept the existing POSIX interface in order to be compatible with existing code. Thisallows C programs written using this interface to be transparently converted to \CFA with minimal effeort. Where this is not applicable, a novel interface will be created to fill the gaps.264 Finally, for these components to be available, it is necessary to expose them through a synchronous interface. The interface can be novel but it is preferrable to match the existing POSIX interface in order to be compatible with existing code. Matching allows C programs written using this interface to be transparently converted to \CFA with minimal effeort. Where this is not applicable, a novel interface will be created to fill the gaps. 250 265 251 266 -
doc/theses/thierry_delisle_PhD/comp_II/img/base.fig
r9640813 r7a0f1d25 1 #FIG 3.2 Produced by xfig version 3.2. 5c1 #FIG 3.2 Produced by xfig version 3.2.7a 2 2 Landscape 3 3 Center 4 4 Inches 5 Letter 5 Letter 6 6 100.00 7 7 Single … … 109 109 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 110 110 2400 3900 7200 3900 7200 4500 2400 4500 2400 3900 111 4 2 0 50 -1 0 12 0.0000 2 180 660 2100 4200 Array of\001 112 4 2 0 50 -1 0 12 0.0000 2 165 600 2100 4425 Queues\001 113 4 2 0 50 -1 0 12 0.0000 2 135 645 2100 3075 Threads\001 114 4 2 0 50 -1 0 12 0.0000 2 180 525 2100 2850 Ready\001 111 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 112 2454 2520 2957 2520 113 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 114 2454 3417 2957 3417 115 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 116 2400 4350 3000 4350 117 4 2 0 50 -1 0 12 0.0000 2 165 720 2100 4200 Array of\001 118 4 2 0 50 -1 0 12 0.0000 2 150 540 2100 4425 Queues\001 119 4 2 0 50 -1 0 12 0.0000 2 135 630 2100 3075 Threads\001 120 4 2 0 50 -1 0 12 0.0000 2 165 450 2100 2850 Ready\001 121 4 0 0 50 -1 0 11 0.0000 2 135 180 2595 3561 TS\001 122 4 0 0 50 -1 0 11 0.0000 2 135 180 2595 2665 TS\001 123 4 0 0 50 -1 0 11 0.0000 2 135 180 2595 4479 TS\001 -
doc/theses/thierry_delisle_PhD/comp_II/img/resize.fig
r9640813 r7a0f1d25 1 #FIG 3.2 Produced by xfig version 3.2. 5c1 #FIG 3.2 Produced by xfig version 3.2.7a 2 2 Landscape 3 3 Center 4 4 Inches 5 Letter 5 Letter 6 6 100.00 7 7 Single … … 98 98 1 1 1.00 45.00 90.00 99 99 7500 4200 7950 4200 100 4 0 0 50 -1 0 12 0.0000 2 135 9 157500 3825 Grows with\001101 4 0 0 50 -1 0 12 0.0000 2 135 840 7500 4050 additional\001102 4 0 0 50 -1 0 12 0.0000 2 1 35 840 7500 4425 processors\001100 4 0 0 50 -1 0 12 0.0000 2 135 900 7500 3825 Grows with\001 101 4 0 0 50 -1 0 12 0.0000 2 135 900 7500 4050 additional\001 102 4 0 0 50 -1 0 12 0.0000 2 120 900 7500 4425 processors\001 103 103 -6 104 104 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 … … 118 118 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 119 119 2400 3900 7200 3900 7200 4500 2400 4500 2400 3900 120 4 2 0 50 -1 0 12 0.0000 2 180 660 2100 4200 Array of\001 121 4 2 0 50 -1 0 12 0.0000 2 165 600 2100 4425 Queues\001 122 4 2 0 50 -1 0 12 0.0000 2 135 645 2100 3075 Threads\001 123 4 2 0 50 -1 0 12 0.0000 2 180 525 2100 2850 Ready\001 120 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 121 3600 5100 4500 5100 4500 6300 3600 6300 3600 5100 122 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 3 123 1 1 1.00 60.00 120.00 124 3750 5550 2400 5550 2400 4500 125 4 2 0 50 -1 0 12 0.0000 2 165 720 2100 4200 Array of\001 126 4 2 0 50 -1 0 12 0.0000 2 150 540 2100 4425 Queues\001 127 4 2 0 50 -1 0 12 0.0000 2 135 630 2100 3075 Threads\001 128 4 2 0 50 -1 0 12 0.0000 2 165 450 2100 2850 Ready\001 129 4 0 0 50 -1 0 13 0.0000 2 135 1980 3600 5025 Cluster Data Strcuture\001 130 4 0 0 50 -1 0 13 0.0000 2 165 1170 2400 5775 Array Pointer\001
Note: See TracChangeset
for help on using the changeset viewer.