Changes in / [25a1cb0:d3aa64f1]
- Location:
- doc/theses/thierry_delisle_PhD/comp_II
- Files:
-
- 3 deleted
- 5 edited
-
Makefile (modified) (5 diffs)
-
comp_II.tex (modified) (30 diffs)
-
img/base.fig (modified) (5 diffs)
-
img/empty.fig (modified) (2 diffs)
-
img/system.fig (modified) (2 diffs)
-
presentation.pdf (deleted)
-
presentation.tex (deleted)
-
presentationstyle.sty (deleted)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/thierry_delisle_PhD/comp_II/Makefile
r25a1cb0 rd3aa64f1 12 12 13 13 ## Define the text source files. 14 15 SOURCES = ${addsuffix .tex, \ 16 comp_II \ 17 } 18 14 19 FIGURES = ${addsuffix .tex, \ 20 base \ 21 empty \ 15 22 emptybit \ 16 23 emptytree \ 17 24 emptytls \ 18 25 resize \ 26 system \ 19 27 } 20 28 21 29 PICTURES = ${addsuffix .pstex, \ 22 base \23 empty \24 system \25 30 } 26 31 … … 32 37 33 38 ## Define the documents that need to be made. 34 all: comp_II.pdf presentation.pdf35 comp_II.pdf: ${FIGURES} ${PICTURES}36 presentation.pdf: presentationstyle.sty base.dark.pstex empty.dark.pstex system.dark.pstex37 39 38 DOCUMENT = comp_II.pdf presentation.pdf40 DOCUMENT = comp_II.pdf 39 41 BASE = ${basename ${DOCUMENT}} 40 42 … … 50 52 # File Dependencies # 51 53 52 %.pdf : build/%.ps | ${Build} 54 ${DOCUMENT} : ${BASE}.ps 53 55 ps2pdf $< 54 56 55 build/%.ps : build/%.dvi | ${Build} 56 dvips $ < -o $@57 ${BASE}.ps : ${BASE}.dvi 58 dvips ${Build}/$< -o $@ 57 59 58 build/%.dvi : %.tex Makefile | ${Build} 60 ${BASE}.dvi : Makefile ${GRAPHS} ${PROGRAMS} ${PICTURES} ${FIGURES} ${SOURCES} \ 61 ${Macros}/common.tex ${Macros}/indexstyle ../../../bibliography/pl.bib \ 62 local.bib glossary.tex | ${Build} 59 63 # Must have *.aux file containing citations for bibtex 60 if [ ! -r ${basename $@}.aux ] ; then ${LaTeX} $ <; fi61 -${BibTeX} ${ basename $@}64 if [ ! -r ${basename $@}.aux ] ; then ${LaTeX} ${basename $@}.tex ; fi 65 -${BibTeX} ${Build}/${basename $@} 62 66 # Some citations reference others so run again to resolve these citations 63 ${LaTeX} $ <64 -${BibTeX} ${ basename $@}67 ${LaTeX} ${basename $@}.tex 68 -${BibTeX} ${Build}/${basename $@} 65 69 # Make index from *.aux entries and input index at end of document 66 -makeglossaries -q -s ${basename $@}.ist${basename $@}70 makeglossaries -q -s ${Build}/${basename $@}.ist ${Build}/${basename $@} 67 71 # Run again to finish citations 68 ${LaTeX} $ <72 ${LaTeX} ${basename $@}.tex 69 73 70 74 ## Define the default recipes. … … 73 77 mkdir -p ${Build} 74 78 75 %.tex : img/%.fig |${Build}79 %.tex : img/%.fig ${Build} 76 80 fig2dev -L eepic $< > ${Build}/$@ 77 81 … … 83 87 fig2dev -L pstex_t -p ${Build}/$@ $< > ${Build}/$@_t 84 88 85 ## pstex with inverted colors86 %.dark.pstex : img/%.fig Makefile | ${Build}87 fig2dev -L pstex $< > ${Build}/$@88 sed -i 's/\/col-1 {0 setgray} bind def/\/col-1 {1 setgray} bind def/g' ${Build}/$@89 sed -i 's/\/col0 {0.000 0.000 0.000 srgb} bind def/\/col0 {1.000 1.000 1.000 srgb} bind def/g' ${Build}/$@90 sed -i 's/\/col7 {1.000 1.000 1.000 srgb} bind def/\/col7 {0.000 0.000 0.000 srgb} bind def/g' ${Build}/$@91 fig2dev -L pstex_t -p ${Build}/$@ $< > ${Build}/$@_t92 93 89 # Local Variables: # 94 90 # compile-command: "make" # -
doc/theses/thierry_delisle_PhD/comp_II/comp_II.tex
r25a1cb0 rd3aa64f1 63 63 It aims to add high-productivity features while maintaining the predictable performance of C. 64 64 As such, concurrency in \CFA\cite{Delisle19} aims to offer simple and safe high-level tools while still allowing performant code. 65 \CFA concurrent code is written in the synchronous programming paradigm but uses \glspl{uthrd} in order to achieve the simplicity and maintainability of synchronous programming without sacrificing the efficiency of asynchronous program ming.65 \CFA concurrent code is written in the synchronous programming paradigm but uses \glspl{uthrd} in order to achieve the simplicity and maintainability of synchronous programming without sacrificing the efficiency of asynchronous programing. 66 66 As such, the \CFA \newterm{scheduler} is a preemptive user-level scheduler that maps \glspl{uthrd} onto \glspl{kthrd}. 67 67 … … 75 75 and the cost of scheduling, \ie deciding which thread to run next among all the threads ready to run. 76 76 \end{enumerate} 77 The first cost is generally constant and fixed\footnote{Affecting the constant context-switch cost is whether it is done in one step, after the scheduling, or in two steps, context-switching to a third fixed thread before scheduling.}, while the scheduling cost can vary based on the system state. 78 Adding multiple \glspl{kthrd} does not fundamentally change the scheduler semantics or requirements, it simply adds new correctness requirements, \ie \newterm{linearizability}\footnote{Meaning, however fast the CPU threads run, there is an equivalent sequential order that gives the same result.}, and a new dimension to performance: scalability, where scheduling cost now also depends on contention. 77 The first cost is generally constant and fixed\footnote{Affecting the constant context-switch cost is whether it is done in one step, after the scheduling, or in two steps, context-switching to a fixed third-thread before scheduling.}, while the scheduling cost can vary based on the system state. 78 Adding multiple \glspl{kthrd} does not fundamentally change the scheduler semantics or requirements, it simply adds new correctness requirements, \ie \newterm{linearizability}\footnote{Meaning however fast the CPU threads run, there is an equivalent sequential order that gives the same result.}, and a new dimension to performance: scalability, where scheduling cost now also depends on contention. 79 79 80 The more threads switch, the more the administration cost of scheduling becomes noticeable. 80 81 It is therefore important to build a scheduler with the lowest possible cost and latency. … … 83 84 While the illusion of simultaneity is easier to reason about, it can break down if the scheduler allows too much unfairness. 84 85 Therefore, the scheduler should offer as much fairness as needed to guarantee eventual progress, but use unfairness to help performance. 85 In practice, threads must wait in turn but there can be advantages to unfair scheduling, similar to the express cashregister at a grocery store.86 In practice, threads must wait in turn but there can be advantages to unfair scheduling, similar to the the express cash-register at a grocery store. 86 87 87 88 The goal of this research is to produce a scheduler that is simple for programmers to understand and offers good performance. … … 92 93 \end{quote} 93 94 94 For a general -purpose scheduler, it is impossible to produce an optimal algorithm as it would require knowledge of the future behaviour of threads.95 As such, scheduling performance is generally either defined by the best -case scenario, \ie a workload to which the scheduler is tailored, or the worst-case scenario, \ie the scheduler behaves no worsethan \emph{X}.95 For a general purpose scheduler, it is impossible to produce an optimal algorithm as it would require knowledge of the future behaviour of threads. 96 As such, scheduling performance is generally either defined by the best case scenario, \ie a workload to which the scheduler is tailored, or the worst case scenario, \ie the scheduler behaves no worst than \emph{X}. 96 97 For this proposal, the performance is evaluated using the second approach to allow \CFA programmers to rely on scheduling performance. 97 98 Because there is no optimal scheduler, ultimately \CFA may allow programmers to write their own scheduler; but that is not the subject of this proposal, which considers only the default scheduler. … … 100 101 To achieve the \CFA scheduling goal includes: 101 102 \begin{enumerate} 102 \item producing a scheduling strategy with sufficient fairness guarantees, 103 \item creating an abstraction layer over the operating system to handle kernel-threads spinning unnecessarily, 104 \item scheduling blocking I/O operations, 105 \item and writing sufficient library tools to allow developers to indirectly use the scheduler, either through tuning knobs or replacing the default scheduler. 103 \item 104 producing a scheduling strategy with sufficient fairness guarantees, 105 \item 106 creating an abstraction layer over the operating system to handle kernel-threads spinning unnecessarily, 107 \item 108 scheduling blocking I/O operations, 109 \item 110 and writing sufficient library tools to allow developers to indirectly use the scheduler, either through tuning knobs or replacing the default scheduler. 106 111 \end{enumerate} 107 112 … … 114 119 \paragraph{Correctness} As with any other concurrent data structure or algorithm, the correctness requirement is paramount. 115 120 The scheduler cannot allow threads to be dropped from the ready queue, \ie scheduled but never run, or be executed multiple times when only being scheduled once. 116 Since \CFA concurrency has no spurious wake up, this definition of correctness also means the scheduler should have no spurious wakeup.121 Since \CFA concurrency has no spurious wakeup, this definition of correctness also means the scheduler should have no spurious wakeup. 117 122 The \CFA scheduler must be correct. 118 123 … … 125 130 The \CFA scheduler should offer good performance for all three metrics. 126 131 127 \paragraph{Fairness} Like performance, this requirement has several aspect s: eventual progress, predictability and performance reliability.132 \paragraph{Fairness} Like performance, this requirement has several aspect : eventual progress, predictability and performance reliability. 128 133 \newterm{Eventual progress} guarantees every scheduled thread is eventually run, \ie prevent starvation. 129 As a hard requirement, the \CFA scheduler must guarantee eventual progress, otherwise the above -mentioned illusion of simultaneous execution is broken and the scheduler becomes much more complex to reason about.130 \newterm{Predictability} and \newterm{reliability} mean similar workloads achieve similar performance and programmer execution intuition is respected.131 For example, a thread that yields aggressively should not run more often th an other tasks.134 As a hard requirement, the \CFA scheduler must guarantee eventual progress, otherwise the above mentioned illusion of simultaneous execution is broken and the scheduler becomes much more complex to reason about. 135 \newterm{Predictability} and \newterm{reliability} means similar workloads achieve similar performance and programmer execution intuition is respected. 136 For example, a thread that yields aggressively should not run more often then other tasks. 132 137 While this is intuitive, it does not hold true for many work-stealing or feedback based schedulers. 133 138 The \CFA scheduler must guarantee eventual progress and should be predictable and offer reliable performance. … … 145 150 146 151 \begin{enumerate} 147 \item Threads live long enough for useful feedback information to be gathered.152 \item Threads live long enough for useful feedback information to be to gathered. 148 153 \item Threads belong to multiple users so fairness across threads is insufficient. 149 154 \end{enumerate} … … 152 157 Since \CFA has the explicit goal of allowing many smaller threads, this can naturally lead to threads with much shorter lifetimes that are only scheduled a few times. 153 158 Scheduling strategies based on feedback cannot be effective in these cases because there is no opportunity to measure the metrics that underlie the algorithm. 154 Note, the problem of \newterm{feedback convergence} (reacting too slowly to scheduling events) is not specific to short -lived threads but can also occur with threads that show drastic changes in scheduling, \eg threads running for long periods of time and then suddenly blocking and unblocking quickly and repeatedly.159 Note, the problem of \newterm{feedback convergence} (reacting too slowly to scheduling events) is not specific to short lived threads but can also occur with threads that show drastic changes in scheduling, \eg threads running for long periods of time and then suddenly blocking and unblocking quickly and repeatedly. 155 160 156 161 In the context of operating systems, these concerns can be overshadowed by a more pressing concern : security. … … 159 164 In the case of the \CFA scheduler, every thread runs in the same user space and is controlled by the same user. 160 165 Fairness across users is therefore a given and it is then possible to safely ignore the possibility that threads are malevolent. 161 This approach allows for a much simpler fairness metric and in this proposal \emph{fairness} is defined as: when multiple threads are cycling through the system, the total ordering of threads being scheduled, \ie pushed onto the ready queue, should not differ much from the total ordering of threads being executed, \ie popped from the readyqueue.166 This approach allows for a much simpler fairness metric and in this proposal \emph{fairness} is defined as: when multiple threads are cycling through the system, the total ordering of threads being scheduled, \ie pushed onto the ready-queue, should not differ much from the total ordering of threads being executed, \ie popped from the ready-queue. 162 167 163 168 Since feedback is not necessarily feasible within the lifetime of all threads and a simple fairness metric can be used, the scheduling strategy proposed for the \CFA runtime does not use per-threads feedback. … … 167 172 Another broad category of schedulers are priority schedulers. 168 173 In these scheduling strategies, threads have priorities and the runtime schedules the threads with the highest priority before scheduling other threads. 169 Threads with equal priority are scheduled using a secondary strategy, often something simple like round robin or FIFO.174 Threads with equal priority are scheduled using a secondary strategy, often something simple like round-robin or FIFO. 170 175 A consequence of priority is that, as long as there is a thread with a higher priority that desires to run, a thread with a lower priority does not run. 171 176 This possible starving of threads can dramatically increase programming complexity since starving threads and priority inversion (prioritizing a lower priority thread) can both lead to serious problems. 172 177 173 178 An important observation is that threads do not need to have explicit priorities for problems to occur. 174 Indeed, any system with multiple ready queues that attempts to exhaust one queue before accessing the other queues, essentially provide implicit priority, which can encounter starvation problems.179 Indeed, any system with multiple ready-queues that attempts to exhaust one queue before accessing the other queues, essentially provide implicit priority, which can encounter starvation problems. 175 180 For example, a popular scheduling strategy that suffers from implicit priorities is work stealing. 176 181 \newterm{Work stealing} is generally presented as follows: … … 186 191 187 192 \subsection{Schedulers without feedback or priorities} 188 This proposal conjectures that i tis possible to construct a default scheduler for the \CFA runtime that offers good scalability and a simple fairness guarantee that is easy for programmers to reason about.193 This proposal conjectures that is is possible to construct a default scheduler for the \CFA runtime that offers good scalability and a simple fairness guarantee that is easy for programmers to reason about. 189 194 The simplest fairness guarantee is FIFO ordering, \ie threads scheduled first run first. 190 195 However, enforcing FIFO ordering generally conflicts with scalability across multiple processors because of the additional synchronization. … … 212 217 Pushing new data is done by selecting one of these underlying queues at random, recording a timestamp for the operation and pushing to the selected queue. 213 218 Popping is done by selecting two queues at random and popping from the queue with the oldest timestamp. 214 A higher number of underlying queues lead to less contention on each queue and therefore better performance.219 A higher number of underlying queues leads to less contention on each queue and therefore better performance. 215 220 In a loaded system, it is highly likely the queues are non-empty, \ie several tasks are on each of the underlying queues. 216 221 This means that selecting a queue at random to pop from is highly likely to yield a queue with available items. … … 219 224 \begin{figure} 220 225 \begin{center} 221 \input{base .pstex_t}226 \input{base} 222 227 \end{center} 223 228 \caption{Relaxed FIFO list at the base of the scheduler: an array of strictly FIFO lists. 224 The timestamp is in all nodes and cell arrays.}229 The timestamp is in all nodes and cell arrays.} 225 230 \label{fig:base} 226 231 \end{figure} … … 228 233 \begin{figure} 229 234 \begin{center} 230 \input{empty .pstex_t}235 \input{empty} 231 236 \end{center} 232 237 \caption{``More empty'' state of the queue: the array contains many empty cells.} … … 240 245 Overall performance is therefore influenced by the contention on the underlying queues and pop performance is influenced by the item density. 241 246 242 This leads to four performance cases for the centralized ready queue, as depicted in Table~\ref{tab:perfcases}.247 This leads to four performance cases for the centralized ready-queue, as depicted in Table~\ref{tab:perfcases}. 243 248 The number of processors (many or few) refers to the number of kernel threads \emph{actively} attempting to pop user threads from the queues, not the total number of kernel threads. 244 249 The number of threads (many or few) refers to the number of user threads ready to be run. … … 316 321 Sparse global information helps high-contention cases but increases latency in zero-contention-cases, to read and ``aggregate'' the information\footnote{Hierarchical structures, \eg binary search tree, effectively aggregate information but follow pointer chains, learning information at each node. 317 322 Similarly, other sparse schemes need to read multiple cachelines to acquire all the information needed.}. 318 Finally, dense local information has both the advantages of low latency in zero-contention cases and scalability in high-contention cases . However the information can become stale making it difficult to use to ensure correctness.319 The fact that these solutions have these fundamental limits suggest to me a better solution that attempts to combine these properties in an interesting way .323 Finally, dense local information has both the advantages of low latency in zero-contention cases and scalability in high-contention cases, however the information can become stale making it difficult to use to ensure correctness. 324 The fact that these solutions have these fundamental limits suggest to me a better solution that attempts to combine these properties in an interesting ways. 320 325 Also, the lock discussed in Section~\ref{sec:resize} allows for solutions that adapt to the number of processors, which could also prove useful. 321 326 … … 323 328 324 329 How much scalability is actually needed is highly debatable. 325 \emph{libfibre}\cite{libfibre} has compared favo urably to other schedulers in webserver tests\cite{Karsten20} and uses a single atomic counter in its scheduling algorithm similarly to the proposed bitmask.330 \emph{libfibre}\cite{libfibre} has compared favorably to other schedulers in webserver tests\cite{Karsten20} and uses a single atomic counter in its scheduling algorithm similarly to the proposed bitmask. 326 331 As such, the single atomic instruction on a shared cacheline may be sufficiently performant. 327 332 328 333 I have built a prototype of this ready queue in the shape of a data queue, \ie nodes on the queue are structures with a single int representing a thread and intrusive data fields. 329 Using this prototype ,I ran preliminary performance experiments that confirm the expected performance in Table~\ref{tab:perfcases}.334 Using this prototype I ran preliminary performance experiments that confirm the expected performance in Table~\ref{tab:perfcases}. 330 335 However, these experiments only offer a hint at the actual performance of the scheduler since threads form more complex operations than simple integer nodes, \eg threads are not independent of each other, when a thread blocks some other thread must intervene to wake it. 331 336 … … 336 341 \begin{figure} 337 342 \begin{center} 338 \input{system .pstex_t}343 \input{system} 339 344 \end{center} 340 345 \caption{Global structure of the \CFA runtime system.} … … 349 354 This assumption is made both in the design of the proposed scheduler as well as in the original design of the \CFA runtime system. 350 355 As such, the proposed scheduler must honour the correctness of this behaviour but does not have any performance objectives with regard to resizing a cluster. 351 How long adding or removing processors take and how much this disrupts the performance of other threads is considered a secondary concern since it should be amortized over long period sof times.356 How long adding or removing processors take and how much this disrupts the performance of other threads is considered a secondary concern since it should be amortized over long period of times. 352 357 However, as mentioned in Section~\ref{sec:queue}, contention on the underlying queues can have a direct impact on performance. 353 358 The number of underlying queues must therefore be adjusted as the number of processors grows or shrinks. … … 367 372 This description effectively matches with the description of a reader-writer lock, infrequent but invasive updates among frequent read operations. 368 373 In the case of the ready queue described above, read operations are operations that push or pop from the ready queue but do not invalidate any references to the ready queue data structures. 369 Writes , on the other hand,would add or remove inner queues, invalidating references to the array of inner queues in a process.374 Writes on the other hand would add or remove inner queues, invalidating references to the array of inner queues in a process. 370 375 Therefore, the current proposed approach to this problem is to add a per-cluster reader-writer lock around the ready queue to prevent restructuring of the ready-queue data-structure while threads are being pushed or popped. 371 376 … … 375 380 376 381 \paragraph{Objectives and Existing Work} 377 The lock must offer scalability and performance on par with the actual ready queue in order not to introduce a new bottleneck.382 The lock must offer scalability and performance on par with the actual ready-queue in order not to introduce a new bottleneck. 378 383 I have already built a lock that fits the desired requirements and preliminary testing show scalability and performance that exceed the target. 379 384 As such, I do not consider this lock to be a risk for this project. … … 391 396 give back unneeded CPU time associated with a process to other user processors executing on the computer, 392 397 \item 393 and reduce energy consumption in cases where more idle kernel-threads translate into idle CPUs, which can cycle down.398 and reduce energy consumption in cases where more idle kernel-threads translate to idle CPUs, which can cycle down. 394 399 \end{enumerate} 395 400 Support for idle sleep broadly involves calling the operating system to block the kernel thread and handling the race between a blocking thread and the waking thread, and handling which kernel thread should sleep or wake up. … … 398 403 This operation is equivalent to the classic problem of missing signals when using condition variables: the ``sleepy'' processor indicates its intention to block but has not yet gone to sleep when another processor attempts to wake it up. 399 404 The waking-up operation sees the blocked process and signals it, but the blocking process is racing to sleep so the signal is missed. 400 In cases where kernel threads are managed as processors on the current cluster, lo sing signals is not necessarily critical, because at least some processors on the cluster are awake and may check for more processors eventually.405 In cases where kernel threads are managed as processors on the current cluster, loosing signals is not necessarily critical, because at least some processors on the cluster are awake and may check for more processors eventually. 401 406 Individual processors always finish scheduling user threads before looking for new work, which means that the last processor to go to sleep cannot miss threads scheduled from inside the cluster (if they do, that demonstrates the ready queue is not linearizable). 402 407 However, this guarantee does not hold if threads are scheduled from outside the cluster, either due to an external event like timers and I/O, or due to a user (or kernel) thread migrating from a different cluster. … … 408 413 409 414 Another important issue is avoiding kernel threads sleeping and waking frequently because there is a significant operating-system cost. 410 This scenario happens when a program oscillates between high and low activity, needing most and then fewerprocessors.415 This scenario happens when a program oscillates between high and low activity, needing most and then less processors. 411 416 A possible partial solution is to order the processors so that the one which most recently went to sleep is woken up. 412 417 This allows other sleeping processors to reach deeper sleep state (when these are available) while keeping ``hot'' processors warmer. … … 415 420 416 421 A final important aspect of idle sleep is when should processors make the decision to sleep and when is it appropriate for sleeping processors to be woken up. 417 Processors that are unnecessarily unblocked lead to unnecessary contention, CPU usage, and power consumption, while too many sleeping processors can lead to sub optimal throughput.418 Furthermore, transitions from sleeping to awake and vice versa also add unnecessary latency.422 Processors that are unnecessarily unblocked lead to unnecessary contention, CPU usage, and power consumption, while too many sleeping processors can lead to sub-optimal throughput. 423 Furthermore, transitions from sleeping to awake and vice-versa also add unnecessary latency. 419 424 There is already a wealth of research on the subject\cite{schillings1996engineering, wiki:thunderherd} and I may use an existing approach for the idle-sleep heuristic in this project, \eg\cite{Karsten20}. 420 425 … … 425 430 It is preferable to block the user thread performing the I/O and reuse the underlying kernel-thread to run other ready user threads. 426 431 This approach requires intercepting user-thread calls to I/O operations, redirecting them to an asynchronous I/O interface, and handling the multiplexing/demultiplexing between the synchronous and asynchronous API. 427 As such, there are three components needed to implement support for asynchronous I/O:432 As such, there are three components needed to implemented support for asynchronous I/O: 428 433 \begin{enumerate} 429 434 \item … … 441 446 It is sufficient to make one work in the complex context of the \CFA runtime. 442 447 \uC uses the $select$\cite{select} as its interface, which handles ttys, pipes and sockets, but not disk. 443 $select$ entails significant complexity and is being replaced in UNIX operating systems, which make it a less interesting alternative.448 $select$ entails significant complexity and is being replaced in UNIX operating-systems, which make it a less interesting alternative. 444 449 Another popular interface is $epoll$\cite{epoll}, which is supposed to be cheaper than $select$. 445 However, $epoll$ also does not handle the file system and anec dotal evidence suggest it has problems with Linux pipes and $TTY$s.450 However, $epoll$ also does not handle the file system and anectodal evidence suggest it has problem with linux pipes and $TTY$s. 446 451 A popular cross-platform alternative is $libuv$\cite{libuv}, which offers asynchronous sockets and asynchronous file system operations (among other features). 447 452 However, as a full-featured library it includes much more than I need and could conflict with other features of \CFA unless significant effort is made to merge them together. 448 453 A very recent alternative that I am investigating is $io_uring$\cite{io_uring}. 449 It claims to address some of the issues with $epoll$ and my early investigating suggest sthat the claim is accurate.450 $io_uring$ uses a much more general approach where system calls are register edto a queue and later executed by the kernel, rather than relying on system calls to return an error instead of blocking and subsequently waiting for changes on file descriptors.454 It claims to address some of the issues with $epoll$ and my early investigating suggest that the claim is accurate. 455 $io_uring$ uses a much more general approach where system calls are register to a queue and later executed by the kernel, rather than relying on system calls to return an error instead of blocking and subsequently waiting for changes on file descriptors. 451 456 I believe this approach allows for fewer problems, \eg the manpage for $open$\cite{open} states: 452 457 \begin{quote} 453 Note that [the $O_NONBLOCK$ flag] has no effect for regular files and block devices;454 that is, I/O operations will (briefly) block when device activity is required, regardless of whether $O_NONBLOCK$ is set.455 Since $O_NONBLOCK$ semantics might eventually be implemented, applications should not depend upon blocking behaviour when specifying this flag for regular files and block devices.458 Note that [the $O_NONBLOCK$ flag] has no effect for regular files and block devices; 459 that is, I/O operations will (briefly) block when device activity is required, regardless of whether $O_NONBLOCK$ is set. 460 Since $O_NONBLOCK$ semantics might eventually be implemented, applications should not depend upon blocking behavior when specifying this flag for regular files and block devices. 456 461 \end{quote} 457 462 This makes approach based on $epoll$/$select$ less reliable since they may not work for every file descriptors. 458 For this reason, I plan to use $io_uring$ as the OS abstraction for the \CFA runtime unless further work shows problems I haven't encountered yet.463 For this reason, I plan to use $io_uring$ as the OS abstraction for the \CFA runtime, unless further work shows problems I haven't encountered yet. 459 464 However, only a small subset of the features are available in Ubuntu as of April 2020\cite{wiki:ubuntu-linux}, which will limit performance comparisons. 460 465 I do not believe this will affect the comparison result. … … 485 490 \section{Discussion} 486 491 I believe that runtime system and scheduling are still open topics. 487 Many ``state of the art'' production frameworks still use single -threaded event loops because of performance considerations, \eg \cite{nginx-design}, and, to my knowledge, no widelyavailable system language offers modern threading facilities.492 Many ``state of the art'' production frameworks still use single threaded event-loops because of performance considerations, \eg \cite{nginx-design}, and, to my knowledge, no wideyl available system language offers modern threading facilities. 488 493 I believe the proposed work offers a novel runtime and scheduling package, where existing work only offers fragments that users must assemble themselves when possible. 489 494 … … 496 501 \hline November 2020 & March 2021 & Completion of the implementation. \\ 497 502 \hline March 2021 & April 2021 & Final Performance experiments. \\ 498 \hline May 2021 & August 2021 & Thesis writing and defen ce. \\503 \hline May 2021 & August 2021 & Thesis writing and defense. \\ 499 504 \hline 500 505 \end{tabular} -
doc/theses/thierry_delisle_PhD/comp_II/img/base.fig
r25a1cb0 rd3aa64f1 1 #FIG 3.2 Produced by xfig version 3.2. 7b1 #FIG 3.2 Produced by xfig version 3.2.5c 2 2 Landscape 3 3 Center 4 4 Inches 5 Letter 5 Letter 6 6 100.00 7 7 Single … … 13 13 1 3 0 1 0 0 50 -1 20 0.000 1 0.0000 6975 4200 20 20 6975 4200 6995 4200 14 14 -6 15 6 2400 2100 3000 2700 16 1 3 0 1 0 7 50 -1 -1 0.000 1 0.0000 2700 2400 300 300 2700 2400 3000 2400 17 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 18 2400 2475 3000 2475 19 4 1 0 50 -1 0 11 0.0000 2 120 210 2700 2650 TS\001 20 -6 21 6 2400 3000 3000 3600 22 1 3 0 1 0 7 50 -1 -1 0.000 1 0.0000 2700 3300 300 300 2700 3300 3000 3300 23 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 24 2400 3375 3000 3375 25 4 1 0 50 -1 0 11 0.0000 2 120 210 2700 3550 TS\001 26 -6 15 27 1 3 0 1 0 7 50 -1 -1 0.000 1 0.0000 3900 2400 300 300 3900 2400 4200 2400 16 28 1 3 0 1 0 7 50 -1 -1 0.000 1 0.0000 3900 3300 300 300 3900 3300 4200 3300 … … 21 33 1 3 0 1 0 7 50 -1 -1 0.000 1 0.0000 6300 3300 300 300 6300 3300 6600 3300 22 34 1 3 0 1 0 7 50 -1 -1 0.000 1 0.0000 4509 3302 300 300 4509 3302 4809 3302 23 1 3 0 1 0 7 50 -1 -1 0.000 1 0.0000 2700 3300 300 300 2700 3300 3000 330024 1 3 0 1 0 7 50 -1 -1 0.000 1 0.0000 2700 2400 300 300 2700 2400 3000 240025 35 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 26 36 3000 3900 3000 4500 … … 63 73 1 1 1.00 45.00 90.00 64 74 6300 4200 6300 3600 65 2 1 0 1 -17 50 -1 -1 0.000 0 0 -1 1 0 275 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2 66 76 1 1 1.00 45.00 90.00 67 77 2700 4200 2700 3600 … … 71 81 1 1 1.00 45.00 90.00 72 82 4500 4200 4500 3600 73 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 74 2400 3375 3000 3375 75 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 76 2400 2475 3000 2475 77 4 2 -1 50 -1 0 12 0.0000 2 135 630 2100 3075 Threads\001 78 4 2 -1 50 -1 0 12 0.0000 2 165 450 2100 2850 Ready\001 79 4 1 -1 50 -1 0 11 0.0000 2 135 180 2700 4450 TS\001 80 4 2 -1 50 -1 0 12 0.0000 2 165 720 2100 4200 Array of\001 81 4 2 -1 50 -1 0 12 0.0000 2 150 540 2100 4425 Queues\001 82 4 1 -1 50 -1 0 11 0.0000 2 135 180 2700 3550 TS\001 83 4 1 -1 50 -1 0 11 0.0000 2 135 180 2700 2650 TS\001 83 4 2 0 50 -1 0 12 0.0000 2 180 660 2100 4200 Array of\001 84 4 2 0 50 -1 0 12 0.0000 2 165 600 2100 4425 Queues\001 85 4 2 0 50 -1 0 12 0.0000 2 135 645 2100 3075 Threads\001 86 4 2 0 50 -1 0 12 0.0000 2 180 525 2100 2850 Ready\001 87 4 1 0 50 -1 0 11 0.0000 2 120 210 2700 4450 TS\001 -
doc/theses/thierry_delisle_PhD/comp_II/img/empty.fig
r25a1cb0 rd3aa64f1 1 #FIG 3.2 Produced by xfig version 3.2. 7b1 #FIG 3.2 Produced by xfig version 3.2.5c 2 2 Landscape 3 3 Center 4 4 Inches 5 Letter 5 Letter 6 6 100.00 7 7 Single … … 41 41 1 1 1.00 45.00 90.00 42 42 2700 4200 2700 3600 43 4 2 -1 50 -1 0 12 0.0000 2 165 720 2100 4200 Array of\00144 4 2 -1 50 -1 0 12 0.0000 2 150 540 2100 4425 Queues\00145 4 2 -1 50 -1 0 12 0.0000 2 135 6302100 3075 Threads\00146 4 2 -1 50 -1 0 12 0.0000 2 165 4502100 2850 Ready\00143 4 2 0 50 -1 0 12 0.0000 2 180 660 2100 4200 Array of\001 44 4 2 0 50 -1 0 12 0.0000 2 165 600 2100 4425 Queues\001 45 4 2 0 50 -1 0 12 0.0000 2 135 645 2100 3075 Threads\001 46 4 2 0 50 -1 0 12 0.0000 2 180 525 2100 2850 Ready\001 -
doc/theses/thierry_delisle_PhD/comp_II/img/system.fig
r25a1cb0 rd3aa64f1 3 3 Center 4 4 Inches 5 Letter 5 Letter 6 6 100.00 7 7 Single … … 165 165 4 1 -1 0 0 0 10 0.0000 2 105 990 2175 3525 Discrete-event\001 166 166 4 1 -1 0 0 0 10 0.0000 2 135 795 2175 4350 preemption\001 167 4 0 -1 0 0 0 10 0.0000 2 150 1290 2325 4875 gen erator/coroutine\001167 4 0 -1 0 0 0 10 0.0000 2 150 1290 2325 4875 genrator/coroutine\001 168 168 4 0 -1 0 0 0 10 0.0000 2 120 270 4050 4875 task\001 169 169 4 0 -1 0 0 0 10 0.0000 2 105 450 7050 4875 cluster\001
Note:
See TracChangeset
for help on using the changeset viewer.