Ignore:
Timestamp:
Feb 16, 2021, 1:32:24 PM (3 years ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, arm-eh, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
feacef9
Parents:
14533d4 (diff), 1830a86 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:
doc/theses/thierry_delisle_PhD/thesis
Files:
3 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/thierry_delisle_PhD/thesis/Makefile

    r14533d4 rf6664bf2  
    88BibTeX = BIBINPUTS=${TeXLIB} && export BIBINPUTS && bibtex
    99
    10 MAKEFLAGS = --no-print-directory --silent
     10MAKEFLAGS = --no-print-directory # --silent
    1111VPATH = ${Build} ${Figures}
    1212
     
    5252# Directives #
    5353
     54.NOTPARALLEL:                                           # cannot make in parallel
     55
    5456.PHONY : all clean                                      # not file names
    5557
     
    8385        ${LaTeX} $<
    8486
    85 build/fairness.svg : fig/fairness.py | ${Build}
    86         python3 $< $@
    87 
    8887## Define the default recipes.
    8988
     
    107106        sed -i 's/$@/${Build}\/$@/g' ${Build}/$@_t
    108107
    109 build/fairness.svg: fig/fairness.py | ${Build}
    110         python3 fig/fairness.py build/fairness.svg
     108build/fairness.svg : fig/fairness.py | ${Build}
     109        python3 $< $@
    111110
    112111## pstex with inverted colors
  • doc/theses/thierry_delisle_PhD/thesis/text/io.tex

    r14533d4 rf6664bf2  
    11\chapter{User Level \io}
    2 As mentionned in Section~\ref{prev:io}, User-Level \io requires multiplexing the \io operations of many \glspl{thrd} onto fewer \glspl{proc} using asynchronous \io operations. Various operating systems offer various forms of asynchronous operations and as mentioned in Chapter~\ref{intro}, this work is exclusively focuesd on Linux.
     2As mentioned in Section~\ref{prev:io}, User-Level \io requires multiplexing the \io operations of many \glspl{thrd} onto fewer \glspl{proc} using asynchronous \io operations. Different operating systems offer various forms of asynchronous operations and as mentioned in Chapter~\ref{intro}, this work is exclusively focused on the Linux operating-system.
    33
    44\section{Kernel Interface}
    5 Since this work fundamentally depends on operating system support, the first step of any design is to discuss the available interfaces and pick one (or more) as the foundations of the \io subsystem.
    6 
    7 \subsection{\lstinline|O_NONBLOCK|}
    8 In Linux, files can be opened with the flag @O_NONBLOCK@~\cite{MAN:open} (or @SO_NONBLOCK@~\cite{MAN:accept}, the equivalent for sockets) to use the file descriptors in ``nonblocking mode''. In this mode, ``Neither the open() nor any subsequent \io operations on the [opened file descriptor] will cause the calling
    9 process to wait.'' This feature can be used as the foundation for the \io subsystem. However, for the subsystem to be able to block \glspl{thrd} until an operation completes, @O_NONBLOCK@ must be use in conjunction with a system call that monitors when a file descriptor becomes ready, \ie, the next \io operation on it will not cause the process to wait\footnote{In this context, ready means to \emph{some} operation can be performed without blocking. It does not mean that the last operation that return \lstinline|EAGAIN| will succeed on the next try. A file that is ready to read but has only 1 byte available would be an example of this distinction.}.
    10 
    11 There are three options to monitor file descriptors in Linux\footnote{For simplicity, this section omits to mention \lstinline|pselect| and \lstinline|ppoll|. The difference between these system calls and \lstinline|select| and \lstinline|poll| respectively is not relevant for this discussion.}, @select@~\cite{MAN:select}, @poll@~\cite{MAN:poll} and @epoll@~\cite{MAN:epoll}. All three of these options offer a system call that blocks a \gls{kthrd} until at least one of many file descriptor becomes ready. The group of file descriptors being waited on is often referred to as the \newterm{interest set}.
    12 
    13 \paragraph{\lstinline|select|} is the oldest of these options, it takes as an input a contiguous array of bits, where each bits represent a file descriptor of interest. On return, it modifies the set in place to identify which of the file descriptors changed status. This means that calling select in a loop requires re-initializing the array each time and the number of file descriptors supported has a hard limit. Another limit of @select@ is that once the call is started, the interest set can no longer be modified. Monitoring a new file descriptor generally requires aborting any in progress call to @select@\footnote{Starting a new call to \lstinline|select| in this case is possible but requires a distinct kernel thread, and as a result is not a acceptable multiplexing solution when the interest set is large and highly dynamic unless the number of parallel calls to select can be strictly bounded.}.
    14 
    15 \paragraph{\lstinline|poll|} is an improvement over select, which removes the hard limit on the number of file descriptors and the need to re-initialize the input on every call. It works using an array of structures as an input rather than an array of bits, thus allowing a more compact input for small interest sets. Like @select@, @poll@ suffers from the limitation that the interest set cannot be changed while the call is blocked.
    16 
    17 \paragraph{\lstinline|epoll|} further improves on these two functions, by allowing the interest set to be dynamically added to and removed from while a \gls{kthrd} is blocked on a call to @epoll@. This is done by creating an \emph{epoll instance} with a persistent intereset set and that is used across multiple calls. This advantage significantly reduces synchronization overhead on the part of the caller (in this case the \io subsystem) since the interest set can be modified when adding or removing file descriptors without having to synchronize with other \glspl{kthrd} potentially calling @epoll@.
    18 
    19 However, all three of these system calls suffer from generality problems to some extent. The man page for @O_NONBLOCK@ mentions that ``[@O_NONBLOCK@] has no effect for regular files and block devices'', which means none of these three system calls are viable multiplexing strategies for these types of \io operations. Furthermore, @epoll@ has been shown to have some problems with pipes and ttys\cit{Peter's examples in some fashion}. Finally, none of these are useful solutions for multiplexing \io operations that do not have a corresponding file descriptor and can be awkward for operations using multiple file descriptors.
    20 
    21 \subsection{The POSIX asynchronous I/O (AIO)}
    22 An alternative to using @O_NONBLOCK@ is to use the AIO interface. Its interface lets programmers enqueue operations to be performed asynchronously by the kernel. Completions of these operations can be communicated in various ways, either by sending a Linux signal, spawning a new \gls{kthrd} or by polling for completion of one or more operation. For the purpose multiplexing operations, spawning a new \gls{kthrd} is counter-productive but a related solution is discussed in Section~\ref{io:morethreads}. Since using interrupts handlers can also lead to fairly complicated interactions between subsystems, I will concentrate on the different polling methods. AIO only supports read and write operations to file descriptors and those do not have the same limitation as @O_NONBLOCK@, \ie, the file descriptors can be regular files and blocked devices. It also supports batching more than one of these operations in a single system call.
    23 
    24 AIO offers two different approach to polling. @aio_error@ can be used as a spinning form of polling, returning @EINPROGRESS@ until the operation is completed, and @aio_suspend@ can be used similarly to @select@, @poll@ or @epoll@, to wait until one or more requests have completed. For the purpose of \io multiplexing, @aio_suspend@ is the intended interface. Even if AIO requests can be submitted concurrently, @aio_suspend@ suffers from the same limitation as @select@ and @poll@, \ie, the interest set cannot be dynamically changed while a call to @aio_suspend@ is in progress. Unlike @select@ and @poll@ however, it also suffers from the limitation that it does not specify which requests have completed, meaning programmers then have to poll each request in the interest set using @aio_error@ to identify which requests have completed. This means that, like @select@ and @poll@ but not @epoll@, the time needed to examine polling results increases based in the total number of requests monitored, not the number of completed requests.
    25 
    26 AIO does not seem to be a particularly popular interface, which I believe is in part due to this less than ideal polling interface. Linus Torvalds talks about this interface as follows :
     5Since this work fundamentally depends on operating-system support, the first step of any design is to discuss the available interfaces and pick one (or more) as the foundations of the non-blocking \io subsystem.
     6
     7\subsection{\lstinline{O_NONBLOCK}}
     8In Linux, files can be opened with the flag @O_NONBLOCK@~\cite{MAN:open} (or @SO_NONBLOCK@~\cite{MAN:accept}, the equivalent for sockets) to use the file descriptors in ``nonblocking mode''. In this mode, ``Neither the @open()@ nor any subsequent \io operations on the [opened file descriptor] will cause the calling
     9process to wait''~\cite{MAN:open}. This feature can be used as the foundation for the non-blocking \io subsystem. However, for the subsystem to know when an \io operation completes, @O_NONBLOCK@ must be use in conjunction with a system call that monitors when a file descriptor becomes ready, \ie, the next \io operation on it does not cause the process to wait\footnote{In this context, ready means \emph{some} operation can be performed without blocking. It does not mean an operation returning \lstinline{EAGAIN} succeeds on the next try. For example, a ready read may only return a subset of bytes and the read must be issues again for the remaining bytes, at which point it may return \lstinline{EAGAIN}.}.
     10This mechanism is also crucial in determining when all \glspl{thrd} are blocked and the application \glspl{kthrd} can now block.
     11
     12There are three options to monitor file descriptors in Linux\footnote{For simplicity, this section omits \lstinline{pselect} and \lstinline{ppoll}. The difference between these system calls and \lstinline{select} and \lstinline{poll}, respectively, is not relevant for this discussion.}, @select@~\cite{MAN:select}, @poll@~\cite{MAN:poll} and @epoll@~\cite{MAN:epoll}. All three of these options offer a system call that blocks a \gls{kthrd} until at least one of many file descriptors becomes ready. The group of file descriptors being waited is called the \newterm{interest set}.
     13
     14\paragraph{\lstinline{select}} is the oldest of these options, it takes as an input a contiguous array of bits, where each bits represent a file descriptor of interest. On return, it modifies the set in place to identify which of the file descriptors changed status. This destructive change means that calling select in a loop requires re-initializing the array each time and the number of file descriptors supported has a hard limit. Another limit of @select@ is that once the call is started, the interest set can no longer be modified. Monitoring a new file descriptor generally requires aborting any in progress call to @select@\footnote{Starting a new call to \lstinline{select} is possible but requires a distinct kernel thread, and as a result is not an acceptable multiplexing solution when the interest set is large and highly dynamic unless the number of parallel calls to \lstinline{select} can be strictly bounded.}.
     15
     16\paragraph{\lstinline{poll}} is an improvement over select, which removes the hard limit on the number of file descriptors and the need to re-initialize the input on every call. It works using an array of structures as an input rather than an array of bits, thus allowing a more compact input for small interest sets. Like @select@, @poll@ suffers from the limitation that the interest set cannot be changed while the call is blocked.
     17
     18\paragraph{\lstinline{epoll}} further improves these two functions by allowing the interest set to be dynamically added to and removed from while a \gls{kthrd} is blocked on an @epoll@ call. This dynamic capability is accomplished by creating an \emph{epoll instance} with a persistent interest set, which is used across multiple calls. This capability significantly reduces synchronization overhead on the part of the caller (in this case the \io subsystem), since the interest set can be modified when adding or removing file descriptors without having to synchronize with other \glspl{kthrd} potentially calling @epoll@.
     19
     20However, all three of these system calls have limitations. The @man@ page for @O_NONBLOCK@ mentions that ``[@O_NONBLOCK@] has no effect for regular files and block devices'', which means none of these three system calls are viable multiplexing strategies for these types of \io operations. Furthermore, @epoll@ has been shown to have problems with pipes and ttys~\cit{Peter's examples in some fashion}. Finally, none of these are useful solutions for multiplexing \io operations that do not have a corresponding file descriptor and can be awkward for operations using multiple file descriptors.
     21
     22\subsection{POSIX asynchronous I/O (AIO)}
     23An alternative to @O_NONBLOCK@ is the AIO interface. Its interface lets programmers enqueue operations to be performed asynchronously by the kernel. Completions of these operations can be communicated in various ways: either by spawning a new \gls{kthrd}, sending a Linux signal, or by polling for completion of one or more operation. For this work, spawning a new \gls{kthrd} is counter-productive but a related solution is discussed in Section~\ref{io:morethreads}. Using interrupts handlers can also lead to fairly complicated interactions between subsystems. Leaving polling for completion, which is similar to the previous system calls. While AIO only supports read and write operations to file descriptors, it does not have the same limitation as @O_NONBLOCK@, \ie, the file descriptors can be regular files and blocked devices. It also supports batching multiple operations in a single system call.
     24
     25AIO offers two different approach to polling: @aio_error@ can be used as a spinning form of polling, returning @EINPROGRESS@ until the operation is completed, and @aio_suspend@ can be used similarly to @select@, @poll@ or @epoll@, to wait until one or more requests have completed. For the purpose of \io multiplexing, @aio_suspend@ is the best interface. However, even if AIO requests can be submitted concurrently, @aio_suspend@ suffers from the same limitation as @select@ and @poll@, \ie, the interest set cannot be dynamically changed while a call to @aio_suspend@ is in progress. AIO also suffers from the limitation of specifying which requests have completed, \ie programmers have to poll each request in the interest set using @aio_error@ to identify the completed requests. This limitation means that, like @select@ and @poll@ but not @epoll@, the time needed to examine polling results increases based on the total number of requests monitored, not the number of completed requests.
     26Finally, AIO does not seem to be a popular interface, which I believe is due in part to this poor polling interface. Linus Torvalds talks about this interface as follows:
    2727
    2828\begin{displayquote}
    29         AIO is a horrible ad-hoc design, with the main excuse being "other,
     29        AIO is a horrible ad-hoc design, with the main excuse being ``other,
    3030        less gifted people, made that design, and we are implementing it for
    3131        compatibility because database people - who seldom have any shred of
    32         taste - actually use it".
     32        taste - actually use it''.
    3333
    3434        But AIO was always really really ugly.
     
    3939\end{displayquote}
    4040
    41 Interestingly, in this e-mail answer, Linus goes on to describe
     41Interestingly, in this e-mail, Linus goes on to describe
    4242``a true \textit{asynchronous system call} interface''
    4343that does
     
    4747This description is actually quite close to the interface described in the next section.
    4848
    49 \subsection{\lstinline|io_uring|}
    50 A very recent addition to Linux, @io_uring@\cite{MAN:io_uring} is a framework that aims to solve many of the problems listed with the above mentioned interfaces. Like AIO, it represents \io operations as entries added on a queue. But like @epoll@, new requests can be submitted while a blocking call waiting for requests to complete is already in progress. The @io_uring@ interface uses two ring buffers (referred to simply as rings) as its core, a submit ring to which programmers push \io requests and a completion buffer which programmers poll for completion.
    51 
    52 One of the big advantages over the interfaces listed above is that it also supports a much wider range of operations. In addition to supporting reads and writes to any file descriptor like AIO, it supports other operations like @open@, @close@, @fsync@, @accept@, @connect@, @send@, @recv@, @splice@, \etc.
    53 
    54 On top of these, @io_uring@ adds many ``bells and whistles'' like avoiding copies between the kernel and user-space with shared memory, allowing different mechanisms to communicate with device drivers and supporting chains of requests, \ie, requests that automatically trigger followup requests on completion.
     49\subsection{\lstinline{io_uring}}
     50A very recent addition to Linux, @io_uring@~\cite{MAN:io_uring}, is a framework that aims to solve many of the problems listed in the above interfaces. Like AIO, it represents \io operations as entries added to a queue. But like @epoll@, new requests can be submitted while a blocking call waiting for requests to complete is already in progress. The @io_uring@ interface uses two ring buffers (referred to simply as rings) at its core: a submit ring to which programmers push \io requests and a completion ring from which programmers poll for completion.
     51
     52One of the big advantages over the prior interfaces is that @io_uring@ also supports a much wider range of operations. In addition to supporting reads and writes to any file descriptor like AIO, it supports other operations like @open@, @close@, @fsync@, @accept@, @connect@, @send@, @recv@, @splice@, \etc.
     53
     54On top of these, @io_uring@ adds many extras like avoiding copies between the kernel and user-space using shared memory, allowing different mechanisms to communicate with device drivers, and supporting chains of requests, \ie, requests that automatically trigger followup requests on completion.
    5555
    5656\subsection{Extra Kernel Threads}\label{io:morethreads}
    57 Finally, if the operating system does not offer any satisfying forms of asynchronous \io operations, a solution is to fake it by creating a pool of \glspl{kthrd} and delegating operations to them in order to avoid blocking \glspl{proc}. The is a compromise on multiplexing. In the worst case, where all \glspl{thrd} are consistently blocking on \io, it devolves into 1-to-1 threading. However, regardless of the frequency of \io operations, it achieves the fundamental goal of not blocking \glspl{proc} when \glspl{thrd} are ready to run. This approach is used by languages like Go\cit{Go} and frameworks like libuv\cit{libuv}, since it has the advantage that it can easily be used across multiple operating systems. This advantage is especially relevant for languages like Go, which offer an homogenous \glsxtrshort{api} across all platforms. As opposed to C, which has a very limited standard api for \io, \eg, the C standard library has no networking.
     57Finally, if the operating system does not offer a satisfactory form of asynchronous \io operations, an ad-hoc solution is to create a pool of \glspl{kthrd} and delegate operations to it to avoid blocking \glspl{proc}, which is a compromise for multiplexing. In the worst case, where all \glspl{thrd} are consistently blocking on \io, it devolves into 1-to-1 threading. However, regardless of the frequency of \io operations, it achieves the fundamental goal of not blocking \glspl{proc} when \glspl{thrd} are ready to run. This approach is used by languages like Go\cit{Go} and frameworks like libuv\cit{libuv}, since it has the advantage that it can easily be used across multiple operating systems. This advantage is especially relevant for languages like Go, which offer a homogeneous \glsxtrshort{api} across all platforms. As opposed to C, which has a very limited standard api for \io, \eg, the C standard library has no networking.
    5858
    5959\subsection{Discussion}
    60 These options effectively fall into two broad camps of solutions, waiting for \io to be ready versus waiting for \io to be completed. All operating systems that support asynchronous \io must offer an interface along one of these lines, but the details can vary drastically. For example, Free BSD offers @kqueue@~\cite{MAN:bsd/kqueue} which behaves similarly to @epoll@ but with some small quality of life improvements, while Windows (Win32)~\cit{https://docs.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o} offers ``overlapped I/O'' which handles submissions similarly to @O_NONBLOCK@, with extra flags on the synchronous system call, but waits for completion events, similarly to @io_uring@.
    61 
    62 For this project, I have chosen to use @io_uring@, in large parts due to its generality. While @epoll@ has been shown to be a good solution to socket \io (\cite{DBLP:journals/pomacs/KarstenB20}), @io_uring@'s transparent support for files, pipes and more complex operations, like @splice@ and @tee@, make it a better choice as the foundation for a general \io subsystem.
     60These options effectively fall into two broad camps: waiting for \io to be ready versus waiting for \io to complete. All operating systems that support asynchronous \io must offer an interface along one of these lines, but the details vary drastically. For example, Free BSD offers @kqueue@~\cite{MAN:bsd/kqueue}, which behaves similarly to @epoll@, but with some small quality of use improvements, while Windows (Win32)~\cit{https://docs.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o} offers ``overlapped I/O'', which handles submissions similarly to @O_NONBLOCK@ with extra flags on the synchronous system call, but waits for completion events, similarly to @io_uring@.
     61
     62For this project, I selected @io_uring@, in large parts because to its generality. While @epoll@ has been shown to be a good solution for socket \io (\cite{DBLP:journals/pomacs/KarstenB20}), @io_uring@'s transparent support for files, pipes, and more complex operations, like @splice@ and @tee@, make it a better choice as the foundation for a general \io subsystem.
    6363
    6464\section{Event-Engine}
    65 
    66 The event engines reponsibility is to use the kernel interface to multiplex many \io operations onto few \glspl{kthrd}. In concrete terms, this means that \glspl{thrd} enter the engine through an interface, the event engines then starts the operation and parks the calling \glspl{thrd}, returning control to the \gls{proc}. The parked \glspl{thrd} are then rescheduled by the event engine once the desired operation has completed.
    67 
    68 \subsection{\lstinline|io_uring| in depth}
    69 Before going into details on the design of the event engine, I will present some more details on the usage of @io_uring@ which are important for the design of the engine.
     65An event engine's responsibility is to use the kernel interface to multiplex many \io operations onto few \glspl{kthrd}. In concrete terms, this means \glspl{thrd} enter the engine through an interface, the event engines then starts the operation and parks the calling \glspl{thrd}, returning control to the \gls{proc}. The parked \glspl{thrd} are then rescheduled by the event engine once the desired operation has completed.
     66
     67\subsection{\lstinline{io_uring} in depth}
     68Before going into details on the design of my event engine, more details on @io_uring@ usage are presented, each important in the design of the engine.
     69Figure~\ref{fig:iouring} shows an overview of an @io_uring@ instance.
     70Two ring buffers are used to communicate with the kernel: one for submissions~(left) and one for completions~(right).
     71The submission ring contains entries, \newterm{Submit Queue Entries} (SQE), produced (appended) by the application when an operation starts and then consumed by the kernel.
     72The completion ring contains entries, \newterm{Completion Queue Entries} (CQE), produced (appended) by the kernel when an operation completes and then consumed by the application.
     73The submission ring contains indexes into the SQE array (denoted \emph{S}) containing entries describing the I/O operation to start;
     74the completion ring contains entries for the completed I/O operation.
     75Multiple @io_uring@ instances can be created, in which case they each have a copy of the data structures in the figure.
    7076
    7177\begin{figure}
    7278        \centering
    7379        \input{io_uring.pstex_t}
    74         \caption[Overview of \lstinline|io_uring|]{Overview of \lstinline|io_uring| \smallskip\newline Two ring buffer are used to communicate with the kernel, one for completions~(right) and one for submissions~(left). The completion ring contains entries, \newterm{CQE}s: Completion Queue Entries, that are produced by the kernel when an operation completes and then consumed by the application. On the other hand, the application produces \newterm{SQE}s: Submit Queue Entries, which it appends to the submission ring for the kernel to consume. Unlike the completion ring, the submission ring does not contain the entries directly, it indexes into the SQE array (denoted \emph{S}) instead.}
     80        \caption{Overview of \lstinline{io_uring}}
     81%       \caption[Overview of \lstinline{io_uring}]{Overview of \lstinline{io_uring} \smallskip\newline Two ring buffer are used to communicate with the kernel, one for completions~(right) and one for submissions~(left). The completion ring contains entries, \newterm{CQE}s: Completion Queue Entries, that are produced by the kernel when an operation completes and then consumed by the application. On the other hand, the application produces \newterm{SQE}s: Submit Queue Entries, which it appends to the submission ring for the kernel to consume. Unlike the completion ring, the submission ring does not contain the entries directly, it indexes into the SQE array (denoted \emph{S}) instead.}
    7582        \label{fig:iouring}
    7683\end{figure}
    7784
    78 Figure~\ref{fig:iouring} shows an overview of an @io_uring@ instance. Multiple @io_uring@ instances can be created, in which case they each have a copy of the data structures in the figure. New \io operations are submitted to the kernel following 4 steps which use the components shown in the figure.
    79 
    80 \paragraph{First} an @sqe@ must be allocated from the pre-allocated array (denoted \emph{S} in Figure~\ref{fig:iouring}). This array is created at the same time as the @io_uring@ instance, is in kernel-locked memory, which means it is both visible by the kernel and the application, and has a fixed size determined at creation. How these entries are allocated is not important for the functionning of @io_uring@, the only requirement is that no entry is reused before the kernel has consumed it.
    81 
    82 \paragraph{Secondly} the @sqe@ must be filled according to the desired operation. This step is straight forward, the only detail worth mentionning is that @sqe@s have a @user_data@ field that must be filled in order to match submission and completion entries.
    83 
    84 \paragraph{Thirdly} the @sqe@ must be submitted to the submission ring, this requires appending the index of the @sqe@ to the ring following regular ring buffer steps: \lstinline|{ buffer[head] = item; head++ }|. Since the head is visible to the kernel, some memory barriers may be required to prevent the compiler from reordering these operations. Since the submission ring is a regular ring buffer, more than one @sqe@ can be added at once and the head can be updated only after the entire batch has been updated.
    85 
    86 \paragraph{Finally} the kernel must be notified of the change to the ring using the system call @io_uring_enter@. The number of elements appended to the submission ring is passed as a parameter and the number of elements consumed is returned. The @io_uring@ instance can be constructed so that this step is not required, but this requires elevated privilege and early version of @io_uring@ had additionnal restrictions.
    87 
    88 The completion side is simpler, applications call @io_uring_enter@ with the flag @IORING_ENTER_GETEVENTS@ to wait on a desired number of operations to complete. The same call can be used to both submit @sqe@s and wait for operations to complete. When operations do complete the kernel appends a @cqe@ to the completion ring and advances the head of the ring. Each @cqe@ contains the result of the operation as well as a copy of the @user_data@ field of the @sqe@ that triggered the operation. It is not necessary to call @io_uring_enter@ to get new events, the kernel can directly modify the completion ring, the system call is only needed if the application wants to block waiting on operations to complete.
    89 
    90 The @io_uring_enter@ system call is protected by a lock inside the kernel. This means that concurrent call to @io_uring_enter@ using the same instance are possible, but there is can be no performance gained from parallel calls to @io_uring_enter@. It is possible to do the first three submission steps in parallel, however, doing so requires careful synchronization.
    91 
    92 @io_uring@ also introduces some constraints on what the number of operations that can be ``in flight'' at the same time. Obviously, @sqe@s are allocated from a fixed-size array, meaning that there is a hard limit to how many @sqe@s can be submitted at once. In addition, the @io_uring_enter@ system call can fail because ``The  kernel [...] ran out of resources to handle [a request]'' or ``The application is attempting to overcommit the number of requests it can  have  pending.''. This requirement means that it can be required to handle bursts of \io requests by holding back some of the requests so they can be submitted at a later time.
     85New \io operations are submitted to the kernel following 4 steps, which use the components shown in the figure.
     86\begin{enumerate}
     87\item
     88An SQE is allocated from the pre-allocated array (denoted \emph{S} in Figure~\ref{fig:iouring}). This array is created at the same time as the @io_uring@ instance, is in kernel-locked memory visible by both the kernel and the application, and has a fixed size determined at creation. How these entries are allocated is not important for the functioning of @io_uring@, the only requirement is that no entry is reused before the kernel has consumed it.
     89\item
     90The SQE is filled according to the desired operation. This step is straight forward, the only detail worth mentioning is that SQEs have a @user_data@ field that must be filled in order to match submission and completion entries.
     91\item
     92The SQE is submitted to the submission ring by appending the index of the SQE to the ring following regular ring buffer steps: \lstinline{buffer[head] = item; head++}. Since the head is visible to the kernel, some memory barriers may be required to prevent the compiler from reordering these operations. Since the submission ring is a regular ring buffer, more than one SQE can be added at once and the head is updated only after all entries are updated.
     93\item
     94The kernel is notified of the change to the ring using the system call @io_uring_enter@. The number of elements appended to the submission ring is passed as a parameter and the number of elements consumed is returned. The @io_uring@ instance can be constructed so this step is not required, but this requires elevated privilege.% and an early version of @io_uring@ had additional restrictions.
     95\end{enumerate}
     96
     97\begin{sloppypar}
     98The completion side is simpler: applications call @io_uring_enter@ with the flag @IORING_ENTER_GETEVENTS@ to wait on a desired number of operations to complete. The same call can be used to both submit SQEs and wait for operations to complete. When operations do complete, the kernel appends a CQE to the completion ring and advances the head of the ring. Each CQE contains the result of the operation as well as a copy of the @user_data@ field of the SQE that triggered the operation. It is not necessary to call @io_uring_enter@ to get new events because the kernel can directly modify the completion ring. The system call is only needed if the application wants to block waiting for operations to complete.
     99\end{sloppypar}
     100
     101The @io_uring_enter@ system call is protected by a lock inside the kernel. This protection means that concurrent call to @io_uring_enter@ using the same instance are possible, but there is no performance gained from parallel calls to @io_uring_enter@. It is possible to do the first three submission steps in parallel, however, doing so requires careful synchronization.
     102
     103@io_uring@ also introduces constraints on the number of simultaneous operations that can be ``in flight''. Obviously, SQEs are allocated from a fixed-size array, meaning that there is a hard limit to how many SQEs can be submitted at once. In addition, the @io_uring_enter@ system call can fail because ``The  kernel [...] ran out of resources to handle [a request]'' or ``The application is attempting to overcommit the number of requests it can  have  pending.''. This restriction means \io request bursts may have to be subdivided and submitted in chunks at a later time.
    93104
    94105\subsection{Multiplexing \io: Submission}
    95 The submission side is the most complicated aspect of @io_uring@ and the completion side effectively follows from the design decisions made in the submission side.
    96 
    97 While it is possible to do the first steps of submission in parallel, the duration of the system call scales with number of entries submitted. The consequence of this is that how much parallelism can be used to prepare submissions for the next system call is limited. Beyond this limit, the length of the system call will be the throughput limiting factor. I have concluded from early experiments that preparing submissions seems to take about as long as the system call itself, which means that with a single @io_uring@ instance, there is no benefit in terms of \io throughput to having more than two \glspl{hthrd}. Therefore the design of the submission engine must manage multiple instances of @io_uring@ running in parallel, effectively sharding @io_uring@ instances. Similarly to scheduling, this sharding can be done privately, \ie, one instance per \glspl{proc}, in decoupled pools, \ie, a pool of \glspl{proc} use a pool of @io_uring@ instances without one-to-one coupling between any given instance and any given \gls{proc}, or some mix of the two. Since completions are sent to the instance where requests were submitted, all instances with pending operations must be polled continously\footnote{As will be described in Chapter~\ref{practice}, this does not translate into constant cpu usage.}.
     106The submission side is the most complicated aspect of @io_uring@ and the completion side effectively follows from the design decisions made in the submission side. While it is possible to do the first steps of submission in parallel, the duration of the system call scales with number of entries submitted. The consequence is that the amount of parallelism used to prepare submissions for the next system call is limited.
     107Beyond this limit, the length of the system call is the throughput limiting factor. I concluded from early experiments that preparing submissions seems to take about as long as the system call itself, which means that with a single @io_uring@ instance, there is no benefit in terms of \io throughput to having more than two \glspl{hthrd}. Therefore the design of the submission engine must manage multiple instances of @io_uring@ running in parallel, effectively sharding @io_uring@ instances. Similarly to scheduling, this sharding can be done privately, \ie, one instance per \glspl{proc}, in decoupled pools, \ie, a pool of \glspl{proc} use a pool of @io_uring@ instances without one-to-one coupling between any given instance and any given \gls{proc}, or some mix of the two. Since completions are sent to the instance where requests were submitted, all instances with pending operations must be polled continously\footnote{As will be described in Chapter~\ref{practice}, this does not translate into constant cpu usage.}.
    98108
    99109\subsubsection{Shared Instances}
     
    104114Allocation failures need to be pushed up to the routing algorithm: \glspl{thrd} attempting \io operations must not be directed to @io_uring@ instances without sufficient @sqe@s available. Furthermore, the routing algorithm should block operations up-front if none of the instances have available @sqe@s.
    105115
    106 Once an @sqe@ is allocated, \glspl{thrd} can fill them normally, they simply need to keep trac of the @sqe@ index and which instance it belongs to.
    107 
    108 Once an @sqe@ is filled in, what needs to happen is that the @sqe@ must be added to the submission ring buffer, an operation that is not thread-safe on itself, and the kernel must be notified using the @io_uring_enter@ system call. The submission ring buffer is the same size as the pre-allocated @sqe@ buffer, therefore pushing to the ring buffer cannot fail\footnote{This is because it is invalid to have the same \lstinline|sqe| multiple times in the ring buffer.}. However, as mentioned, the system call itself can fail with the expectation that it will be retried once some of the already submitted operations complete. Since multiple @sqe@s can be submitted to the kernel at once, it is important to strike a balance between batching and latency. Operations that are ready to be submitted should be batched together in few system calls, but at the same time, operations should not be left pending for long period of times before being submitted. This can be handled by either designating one of the submitting \glspl{thrd} as the being responsible for the system call for the current batch of @sqe@s or by having some other party regularly submitting all ready @sqe@s, \eg, the poller \gls{thrd} mentionned later in this section.
    109 
    110 In the case of designating a \gls{thrd}, ideally, when multiple \glspl{thrd} attempt to submit operations to the same @io_uring@ instance, all requests would be batched together and one of the \glspl{thrd} would do the system call on behalf of the others, referred to as the \newterm{submitter}. In practice however, it is important that the \io requests are not left pending indefinately and as such, it may be required to have a current submitter and a next submitter. Indeed, as long as there is a ``next'' submitter, \glspl{thrd} submitting new \io requests can move on, knowing that some future system call will include their request. Once the system call is done, the submitter must also free @sqe@s so that the allocator can reused them.
    111 
    112 Finally, the completion side is much simpler since the @io_uring@ system call enforces a natural synchronization point. Polling simply needs to regularly do the system call, go through the produced @cqe@s and communicate the result back to the originating \glspl{thrd}. Since @cqe@s only own a signed 32 bit result, in addition to the copy of the @user_data@ field, all that is needed to communicate the result is a simple future~\cite{wiki:future}. If the submission side does not designate submitters, polling can also submit all @sqe@s as it is polling events.  A simple approach to polling is to allocate a \gls{thrd} per @io_uring@ instance and simply let the poller \glspl{thrd} poll their respective instances when scheduled. This design is especially convinient for reasons explained in Chapter~\ref{practice}.
    113 
     116Once an SQE is allocated, \glspl{thrd} can fill them normally, they simply need to keep track of the SQE index and which instance it belongs to.
     117
     118Once an SQE is filled in, what needs to happen is that the SQE must be added to the submission ring buffer, an operation that is not thread-safe on itself, and the kernel must be notified using the @io_uring_enter@ system call. The submission ring buffer is the same size as the pre-allocated SQE buffer, therefore pushing to the ring buffer cannot fail\footnote{This is because it is invalid to have the same \lstinline{sqe} multiple times in the ring buffer.}. However, as mentioned, the system call itself can fail with the expectation that it will be retried once some of the already submitted operations complete. Since multiple SQEs can be submitted to the kernel at once, it is important to strike a balance between batching and latency. Operations that are ready to be submitted should be batched together in few system calls, but at the same time, operations should not be left pending for long period of times before being submitted. This can be handled by either designating one of the submitting \glspl{thrd} as the being responsible for the system call for the current batch of SQEs or by having some other party regularly submitting all ready SQEs, \eg, the poller \gls{thrd} mentioned later in this section.
     119
     120In the case of designating a \gls{thrd}, ideally, when multiple \glspl{thrd} attempt to submit operations to the same @io_uring@ instance, all requests would be batched together and one of the \glspl{thrd} would do the system call on behalf of the others, referred to as the \newterm{submitter}. In practice however, it is important that the \io requests are not left pending indefinitely and as such, it may be required to have a current submitter and a next submitter. Indeed, as long as there is a ``next'' submitter, \glspl{thrd} submitting new \io requests can move on, knowing that some future system call will include their request. Once the system call is done, the submitter must also free SQEs so that the allocator can reused them.
     121
     122Finally, the completion side is much simpler since the @io_uring@ system call enforces a natural synchronization point. Polling simply needs to regularly do the system call, go through the produced CQEs and communicate the result back to the originating \glspl{thrd}. Since CQEs only own a signed 32 bit result, in addition to the copy of the @user_data@ field, all that is needed to communicate the result is a simple future~\cite{wiki:future}. If the submission side does not designate submitters, polling can also submit all SQEs as it is polling events.  A simple approach to polling is to allocate a \gls{thrd} per @io_uring@ instance and simply let the poller \glspl{thrd} poll their respective instances when scheduled. This design is especially convenient for reasons explained in Chapter~\ref{practice}.
     123
     124<<<<<<< HEAD
    114125With this pool of instances approach, the big advantage is that it is fairly flexible. It does not impose restrictions on what \glspl{thrd} submitting \io operations can and cannot do between allocations and submissions. It also can gracefully handles running out of ressources, @sqe@s or the kernel returning @EBUSY@. The down side to this is that many of the steps used for submitting need complex synchronization to work properly. The routing and allocation algorithm needs to keep track of which ring instances have available @sqe@s, block incoming requests if no instance is available, prevent barging if \glspl{thrd} are already queued up waiting for @sqe@s and handle @sqe@s being freed. The submission side needs to safely append @sqe@s to the ring buffer, make sure no @sqe@ is dropped or left pending forever, notify the allocation side when @sqe@s can be reused and handle the kernel returning @EBUSY@. All this synchronization may have a significant cost and, compare to the next approach presented, this synchronization is entirely overhead.
    115126
    116127\subsubsection{Private Instances}
    117128Another approach is to simply create one ring instance per \gls{proc}. This alleviate the need for synchronization on the submissions, requiring only that \glspl{thrd} are not interrupted in between two submission steps. This is effectively the same requirement as using @thread_local@ variables. Since @sqe@s that are allocated must be submitted to the same ring, on the same \gls{proc}, this effectively forces the application to submit @sqe@s in allocation order\footnote{The actual requirement is that \glspl{thrd} cannot context switch between allocation and submission. This requirement means that from the subsystem's point of view, the allocation and submission are sequential. To remove this requirement, a \gls{thrd} would need the ability to ``yield to a specific \gls{proc}'', \ie, park with the promise that it will be run next on a specific \gls{proc}, the \gls{proc} attached to the correct ring.}, greatly simplifying both allocation and submission. In this design, allocation and submission form a ring partitionned ring buffer as shown in Figure~\ref{fig:pring}. Once added to the ring buffer, the attached \gls{proc} has a significant amount of flexibility with regards to when to do the system call. Possible options are: when the \gls{proc} runs out of \glspl{thrd} to run, after running a given number of threads \glspl{thrd}, etc.
     129=======
     130With this pool of instances approach, the big advantage is that it is fairly flexible. It does not impose restrictions on what \glspl{thrd} submitting \io operations can and cannot do between allocations and submissions. It also can gracefully handle running out of resources, SQEs or the kernel returning @EBUSY@. The down side to this is that many of the steps used for submitting need complex synchronization to work properly. The routing and allocation algorithm needs to keep track of which ring instances have available SQEs, block incoming requests if no instance is available, prevent barging if \glspl{thrd} are already queued up waiting for SQEs and handle SQEs being freed. The submission side needs to safely append SQEs to the ring buffer, make sure no SQE is dropped or left pending forever, notify the allocation side when SQEs can be reused and handle the kernel returning @EBUSY@. Sharding the @io_uring@ instances should alleviate much of the contention caused by this, but all this synchronization may still have non-zero cost.
     131
     132\subsubsection{Private Instances}
     133Another approach is to simply create one ring instance per \gls{proc}. This alleviate the need for synchronization on the submissions, requiring only that \glspl{thrd} are not interrupted in between two submission steps. This is effectively the same requirement as using @thread_local@ variables. Since SQEs that are allocated must be submitted to the same ring, on the same \gls{proc}, this effectively forces the application to submit SQEs in allocation order\footnote{The actual requirement is that \glspl{thrd} cannot context switch between allocation and submission. This requirement means that from the subsystem's point of view, the allocation and submission are sequential. To remove this requirement, a \gls{thrd} would need the ability to ``yield to a specific \gls{proc}'', \ie, park with the promise that it will be run next on a specific \gls{proc}, the \gls{proc} attached to the correct ring. This is not a current or planned feature of \CFA.}, greatly simplifying both allocation and submission. In this design, allocation and submission form a ring partitioned ring buffer as shown in Figure~\ref{fig:pring}. Once added to the ring buffer, the attached \gls{proc} has a significant amount of flexibility with regards to when to do the system call. Possible options are: when the \gls{proc} runs out of \glspl{thrd} to run, after running a given number of threads \glspl{thrd}, etc.
     134>>>>>>> 1830a8657cb302a89a7ca045bee06baa48b18101
    118135
    119136\begin{figure}
    120137        \centering
    121138        \input{pivot_ring.pstex_t}
    122         \caption[Partitionned ring buffer]{Partitionned ring buffer \smallskip\newline Allocated sqes are appending to the first partition. When submitting, the partition is simply advanced to include all the sqes that should be submitted. The kernel considers the partition as the head of the ring.}
     139        \caption[Partitioned ring buffer]{Partitioned ring buffer \smallskip\newline Allocated sqes are appending to the first partition. When submitting, the partition is simply advanced to include all the sqes that should be submitted. The kernel considers the partition as the head of the ring.}
    123140        \label{fig:pring}
    124141\end{figure}
    125142
     143<<<<<<< HEAD
    126144This approach has the advantage that it does not require much of the synchronization needed in the shared approach. This comes at the cost that \glspl{thrd} submitting \io operations have less flexibility, they cannot park or yield, and several exceptional cases are handled poorly. Instances running out of @sqe@s cannot run \glspl{thrd} wanting to do \io operations, in such a case the \gls{thrd} needs to be moved to a different \gls{proc}, the only current way of achieving this would be to @yield()@ hoping to be scheduled on a different \gls{proc}, which is not guaranteed.
    127145
     
    190208%       if cltr.io.flag || proc.io != alloc.io || proc.io->flag:
    191209%               return submit_slow(cltr.io)
     210=======
     211This approach has the advantage that it does not require much of the synchronization needed in the shared approach. This comes at the cost that \glspl{thrd} submitting \io operations have less flexibility, they cannot park or yield, and several exceptional cases are handled poorly. Instances running out of SQEs cannot run \glspl{thrd} wanting to do \io operations, in such a case the \gls{thrd} needs to be moved to a different \gls{proc}, the only current way of achieving this would be to @yield()@ hoping to be scheduled on a different \gls{proc}, which is not guaranteed. Another problematic case is that \glspl{thrd} that do not park for long periods of time will delay the submission of any SQE not already submitted. This issue is similar to fairness issues which schedulers that use work-stealing mentioned in the previous chapter.
     212>>>>>>> 1830a8657cb302a89a7ca045bee06baa48b18101
    192213
    193214%       submit_fast(proc.io, a)
     
    214235\subsection{Asynchronous Extension}
    215236
    216 \subsection{Interface directly to \lstinline|io_uring|}
     237\subsection{Interface directly to \lstinline{io_uring}}
  • doc/theses/thierry_delisle_PhD/thesis/thesis.tex

    r14533d4 rf6664bf2  
    1 % uWaterloo Thesis Template for LaTeX
    2 % Last Updated June 14, 2017 by Stephen Carr, IST Client Services
    3 % FOR ASSISTANCE, please send mail to rt-IST-CSmathsci@ist.uwaterloo.ca
    4 
    5 % Effective October 2006, the University of Waterloo
    6 % requires electronic thesis submission. See the uWaterloo thesis regulations at
     1%======================================================================
     2% University of Waterloo Thesis Template for LaTeX
     3% Last Updated November, 2020
     4% by Stephen Carr, IST Client Services,
     5% University of Waterloo, 200 University Ave. W., Waterloo, Ontario, Canada
     6% FOR ASSISTANCE, please send mail to request@uwaterloo.ca
     7
     8% DISCLAIMER
     9% To the best of our knowledge, this template satisfies the current uWaterloo thesis requirements.
     10% However, it is your responsibility to assure that you have met all requirements of the University and your particular department.
     11
     12% Many thanks for the feedback from many graduates who assisted the development of this template.
     13% Also note that there are explanatory comments and tips throughout this template.
     14%======================================================================
     15% Some important notes on using this template and making it your own...
     16
     17% The University of Waterloo has required electronic thesis submission since October 2006.
     18% See the uWaterloo thesis regulations at
    719% https://uwaterloo.ca/graduate-studies/thesis.
    8 
    9 % DON'T FORGET TO ADD YOUR OWN NAME AND TITLE in the "hyperref" package
    10 % configuration below. THIS INFORMATION GETS EMBEDDED IN THE PDF FINAL PDF DOCUMENT.
    11 % You can view the information if you view Properties of the PDF document.
    12 
    13 % Many faculties/departments also require one or more printed
    14 % copies. This template attempts to satisfy both types of output.
    15 % It is based on the standard "book" document class which provides all necessary
    16 % sectioning structures and allows multi-part theses.
    17 
    18 % DISCLAIMER
    19 % To the best of our knowledge, this template satisfies the current uWaterloo requirements.
    20 % However, it is your responsibility to assure that you have met all
    21 % requirements of the University and your particular department.
    22 % Many thanks for the feedback from many graduates that assisted the development of this template.
    23 
    24 % -----------------------------------------------------------------------
    25 
    26 % By default, output is produced that is geared toward generating a PDF
    27 % version optimized for viewing on an electronic display, including
    28 % hyperlinks within the PDF.
    29 
     20% This thesis template is geared towards generating a PDF version optimized for viewing on an electronic display, including hyperlinks within the PDF.
     21
     22% DON'T FORGET TO ADD YOUR OWN NAME AND TITLE in the "hyperref" package configuration below.
     23% THIS INFORMATION GETS EMBEDDED IN THE PDF FINAL PDF DOCUMENT.
     24% You can view the information if you view properties of the PDF document.
     25
     26% Many faculties/departments also require one or more printed copies.
     27% This template attempts to satisfy both types of output.
     28% See additional notes below.
     29% It is based on the standard "book" document class which provides all necessary sectioning structures and allows multi-part theses.
     30
     31% If you are using this template in Overleaf (cloud-based collaboration service), then it is automatically processed and previewed for you as you edit.
     32
     33% For people who prefer to install their own LaTeX distributions on their own computers, and process the source files manually, the following notes provide the sequence of tasks:
     34 
    3035% E.g. to process a thesis called "mythesis.tex" based on this template, run:
    3136
    3237% pdflatex mythesis     -- first pass of the pdflatex processor
    3338% bibtex mythesis       -- generates bibliography from .bib data file(s)
    34 % makeindex         -- should be run only if an index is used
     39% makeindex         -- should be run only if an index is used 
    3540% pdflatex mythesis     -- fixes numbering in cross-references, bibliographic references, glossaries, index, etc.
    36 % pdflatex mythesis     -- fixes numbering in cross-references, bibliographic references, glossaries, index, etc.
    37 
    38 % If you use the recommended LaTeX editor, Texmaker, you would open the mythesis.tex
    39 % file, then click the PDFLaTeX button. Then run BibTeX (under the Tools menu).
    40 % Then click the PDFLaTeX button two more times. If you have an index as well,
    41 % you'll need to run MakeIndex from the Tools menu as well, before running pdflatex
     41% pdflatex mythesis     -- it takes a couple of passes to completely process all cross-references
     42
     43% If you use the recommended LaTeX editor, Texmaker, you would open the mythesis.tex file, then click the PDFLaTeX button. Then run BibTeX (under the Tools menu).
     44% Then click the PDFLaTeX button two more times.
     45% If you have an index as well,you'll need to run MakeIndex from the Tools menu as well, before running pdflatex
    4246% the last two times.
    4347
    44 % N.B. The "pdftex" program allows graphics in the following formats to be
    45 % included with the "\includegraphics" command: PNG, PDF, JPEG, TIFF
    46 % Tip 1: Generate your figures and photos in the size you want them to appear
    47 % in your thesis, rather than scaling them with \includegraphics options.
    48 % Tip 2: Any drawings you do should be in scalable vector graphic formats:
    49 % SVG, PNG, WMF, EPS and then converted to PNG or PDF, so they are scalable in
    50 % the final PDF as well.
    51 % Tip 3: Photographs should be cropped and compressed so as not to be too large.
    52 
    53 % To create a PDF output that is optimized for double-sided printing:
    54 %
    55 % 1) comment-out the \documentclass statement in the preamble below, and
    56 % un-comment the second \documentclass line.
    57 %
    58 % 2) change the value assigned below to the boolean variable
    59 % "PrintVersion" from "false" to "true".
    60 
    61 % --------------------- Start of Document Preamble -----------------------
    62 
    63 % Specify the document class, default style attributes, and page dimensions
     48% N.B. The "pdftex" program allows graphics in the following formats to be included with the "\includegraphics" command: PNG, PDF, JPEG, TIFF
     49% Tip: Generate your figures and photos in the size you want them to appear in your thesis, rather than scaling them with \includegraphics options.
     50% Tip: Any drawings you do should be in scalable vector graphic formats: SVG, PNG, WMF, EPS and then converted to PNG or PDF, so they are scalable in the final PDF as well.
     51% Tip: Photographs should be cropped and compressed so as not to be too large.
     52
     53% To create a PDF output that is optimized for double-sided printing:
     54% 1) comment-out the \documentclass statement in the preamble below, and un-comment the second \documentclass line.
     55% 2) change the value assigned below to the boolean variable "PrintVersion" from " false" to "true".
     56
     57%======================================================================
     58%   D O C U M E N T   P R E A M B L E
     59% Specify the document class, default style attributes, and page dimensions, etc.
    6460% For hyperlinked PDF, suitable for viewing on a computer, use this:
    6561\documentclass[letterpaper,12pt,titlepage,oneside,final]{book}
    6662
    67 % For PDF, suitable for double-sided printing, change the PrintVersion variable below
    68 % to "true" and use this \documentclass line instead of the one above:
     63% For PDF, suitable for double-sided printing, change the PrintVersion variable below to "true" and use this \documentclass line instead of the one above:
    6964%\documentclass[letterpaper,12pt,titlepage,openright,twoside,final]{book}
    7065
    71 \newcommand{\href}[1]{#1} % does nothing, but defines the command so the
    72     % print-optimized version will ignore \href tags (redefined by hyperref pkg).
     66% Some LaTeX commands I define for my own nomenclature.
     67% If you have to, it's easier to make changes to nomenclature once here than in a million places throughout your thesis!
     68\newcommand{\package}[1]{\textbf{#1}} % package names in bold text
     69\newcommand{\cmmd}[1]{\textbackslash\texttt{#1}} % command name in tt font
     70\newcommand{\href}[1]{#1} % does nothing, but defines the command so the print-optimized version will ignore \href tags (redefined by hyperref pkg).
     71%\newcommand{\texorpdfstring}[2]{#1} % does nothing, but defines the command
     72% Anything defined here may be redefined by packages added below...
    7373
    7474% This package allows if-then-else control structures.
     
    7676\newboolean{PrintVersion}
    7777\setboolean{PrintVersion}{false}
    78 % CHANGE THIS VALUE TO "true" as necessary, to improve printed results for hard copies
    79 % by overriding some options of the hyperref package below.
     78% CHANGE THIS VALUE TO "true" as necessary, to improve printed results for hard copies by overriding some options of the hyperref package, called below.
    8079
    8180%\usepackage{nomencl} % For a nomenclature (optional; available from ctan.org)
     
    8584
    8685% Hyperlinks make it very easy to navigate an electronic document.
    87 % In addition, this is where you should specify the thesis title
    88 % and author as they appear in the properties of the PDF document.
     86% In addition, this is where you should specify the thesis title and author as they appear in the properties of the PDF document.
    8987% Use the "hyperref" package
    9088% N.B. HYPERREF MUST BE THE LAST PACKAGE LOADED; ADD ADDITIONAL PKGS ABOVE
    9189\usepackage[pagebackref=false]{hyperref} % with basic options
    92                 % N.B. pagebackref=true provides links back from the References to the body text. This can cause trouble for printing.
     90%\usepackage[pdftex,pagebackref=true]{hyperref}
     91% N.B. pagebackref=true provides links back from the References to the body text. This can cause trouble for printing.
    9392\hypersetup{
    9493        plainpages=false,       % needed if Roman numbers in frontpages
    95         unicode=false,          % non-Latin characters in Acrobats bookmarks
    96         pdftoolbar=true,        % show Acrobats toolbar?
    97         pdfmenubar=true,        % show Acrobats menu?
     94        unicode=false,          % non-Latin characters in Acrobat's bookmarks
     95        pdftoolbar=true,        % show Acrobat's toolbar?
     96        pdfmenubar=true,        % show Acrobat's menu?
    9897        pdffitwindow=false,     % window fit to page when opened
    9998        pdfstartview={FitH},    % fits the width of the page to the window
     
    111110\ifthenelse{\boolean{PrintVersion}}{   % for improved print quality, change some hyperref options
    112111\hypersetup{    % override some previously defined hyperref options
    113         citecolor=black,
    114         filecolor=black,
    115         linkcolor=black,
     112        citecolor=black,%
     113        filecolor=black,%
     114        linkcolor=black,%
    116115        urlcolor=black
    117116}}{} % end of ifthenelse (no else)
     
    136135
    137136% Setting up the page margins...
    138 \setlength{\textheight}{9in}\setlength{\topmargin}{-0.45in}\setlength{\headsep}{0.25in}
     137\setlength{\textheight}{9in}
     138\setlength{\topmargin}{-0.45in}
     139\setlength{\headsep}{0.25in}
    139140% uWaterloo thesis requirements specify a minimum of 1 inch (72pt) margin at the
    140 % top, bottom, and outside page edges and a 1.125 in. (81pt) gutter
    141 % margin (on binding side). While this is not an issue for electronic
    142 % viewing, a PDF may be printed, and so we have the same page layout for
    143 % both printed and electronic versions, we leave the gutter margin in.
     141% top, bottom, and outside page edges and a 1.125 in. (81pt) gutter margin (on binding side).
     142% While this is not an issue for electronic viewing, a PDF may be printed, and so we have the same page layout for both printed and electronic versions, we leave the gutter margin in.
    144143% Set margins to minimum permitted by uWaterloo thesis regulations:
    145144\setlength{\marginparwidth}{0pt} % width of margin notes
     
    150149\setlength{\evensidemargin}{0.125in} % Adds 1/8 in. to binding side of all
    151150% even-numbered pages when the "twoside" printing option is selected
    152 \setlength{\oddsidemargin}{0.125in} % Adds 1/8 in. to the left of all pages
    153 % when "oneside" printing is selected, and to the left of all odd-numbered
    154 % pages when "twoside" printing is selected
    155 \setlength{\textwidth}{6.375in} % assuming US letter paper (8.5 in. x 11 in.) and
    156 % side margins as above
     151\setlength{\oddsidemargin}{0.125in} % Adds 1/8 in. to the left of all pages when "oneside" printing is selected, and to the left of all odd-numbered pages when "twoside" printing is selected
     152\setlength{\textwidth}{6.375in} % assuming US letter paper (8.5 in. x 11 in.) and side margins as above
    157153\raggedbottom
    158154
    159 % The following statement specifies the amount of space between
    160 % paragraphs. Other reasonable specifications are \bigskipamount and \smallskipamount.
     155% The following statement specifies the amount of space between paragraphs. Other reasonable specifications are \bigskipamount and \smallskipamount.
    161156\setlength{\parskip}{\medskipamount}
    162157
    163 % The following statement controls the line spacing.  The default
    164 % spacing corresponds to good typographic conventions and only slight
    165 % changes (e.g., perhaps "1.2"), if any, should be made.
     158% The following statement controls the line spacing.
     159% The default spacing corresponds to good typographic conventions and only slight changes (e.g., perhaps "1.2"), if any, should be made.
    166160\renewcommand{\baselinestretch}{1} % this is the default line space setting
    167161
    168 % By default, each chapter will start on a recto (right-hand side)
    169 % page.  We also force each section of the front pages to start on
    170 % a recto page by inserting \cleardoublepage commands.
    171 % In many cases, this will require that the verso page be
    172 % blank and, while it should be counted, a page number should not be
    173 % printed.  The following statements ensure a page number is not
    174 % printed on an otherwise blank verso page.
     162% By default, each chapter will start on a recto (right-hand side) page.
     163% We also force each section of the front pages to start on a recto page by inserting \cleardoublepage commands.
     164% In many cases, this will require that the verso (left-hand) page be blank, and while it should be counted, a page number should not be printed.
     165% The following statements ensure a page number is not printed on an otherwise blank verso page.
    175166\let\origdoublepage\cleardoublepage
    176167\newcommand{\clearemptydoublepage}{%
     
    204195\input{common}
    205196\CFAStyle                                               % CFA code-style for all languages
    206 \lstset{basicstyle=\linespread{0.9}\tt}
     197\lstset{language=CFA,basicstyle=\linespread{0.9}\tt}    % CFA default language
    207198
    208199% glossary of terms to use
     
    210201\makeindex
    211202
    212 \newcommand\io{\glsxtrshort{io}}%
    213 
    214 %======================================================================
    215 %   L O G I C A L    D O C U M E N T -- the content of your thesis
     203\newcommand\io{\glsxtrshort{io}\xspace}%
     204
     205%======================================================================
     206%   L O G I C A L    D O C U M E N T
     207% The logical document contains the main content of your thesis.
     208% Being a large document, it is a good idea to divide your thesis into several files, each one containing one chapter or other significant chunk of content, so you can easily shuffle things around later if desired.
    216209%======================================================================
    217210\begin{document}
    218211
    219 % For a large document, it is a good idea to divide your thesis
    220 % into several files, each one containing one chapter.
    221 % To illustrate this idea, the "front pages" (i.e., title page,
    222 % declaration, borrowers' page, abstract, acknowledgements,
    223 % dedication, table of contents, list of tables, list of figures,
    224 % nomenclature) are contained within the file "uw-ethesis-frontpgs.tex" which is
    225 % included into the document by the following statement.
    226212%----------------------------------------------------------------------
    227213% FRONT MATERIAL
     214% title page,declaration, borrowers' page, abstract, acknowledgements,
     215% dedication, table of contents, list of tables, list of figures, nomenclature, etc.
    228216%----------------------------------------------------------------------
    229217\input{text/front.tex}
    230218
    231 
    232219%----------------------------------------------------------------------
    233220% MAIN BODY
    234 %----------------------------------------------------------------------
    235 % Because this is a short document, and to reduce the number of files
    236 % needed for this template, the chapters are not separate
    237 % documents as suggested above, but you get the idea. If they were
    238 % separate documents, they would each start with the \chapter command, i.e,
    239 % do not contain \documentclass or \begin{document} and \end{document} commands.
     221% We suggest using a separate file for each chapter of your thesis.
     222% Start each chapter file with the \chapter command.
     223% Only use \documentclass or \begin{document} and \end{document} commands in this master document.
     224% Tip: Putting each sentence on a new line is a way to simplify later editing.
     225%----------------------------------------------------------------------
     226
    240227\part{Introduction}
    241228\input{text/intro.tex}
     
    255242%----------------------------------------------------------------------
    256243% END MATERIAL
    257 %----------------------------------------------------------------------
    258 
    259 % B I B L I O G R A P H Y
    260 % -----------------------
    261 
    262 % The following statement selects the style to use for references.  It controls the sort order of the entries in the bibliography and also the formatting for the in-text labels.
     244% Bibliography, Appendices, Index, etc.
     245%----------------------------------------------------------------------
     246
     247% Bibliography
     248
     249% The following statement selects the style to use for references.
     250% It controls the sort order of the entries in the bibliography and also the formatting for the in-text labels.
    263251\bibliographystyle{plain}
    264252% This specifies the location of the file containing the bibliographic information.
    265 % It assumes you're using BibTeX (if not, why not?).
    266 \cleardoublepage % This is needed if the book class is used, to place the anchor in the correct page,
    267                  % because the bibliography will start on its own page.
    268                  % Use \clearpage instead if the document class uses the "oneside" argument
     253% It assumes you're using BibTeX to manage your references (if not, why not?).
     254\cleardoublepage % This is needed if the "book" document class is used, to place the anchor in the correct page, because the bibliography will start on its own page.
     255% Use \clearpage instead if the document class uses the "oneside" argument
    269256\phantomsection  % With hyperref package, enables hyperlinking from the table of contents to bibliography
    270257% The following statement causes the title "References" to be used for the bibliography section:
     
    275262
    276263\bibliography{local,pl}
    277 % Tip 5: You can create multiple .bib files to organize your references.
     264% Tip: You can create multiple .bib files to organize your references.
    278265% Just list them all in the \bibliogaphy command, separated by commas (no spaces).
    279266
    280 % % The following statement causes the specified references to be added to the bibliography% even if they were not
    281 % % cited in the text. The asterisk is a wildcard that causes all entries in the bibliographic database to be included (optional).
     267% The following statement causes the specified references to be added to the bibliography even if they were not cited in the text.
     268% The asterisk is a wildcard that causes all entries in the bibliographic database to be included (optional).
    282269% \nocite{*}
     270%----------------------------------------------------------------------
     271
     272% Appendices
    283273
    284274% The \appendix statement indicates the beginning of the appendices.
    285275\appendix
    286 % Add a title page before the appendices and a line in the Table of Contents
     276% Add an un-numbered title page before the appendices and a line in the Table of Contents
    287277\chapter*{APPENDICES}
    288278\addcontentsline{toc}{chapter}{APPENDICES}
     279% Appendices are just more chapters, with different labeling (letters instead of numbers).
    289280%======================================================================
    290281\chapter[PDF Plots From Matlab]{Matlab Code for Making a PDF Plot}
     
    324315%\input{thesis.ind}                             % index
    325316
    326 \phantomsection
    327 
    328 \end{document}
     317\phantomsection         % allows hyperref to link to the correct page
     318
     319%----------------------------------------------------------------------
     320\end{document} % end of logical document
Note: See TracChangeset for help on using the changeset viewer.