Ignore:
Timestamp:
Apr 10, 2022, 2:53:18 PM (4 years ago)
Author:
JiadaL <j82liang@…>
Branches:
ADT, ast-experimental, enum, master, pthread-emulation, qualifiedEnum
Children:
d8e2a09
Parents:
4559b34 (diff), 6256891 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Resolve conflict

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/thierry_delisle_PhD/thesis/text/io.tex

    r4559b34 r92538ab  
    11\chapter{User Level \io}
    22As mentioned in Section~\ref{prev:io}, User-Level \io requires multiplexing the \io operations of many \glspl{thrd} onto fewer \glspl{proc} using asynchronous \io operations.
    3 Different operating systems offer various forms of asynchronous operations and as mentioned in Chapter~\ref{intro}, this work is exclusively focused on the Linux operating-system.
     3Different operating systems offer various forms of asynchronous operations and, as mentioned in Chapter~\ref{intro}, this work is exclusively focused on the Linux operating-system.
    44
    55\section{Kernel Interface}
     
    173173The consequence is that the amount of parallelism used to prepare submissions for the next system call is limited.
    174174Beyond this limit, the length of the system call is the throughput limiting factor.
    175 I concluded from early experiments that preparing submissions seems to take about as long as the system call itself, which means that with a single @io_uring@ instance, there is no benefit in terms of \io throughput to having more than two \glspl{hthrd}.
     175I concluded from early experiments that preparing submissions seems to take at most as long as the system call itself, which means that with a single @io_uring@ instance, there is no benefit in terms of \io throughput to having more than two \glspl{hthrd}.
    176176Therefore the design of the submission engine must manage multiple instances of @io_uring@ running in parallel, effectively sharding @io_uring@ instances.
    177177Similarly to scheduling, this sharding can be done privately, \ie, one instance per \glspl{proc}, in decoupled pools, \ie, a pool of \glspl{proc} use a pool of @io_uring@ instances without one-to-one coupling between any given instance and any given \gls{proc}, or some mix of the two.
    178178Since completions are sent to the instance where requests were submitted, all instances with pending operations must be polled continously
    179179\footnote{As will be described in Chapter~\ref{practice}, this does not translate into constant cpu usage.}.
     180Note that once an operation completes, there is nothing that ties it to the @io_uring@ instance that handled it.
     181There is nothing preventing a new operation with, for example, the same file descriptors to a different @io_uring@ instance.
    180182
    181183A complicating aspect of submission is @io_uring@'s support for chains of operations, where the completion of an operation triggers the submission of the next operation on the link.
     
    198200The only added complexity is that the number of SQEs is fixed, which means allocation can fail.
    199201
    200 Allocation failures need to be pushed up to the routing algorithm: \glspl{thrd} attempting \io operations must not be directed to @io_uring@ instances without sufficient SQEs available.
     202Allocation failures need to be pushed up to a routing algorithm: \glspl{thrd} attempting \io operations must not be directed to @io_uring@ instances without sufficient SQEs available.
    201203Furthermore, the routing algorithm should block operations up-front if none of the instances have available SQEs.
    202204
     
    212214
    213215In the case of designating a \gls{thrd}, ideally, when multiple \glspl{thrd} attempt to submit operations to the same @io_uring@ instance, all requests would be batched together and one of the \glspl{thrd} would do the system call on behalf of the others, referred to as the \newterm{submitter}.
    214 In practice however, it is important that the \io requests are not left pending indefinitely and as such, it may be required to have a current submitter and a next submitter.
     216In practice however, it is important that the \io requests are not left pending indefinitely and as such, it may be required to have a ``next submitter'' that guarentees everything that is missed by the current submitter is seen by the next one.
    215217Indeed, as long as there is a ``next'' submitter, \glspl{thrd} submitting new \io requests can move on, knowing that some future system call will include their request.
    216218Once the system call is done, the submitter must also free SQEs so that the allocator can reused them.
     
    221223If the submission side does not designate submitters, polling can also submit all SQEs as it is polling events.
    222224A simple approach to polling is to allocate a \gls{thrd} per @io_uring@ instance and simply let the poller \glspl{thrd} poll their respective instances when scheduled.
    223 This design is especially convenient for reasons explained in Chapter~\ref{practice}.
    224225
    225226With this pool of instances approach, the big advantage is that it is fairly flexible.
    226227It does not impose restrictions on what \glspl{thrd} submitting \io operations can and cannot do between allocations and submissions.
    227 It also can gracefully handles running out of ressources, SQEs or the kernel returning @EBUSY@.
     228It also can gracefully handle running out of ressources, SQEs or the kernel returning @EBUSY@.
    228229The down side to this is that many of the steps used for submitting need complex synchronization to work properly.
    229230The routing and allocation algorithm needs to keep track of which ring instances have available SQEs, block incoming requests if no instance is available, prevent barging if \glspl{thrd} are already queued up waiting for SQEs and handle SQEs being freed.
    230231The submission side needs to safely append SQEs to the ring buffer, correctly handle chains, make sure no SQE is dropped or left pending forever, notify the allocation side when SQEs can be reused and handle the kernel returning @EBUSY@.
    231 All this synchronization may have a significant cost and, compare to the next approach presented, this synchronization is entirely overhead.
     232All this synchronization may have a significant cost and, compared to the next approach presented, this synchronization is entirely overhead.
    232233
    233234\subsubsection{Private Instances}
    234235Another approach is to simply create one ring instance per \gls{proc}.
    235 This alleviate the need for synchronization on the submissions, requiring only that \glspl{thrd} are not interrupted in between two submission steps.
     236This alleviates the need for synchronization on the submissions, requiring only that \glspl{thrd} are not interrupted in between two submission steps.
    236237This is effectively the same requirement as using @thread_local@ variables.
    237238Since SQEs that are allocated must be submitted to the same ring, on the same \gls{proc}, this effectively forces the application to submit SQEs in allocation order
     
    240241To remove this requirement, a \gls{thrd} would need the ability to ``yield to a specific \gls{proc}'', \ie, park with the promise that it will be run next on a specific \gls{proc}, the \gls{proc} attached to the correct ring.}
    241242, greatly simplifying both allocation and submission.
    242 In this design, allocation and submission form a ring partitionned ring buffer as shown in Figure~\ref{fig:pring}.
     243In this design, allocation and submission form a partitionned ring buffer as shown in Figure~\ref{fig:pring}.
    243244Once added to the ring buffer, the attached \gls{proc} has a significant amount of flexibility with regards to when to do the system call.
    244 Possible options are: when the \gls{proc} runs out of \glspl{thrd} to run, after running a given number of threads \glspl{thrd}, etc.
     245Possible options are: when the \gls{proc} runs out of \glspl{thrd} to run, after running a given number of \glspl{thrd}, etc.
    245246
    246247\begin{figure}
     
    329330\paragraph{Pending Allocations} can be more complicated to handle.
    330331If the arbiter has available instances, the arbiter can attempt to directly hand over the instance and satisfy the request.
    331 Otherwise
     332Otherwise it must hold onto the list of threads until SQEs are made available again.
     333This handling becomes that much more complex if pending allocation require more than one SQE, since the arbiter must make a decision between statisfying requests in FIFO ordering or satisfy requests for fewer SQEs first.
     334
     335While this arbiter has the potential to solve many of the problems mentionned in above, it also introduces a significant amount of complexity.
     336Tracking which processors are borrowing which instances and which instances have SQEs available ends-up adding a significant synchronization prelude to any I/O operation.
     337Any submission must start with a handshake that pins the currently borrowed instance, if available.
     338An attempt to allocate is then made, but the arbiter can concurrently be attempting to allocate from the same instance from a different \gls{hthrd}.
     339Once the allocation is completed, the submission must still check that the instance is still burrowed before attempt to flush.
     340These extra synchronization steps end-up having a similar cost to the multiple shared instances approach.
     341Furthermore, if the number of instances does not match the number of processors actively submitting I/O, the system can fall into a state where instances are constantly being revoked and end-up cycling the processors, which leads to significant cache deterioration.
     342Because of these reasons, this approach, which sounds promising on paper, does not improve on the private instance approach in practice.
     343
     344\subsubsection{Private Instances V2}
     345
    332346
    333347
Note: See TracChangeset for help on using the changeset viewer.