Context Navigation

-              rf4eb705
+              r52f6250
 Once an @sqe@ is allocated, \glspl{thrd} can fill them normally, they simply need to keep trac of the @sqe@ index and which instance it belongs to.
 Once an @sqe@ is filled in, what needs to happen is that the @sqe@ must be added to the submission ring buffer, an operation that is not thread-safe on itself, and the kernel must be notified using the @io_uring_enter@ system call. The submission ring buffer is the same size as the pre-allocated @sqe@ buffer, therefore pushing to the ring buffer cannot fail\footnote{This is because it is invalid to have the same \lstinline|sqe| multiple times in the ring buffer.}. However, as mentioned, the system call itself can fail with the expectation that it will be retried once some of the already submitted operations complete. Ideally, when multiple \glspl{thrd} attempt to submit operations to the same @io_uring@ instance, all requests would be batched together and one of the \glspl{thrd} would do the system call on behalf of the others, referred to as the \newterm{submitter}. In practice however, it is important that the \io requests are not left pending indefinately and as such, it may be required to have a current submitter and a next submitter. Indeed, as long as there is a ``next'' submitter, \glspl{thrd} submitting new \io requests can move on, knowing that some future system call will include their request. Once the system call is done, the submitter must also free @sqe@s so that the allocator can reused them.
+Once an @sqe@ is filled in, what needs to happen is that the @sqe@ must be added to the submission ring buffer, an operation that is not thread-safe on itself, and the kernel must be notified using the @io_uring_enter@ system call. The submission ring buffer is the same size as the pre-allocated @sqe@ buffer, therefore pushing to the ring buffer cannot fail\footnote{This is because it is invalid to have the same \lstinline|sqe| multiple times in the ring buffer.}. However, as mentioned, the system call itself can fail with the expectation that it will be retried once some of the already submitted operations complete. Since multiple @sqe@s can be submitted to the kernel at once, it is important to strike a balance between batching and latency. Operations that are ready to be submitted should be batched together in few system calls, but at the same time, operations should not be left pending for long period of times before being submitted. This can be handled by either designating one of the submitting \glspl{thrd} as the being responsible for the system call for the current batch of @sqe@s or by having some other party regularly submitting all ready @sqe@s, \eg, the poller \gls{thrd} mentionned later in this section.
 Finally, the completion side is much simpler since the @io_uring@ system call enforces a natural synchronization point. Polling simply needs to regularly do the system call and go through the produced @cqe@s and communicate the result back to the original \gls{thrd}. Since @cqe@s only own a signed 32 bit result, in addition to the copy of the @user_data@ field, all that is needed to communicate the result is a simple future~\cite{wiki:future}. As mentioned in the last paragraph, the polling is also responsible for unlocking submitters that are blocked because the system failed due to being busy. A simple approach to polling is to allocate a \gls{thrd} per @io_uring@ instance and simply let the poller \glspl{thrd} poll their respective instances when scheduled. This design is especially convinient for reasons explained in Chapter~\ref{practice}.
+In the case of designating a \gls{thrd}, ideally, when multiple \glspl{thrd} attempt to submit operations to the same @io_uring@ instance, all requests would be batched together and one of the \glspl{thrd} would do the system call on behalf of the others, referred to as the \newterm{submitter}. In practice however, it is important that the \io requests are not left pending indefinately and as such, it may be required to have a current submitter and a next submitter. Indeed, as long as there is a ``next'' submitter, \glspl{thrd} submitting new \io requests can move on, knowing that some future system call will include their request. Once the system call is done, the submitter must also free @sqe@s so that the allocator can reused them.
+Finally, the completion side is much simpler since the @io_uring@ system call enforces a natural synchronization point. Polling simply needs to regularly do the system call, go through the produced @cqe@s and communicate the result back to the originating \glspl{thrd}. Since @cqe@s only own a signed 32 bit result, in addition to the copy of the @user_data@ field, all that is needed to communicate the result is a simple future~\cite{wiki:future}. If the submission side does not designate submitters, polling can also submit all @sqe@s as it is polling events.  A simple approach to polling is to allocate a \gls{thrd} per @io_uring@ instance and simply let the poller \glspl{thrd} poll their respective instances when scheduled. This design is especially convinient for reasons explained in Chapter~\ref{practice}.
+\subsubsection*{Private Instances}
+Another approach is to simply create one ring instance per \gls{proc}. This alleviate the need for synchronization on the submissions, requiring only that \glspl{thrd} are not interrupted in between two submission steps. This is effectively the same requirement as using @thread_local@ variables. Since @sqe@s that are allocated must be submitted to the same ring, on the same \gls{proc}, this effectively forces the application to submit @sqe@s in allocation order\footnote{The actual requirement is that \glspl{thrd} cannot context switch between allocation and submission. This requirement means that from the subsystem's point of view, the allocation and submission are sequential. To remove this requirement, a \gls{thrd} would need the ability to ``yield to a specific \gls{proc}'', \ie, park with the promise that it will be run next on a specific \gls{proc}, the \gls{proc} attached to the correct ring. This is not a current or planned feature of \CFA.}, greatly simplifying both allocation and submission. In this design, allocation and submission form a ring partitionned ring buffer as shown in Figure~\ref{fig:pring}.
+With this pool of instances approach, the big advantage is that it is fairly flexible. It does not impose restrictions on what \glspl{thrd} submitting \io operations can and cannot do between allocations and submissions. It also can gracefully handle running out of ressources, @sqe@s or the kernel returning @EBUSY@. The down side to this is that many of the steps used for submitting need complex synchronization to work properly. The routing and allocation algorithm needs to keep track of which ring instances have available @sqe@s, block incoming requests if no instance is available, prevent barging if \glspl{thrd} are already queued up waiting for @sqe@s and handle @sqe@s being freed. The submission side needs to safely append @sqe@s to the ring buffer, make sure no @sqe@ is dropped or left pending forever, notify the allocation side when @sqe@s can be reused and handle the kernel returning @EBUSY@. Sharding the @io_uring@ instances should alleviate much of the contention caused by this, but all this synchronization may still have non-zero cost.
+\subsubsection{Private Instances}
+Another approach is to simply create one ring instance per \gls{proc}. This alleviate the need for synchronization on the submissions, requiring only that \glspl{thrd} are not interrupted in between two submission steps. This is effectively the same requirement as using @thread_local@ variables. Since @sqe@s that are allocated must be submitted to the same ring, on the same \gls{proc}, this effectively forces the application to submit @sqe@s in allocation order\footnote{The actual requirement is that \glspl{thrd} cannot context switch between allocation and submission. This requirement means that from the subsystem's point of view, the allocation and submission are sequential. To remove this requirement, a \gls{thrd} would need the ability to ``yield to a specific \gls{proc}'', \ie, park with the promise that it will be run next on a specific \gls{proc}, the \gls{proc} attached to the correct ring. This is not a current or planned feature of \CFA.}, greatly simplifying both allocation and submission. In this design, allocation and submission form a ring partitionned ring buffer as shown in Figure~\ref{fig:pring}. Once added to the ring buffer, the attached \gls{proc} has a significant amount of flexibility with regards to when to do the system call. Possible options are: when the \gls{proc} runs out of \glspl{thrd} to run, after running a given number of threads \glspl{thrd}, etc.
 \begin{figure}
 …
 \end{figure}
+\subsection{Multiplexing \io: Completion}
+This approach has the advantage that it does not require much of the synchronization needed in the shared approach. This comes at the cost that \glspl{thrd} submitting \io operations have less flexibility, they cannot park or yield, and several exceptional cases are handled poorly. Instances running out of @sqe@s cannot run \glspl{thrd} wanting to do \io operations, in such a case the \gls{thrd} needs to be moved to a different \gls{proc}, the only current way of achieving this would be to @yield()@ hoping to be scheduled on a different \gls{proc}, which is not guaranteed. Another problematic case is that \glspl{thrd} that do not park for long periods of time will delay the submission of any @sqe@ not already submitted. This issue is similar to fairness issues which schedulers that use work-stealing mentioned in the previous chapter.
 \section{Interface}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 52f6250

Legend:

doc/theses/thierry_delisle_PhD/thesis/text/io.tex

Download in other formats: