Changes in / [e6cfa8ff:5b544a6]
- Files:
-
- 11 added
- 14 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/thierry_delisle_PhD/.gitignore
re6cfa8ff r5b544a6 8 8 9 9 comp_II/build/ 10 comp_II/img/*.fig.bak 10 11 comp_II/comp_II.pdf 11 12 comp_II/comp_II.ps -
doc/theses/thierry_delisle_PhD/comp_II/Makefile
re6cfa8ff r5b544a6 18 18 19 19 FIGURES = ${addsuffix .tex, \ 20 base \ 21 empty \ 22 emptybit \ 23 emptytree \ 24 resize \ 20 25 } 21 26 … … 70 75 mkdir -p ${Build} 71 76 72 %.tex : %.fig ${Build}77 %.tex : img/%.fig ${Build} 73 78 fig2dev -L eepic $< > ${Build}/$@ 74 79 75 %.ps : %.fig | ${Build}80 %.ps : img/%.fig | ${Build} 76 81 fig2dev -L ps $< > ${Build}/$@ 77 82 78 %.pstex : %.fig | ${Build}83 %.pstex : img/%.fig | ${Build} 79 84 fig2dev -L pstex $< > ${Build}/$@ 80 85 fig2dev -L pstex_t -p ${Build}/$@ $< > ${Build}/$@_t -
doc/theses/thierry_delisle_PhD/comp_II/comp_II.tex
re6cfa8ff r5b544a6 1 \documentclass[11pt,fullpage]{article} 1 \documentclass[11pt]{article} 2 \usepackage{fullpage} 2 3 \usepackage[T1]{fontenc} 3 4 \usepackage[utf8]{inputenc} … … 6 7 \usepackage{xcolor} 7 8 \usepackage{graphicx} 8 \usepackage [hidelinks]{hyperref}9 \usepackage{epic,eepic} 9 10 \usepackage{glossaries} 10 11 \usepackage{textcomp} 11 \usepackage{geometry} 12 \usepackage[hidelinks]{hyperref} 13 %\usepackage[margin=1in]{geometry} 14 %\usepackage{float} 12 15 13 16 % cfa macros used in the document … … 51 54 \section{Introduction} 52 55 \subsection{\CFA and the \CFA concurrency package} 53 \CFA\cit is a modern, polymorphic, non-object-oriented, backwards-compatible extension of the C programming language. It aims to add high productivity features while maintaning the predictible performance of C. As such concurrency in \CFA\cit aims to offer simple and safe high-level tools while still allowing performant code. Concurrent code is written in the syncrhonous programming paradigm but uses \glspl{uthrd} in order to achieve the simplicity and maintainability of synchronous programming without sacrificing the efficiency of asynchronous programing. As such the \CFA scheduler is a user-level scheduler that maps \glspl{uthrd} onto \glspl{kthrd}. 54 55 The goal of this research is to produce a scheduler that is simple to use and offers acceptable performance in all cases. Here simplicity does not refer to the API but to how much scheduling concerns programmers need to take into account when using the \CFA concurrency package. Therefore, the main goal of this proposal is as follows : 56 \CFA\cit is a modern, polymorphic, non-object-oriented, backwards-compatible extension of the C programming language. It aims to add high-productivity features while maintaning the predictible performance of C. As such, concurrency in \CFA\cit aims to offer simple and safe high-level tools while still allowing performant code. \CFA concurrrent code is written in the synchronous programming paradigm but uses \glspl{uthrd} in order to achieve the simplicity and maintainability of synchronous programming without sacrificing the efficiency of asynchronous programing. As such, the \CFA \emph{scheduler} is a preemptive user-level scheduler that maps \glspl{uthrd} onto \glspl{kthrd}. 57 58 Scheduling occurs when execution switches from one thread to another, where the second thread is implicitly chosen by the scheduler. This scheduling is an indirect handoff, as opposed to generators and coroutines which explicitly switch to the next generator and coroutine respectively. The cost of switching between two threads for an indirect handoff has two components : the cost of actually context-switching, i.e., changing the relevant registers to move execution from one thread to the other, and the cost of scheduling, i.e., deciding which thread to run next among all the threads ready to run. The first cost is generally constant and fixed, while the scheduling cost can vary based on the system state\footnote{Affecting the context-switch cost is whether it is done in one step, after the scheduling, or in two steps, context-switching to a fixed third-thread before scheduling.}. Adding multiple \glspl{kthrd} does not fundamentally change the scheduler semantics or requirements, it simply adds new correctness requirements, i.e. \textit{linearizability}, and a new dimension to performance: scalability, where scheduling cost now also depends on contention. 59 60 The more threads switch, the more the administrating cost of scheduling becomes noticeable. It is therefore important to build a scheduler with the lowest possible cost and latency. Another important consideration is \emph{fairness}. In principle, scheduling should give the illusion of perfect fairness, where all threads ready to run are running \emph{simultaneously}. While the illusion of simultaneity is easier to reason about, it can break down if the scheduler allows to much unfairness. Therefore, the scheduler should offer as much fairness as needed to guarantee eventual progress, but use unfairness to help performance. In practice, threads must wait in turn but there can be advantages to unfair scheduling, e.g., the express cash register at a grocery store. 61 62 The goal of this research is to produce a scheduler that is simple for programmers to understand and offers good performance. Here understandability does not refer to the API but to how much scheduling concerns programmers need to take into account when writing a \CFA concurrent package. Therefore, the main goal of this proposal is : 56 63 \begin{quote} 57 The \CFA scheduler should be \emph{viable} for anyworkload.64 The \CFA scheduler should be \emph{viable} for \emph{any} workload. 58 65 \end{quote} 59 66 60 This objective includes producing a scheduling strategy with minimal fairness guarantees, creating an abstraction layer over the operating system to handle kernel-threads spinning unnecessarily and hide blocking I/O operations and, writing sufficient library tools to allow developpers to properly use the scheduler. 67 For a general purpose scheduler, it is impossible to produce an optimal algorithm as it would require knowledge of the future behaviour of threads. As such, scheduling performance is generally either defined by the best case scenario, a workload to which the scheduler is tailored, or the worst case scenario, i.e., the scheduler behaves no worst than \emph{X}. For this proposal, the performance is evaluated using the second approach to allow \CFA programmers to rely on scheduling performance. A solution to this impossibility is to allow programmers to write their own scheduler, that is not the subject of this proposal, which considers only the default scheduler. As such, it is important that only programmers with exceptionally high performance requirements should need to write their own scheduler and replace the scheduler in this proposal. 68 69 This objective includes producing a scheduling strategy with sufficient fairness guarantees, creating an abstraction layer over the operating system to handle kernel-threads spinning unnecessarily and hide blocking I/O operations, and writing sufficient library tools to allow developers to indirectly use the scheduler. 61 70 62 71 % =============================================================================== … … 64 73 65 74 \section{Scheduling for \CFA} 66 While the \CFA concurrency package doesn't have any particular scheduling needs beyond those of any concurrency package which uses \glspl{uthrd}, it is important that the default \CFA Scheduler be viable in general. Indeed, since the \CFA Scheduler does not target any specific workloads, it is unrealistic to demand that it use the best scheduling strategy in all cases. However, it should offer a viable ``out of the box'' solution for most scheduling problems so that programmers can quickly write performant concurrent without needed to think about which scheduling strategy is more appropriate for their workload. Indeed, only programmers with exceptionnaly high performance requirements should need to write their own scheduler. More specifically, two broad types of schedulering strategies should be avoided in order to avoid penalizing certain types of workloads : feedback-based and priority schedulers. 75 While the \CFA concurrency package does not have any particular scheduling requirements beyond supporting \glspl{uthrd}. Therefore, the detailed requirements of the \CFA scheduler are : 76 77 \paragraph{Correctness} As with any other concurrent data structure or algorithm, the correctness requirement is paramount. The scheduler cannot allow threads to be dropped from the ready-queue, i.e., scheduled but never run, or be executed multiple times when only being scheduled once. Since \CFA concurrency has no spurious wakeup, this definition of correctness also means the scheduler should have no spurious wakeup. The \CFA scheduler must be correct. 78 79 \paragraph{Performance} The performance of a scheduler can generally be mesured in terms of scheduling cost, scalability and latency. Scheduling cost is the cost to switch from one thread to another, as mentioned above. For simple applications where a single kernel thread does most of the scheduling, it is generally the dominating cost. When adding many kernel threads, scalability can become an issue, effectively increasing the cost of context-switching when contention is high. Finally, a third axis of performance is tail latency. This measurement is related to fairness and mesures how long is needed for a thread to be run once scheduled but evaluated in the worst cases. The \CFA scheduler should offer good performance in all three metrics. 80 81 \paragraph{Fairness} Like performance, this requirements has several aspect : eventual progress, predictability and performance reliablility. As a hard requirement, the \CFA scheduler must guarantee eventual progress, i.e., prevent starvation, otherwise the above mentioned illusion of simultaneous execution is broken and the scheduler becomes much more complext to reason about. Beyond this requirement, performance should be predictible and reliable, which means similar workloads achieve similar performance and programmer intuition is respected. An example of this is : a thread that yields agressively should not run more often then other tasks. While this is intuitive, it does not hold true for many work-stealing or feedback based schedulers. The \CFA scheduler must guarantee eventual progress and should be predictible and offer reliable performance. 82 83 \paragraph{Efficiency} Finally, efficient usage of CPU resources is also an important requirement. This issue is discussed more in depth towards the end of this proposal. It effectively refers to avoiding using CPU power when there are no threads to run, and conversely, use all CPUs available when the workload can benefit from it. Balancing these two states is where the complexity lies. The \CFA scheduler should be efficient with respect to the underlying (shared) computer. 84 85 To achieve these requirements, I can reject two broad types of scheduling strategies : feedback-based and priority schedulers. 67 86 68 87 \subsection{Feedback-Based Schedulers} 69 Many operating systems use schedulers based on fe adback loops in some form, they measure how much CPU a particular thread has used\footnote{Different metrics can be used tohere but it is not relevant to the discussion.} and schedule threads based on this metric. These strategies are sensible for operating systems but rely on two assumptions on the workload :88 Many operating systems use schedulers based on feedback in some form, e.g., measuring how much CPU a particular thread has used\footnote{Different metrics can measured here but it is not relevant to the discussion.} and schedule threads based on this metric. These strategies are sensible for operating systems but rely on two assumptions on the workload : 70 89 71 90 \begin{enumerate} 72 \item Threads live long enough to be scheduled many times.73 \item Cooperation among all threads is not simply infeasible, it is a security risk.91 \item Threads live long enough for useful feedback information to be to gathered. 92 \item Threads belong to multiple users so fairness across threads is insufficient. 74 93 \end{enumerate} 75 94 76 While these two assumptions generally hold for operating systems, they may not for \CFA programs. In fact, \CFA uses \glspl{uthrd} which have the explicit goal of reducing the cost of threading primitives to allow many smaller threads. This can naturally lead to have threads with much shorter lifetime and only being scheduled a few times. Scheduling strategies based on feadback loops cannot be effective in these cases because they will not have the opportunity to measure the metrics that underlay the algorithm. Note that the problem of feadback loop convergence (reacting too slowly to scheduling events) is not specific to short lived threads but can also occur with threads that show drastic changes in scheduling event, e.g., threads running for long periods of time and then suddenly blocking and unblocking quickly and repeatedly.77 78 In the context of operating systems, these concerns can be overshadowed by a more pressing concern : security. When multiple users are involved, it is possible that some users are malevolent and try to exploit the scheduling strategy in order to achieve some nefarious objective. Security concerns mean that more precise and robust fairness metrics must be used . In the case of the \CFA scheduler, every thread runs in the same user-space and are controlled from the same user. It is then possible to safely ignore the possibility that threads are malevolent and assume that all threads will ignore or cooperate with each other. This allows for a much simpler fairness metric and in this proposal ``fairness'' will be considered as equal opportunities to run once scheduled.79 80 Since fe adback is not necessarily feasible within the lifetime of all threads and a simple fairness metric can be used, the scheduling strategy proposed for the \CFA runtime does not user per-threads feedback. Feedback loops in general are not rejected for secondary concerns like idle sleep, but no feedback loopis used to decide which thread to run next.95 While these two assumptions generally hold for operating systems, they may not for user-level threading. Since \CFA has the explicit goal of allowing many smaller threads, this can naturally lead to threads with much shorter lifetime, only being scheduled a few times. Scheduling strategies based on feedback cannot be effective in these cases because they do not have the opportunity to measure the metrics that underlie the algorithm. Note that the problem of feedback convergence (reacting too slowly to scheduling events) is not specific to short lived threads but can also occur with threads that show drastic changes in scheduling, e.g., threads running for long periods of time and then suddenly blocking and unblocking quickly and repeatedly. 96 97 In the context of operating systems, these concerns can be overshadowed by a more pressing concern : security. When multiple users are involved, it is possible that some users are malevolent and try to exploit the scheduling strategy in order to achieve some nefarious objective. Security concerns mean that more precise and robust fairness metrics must be used to guarantee fairness across processes created by users as well as threads created within a process. In the case of the \CFA scheduler, every thread runs in the same user-space and are controlled by the same user. Fairness across users is therefore a given and it is then possible to safely ignore the possibility that threads are malevolent. This approach allows for a much simpler fairness metric and in this proposal ``fairness'' is considered as follows : when multiple threads are cycling through the system, the total ordering of threads being scheduled, i.e., pushed onto the ready-queue, should not differ much from the total ordering of threads being executed, i.e., popped from the ready-queue. 98 99 Since feedback is not necessarily feasible within the lifetime of all threads and a simple fairness metric can be used, the scheduling strategy proposed for the \CFA runtime does not use per-threads feedback. Feedback in general is not rejected for secondary concerns like idle sleep, but no feedback is used to decide which thread to run next. 81 100 82 101 \subsection{Priority Schedulers} 83 Another broad category of schedulers are priority schedulers. In these scheduling strategies threads have priorities and the runtime schedules the threads with the highest priority before scheduling other threads. Threads with equal priority are scheduled using a secondary strategy, often something simple like round-robin or FIFO. These priority mean that, as long as there is a thread with a higher priority that desires to run, a thread with a lower priority will not run. This possible starving of threads can dramatically increase programming complexity since starving threads and priority inversion (prioritising a lower priority thread) can both lead to serious problems, leaving programmers between a rock and a hard place. 84 85 An important observation to make is that threads do not need to have explicit priorities for problems to be possible. Indeed, any system with multiple ready-queues and attempts to exhaust one queue before accessing the other queues, could encounter starvation problems. A popular scheduling strategy that suffers from implicit priorities is work-stealing. Work-stealing is generally presented as follows : 86 87 \begin{itemize} 88 \item Each processor has a list of threads. 89 \end{itemize} 102 Another broad category of schedulers are priority schedulers. In these scheduling strategies, threads have priorities and the runtime schedules the threads with the highest priority before scheduling other threads. Threads with equal priority are scheduled using a secondary strategy, often something simple like round-robin or FIFO. These priority mean that, as long as there is a thread with a higher priority that desires to run, a thread with a lower priority does not run. This possible starving of threads can dramatically increase programming complexity since starving threads and priority inversion (prioritizing a lower priority thread) can both lead to serious problems. 103 104 An important observation to make is that threads do not need to have explicit priorities for problems to occur. Indeed, any system with multiple ready-queues and attempts to exhaust one queue before accessing the other queues, can encounter starvation problems. A popular scheduling strategy that suffers from implicit priorities is work-stealing. Work-stealing is generally presented as follows, each processor has a list of ready threads. 90 105 \begin{enumerate} 91 106 \item Run threads from ``this'' processor's list. … … 93 108 \end{enumerate} 94 109 95 In a loaded system\footnote{A loaded system is a system where threads are being run at the same rate they are scheduled }, if a thread does not yield or block for an extended period of time, threads on the same processor list will starve if no other processors canexhaust their list.110 In a loaded system\footnote{A loaded system is a system where threads are being run at the same rate they are scheduled.}, if a thread does not yield, block or preempt for an extended period of time, threads on the same processor list starve if no other processors exhaust their list. 96 111 97 112 Since priorities can be complex to handle for programmers, the scheduling strategy proposed for the \CFA runtime does not use a strategy with either implicit or explicit thread priorities. 98 113 99 \subsection{Schedulers without feadback or priorities} 100 I claim that the ideal default scheduler for the \CFA runtime is a scheduler that offers good scalability and a simple fairness guarantee that is easy for programmers to reason about. The simplest fairness guarantee is to guarantee FIFO ordering, i.e., threads scheduled first will run first. However, enforcing FIFO ordering generally conflicts with scalability across multiple processors because of the additionnal synchronization. Thankfully, strict FIFO is not needed for scheduling. Since concurrency is inherently non-deterministic, fairness concerns in scheduling are only a problem if a thread repeatedly runs before another thread can run\footnote{This is because the non-determinism means that programmers must already handle ordering problems in order to produce correct code and already must rely on weak guarantees, for example that a specific thread will \emph{eventually} run.}. This need for unfairness to persist before problems occur means that the FIFO fairness guarantee can be significantly relaxed without causing problems. For this proposal, the target guarantee is that the \CFA scheduler guarantees \emph{probable} FIFO ordering, which is defined as follows : 114 \subsection{Schedulers without feedback or priorities} 115 This proposal conjectures that is is possible to construct a default scheduler for the \CFA runtime that offers good scalability and a simple fairness guarantee that is easy for programmers to reason about. The simplest fairness guarantee is FIFO ordering, i.e., threads scheduled first run first. However, enforcing FIFO ordering generally conflicts with scalability across multiple processors because of the additionnal synchronization. Thankfully, strict FIFO is not needed for sufficient fairness. Since concurrency is inherently non-deterministic, fairness concerns in scheduling are only a problem if a thread repeatedly runs before another thread can run. This is because the non-determinism means that programmers must already handle ordering problems in order to produce correct code and already must rely on weak guarantees, for example that a specific thread will \emph{eventually} run. Since some reordering does not break correctness, the FIFO fairness guarantee can be significantly relaxed without causing problems. For this proposal, the target guarantee is that the \CFA scheduler provides \emph{probable} FIFO ordering, which allows reordering but makes it improbable that threads are reordered far from their position in total ordering. 116 117 Scheduling is defined as follows : 101 118 \begin{itemize} 102 \item Given two threads $X$ and $Y$, the odds that thread $X$ runs $N$ times \emph{after} thread $Y$ is scheduled but \emph{before} it is run, decreases exponentially with regard sto $N$.119 \item Given two threads $X$ and $Y$, the odds that thread $X$ runs $N$ times \emph{after} thread $Y$ is scheduled but \emph{before} it is run, decreases exponentially with regard to $N$. 103 120 \end{itemize} 104 121 105 While this is not a strong guarantee, the probability that problems persist for long period of times decreases exponentially, making persisting problems virtually impossible. 106 107 \subsection{Real-Time} 108 While the objective of this proposed scheduler is similar to the objective of real-time scheduling, this proposal is not a proposal for real-time scheduler and as such makes no attempt to offer either soft or hard guarantees on scheduling delays. 122 While this is not a bounded guarantee, the probability that unfairness persist for long periods of times decreases exponentially, making persisting unfairness virtually impossible. 109 123 110 124 % =============================================================================== … … 112 126 \section{Proposal} 113 127 114 \subsection{Ready-Queue} 115 Using trevor's paper\cit as basis, it is simple to build a relaxed FIFO list that is fast and scalable for loaded or overloaded systems. The described queue uses an array of underlying strictly FIFO queue. Pushing new data is done by selecting one of these underlying queues at random, recording a timestamp for the push and pushing to the selected queue. Popping is done by selecting two queues at random and popping from the queue for which the head has the oldest timestamp. In loaded or overloaded systems, it is higly likely that the queues is far from empty, e.i., several tasks are on each of the underlying queues. This means that selecting a queue at random to pop from is higly likely to yield a queue that is not empty. 116 117 When the ready queue is "more empty", i.e., several of the inner queues are empty, selecting a random queue for popping is less likely to yield a valid selection and more attempts need to be made, resulting in a performance degradation. In cases, with few elements on the ready queue and few processors running, performance can be improved by adding information to help processors find which inner queues are used. Preliminary performance tests indicate that with few processors, a bitmask can be used to identify which inner queues are currently in use. This is especially effective in the single-thread case, where the bitmask will always be up-to-date. Furthermore, modern x86 CPUs have a BMI2 extension which allow using the bitmask with very little overhead over directly accessing the readyqueue offerring decent performance even in cases with many empty inner queues. This technique does not solve the problem completely, it randomly attempts to find a block of 64 queues where at least one is used, instead of attempting to find a used queue. For systems with a large number of cores this does not completely solve the problem, but it is a fixed improvement. The size of the blocks are limited by the maximum size atomic instruction can operate on, therefore atomic instructions on large words would increase the 64 queues per block limit. 118 119 \TODO double check the next sentence 120 Preliminary result indicate that the bitmask approach with the BMI2 extension can lead to multi-threaded performance that is contention agnostic in the worst case. 121 This result suggests that the contention penalty and the increase performance for additionnal thread cancel each other exactly. This may indicate that a relatively small reduction in contention may tip the performance into positive scalling even for the worst case. It can be noted that in cases of high-contention, the use of the bitmask to find queues that are not empty is much less reliable. Indeed, if contention on the bitmask is high, it means it probably changes significantly between the moment it is read and the actual operation on the queues it represents. Furthermore, the objective of the bitmask is to avoid probing queues that are empty. Therefore, in cases where the bitmask is highly contented, it may be preferrable to probe queues randomly, either until contention decreases or until a prior prefetch of the bitmask completes. Ideally, the scheduler would be able to observe that the bitmask is highly contented and adjust its behaviour appropriately. However, I am not aware of any mechanism to query whether a cacheline is in cache or to run other instructions until a cacheline is fetch without blocking on the cacheline. As such, an alternative that may have a similar impact would be for each thread to have their own bitmask, which would be updated both after each scheduler action and after a certain number of failed probing. If the bitmask has little contention, the local bitmask will be mostly up-to-date and several threads won't need to contend as much on the global bitmask. If the bitmask has significant contention, then fetching it becomes more expensive and threads may as well probe randomly. This solution claims that probing randomly or against an out-of-date bitmask is equivalent. 122 123 In cases where this is insufficient, another approach is to use a hiearchical data structure. Creating a tree of nodes to reduce contention has been shown to work in similar cases\cit(SNZI: Scalable NonZero Indicators)\footnote{This particular paper seems to be patented in the US. How does that affect \CFA? Can I use it in my work?}. However, this approach may lead to poorer single-threaded performance due to the inherent pointer chasing, as such, it was not considered as the first approach but as a fallback in case the bitmask approach does not satisfy the performance goals. 124 125 Part of this performance relies on contention being low when there are few threads on the readyqueue. However, this can be assumed reliably if the system handles putting idle processors to sleep, which is addressed in section \ref{sleep}. 128 \subsection{Ready-Queue} \label{sec:queue} 129 A simple ready-queue can be built from a FIFO queue, user-threads are pushed onto the queue when they are ready to run and processors (kernel-threads acting as virtual processors) pop the user-threads from the queue and execute them. Using Trevor's paper\cit as basis, it is simple to build a relaxed FIFO list that is fast and scalable for loaded or overloaded systems. The described queue uses an array of underlying strictly FIFO queues as shown in Figure~\ref{fig:base}\footnote{For this section, the number of underlying queues is assumed to be constant, Section~\ref{sec:resize} will discuss resizing the array.}. Pushing new data is done by selecting one of these underlying queues at random, recording a timestamp for the push and pushing to the selected queue. Popping is done by selecting two queues at random and popping from the queue for which the head has the oldest timestamp. A higher number of underlying queues leads to less contention on each queue and therefore better performance. In a loaded system, it is higly likely the queues are non-empty, i.e., several tasks are on each of the underlying queues. This means that selecting a queue at random to pop from is higly likely to yield a queue with available items. In Figure~\ref{fig:base}, ignoring the ellipsis, the chances of getting an empty queue is 2/7 per pick, meaning two randoms pick will yield an item approximately 9 times out of 10. 130 131 \begin{figure} 132 \begin{center} 133 % {\resizebox{0.8\textwidth}{!}{\input{base}}} 134 \input{base} 135 \end{center} 136 \caption{Relaxed FIFO list at the base of the scheduler: an array of strictly FIFO lists. } 137 \label{fig:base} 138 \end{figure} 139 140 \begin{figure} 141 \begin{center} 142 % {\resizebox{0.8\textwidth}{!}{\input{empty}}} 143 \input{empty} 144 \end{center} 145 \caption{``More empty'' state of the queue: the array contains many empty cells.} 146 \label{fig:empty} 147 \end{figure} 148 149 When the ready queue is "more empty", i.e., several of the inner queues are empty, selecting a random queue for popping is less likely to yield a valid selection and more attempts need to be made, resulting in a performance degradation. Figure~\ref{fig:empty} shows an example with fewer elements where the chances of getting an empty queue is 5/7 per pick, meaning two randoms pick will yield an item only half the time. Since the overarching queue is not empty, the pop operation \emph{must} find an element before returning and therefore must retry. Overall performance is therefore influenced by the contention on the underlying queues and pop performance is influenced by the items density. This leads to four performance cases, as depicted in Table~\ref{tab:perfcases}. 150 151 \begin{table} 152 \begin{center} 153 \begin{tabular}{|r|l|l|} 154 \cline{2-3} 155 \multicolumn{1}{r|}{} & \multicolumn{1}{c|}{Many Processors} & \multicolumn{1}{c|}{Few Processors} \\ 156 \hline 157 Many Threads & A: good performance & B: good performance \\ 158 \hline 159 Few Threads & C: poor performance & D: poor performance \\ 160 \hline 161 \end{tabular} 162 \end{center} 163 \caption{Performance of the relaxed FIFO list in different cases. The number of processors (many or few) refers to the number of kernel-threads \emph{actively} attempting to pop user-threads from the queues, not the total number of kernel-threads. The number of threads (many of few) refers to the number of user-threads ready to be run. Many threads means they outnumber processors significantly and most underlying queues have items, few threads means there are barely more threads than processors and most underlying queues are empty. Cases with fewer threads than processors are discussed in Section~\ref{sec:sleep}.} 164 \label{tab:perfcases} 165 \end{table} 166 167 Performance can be improved in case~D (Table~\ref{tab:perfcases}) by adding information to help processors find which inner queues are used. This aims to avoid the cost of retrying the pop operation but does not affect contention on the underlying queues and can incur some management cost for both push and pop operations. 168 169 A bitmask can be used to identify which inner queues are currently in use, as shown in Figure~\ref{fig:emptybit}. This means that processors can often find user-threads in constant time, regardless of how many underlying queues are empty. Furthermore, modern x86 CPUs have an extension (BMI2) which allow using the bitmask with very little overhead compared to a filled readyqueue, offerring decent performance even in cases with many empty inner queues. However, this technique has its limits, with a single word\footnote{Word refers here to however many bits can be written atomicly.} bitmask, the total number of underlying queues in the overarching queue is limited to the number of bits in the word. With a multi-word bitmask, this maximum limit can be increased arbitrarily, but it is not possible to check if the queue is empty by reading the bitmask atomicly. A dense bitmap, i.e., either a single word bitmask or a multi word bitmask where all words are densely packed, also causes additionnal problems in case~C (Table~\ref{tab:perfcases}), which the increased contention on the bitmask both causes new performance degradation and means the accuracy of the bitmask is less reliable due to more hardware threads potentially racing to read and/or update that information. 170 171 \begin{figure} 172 \begin{center} 173 {\resizebox{0.8\textwidth}{!}{\input{emptybit}}} 174 \end{center} 175 \caption{``More empty'' queue with added bitmask to indicate which array cells have items.} 176 \label{fig:emptybit} 177 \end{figure} 178 179 Another approach is to use a hiearchical data structure, for example Figure~\ref{fig:emptytree}. Creating a tree of nodes to reduce contention has been shown to work in similar cases\cit(SNZI: Scalable NonZero Indicators)\footnote{This particular paper seems to be patented in the US. How does that affect \CFA? Can I use it in my work?}. However, this approach may lead to poorer performance in case~B (Table~\ref{tab:perfcases}) due to the inherent pointer chasing cost and already low contention cost in that case. 180 181 \begin{figure} 182 \begin{center} 183 {\resizebox{0.8\textwidth}{!}{\input{emptytree}}} 184 \end{center} 185 \caption{``More empty'' queue with added binary search tree indicate which array cells have items.} 186 \label{fig:emptytree} 187 \end{figure} 188 189 Finally, a third approach is to use dense information, similar to the bitmap, but have each thread keep its own independant copies of it. While this approach can offer good scalability \emph{and} low latency, the livelyness of the information can become a problem. In the simple cases, local copies of which underlying queues are empty can become stale and end-up not be useful when for the pop operation. A more serious problem is that reliable information is necessary for some parts of this algorithm to be correct. As mentionned in this section, processors must know \emph{reliably} whether the list is empty or not to decide if they can return \texttt{NULL} or if they must keep looking during a pop operation. Section~\ref{sec:sleep} discusses an other case where reliable information is required for the algorithm to be correct. 190 191 There is a fundamental tradeoff among these approach. Dense global information about empty underlying queues will help zero-contention cases at the cost of high-contention case. Sparse global information will help high-contention cases but increase latency in zero-contention-cases, to read and ``aggregate'' the information\footnote{Hiearchical structures, e.g., binary search tree, effectively aggregate information but following pointer chains, learning information for each node. Similarly, other sparse schemes would need to read multiple cachelines to acquire all the information needed.}. Finally, dense local information has both the advantages of low latency in zero-contention cases and scalability in high-contention cases, however the information can become stale making it difficult to use to ensure correctness. The fact that these solutions have these fundamental limits suggest that that a more solution that combines these solutions in an interesting ways. The lock discussed in Section~\ref{sec:resize} also allows for solutions that adapt to the number of processors, which couls also prove useful. 126 192 127 193 \paragraph{Objectives and Existing Work} 128 How much scalability is actually needed is highly debatable, libfibre\cit is has compared favorably to other schedulers in webserver tests\cit and uses a single atomic counter in its scheduling algorithm similarly to the proposed bitmask. As such the single atomic instruction on a shared cacheline may be sufficiently performant. 129 130 I have built a prototype of this ready-queue (including the bitmask and BMI2 usage, but not the sharded bitmask) and ran performance experiments on it but it is difficult to compare this prototype to a thread scheduler as the prototype is used as a data-queue. I have also integrated this prototype into the \CFA runtime, but have not yet created performance experiments to compare results. I believe that the bitmask approach is currently one of the larger risks of the proposal, early tests lead me to believe it may work but it is not clear that the contention problem can be overcome. The worst-case scenario is a case where the number of processors and the number of ready threads are similar, yet scheduling events are very frequent. Fewer threads should lead to the Idle Sleep mechanism reducing contention while having many threads ready leads to optimal performance. It is difficult to evaluate the likeliness of this worst-case scenario in real workloads. I believe, frequent scheduling events suggest a more ``bursty'' workload where new work is finely divided among many threads which race to completion. This type of workload would only see a peek of contention close to the end of the work, but no sustained contention. Very fine-grained pipelines are less ``bursty'', these may lead to more sustained contention. However, they could also easily benefit from a direct hand-off strategy which would circumvent the problem entirely. 131 132 \subsection{Dynamic Resizing} 133 The \CFA runtime system currently handles dynamically adding and removing processors from clusters at any time. Since this is part of the existing design, the proposed scheduler must also support this behaviour. However, dynamicly resizing the clusters is considered a rare event associated with setup, teardown and major configuration changes. This assumptions is made both in the design of the proposed scheduler as well as in the original design of the \CFA runtime system. As such, the proposed scheduler must honor the correctness of these behaviour but does not have any performance objectives with regards to resizing a cluster. How long adding or removing processors take and how much this disrupts the performance of other threads is considered a secondary concern since it should be amortized over long period of times. This description effectively matches with te description of a Reader-Writer lock, in frequent but invasive updates among frequent (mostly) read operations. In the case of the Ready-Queue described above, read operations are operations that push or pop from the ready-queue but do not invalidate any references to the ready queue data structures. Writes on the other-hand would add or remove inner queues, invalidating references to the array of inner queues in the process. Therefore, the current proposed approach to this problem is the add a per-cluster Reader Writer lock around the ready queue to prevent restructuring of the ready-queue data structure while threads are being pushed or popped. 194 195 How much scalability is actually needed is highly debatable, libfibre\cit has compared favorably to other schedulers in webserver tests\cit and uses a single atomic counter in its scheduling algorithm similarly to the proposed bitmask. As such, the single atomic instruction on a shared cacheline may be sufficiently performant. 196 197 I have built a prototype of this ready-queue (including the bitmask and BMI2 usage, but not the sharded bitmask) and ran performance experiments on it but it is difficult to compare this prototype to a thread scheduler as the prototype is used as a data-queue. I have also integrated this prototype into the \CFA runtime, but have not yet created performance experiments to compare results. I believe that the bitmask approach is currently one of the larger risks of the proposal, early tests lead me to believe it may work but it is not clear that the contention problem can be overcome. The worst-case scenario is a case where the number of processors and the number of ready threads are similar, yet scheduling events are very frequent. Fewer threads should lead to the Idle Sleep mechanism discussed in Section~\ref{sec:sleep} to reduce contention while having many threads ready leads to optimal performance. It is difficult to evaluate the likeliness of this worst-case scenario in real workloads. I believe, frequent scheduling events suggest a more ``bursty'' workload where new work is finely divided among many threads which race to completion. This type of workload would only see a peek of contention close to the end of the work, but no sustained contention. Very fine-grained pipelines are less ``bursty'', these may lead to more sustained contention. However, they could also easily benefit from a direct hand-off strategy which would circumvent the problem entirely. 198 199 \subsection{Dynamic Resizing} \label{sec:resize} 200 The \CFA runtime system currently handles dynamically adding and removing processors from clusters at any time. Since this is part of the existing design, the proposed scheduler must also support this behaviour. However, dynamicly resizing the clusters is considered a rare event associated with setup, teardown and major configuration changes. This assumptions is made both in the design of the proposed scheduler as well as in the original design of the \CFA runtime system. As such, the proposed scheduler must honor the correctness of these behaviour but does not have any performance objectives with regards to resizing a cluster. How long adding or removing processors take and how much this disrupts the performance of other threads is considered a secondary concern since it should be amortized over long period of times. However, as mentionned in Section~\ref{sec:queue}, contention on the underlying queues can have a direct impact on performance, the number of underlying queues must therefore be adjusted as the number of processors grows or shrinks. Since the underlying queues are stored in a dense array, changing the number of queues requires resizing the array and therefore moving it. This can introduce memory reclamation problems if not done correctly. 201 202 \begin{figure} 203 \begin{center} 204 % {\resizebox{0.8\textwidth}{!}{\input{resize}}} 205 \input{resize} 206 \end{center} 207 \caption{Copy of data structure shown in Figure~\ref{fig:base}. The cells of the array can be modified concurrently but resizing the array, which requires moving it, is not safe to do concurrently. This can also be true of the accompanying data structures used to find non-empty queues.} 208 \label{fig:base2} 209 \end{figure} 210 211 It is important to note that how the array is used in this case. While the array cells are modified by every push and pop operation, the array itself, i.e., the pointer that would change when resized, is only read during these operations. Therefore the use is this pointer can be described as frequent reads and in frequent writes. This description effectively matches with the description of a Reader-Writer lock, infrequent but invasive updates among frequent read operations. In the case of the Ready-Queue described above, read operations are operations that push or pop from the ready-queue but do not invalidate any references to the ready queue data structures. Writes on the other-hand would add or remove inner queues, invalidating references to the array of inner queues in the process. Therefore, the current proposed approach to this problem is the add a per-cluster Reader Writer lock around the ready queue to prevent restructuring of the ready-queue data structure while threads are being pushed or popped. 134 212 135 213 There are possible alternatives to the Reader Writer lock solution. This problem is effectively a memory reclamation problem and as such there is a large body of research on the subject. However, the RWlock solution is simple and can be leveraged to solve other problems (e.g. processor ordering and memory reclamation of threads) which makes it an attractive solution. 136 214 137 215 \paragraph{Objectives and Existing Work} 138 The lock must offer scalability and performance on par with the actual ready-queue in order not to introduce a new bottle 139 140 \subsection{Idle Sleep} \label{s leep}141 As mentionned above, idle sleep is the process of putting processors to sleep while they do not have threads to execute. In this context processors are kernel-threads and sleeping refers to asking the kernel to block a thread. This can be achieved with either thread synchronization operations like pthread\_cond\_wait or using signal operations like sigsuspend.216 The lock must offer scalability and performance on par with the actual ready-queue in order not to introduce a new bottleneck. I have already built a lock that fits the desired requirements and preliminary testing show scalability and performance that exceed the target. As such, I do not consider this lock to be a risk on this project. 217 218 \subsection{Idle Sleep} \label{sec:sleep} 219 As mentionned above, idle sleep is the process of putting processors to sleep while they do not have threads to execute. In this context, processors are kernel-threads and sleeping refers to asking the kernel to block a thread. This can be achieved with either thread synchronization operations like pthread\_cond\_wait or using signal operations like sigsuspend. The goal of putting idle processors to sleep is two-fold, it reduces energy consumption in cases where more idle kernel-threads translate to idle hardware threads, and reduces contention on the ready queue, since the otherwise idle processors generally contend trying to pop items from the queue. Since energy efficiency is a growing concern in many industry sectors\cit, there is not real need to solve the contention problem without using idle sleep. 142 220 143 221 Support for idle sleep broadly involves calling the operating system to block the kernel thread but also handling the race between the sleeping and the waking up, and handling which kernel thread should sleep or wake-up. 144 222 145 When a processor decides to sleep, there is a race that occurs between it signalling that it will go to sleep (so other processors can find sleeping processors) and actually blocking the kernel thread. This is equivalent to the classic problem of missing signals when using condition variables, the ``sleepy'' processor indicates that it will sleep but has not yet gone to sleep, if another processor attempts to wake it up, the waking-up operation may claim nothing needs to be done and the signal will have been missed. In cases where threads are scheduled from processors on the current cluster, loosing signals is not necessarily critical, because at least some processors on the cluster are awake. Individual processors always finish s hceduling threads before looking for new work, which means that the last processor to go to sleep cannot miss threads scheduled from inside the cluster (if they do, that demonstrates the ready-queue is not linearizable). However, this guarantee does not hold if threads are shceduled from outside the cluster, either due to an external event like timers and I/O, or due to a thread migrating from a different cluster. In this case, missed signals can lead to the cluster deadlocking where it should not\footnote{Clusters ``should'' never deadlock, but for this proposal, cases where \CFA users \emph{actually} wrote \CFA code that leads to a deadlock it is considered as a deadlock that ``should'' happen. }. Therefore, it is important that the scheduling of threads include a mechanism where signals \emph{cannot} be missed. For performance reasons, it can be advantageous to have a secondary mechanism that allows signals to be missed in cases where it cannot lead to a deadlock. To be safe, this process must include a ``handshake'' where it is guaranteed that either~: the sleepy processor notices that a thread was scheduled after it signalled its intent to block or code scheduling threads well seethe intent to sleep before scheduling and be able to wake-up the processor. This matter is complicated by the fact that pthread offers few tools to implement this solution and offers no guarantee of ordering of threads waking up for most of these tools.146 147 Another issues is trying to avoid kernel sleeping and waking frequently. A possible partial solution is to order the processors so that the one which most recently went to sleep is woken up. This allows other sleeping processors to reach deeper sleep state (when these are available) while keeping ``hot'' processors warmer. Note that while this generally means organising the processors in a stack, I believe that the unique index provided by the ReaderWriter lock can be reused to strictly order the waking order of processors, causing a LIFO like waking order. While a strict LIFO stack is probably better, using the processor index could pro ove useful and offer a sufficiently LIFO ordering.223 When a processor decides to sleep, there is a race that occurs between it signalling that it will go to sleep (so other processors can find sleeping processors) and actually blocking the kernel thread. This is equivalent to the classic problem of missing signals when using condition variables, the ``sleepy'' processor indicates that it will sleep but has not yet gone to sleep, if another processor attempts to wake it up, the waking-up operation may claim nothing needs to be done and the signal will have been missed. In cases where threads are scheduled from processors on the current cluster, loosing signals is not necessarily critical, because at least some processors on the cluster are awake. Individual processors always finish scheduling threads before looking for new work, which means that the last processor to go to sleep cannot miss threads scheduled from inside the cluster (if they do, that demonstrates the ready-queue is not linearizable). However, this guarantee does not hold if threads are scheduled from outside the cluster, either due to an external event like timers and I/O, or due to a thread migrating from a different cluster. In this case, missed signals can lead to the cluster deadlocking where it should not\footnote{Clusters ``should'' never deadlock, but for this proposal, cases where \CFA users \emph{actually} wrote \CFA code that leads to a deadlock it is considered as a deadlock that ``should'' happen. }. Therefore, it is important that the scheduling of threads include a mechanism where signals \emph{cannot} be missed. For performance reasons, it can be advantageous to have a secondary mechanism that allows signals to be missed in cases where it cannot lead to a deadlock. To be safe, this process must include a ``handshake'' where it is guaranteed that either~: the sleepy processor notices that a thread was scheduled after it signalled its intent to block or code scheduling threads sees the intent to sleep before scheduling and be able to wake-up the processor. This matter is complicated by the fact that pthread offers few tools to implement this solution and offers no guarantee of ordering of threads waking up for most of these tools. 224 225 Another issues is trying to avoid kernel sleeping and waking frequently. A possible partial solution is to order the processors so that the one which most recently went to sleep is woken up. This allows other sleeping processors to reach deeper sleep state (when these are available) while keeping ``hot'' processors warmer. Note that while this generally means organising the processors in a stack, I believe that the unique index provided by the ReaderWriter lock can be reused to strictly order the waking order of processors, causing a LIFO like waking order. While a strict LIFO stack is probably better, using the processor index could prove useful and offer a sufficiently LIFO ordering. 148 226 149 227 Finally, another important aspect of Idle Sleep is when should processors make the decision to sleep and when it is appropriate for sleeping processors to be woken up. Processors that are unnecessarily awake lead to unnecessary contention and power consumption, while too many sleeping processors can lead to sub-optimal throughput. Furthermore, transitions from sleeping to awake and vice-versa also add unnecessary latency. There is already a wealth of research on the subject and I do not plan to implement a novel idea for the Idle Sleep heuristic in this project. … … 153 231 154 232 \paragraph{OS Abstraction} 155 One of the fundamental part of this converting blocking I/O operations into non-blocking ones. This relies on having an underlying asynchronous I/O interface to which to direct the I/O operations. While there exists many different APIs for asynchronous I/O, it is not part of this proposal to create a novel API, simply to use an existing one that is sufficient. uC++ uses the \texttt{select} as its interface, which handles pipes and sockets. It entails significant complexity and has performances problems which make it a less interesting alternative. Another interface which is becoming popular recently\cit is \texttt{epoll}. However, epoll also does not handle file system and seems to have problem to linux pipes and \texttt{TTY}s\cit. A very recent alternative that must still be investigated is \texttt{io\_uring}. It claims to address some of the issues with \texttt{epoll} but is too recent to be confident that it does. Finally, a popular cross-platform alternative is \texttt{libuv}, which offers asynchronous sockets and asynchronous file system operations (among other features). However, as a full-featured library it includes much more than what is needed and could conflict with other features of \CFA unless significant efforts are made to merge them together.233 One of the fundamental part of converting blocking I/O operations into non-blocking ones is having an underlying asynchronous I/O interface to direct the I/O operations. While there exists many different APIs for asynchronous I/O, it is not part of this proposal to create a novel API, simply to use an existing one that is sufficient. uC++ uses the \texttt{select} as its interface, which handles ttys, pipes and sockets, but not disk. It entails significant complexity and is being replaced which make it a less interesting alternative. Another interface which is becoming popular recently\cit is \texttt{epoll}, which is supposed to be cheaper than \texttt{select}. However, epoll also does not handle file system and seems to have problem to linux pipes and \texttt{TTY}s\cit. A very recent alternative that must still be investigated is \texttt{io\_uring}. It claims to address some of the issues with \texttt{epoll} but is too recent to be confident that it does. Finally, a popular cross-platform alternative is \texttt{libuv}, which offers asynchronous sockets and asynchronous file system operations (among other features). However, as a full-featured library it includes much more than what is needed and could conflict with other features of \CFA unless significant efforts are made to merge them together. 156 234 157 235 \paragraph{Event-Engine} … … 159 237 160 238 \paragraph{Interface} 161 Finally, for these components to be available, it is necessary to expose them through a synchronous interface. This can be a novel interface but it is preferrable to attempt to intercept the existing POSIX interface in order to be compatible with existing code. This will allowC programs written using this interface to be transparently converted to \CFA with minimal effeort. Where this is not applicable, a novel interface will be created to fill the gaps.239 Finally, for these components to be available, it is necessary to expose them through a synchronous interface. This can be a novel interface but it is preferrable to attempt to intercept the existing POSIX interface in order to be compatible with existing code. This allows C programs written using this interface to be transparently converted to \CFA with minimal effeort. Where this is not applicable, a novel interface will be created to fill the gaps. 162 240 163 241 … … 171 249 \section{Timeline} 172 250 173 174 \cleardoublepage175 251 176 252 % B I B L I O G R A P H Y 177 253 % ----------------------------- 178 \addcontentsline{toc}{chapter}{Bibliography} 254 \cleardoublepage 255 \phantomsection % allows hyperref to link to the correct page 256 \addcontentsline{toc}{section}{\refname} 179 257 \bibliographystyle{plain} 180 258 \bibliography{pl,local} 259 260 % G L O S S A R Y 261 % ----------------------------- 181 262 \cleardoublepage 182 263 \phantomsection % allows hyperref to link to the correct page 183 184 % G L O S S A R Y 185 % ----------------------------- 186 \addcontentsline{toc}{chapter}{Glossary} 264 \addcontentsline{toc}{section}{Glossary} 187 265 \printglossary 188 \cleardoublepage189 \phantomsection % allows hyperref to link to the correct page190 266 191 267 \end{document} -
doc/user/user.tex
re6cfa8ff r5b544a6 11 11 %% Created On : Wed Apr 6 14:53:29 2016 12 12 %% Last Modified By : Peter A. Buhr 13 %% Last Modified On : Sat Jul 13 18:36:18 201914 %% Update Count : 3 87613 %% Last Modified On : Fri Mar 6 13:34:52 2020 14 %% Update Count : 3924 15 15 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 16 16 … … 211 211 Even with all its problems, C continues to be popular because it allows writing software at virtually any level in a computer system without restriction. 212 212 For system programming, where direct access to hardware, storage management, and real-time issues are a requirement, C is usually the only language of choice. 213 The TIOBE index~\cite{TIOBE} for July 2018 ranks the top five most \emph{popular} programming languages as \Index*{Java} 16\%, C 14\%, \Index*[C++]{\CC{}} 7.5\%, Python 6\%, Visual Basic 4\% = 47.5\%, where the next 50 languages are less than 4\% each, with a long tail.214 The top 3 rankings over the past 30years are:213 The TIOBE index~\cite{TIOBE} for February 2020 ranks the top six most \emph{popular} programming languages as \Index*{Java} 17.4\%, C 16.8\%, Python 9.3\%, \Index*[C++]{\CC{}} 6.2\%, \Csharp 5.9\%, Visual Basic 5.9\% = 61.5\%, where the next 50 languages are less than 2\% each, with a long tail. 214 The top 4 rankings over the past 35 years are: 215 215 \begin{center} 216 216 \setlength{\tabcolsep}{10pt} 217 \begin{tabular}{@{}rccccccc@{}} 218 & 2018 & 2013 & 2008 & 2003 & 1998 & 1993 & 1988 \\ \hline 219 Java & 1 & 2 & 1 & 1 & 16 & - & - \\ 220 \R{C} & \R{2} & \R{1} & \R{2} & \R{2} & \R{1} & \R{1} & \R{1} \\ 221 \CC & 3 & 4 & 3 & 3 & 2 & 2 & 5 \\ 217 \begin{tabular}{@{}rcccccccc@{}} 218 & 2020 & 2015 & 2010 & 2005 & 2000 & 1995 & 1990 & 1985 \\ \hline 219 Java & 1 & 2 & 1 & 2 & 3 & - & - & - \\ 220 \R{C} & \R{2} & \R{1} & \R{2} & \R{1} & \R{1} & \R{2} & \R{1} & \R{1} \\ 221 Python & 3 & 7 & 6 & 6 & 22 & 21 & - & - \\ 222 \CC & 4 & 4 & 4 & 3 & 2 & 1 & 2 & 12 \\ 222 223 \end{tabular} 223 224 \end{center} … … 512 513 Keyword clashes are accommodated by syntactic transformations using the \CFA backquote escape-mechanism: 513 514 \begin{cfa} 514 int ®` ®otype®`®= 3; §\C{// make keyword an identifier}§515 double ®` ®forall®`®= 3.5;515 int ®``®otype = 3; §\C{// make keyword an identifier}§ 516 double ®``®forall = 3.5; 516 517 \end{cfa} 517 518 … … 524 525 // include file uses the CFA keyword "with". 525 526 #if ! defined( with ) §\C{// nesting ?}§ 526 #define with ®` ®with®`®§\C{// make keyword an identifier}§527 #define with ®``®with §\C{// make keyword an identifier}§ 527 528 #define __CFA_BFD_H__ 528 529 #endif 529 530 ®#include_next <bfdlink.h> §\C{// must have internal check for multiple expansion}§ 531 ® 530 §{\color{red}\#\textbf{include\_next} <bfdlink.h>}§ §\C{// must have internal check for multiple expansion}§ 532 531 #if defined( with ) && defined( __CFA_BFD_H__ ) §\C{// reset only if set}§ 533 532 #undef with … … 576 575 \section{Exponentiation Operator} 577 576 578 C, \CC, and Java (and many other programming languages) have no exponentiation operator\index{exponentiation!operator}\index{operator!exponentiation}, \ie $x^y$, and instead use a routine, like \Indexc{pow }, to perform the exponentiation operation.579 \CFA extends the basic operators with the exponentiation operator ©? \?©\index{?\\?@©?\?©} and ©?\=?©\index{?\\=?@©\=?©}, as in, ©x \ y© and ©x \= y©, which means $x^y$ and $x \leftarrow x^y$.577 C, \CC, and Java (and many other programming languages) have no exponentiation operator\index{exponentiation!operator}\index{operator!exponentiation}, \ie $x^y$, and instead use a routine, like \Indexc{pow(x,y)}, to perform the exponentiation operation. 578 \CFA extends the basic operators with the exponentiation operator ©?®\®?©\index{?\\?@©?®\®?©} and ©?\=?©\index{?\\=?@©®\®=?©}, as in, ©x ®\® y© and ©x ®\®= y©, which means $x^y$ and $x \leftarrow x^y$. 580 579 The priority of the exponentiation operator is between the cast and multiplicative operators, so that ©w * (int)x \ (int)y * z© is parenthesized as ©((w * (((int)x) \ ((int)y))) * z)©. 581 580 582 As for \Index{division}, there are exponentiation operators for integral and floating types, including the builtin \Index{complex} types.581 There are exponentiation operators for integral and floating types, including the builtin \Index{complex} types. 583 582 Integral exponentiation\index{exponentiation!unsigned integral} is performed with repeated multiplication\footnote{The multiplication computation is $O(\log y)$.} (or shifting if the exponent is 2). 584 Overflow f rom large exponents or negative exponents returnzero.583 Overflow for a large exponent or negative exponent returns zero. 585 584 Floating exponentiation\index{exponentiation!floating} is performed using \Index{logarithm}s\index{exponentiation!logarithm}, so the exponent cannot be negative. 586 585 \begin{cfa} … … 589 588 1 1 256 -64 125 ®0® 3273344365508751233 ®0® ®0® -0.015625 18.3791736799526 0.264715-1.1922i 590 589 \end{cfa} 591 Note, ©5 ®\® 32© and ©5L ®\® 64© overflow, and ©-4 ®\®-3© is a fraction but stored in an integer so all three computations generate an integral zero.590 Note, ©5 \ 32© and ©5L \ 64© overflow, and ©-4 \ -3© is a fraction but stored in an integer so all three computations generate an integral zero. 592 591 Parenthesis are necessary for complex constants or the expression is parsed as ©1.0f+®(®2.0fi \ 3.0f®)®+2.0fi©. 593 592 The exponentiation operator is available for all the basic types, but for user-defined types, only the integral-computation version is available. … … 598 597 OT ?®\®?( OT ep, unsigned long int y ); 599 598 \end{cfa} 600 The user type ©T© must define multiplication, one , ©1©, and,©*©.599 The user type ©T© must define multiplication, one (©1©), and ©*©. 601 600 602 601 … … 626 625 627 626 628 \subsection{Loop Control} 629 630 The ©for©/©while©/©do-while© loop-control allows empty or simplified ranges (see Figure~\ref{f:LoopControlExamples}). 631 \begin{itemize} 632 \item 633 An empty conditional implies ©1©. 634 \item 635 The up-to range ©~©\index{~@©~©} means exclusive range [M,N). 636 \item 637 The up-to range ©~=©\index{~=@©~=©} means inclusive range [M,N]. 638 \item 639 The down-to range ©-~©\index{-~@©-~©} means exclusive range [N,M). 640 \item 641 The down-to range ©-~=©\index{-~=@©-~=©} means inclusive range [N,M]. 642 \item 643 ©@© means put nothing in this field. 644 \item 645 ©0© is the implicit start value; 646 \item 647 ©1© is the implicit increment value. 648 \item 649 The up-to range uses ©+=© for increment; 650 \item 651 The down-to range uses ©-=© for decrement. 652 \item 653 The loop index is polymorphic in the type of the start value or comparison value when start is implicitly ©0©. 654 \end{itemize} 655 656 \begin{figure} 627 %\section{\texorpdfstring{\protect\lstinline@case@ Clause}{case Clause}} 628 \subsection{\texorpdfstring{\LstKeywordStyle{case} Clause}{case Clause}} 629 630 C restricts the ©case© clause of a ©switch© statement to a single value. 631 For multiple ©case© clauses associated with the same statement, it is necessary to have multiple ©case© clauses rather than multiple values. 632 Requiring a ©case© clause for each value does not seem to be in the spirit of brevity normally associated with C. 633 Therefore, the ©case© clause is extended with a list of values, as in: 657 634 \begin{cquote} 658 \begin{tabular}{@{}l|l@{}} 659 \multicolumn{1}{c|}{loop control} & \multicolumn{1}{c}{output} \\ 660 \hline 661 \begin{cfa} 662 sout | nlOff; 663 while ®()® { sout | "empty"; break; } sout | nl; 664 do { sout | "empty"; break; } while ®()®; sout | nl; 665 for ®()® { sout | "empty"; break; } sout | nl; 666 for ( ®0® ) { sout | "A"; } sout | "zero" | nl; 667 for ( ®1® ) { sout | "A"; } sout | nl; 668 for ( ®10® ) { sout | "A"; } sout | nl; 669 for ( ®1 ~= 10 ~ 2® ) { sout | "B"; } sout | nl; 670 for ( ®10 -~= 1 ~ 2® ) { sout | "C"; } sout | nl; 671 for ( ®0.5 ~ 5.5® ) { sout | "D"; } sout | nl; 672 for ( ®5.5 -~ 0.5® ) { sout | "E"; } sout | nl; 673 for ( ®i; 10® ) { sout | i; } sout | nl; 674 for ( ®i; 1 ~= 10 ~ 2® ) { sout | i; } sout | nl; 675 for ( ®i; 10 -~= 1 ~ 2® ) { sout | i; } sout | nl; 676 for ( ®i; 0.5 ~ 5.5® ) { sout | i; } sout | nl; 677 for ( ®i; 5.5 -~ 0.5® ) { sout | i; } sout | nl; 678 for ( ®ui; 2u ~= 10u ~ 2u® ) { sout | ui; } sout | nl; 679 for ( ®ui; 10u -~= 2u ~ 2u® ) { sout | ui; } sout | nl; 680 enum { N = 10 }; 681 for ( ®N® ) { sout | "N"; } sout | nl; 682 for ( ®i; N® ) { sout | i; } sout | nl; 683 for ( ®i; N -~ 0® ) { sout | i; } sout | nl; 684 const int start = 3, comp = 10, inc = 2; 685 for ( ®i; start ~ comp ~ inc + 1® ) { sout | i; } sout | nl; 686 for ( ®i; 1 ~ @® ) { if ( i > 10 ) break; 687 sout | i; } sout | nl; 688 for ( ®i; 10 -~ @® ) { if ( i < 0 ) break; 689 sout | i; } sout | nl; 690 for ( ®i; 2 ~ @ ~ 2® ) { if ( i > 10 ) break; 691 sout | i; } sout | nl; 692 for ( ®i; 2.1 ~ @ ~ @® ) { if ( i > 10.5 ) break; 693 sout | i; i += 1.7; } sout | nl; 694 for ( ®i; 10 -~ @ ~ 2® ) { if ( i < 0 ) break; 695 sout | i; } sout | nl; 696 for ( ®i; 12.1 ~ @ ~ @® ) { if ( i < 2.5 ) break; 697 sout | i; i -= 1.7; } sout | nl; 698 for ( ®i; 5 : j; -5 ~ @® ) { sout | i | j; } sout | nl; 699 for ( ®i; 5 : j; -5 -~ @® ) { sout | i | j; } sout | nl; 700 for ( ®i; 5 : j; -5 ~ @ ~ 2® ) { sout | i | j; } sout | nl; 701 for ( ®i; 5 : j; -5 -~ @ ~ 2® ) { sout | i | j; } sout | nl; 702 for ( ®j; -5 ~ @ : i; 5® ) { sout | i | j; } sout | nl; 703 for ( ®j; -5 -~ @ : i; 5® ) { sout | i | j; } sout | nl; 704 for ( ®j; -5 ~ @ ~ 2 : i; 5® ) { sout | i | j; } sout | nl; 705 for ( ®j; -5 -~ @ ~ 2 : i; 5® ) { sout | i | j; } sout | nl; 706 for ( ®j; -5 -~ @ ~ 2 : i; 5 : k; 1.5 ~ @® ) { 707 sout | i | j | k; } sout | nl; 708 for ( ®j; -5 -~ @ ~ 2 : k; 1.5 ~ @ : i; 5® ) { 709 sout | i | j | k; } sout | nl; 710 for ( ®k; 1.5 ~ @ : j; -5 -~ @ ~ 2 : i; 5® ) { 711 sout | i | j | k; } sout | nl; 635 \begin{tabular}{@{}l@{\hspace{3em}}l@{\hspace{2em}}l@{}} 636 \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c@{\hspace{2em}}}{\textbf{C}} \\ 637 \begin{cfa} 638 switch ( i ) { 639 case ®1, 3, 5®: 640 ... 641 case ®2, 4, 6®: 642 ... 643 } 712 644 \end{cfa} 713 645 & 714 646 \begin{cfa} 715 716 empty 717 empty 718 empty 719 zero 720 A 721 A A A A A A A A A A 722 B B B B B 723 C C C C C 724 D D D D D 725 E E E E E 726 0 1 2 3 4 5 6 7 8 9 727 1 3 5 7 9 728 10 8 6 4 2 729 0.5 1.5 2.5 3.5 4.5 730 5.5 4.5 3.5 2.5 1.5 731 2 4 6 8 10 732 10 8 6 4 2 733 734 N N N N N N N N N N 735 0 1 2 3 4 5 6 7 8 9 736 10 9 8 7 6 5 4 3 2 1 737 738 3 6 9 739 740 1 2 3 4 5 6 7 8 9 10 741 742 10 9 8 7 6 5 4 3 2 1 0 743 744 2 4 6 8 10 745 746 2.1 3.8 5.5 7.2 8.9 747 748 10 8 6 4 2 0 749 750 12.1 10.4 8.7 7 5.3 3.6 751 0 -5 1 -4 2 -3 3 -2 4 -1 752 0 -5 1 -6 2 -7 3 -8 4 -9 753 0 -5 1 -3 2 -1 3 1 4 3 754 0 -5 1 -7 2 -9 3 -11 4 -13 755 0 -5 1 -4 2 -3 3 -2 4 -1 756 0 -5 1 -6 2 -7 3 -8 4 -9 757 0 -5 1 -3 2 -1 3 1 4 3 758 0 -5 1 -7 2 -9 3 -11 4 -13 759 760 0 -5 1.5 1 -7 2.5 2 -9 3.5 3 -11 4.5 4 -13 5.5 761 762 0 -5 1.5 1 -7 2.5 2 -9 3.5 3 -11 4.5 4 -13 5.5 763 764 0 -5 1.5 1 -7 2.5 2 -9 3.5 3 -11 4.5 4 -13 5.5 647 switch ( i ) { 648 case 1: case 3 : case 5: 649 ... 650 case 2: case 4 : case 6: 651 ... 652 } 653 \end{cfa} 654 & 655 \begin{cfa} 656 657 // odd values 658 659 // even values 660 661 765 662 \end{cfa} 766 663 \end{tabular} 767 664 \end{cquote} 768 \caption{Loop Control Examples} 769 \label{f:LoopControlExamples} 770 \end{figure} 665 In addition, subranges are allowed to specify case values.\footnote{ 666 gcc has the same mechanism but awkward syntax, \lstinline@2 ...42@, because a space is required after a number, otherwise the period is a decimal point.} 667 \begin{cfa} 668 switch ( i ) { 669 case ®1~5:® §\C{// 1, 2, 3, 4, 5}§ 670 ... 671 case ®10~15:® §\C{// 10, 11, 12, 13, 14, 15}§ 672 ... 673 } 674 \end{cfa} 675 Lists of subranges are also allowed. 676 \begin{cfa} 677 case ®1~5, 12~21, 35~42®: 678 \end{cfa} 771 679 772 680 … … 977 885 978 886 979 %\section{\texorpdfstring{\protect\lstinline@case@ Clause}{case Clause}} 980 \subsection{\texorpdfstring{\LstKeywordStyle{case} Statement}{case Statement}} 981 982 C restricts the ©case© clause of a ©switch© statement to a single value. 983 For multiple ©case© clauses associated with the same statement, it is necessary to have multiple ©case© clauses rather than multiple values. 984 Requiring a ©case© clause for each value does not seem to be in the spirit of brevity normally associated with C. 985 Therefore, the ©case© clause is extended with a list of values, as in: 986 \begin{cquote} 987 \begin{tabular}{@{}l@{\hspace{3em}}l@{\hspace{2em}}l@{}} 988 \multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}} & \multicolumn{1}{c@{\hspace{2em}}}{\textbf{C}} \\ 989 \begin{cfa} 990 switch ( i ) { 991 case ®1, 3, 5®: 887 \subsection{Non-terminating and Labelled \texorpdfstring{\LstKeywordStyle{fallthrough}}{Non-terminating and Labelled fallthrough}} 888 889 The ©fallthrough© clause may be non-terminating within a ©case© clause or have a target label to common code from multiple case clauses. 890 \begin{center} 891 \begin{tabular}{@{}lll@{}} 892 \begin{cfa} 893 choose ( ... ) { 894 case 3: 895 if ( ... ) { 896 ... ®fallthru;® // goto case 4 897 } else { 898 ... 899 } 900 // implicit break 901 case 4: 902 903 904 905 906 \end{cfa} 907 & 908 \begin{cfa} 909 choose ( ... ) { 910 case 3: 911 ... ®fallthrough common;® 912 case 4: 913 ... ®fallthrough common;® 914 915 ®common:® // below fallthrough 916 // at case-clause level 917 ... // common code for cases 3/4 918 // implicit break 919 case 4: 920 921 922 \end{cfa} 923 & 924 \begin{cfa} 925 choose ( ... ) { 926 case 3: 927 choose ( ... ) { 928 case 4: 929 for ( ... ) { 930 // multi-level transfer 931 ... ®fallthru common;® 932 } 933 ... 934 } 992 935 ... 993 case ®2, 4, 6®: 994 ... 995 } 936 ®common:® // below fallthrough 937 // at case-clause level 938 \end{cfa} 939 \end{tabular} 940 \end{center} 941 The target label must be below the ©fallthrough© and may not be nested in a control structure, and 942 the target label must be at the same or higher level as the containing ©case© clause and located at 943 the same level as a ©case© clause; the target label may be case ©default©, but only associated 944 with the current ©switch©/©choose© statement. 945 946 947 \subsection{Loop Control} 948 949 The ©for©/©while©/©do-while© loop-control allows empty or simplified ranges (see Figure~\ref{f:LoopControlExamples}). 950 \begin{itemize} 951 \item 952 The loop index is polymorphic in the type of the comparison value N (when the start value is implicit) or the start value M. 953 \item 954 An empty conditional implies comparison value of ©1© (true). 955 \item 956 A comparison N is implicit up-to exclusive range [0,N©®)®©. 957 \item 958 A comparison ©=© N is implicit up-to inclusive range [0,N©®]®©. 959 \item 960 The up-to range M ©~©\index{~@©~©} N means exclusive range [M,N©®)®©. 961 \item 962 The up-to range M ©~=©\index{~=@©~=©} N means inclusive range [M,N©®]®©. 963 \item 964 The down-to range M ©-~©\index{-~@©-~©} N means exclusive range [N,M©®)®©. 965 \item 966 The down-to range M ©-~=©\index{-~=@©-~=©} N means inclusive range [N,M©®]®©. 967 \item 968 ©0© is the implicit start value; 969 \item 970 ©1© is the implicit increment value. 971 \item 972 The up-to range uses operator ©+=© for increment; 973 \item 974 The down-to range uses operator ©-=© for decrement. 975 \item 976 ©@© means put nothing in this field. 977 \item 978 ©:© means start another index. 979 \end{itemize} 980 981 \begin{figure} 982 \begin{tabular}{@{}l|l@{}} 983 \multicolumn{1}{c|}{loop control} & \multicolumn{1}{c}{output} \\ 984 \hline 985 \begin{cfa}[xleftmargin=0pt] 986 while ®()® { sout | "empty"; break; } 987 do { sout | "empty"; break; } while ®()®; 988 for ®()® { sout | "empty"; break; } 989 for ( ®0® ) { sout | "A"; } sout | "zero"; 990 for ( ®1® ) { sout | "A"; } 991 for ( ®10® ) { sout | "A"; } 992 for ( ®= 10® ) { sout | "A"; } 993 for ( ®1 ~= 10 ~ 2® ) { sout | "B"; } 994 for ( ®10 -~= 1 ~ 2® ) { sout | "C"; } 995 for ( ®0.5 ~ 5.5® ) { sout | "D"; } 996 for ( ®5.5 -~ 0.5® ) { sout | "E"; } 997 for ( ®i; 10® ) { sout | i; } 998 for ( ®i; = 10® ) { sout | i; } 999 for ( ®i; 1 ~= 10 ~ 2® ) { sout | i; } 1000 for ( ®i; 10 -~= 1 ~ 2® ) { sout | i; } 1001 for ( ®i; 0.5 ~ 5.5® ) { sout | i; } 1002 for ( ®i; 5.5 -~ 0.5® ) { sout | i; } 1003 for ( ®ui; 2u ~= 10u ~ 2u® ) { sout | ui; } 1004 for ( ®ui; 10u -~= 2u ~ 2u® ) { sout | ui; } 1005 enum { N = 10 }; 1006 for ( ®N® ) { sout | "N"; } 1007 for ( ®i; N® ) { sout | i; } 1008 for ( ®i; N -~ 0® ) { sout | i; } 1009 const int start = 3, comp = 10, inc = 2; 1010 for ( ®i; start ~ comp ~ inc + 1® ) { sout | i; } 1011 for ( i; 1 ~ ®@® ) { if ( i > 10 ) break; sout | i; } 1012 for ( i; 10 -~ ®@® ) { if ( i < 0 ) break; sout | i; } 1013 for ( i; 2 ~ ®@® ~ 2 ) { if ( i > 10 ) break; sout | i; } 1014 for ( i; 2.1 ~ ®@® ~ ®@® ) { if ( i > 10.5 ) break; sout | i; i += 1.7; } 1015 for ( i; 10 -~ ®@® ~ 2 ) { if ( i < 0 ) break; sout | i; } 1016 for ( i; 12.1 ~ ®@® ~ ®@® ) { if ( i < 2.5 ) break; sout | i; i -= 1.7; } 1017 for ( i; 5 ®:® j; -5 ~ @ ) { sout | i | j; } 1018 for ( i; 5 ®:® j; -5 -~ @ ) { sout | i | j; } 1019 for ( i; 5 ®:® j; -5 ~ @ ~ 2 ) { sout | i | j; } 1020 for ( i; 5 ®:® j; -5 -~ @ ~ 2 ) { sout | i | j; } 1021 for ( i; 5 ®:® j; -5 ~ @ ) { sout | i | j; } 1022 for ( i; 5 ®:® j; -5 -~ @ ) { sout | i | j; } 1023 for ( i; 5 ®:® j; -5 ~ @ ~ 2 ) { sout | i | j; } 1024 for ( i; 5 ®:® j; -5 -~ @ ~ 2 ) { sout | i | j; } 1025 for ( i; 5 ®:® j; -5 -~ @ ~ 2 ®:® k; 1.5 ~ @ ) { sout | i | j | k; } 1026 for ( i; 5 ®:® j; -5 -~ @ ~ 2 ®:® k; 1.5 ~ @ ) { sout | i | j | k; } 1027 for ( i; 5 ®:® k; 1.5 ~ @ ®:® j; -5 -~ @ ~ 2 ) { sout | i | j | k; } 996 1028 \end{cfa} 997 1029 & 998 1030 \begin{cfa} 999 switch ( i ) { 1000 case 1: case 3 : case 5: 1001 ... 1002 case 2: case 4 : case 6: 1003 ... 1004 } 1005 \end{cfa} 1006 & 1007 \begin{cfa} 1008 1009 // odd values 1010 1011 // even values 1012 1013 1031 empty 1032 empty 1033 empty 1034 zero 1035 A 1036 A A A A A A A A A A 1037 A A A A A A A A A A A 1038 B B B B B 1039 C C C C C 1040 D D D D D 1041 E E E E E 1042 0 1 2 3 4 5 6 7 8 9 1043 0 1 2 3 4 5 6 7 8 9 10 1044 1 3 5 7 9 1045 10 8 6 4 2 1046 0.5 1.5 2.5 3.5 4.5 1047 5.5 4.5 3.5 2.5 1.5 1048 2 4 6 8 10 1049 10 8 6 4 2 1050 1051 N N N N N N N N N N 1052 0 1 2 3 4 5 6 7 8 9 1053 10 9 8 7 6 5 4 3 2 1 1054 1055 3 6 9 1056 1 2 3 4 5 6 7 8 9 10 1057 10 9 8 7 6 5 4 3 2 1 0 1058 2 4 6 8 10 1059 2.1 3.8 5.5 7.2 8.9 1060 10 8 6 4 2 0 1061 12.1 10.4 8.7 7. 5.3 3.6 1062 0 -5 1 -4 2 -3 3 -2 4 -1 1063 0 -5 1 -6 2 -7 3 -8 4 -9 1064 0 -5 1 -3 2 -1 3 1 4 3 1065 0 -5 1 -7 2 -9 3 -11 4 -13 1066 0 -5 1 -4 2 -3 3 -2 4 -1 1067 0 -5 1 -6 2 -7 3 -8 4 -9 1068 0 -5 1 -3 2 -1 3 1 4 3 1069 0 -5 1 -7 2 -9 3 -11 4 -13 1070 0 -5 1.5 1 -7 2.5 2 -9 3.5 3 -11 4.5 4 -13 5.5 1071 0 -5 1.5 1 -7 2.5 2 -9 3.5 3 -11 4.5 4 -13 5.5 1072 0 -5 1.5 1 -7 2.5 2 -9 3.5 3 -11 4.5 4 -13 5.5 1014 1073 \end{cfa} 1015 1074 \end{tabular} 1016 \end{cquote} 1017 In addition, subranges are allowed to specify case values.\footnote{ 1018 gcc has the same mechanism but awkward syntax, \lstinline@2 ...42@, because a space is required after a number, otherwise the period is a decimal point.} 1019 \begin{cfa} 1020 switch ( i ) { 1021 case ®1~5:® §\C{// 1, 2, 3, 4, 5}§ 1022 ... 1023 case ®10~15:® §\C{// 10, 11, 12, 13, 14, 15}§ 1024 ... 1025 } 1026 \end{cfa} 1027 Lists of subranges are also allowed. 1028 \begin{cfa} 1029 case ®1~5, 12~21, 35~42®: 1030 \end{cfa} 1031 1075 \caption{Loop Control Examples} 1076 \label{f:LoopControlExamples} 1077 \end{figure} 1032 1078 1033 1079 % for () => for ( ;; ) … … 6547 6593 hence, names in these include files are not mangled\index{mangling!name} (see~\VRef{s:Interoperability}). 6548 6594 All other C header files must be explicitly wrapped in ©extern "C"© to prevent name mangling. 6549 For \Index*[C++]{\CC{}}, the name-mangling issue is often handled internally in manyC header-files through checks for preprocessor variable ©__cplusplus©, which adds appropriate ©extern "C"© qualifiers.6595 This approach is different from \Index*[C++]{\CC{}} where the name-mangling issue is handled internally in C header-files through checks for preprocessor variable ©__cplusplus©, which adds appropriate ©extern "C"© qualifiers. 6550 6596 6551 6597 … … 6561 6607 The storage-management routines extend their C equivalents by overloading, alternate names, providing shallow type-safety, and removing the need to specify the allocation size for non-array types. 6562 6608 6563 Storage management provides the following capabilities:6609 C storage management provides the following capabilities: 6564 6610 \begin{description} 6565 \item[fill ]6566 after allocation the storage is filled with a specified character.6611 \item[filled] 6612 after allocation with a specified character or value. 6567 6613 \item[resize] 6568 an existing allocation is decreased or increased insize.6569 In either case, new storage may or may not be allocated and, if there is a new allocation, as much data from the existing allocation is copied .6614 an existing allocation to decreased or increased its size. 6615 In either case, new storage may or may not be allocated and, if there is a new allocation, as much data from the existing allocation is copied into the new allocation. 6570 6616 For an increase in storage size, new storage after the copied data may be filled. 6571 \item[align ment]6572 an allocation startson a specified memory boundary, \eg, an address multiple of 64 or 128 for cache-line purposes.6617 \item[align] 6618 an allocation on a specified memory boundary, \eg, an address multiple of 64 or 128 for cache-line purposes. 6573 6619 \item[array] 6574 6620 the allocation size is scaled to the specified number of array elements. 6575 6621 An array may be filled, resized, or aligned. 6576 6622 \end{description} 6577 The table shows allocation routines supporting different combinations of storage-management capabilities: 6578 \begin{center} 6579 \begin{tabular}{@{}r|r|l|l|l|l@{}} 6623 \VRef[Table]{t:AllocationVersusCapabilities} shows allocation routines supporting different combinations of storage-management capabilities. 6624 \begin{table} 6625 \centering 6626 \begin{minipage}{0.75\textwidth} 6627 \begin{tabular}{@{}r|l|l|l|l|l@{}} 6580 6628 \multicolumn{1}{c}{}& & \multicolumn{1}{c|}{fill} & resize & alignment & array \\ 6581 6629 \hline 6582 6630 C & ©malloc© & no & no & no & no \\ 6583 6631 & ©calloc© & yes (0 only) & no & no & yes \\ 6584 & ©realloc© & no/copy& yes & no & no \\6632 & ©realloc© & copy & yes & no & no \\ 6585 6633 & ©memalign© & no & no & yes & no \\ 6634 & ©aligned_alloc©\footnote{Same as ©memalign© but size is an integral multiple of alignment, which is universally ignored.} 6635 & no & no & yes & no \\ 6586 6636 & ©posix_memalign© & no & no & yes & no \\ 6637 & ©valloc© & no & no & yes (page size)& no \\ 6638 & ©pvalloc©\footnote{Same as ©valloc© but rounds size to multiple of page size.} 6639 & no & no & yes (page size)& no \\ 6587 6640 \hline 6588 C11 & ©aligned_alloc© & no & no & yes & no \\ 6589 \hline 6590 \CFA & ©alloc© & no/copy/yes & no/yes & no & yes \\ 6591 & ©align_alloc© & no/yes & no & yes & yes \\ 6641 \CFA & ©cmemalign© & yes (0 only) & no & yes & yes \\ 6642 & ©realloc© & copy & yes & yes & no \\ 6643 & ©alloc© & no & yes & no & yes \\ 6644 & ©alloc_set© & yes & yes & no & yes \\ 6645 & ©alloc_align© & no & yes & yes & yes \\ 6646 & ©alloc_align_set© & yes & yes & yes & yes \\ 6592 6647 \end{tabular} 6593 \end{center} 6594 It is impossible to resize with alignment because the underlying ©realloc© allocates storage if more space is needed, and it does not honour alignment from the original allocation. 6648 \end{minipage} 6649 \caption{Allocation Routines versus Storage-Management Capabilities} 6650 \label{t:AllocationVersusCapabilities} 6651 \end{table} 6652 6653 \CFA memory management extends the type safety of all allocations by using the type of the left-hand-side type to determine the allocation size and return a matching type for the new storage. 6654 Type-safe allocation is provided for all C allocation routines and new \CFA allocation routines, \eg in 6655 \begin{cfa} 6656 int * ip = (int *)malloc( sizeof(int) ); §\C{// C}§ 6657 int * ip = malloc(); §\C{// \CFA type-safe version of C malloc}§ 6658 int * ip = alloc(); §\C{// \CFA type-safe uniform alloc}§ 6659 \end{cfa} 6660 the latter two allocations determine the allocation size from the type of ©p© (©int©) and cast the pointer to the allocated storage to ©int *©. 6661 6662 \CFA memory management extends allocation safety by implicitly honouring all alignment requirements, \eg in 6663 \begin{cfa} 6664 struct S { int i; } __attribute__(( aligned( 128 ) )); // cache-line alignment 6665 S * sp = malloc(); §\C{// honour type alignment}§ 6666 \end{cfa} 6667 the storage allocation is implicitly aligned to 128 rather than the default 16. 6668 The alignment check is performed at compile time so there is no runtime cost. 6669 6670 \CFA memory management extends the resize capability with the notion of \newterm{sticky properties}. 6671 Hence, initial allocation capabilities are remembered and maintained when resize requires copying. 6672 For example, an initial alignment and fill capability are preserved during a resize copy so the copy has the same alignment and extended storage is filled. 6673 Without sticky properties it is dangerous to use ©realloc©, resulting in an idiom of manually performing the reallocation to maintain correctness. 6674 6675 \CFA memory management extends allocation to support constructors for initialization of allocated storage, \eg in 6676 \begin{cfa} 6677 struct S { int i; }; §\C{// cache-line aglinment}§ 6678 void ?{}( S & s, int i ) { s.i = i; } 6679 // assume ?|? operator for printing an S 6680 6681 S & sp = *®new®( 3 ); §\C{// call constructor after allocation}§ 6682 sout | sp.i; 6683 ®delete®( &sp ); 6684 6685 S * spa = ®anew®( 10, 5 ); §\C{// allocate array and initialize each array element}§ 6686 for ( i; 10 ) sout | spa[i] | nonl; 6687 sout | nl; 6688 ®adelete®( 10, spa ); 6689 \end{cfa} 6690 Allocation routines ©new©/©anew© allocate a variable/array and initialize storage using the allocated type's constructor. 6691 Note, the matching deallocation routines ©delete©/©adelete©. 6595 6692 6596 6693 \leavevmode 6597 6694 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 6598 // C unsafe allocation6599 6695 extern "C" { 6600 void * malloc( size_t size );§\indexc{memset}§ 6601 void * calloc( size_t dim, size_t size );§\indexc{calloc}§ 6602 void * realloc( void * ptr, size_t size );§\indexc{realloc}§ 6603 void * memalign( size_t align, size_t size );§\indexc{memalign}§ 6604 int posix_memalign( void ** ptr, size_t align, size_t size );§\indexc{posix_memalign}§ 6605 6606 // C unsafe initialization/copy 6607 void * memset( void * dest, int c, size_t size ); 6608 void * memcpy( void * dest, const void * src, size_t size ); 6609 } 6696 // C unsafe allocation 6697 void * malloc( size_t size );§\indexc{malloc}§ 6698 void * calloc( size_t dim, size_t size );§\indexc{calloc}§ 6699 void * realloc( void * ptr, size_t size );§\indexc{realloc}§ 6700 void * memalign( size_t align, size_t size );§\indexc{memalign}§ 6701 void * aligned_alloc( size_t align, size_t size );§\indexc{aligned_alloc}§ 6702 int posix_memalign( void ** ptr, size_t align, size_t size );§\indexc{posix_memalign}§ 6703 void * cmemalign( size_t alignment, size_t noOfElems, size_t elemSize );§\indexc{cmemalign}§ // CFA 6704 6705 // C unsafe initialization/copy 6706 void * memset( void * dest, int c, size_t size );§\indexc{memset}§ 6707 void * memcpy( void * dest, const void * src, size_t size );§\indexc{memcpy}§ 6708 } 6709 6710 void * realloc( void * oaddr, size_t nalign, size_t size ); // CFA heap 6610 6711 6611 6712 forall( dtype T | sized(T) ) { 6612 // §\CFA§ safe equivalents, i.e., implicit size specification6713 // §\CFA§ safe equivalents, i.e., implicit size specification 6613 6714 T * malloc( void ); 6614 6715 T * calloc( size_t dim ); 6615 6716 T * realloc( T * ptr, size_t size ); 6616 6717 T * memalign( size_t align ); 6718 T * cmemalign( size_t align, size_t dim ); 6617 6719 T * aligned_alloc( size_t align ); 6618 6720 int posix_memalign( T ** ptr, size_t align ); 6619 6721 6620 // §\CFA§ safe general allocation, fill, resize, array6722 // §\CFA§ safe general allocation, fill, resize, alignment, array 6621 6723 T * alloc( void );§\indexc{alloc}§ 6622 T * alloc( char fill );6623 6724 T * alloc( size_t dim ); 6624 T * alloc( size_t dim, char fill );6625 6725 T * alloc( T ptr[], size_t dim ); 6626 T * alloc( T ptr[], size_t dim, char fill ); 6627 6628 // §\CFA§ safe general allocation, align, fill, array 6629 T * align_alloc( size_t align ); 6630 T * align_alloc( size_t align, char fill ); 6631 T * align_alloc( size_t align, size_t dim ); 6632 T * align_alloc( size_t align, size_t dim, char fill ); 6633 6634 // §\CFA§ safe initialization/copy, i.e., implicit size specification 6635 T * memset( T * dest, char c );§\indexc{memset}§ 6726 T * alloc_set( char fill );§\indexc{alloc_set}§ 6727 T * alloc_set( T fill ); 6728 T * alloc_set( size_t dim, char fill ); 6729 T * alloc_set( size_t dim, T fill ); 6730 T * alloc_set( size_t dim, const T fill[] ); 6731 T * alloc_set( T ptr[], size_t dim, char fill ); 6732 6733 T * alloc_align( size_t align ); 6734 T * alloc_align( size_t align, size_t dim ); 6735 T * alloc_align( T ptr[], size_t align ); // aligned realloc array 6736 T * alloc_align( T ptr[], size_t align, size_t dim ); // aligned realloc array 6737 T * alloc_align_set( size_t align, char fill ); 6738 T * alloc_align_set( size_t align, T fill ); 6739 T * alloc_align_set( size_t align, size_t dim, char fill ); 6740 T * alloc_align_set( size_t align, size_t dim, T fill ); 6741 T * alloc_align_set( size_t align, size_t dim, const T fill[] ); 6742 T * alloc_align_set( T ptr[], size_t align, size_t dim, char fill ); 6743 6744 // §\CFA§ safe initialization/copy, i.e., implicit size specification 6745 T * memset( T * dest, char fill );§\indexc{memset}§ 6636 6746 T * memcpy( T * dest, const T * src );§\indexc{memcpy}§ 6637 6747 6638 // §\CFA§ safe initialization/copy array 6639 T * amemset( T dest[], char c, size_t dim );6748 // §\CFA§ safe initialization/copy, i.e., implicit size specification, array types 6749 T * amemset( T dest[], char fill, size_t dim ); 6640 6750 T * amemcpy( T dest[], const T src[], size_t dim ); 6641 6751 } 6642 6752 6643 // §\CFA§ allocation/deallocation and constructor/destructor 6644 forall( dtype T | sized(T), ttype Params | { void ?{}( T *, Params ); } ) T * new( Params p );§\indexc{new}§6645 forall( dtype T | { void ^?{}( T *); } ) void delete( T * ptr );§\indexc{delete}§6646 forall( dtype T, ttype Params | { void ^?{}( T *); void delete( Params ); } )6753 // §\CFA§ allocation/deallocation and constructor/destructor, non-array types 6754 forall( dtype T | sized(T), ttype Params | { void ?{}( T &, Params ); } ) T * new( Params p );§\indexc{new}§ 6755 forall( dtype T | sized(T) | { void ^?{}( T & ); } ) void delete( T * ptr );§\indexc{delete}§ 6756 forall( dtype T, ttype Params | sized(T) | { void ^?{}( T & ); void delete( Params ); } ) 6647 6757 void delete( T * ptr, Params rest ); 6648 6758 6649 // §\CFA§ allocation/deallocation and constructor/destructor, array 6650 forall( dtype T | sized(T), ttype Params | { void ?{}( T *, Params ); } ) T * anew( size_t dim, Params p );§\indexc{anew}§6651 forall( dtype T | sized(T) | { void ^?{}( T *); } ) void adelete( size_t dim, T arr[] );§\indexc{adelete}§6652 forall( dtype T | sized(T) | { void ^?{}( T *); }, ttype Params | { void adelete( Params ); } )6759 // §\CFA§ allocation/deallocation and constructor/destructor, array types 6760 forall( dtype T | sized(T), ttype Params | { void ?{}( T &, Params ); } ) T * anew( size_t dim, Params p );§\indexc{anew}§ 6761 forall( dtype T | sized(T) | { void ^?{}( T & ); } ) void adelete( size_t dim, T arr[] );§\indexc{adelete}§ 6762 forall( dtype T | sized(T) | { void ^?{}( T & ); }, ttype Params | { void adelete( Params ); } ) 6653 6763 void adelete( size_t dim, T arr[], Params rest ); 6654 6764 \end{cfa} -
libcfa/src/bits/containers.hfa
re6cfa8ff r5b544a6 169 169 get_next( *head ) = 0p; 170 170 verify(*this.tail == 1p); 171 verify( get_next(*head) == 0p ); 171 172 return head; 172 173 } -
libcfa/src/concurrency/monitor.cfa
re6cfa8ff r5b544a6 534 534 //Find the thread to run 535 535 $thread * signallee = pop_head( this.blocked )->waiting_thread; 536 /* paranoid */ verify( signallee->next == 0p );537 536 __set_owner( monitors, count, signallee ); 538 537 -
libcfa/src/exception.c
re6cfa8ff r5b544a6 72 72 // Used in the personality function, way down in termination. 73 73 // struct _Unwind_Context * -> _Unwind_Reason_Code(*)(exception_t *) 74 #if defined( __x86_64 ) 74 75 #define MATCHER_FROM_CONTEXT(ptr_to_context) \ 75 76 (*(_Unwind_Reason_Code(**)(exception_t *))(_Unwind_GetCFA(ptr_to_context) + 8)) 76 77 #elif defined( __i386 ) 78 #define MATCHER_FROM_CONTEXT(ptr_to_context) \ 79 (*(_Unwind_Reason_Code(**)(exception_t *))(_Unwind_GetCFA(ptr_to_context) + 24)) 80 #endif 77 81 78 82 // RESUMPTION ================================================================ -
libcfa/src/heap.cfa
re6cfa8ff r5b544a6 10 10 // Created On : Tue Dec 19 21:58:35 2017 11 11 // Last Modified By : Peter A. Buhr 12 // Last Modified On : Tue Feb 4 10:04:51202013 // Update Count : 6 4812 // Last Modified On : Fri Mar 6 10:14:52 2020 13 // Update Count : 650 14 14 // 15 15 … … 819 819 820 820 extern "C" { 821 // The malloc() function allocates size bytes and returns a pointer to the allocated memory. The memory is not 822 // initialized. If size is 0, then malloc() returns either 0p, or a unique pointer value that can later be 823 // successfully passed to free(). 821 // Allocates size bytes and returns a pointer to the allocated memory. The memory is not initialized. If size is 0, 822 // then malloc() returns either 0p, or a unique pointer value that can later be successfully passed to free(). 824 823 void * malloc( size_t size ) { 825 824 #ifdef __STATISTICS__ … … 831 830 } // malloc 832 831 833 // The calloc() function allocates memory for an array of nmemb elements of size bytes each and returns a pointer to834 // the allocated memory. The memory is set to zero. If nmemb or size is 0, then calloc() returns either 0p, or a835 // unique pointervalue that can later be successfully passed to free().832 // Allocate memory for an array of nmemb elements of size bytes each and returns a pointer to the allocated 833 // memory. The memory is set to zero. If nmemb or size is 0, then calloc() returns either 0p, or a unique pointer 834 // value that can later be successfully passed to free(). 836 835 void * calloc( size_t noOfElems, size_t elemSize ) { 837 836 #ifdef __STATISTICS__ … … 843 842 } // calloc 844 843 845 // The realloc() function changes the size of the memory block pointed to by ptr to size bytes. The contents will be846 // unchanged in the range from the start of the region up to the minimum of the old and new sizes. If the new size847 // is larger than the old size, the added memory will not be initialized. If ptr is 0p, then the call is848 // equivalent to malloc(size), for all values of size; if size is equal to zero, and ptr is not 0p, then the call849 // is equivalent to free(ptr). Unless ptr is 0p, it must have been returned by an earlier call to malloc(),850 // calloc() or realloc(). If the area pointedto was moved, a free(ptr) is done.844 // Change the size of the memory block pointed to by ptr to size bytes. The contents shall be unchanged in the range 845 // from the start of the region up to the minimum of the old and new sizes. If the new size is larger than the old 846 // size, the added memory shall not be initialized. If ptr is 0p, then the call is equivalent to malloc(size), for 847 // all values of size; if size is equal to zero, and ptr is not 0p, then the call is equivalent to free(ptr). Unless 848 // ptr is 0p, it must have been returned by an earlier call to malloc(), calloc() or realloc(). If the area pointed 849 // to was moved, a free(ptr) is done. 851 850 void * realloc( void * oaddr, size_t size ) { 852 851 #ifdef __STATISTICS__ … … 903 902 } // realloc 904 903 905 // The obsolete function memalign() allocates size bytes and returns a pointer to the allocated memory. The memory906 // a ddress will be a multiple of alignment, which must be a power of two.904 // Allocates size bytes and returns a pointer to the allocated memory. The memory address shall be a multiple of 905 // alignment, which must be a power of two. (obsolete) 907 906 void * memalign( size_t alignment, size_t size ) { 908 907 #ifdef __STATISTICS__ … … 915 914 916 915 917 // The cmemalign() function is the same as calloc() with memory alignment.916 // Same as calloc() with memory alignment. 918 917 void * cmemalign( size_t alignment, size_t noOfElems, size_t elemSize ) { 919 918 #ifdef __STATISTICS__ … … 925 924 } // cmemalign 926 925 927 // The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a928 // multiple of alignment.926 // Same as memalign(), but ISO/IEC 2011 C11 Section 7.22.2 states: the value of size shall be an integral multiple 927 // of alignment. This requirement is universally ignored. 929 928 void * aligned_alloc( size_t alignment, size_t size ) { 930 929 return memalign( alignment, size ); … … 932 931 933 932 934 // The function posix_memalign() allocates size bytes and places the address of the allocated memory in *memptr. The935 // address of the allocated memory will be a multiple of alignment, which must be a power of two and a multiple of936 // sizeof(void *). If size is 0, then posix_memalign() returns either 0p, or a unique pointer value that can later937 // be successfully passed tofree(3).933 // Allocates size bytes and places the address of the allocated memory in *memptr. The address of the allocated 934 // memory shall be a multiple of alignment, which must be a power of two and a multiple of sizeof(void *). If size 935 // is 0, then posix_memalign() returns either 0p, or a unique pointer value that can later be successfully passed to 936 // free(3). 938 937 int posix_memalign( void ** memptr, size_t alignment, size_t size ) { 939 938 if ( alignment < sizeof(void *) || ! libPow2( alignment ) ) return EINVAL; // check alignment … … 943 942 } // posix_memalign 944 943 945 // The obsolete function valloc() allocates size bytes and returns a pointer to the allocated memory. The memory946 // address will be a multiple of thepage size. It is equivalent to memalign(sysconf(_SC_PAGESIZE),size).944 // Allocates size bytes and returns a pointer to the allocated memory. The memory address shall be a multiple of the 945 // page size. It is equivalent to memalign(sysconf(_SC_PAGESIZE),size). 947 946 void * valloc( size_t size ) { 948 947 return memalign( pageSize, size ); … … 950 949 951 950 952 // The free() function frees the memory space pointed to by ptr, which must have been returned by a previous call to 953 // malloc(), calloc() or realloc(). Otherwise, or if free(ptr) has already been called before, undefined behavior 954 // occurs. If ptr is 0p, no operation is performed. 951 // Same as valloc but rounds size to multiple of page size. 952 void * pvalloc( size_t size ) { 953 return memalign( pageSize, libCeiling( size, pageSize ) ); 954 } // pvalloc 955 956 957 // Frees the memory space pointed to by ptr, which must have been returned by a previous call to malloc(), calloc() 958 // or realloc(). Otherwise, or if free(ptr) has already been called before, undefined behavior occurs. If ptr is 959 // 0p, no operation is performed. 955 960 void free( void * addr ) { 956 961 #ifdef __STATISTICS__ … … 973 978 974 979 975 // The malloc_alignment() function returns the alignment of the allocation.980 // Returns the alignment of the allocation. 976 981 size_t malloc_alignment( void * addr ) { 977 982 if ( unlikely( addr == 0p ) ) return libAlign(); // minimum alignment … … 985 990 986 991 987 // The malloc_zero_fill() function returns true if the allocation is zero filled, i.e., initially allocated by calloc().992 // Returns true if the allocation is zero filled, i.e., initially allocated by calloc(). 988 993 bool malloc_zero_fill( void * addr ) { 989 994 if ( unlikely( addr == 0p ) ) return false; // null allocation is not zero fill … … 996 1001 997 1002 998 // The malloc_usable_size() function returns the number of usable bytes in the block pointed to by ptr, a pointer to999 // a block of memory allocated by malloc(3)or a related function.1003 // Returns the number of usable bytes in the block pointed to by ptr, a pointer to a block of memory allocated by 1004 // malloc or a related function. 1000 1005 size_t malloc_usable_size( void * addr ) { 1001 1006 if ( unlikely( addr == 0p ) ) return 0; // null allocation has 0 size … … 1009 1014 1010 1015 1011 // The malloc_stats() function prints (on default standard error) statistics about memory allocated by malloc(3) and 1012 // related functions. 1016 // Prints (on default standard error) statistics about memory allocated by malloc and related functions. 1013 1017 void malloc_stats( void ) { 1014 1018 #ifdef __STATISTICS__ … … 1018 1022 } // malloc_stats 1019 1023 1020 // The malloc_stats_fd() function changes the file descripter where malloc_stats() writes thestatistics.1024 // Changes the file descripter where malloc_stats() writes statistics. 1021 1025 int malloc_stats_fd( int fd __attribute__(( unused )) ) { 1022 1026 #ifdef __STATISTICS__ … … 1030 1034 1031 1035 1032 // The mallopt() function adjusts parameters that control the behavior of the memory-allocation functions (see 1033 // malloc(3)). The param argument specifies the parameter to be modified, and value specifies the new value for that 1034 // parameter. 1036 // Adjusts parameters that control the behavior of the memory-allocation functions (see malloc). The param argument 1037 // specifies the parameter to be modified, and value specifies the new value for that parameter. 1035 1038 int mallopt( int option, int value ) { 1036 1039 choose( option ) { … … 1043 1046 } // mallopt 1044 1047 1045 // The malloc_trim() function attempts to release free memory at the top of the heap (by calling sbrk(2) with a 1046 // suitable argument). 1048 // Attempt to release free memory at the top of the heap (by calling sbrk with a suitable argument). 1047 1049 int malloc_trim( size_t ) { 1048 1050 return 0; // => impossible to release memory … … 1050 1052 1051 1053 1052 // The malloc_info() function exports an XML string that describes the current state of the memory-allocation1053 // implementation in the caller. The string is printed on the file stream stream. The exported string includes1054 // information about all arenas (see malloc(3)).1054 // Exports an XML string that describes the current state of the memory-allocation implementation in the caller. 1055 // The string is printed on the file stream stream. The exported string includes information about all arenas (see 1056 // malloc). 1055 1057 int malloc_info( int options, FILE * stream ) { 1056 1058 if ( options != 0 ) { errno = EINVAL; return -1; } … … 1059 1061 1060 1062 1061 // The malloc_get_state() function records the current state of all malloc(3) internal bookkeeping variables (but1062 // not the actual contents of the heap or the state of malloc_hook(3) functions pointers). The state is recorded in1063 // a system-dependent opaque data structure dynamically allocated via malloc(3), and a pointer to that data1064 // structure is returned as the function result. (It is the caller's responsibility to free(3)this memory.)1063 // Records the current state of all malloc internal bookkeeping variables (but not the actual contents of the heap 1064 // or the state of malloc_hook functions pointers). The state is recorded in a system-dependent opaque data 1065 // structure dynamically allocated via malloc, and a pointer to that data structure is returned as the function 1066 // result. (The caller must free this memory.) 1065 1067 void * malloc_get_state( void ) { 1066 1068 return 0p; // unsupported … … 1068 1070 1069 1071 1070 // The malloc_set_state() function restores the state of all malloc(3) internal bookkeeping variables to the values1071 // recorded in the opaque datastructure pointed to by state.1072 // Restores the state of all malloc internal bookkeeping variables to the values recorded in the opaque data 1073 // structure pointed to by state. 1072 1074 int malloc_set_state( void * ptr ) { 1073 1075 return 0; // unsupported -
libcfa/src/interpose.cfa
re6cfa8ff r5b544a6 10 10 // Created On : Wed Mar 29 16:10:31 2017 11 11 // Last Modified By : Peter A. Buhr 12 // Last Modified On : Mon Feb 17 10:18:53202013 // Update Count : 1 6612 // Last Modified On : Mon Mar 2 17:37:00 2020 13 // Update Count : 176 14 14 // 15 15 … … 143 143 void abort( const char fmt[], ... ) __attribute__(( format(printf, 1, 2), __nothrow__, __leaf__, __noreturn__ )); 144 144 void abort( bool signalAbort, const char fmt[], ... ) __attribute__(( format(printf, 2, 3), __nothrow__, __leaf__, __noreturn__ )); 145 void __abort( bool signalAbort, const char fmt[], va_list args ) __attribute__(( __nothrow__, __leaf__, __noreturn__ )); 145 146 146 147 extern "C" { … … 152 153 va_list argp; 153 154 va_start( argp, fmt ); 154 abort( false, fmt, argp );155 __abort( false, fmt, argp ); 155 156 va_end( argp ); 156 157 } … … 218 219 } 219 220 220 void abort( bool signalAbort, const char fmt[], ... ) { 221 // Cannot forward va_list. 222 void __abort( bool signalAbort, const char fmt[], va_list args ) { 221 223 void * kernel_data = kernel_abort(); // must be done here to lock down kernel 222 224 int len; … … 228 230 229 231 assert( fmt ); 230 va_list args;231 va_start( args, fmt );232 233 232 len = vsnprintf( abort_text, abort_text_size, fmt, args ); 234 va_end( args );235 233 __cfaabi_bits_write( STDERR_FILENO, abort_text, len ); 236 234 … … 248 246 va_list args; 249 247 va_start( args, fmt ); 250 abort( false, fmt, args ); 248 __abort( false, fmt, args ); 249 // CONTROL NEVER REACHES HERE! 251 250 va_end( args ); 251 } 252 253 void abort( bool signalAbort, const char fmt[], ... ) { 254 va_list args; 255 va_start( args, fmt ); 256 __abort( signalAbort, fmt, args ); 257 // CONTROL NEVER REACHES HERE! 258 va_end( args ); 252 259 } 253 260 -
libcfa/src/iostream.cfa
re6cfa8ff r5b544a6 10 10 // Created On : Wed May 27 17:56:53 2015 11 11 // Last Modified By : Peter A. Buhr 12 // Last Modified On : Thu Feb 20 15:53:23202013 // Update Count : 8 2912 // Last Modified On : Thu Mar 5 15:56:16 2020 13 // Update Count : 834 14 14 // 15 15 … … 536 536 static void base10_128( ostype & os, _Ostream_Manip(T) fmt ) { \ 537 537 if ( fmt.val > UINT64_MAX ) { \ 538 fmt.val /= P10_UINT64; \ 539 base10_128( os, fmt ); /* recursive */ \ 538 base10_128( os, fmt.val / P10_UINT64 ); /* recursive */ \ 540 539 _Ostream_Manip(unsigned long long int) fmt2 @= { (uint64_t)(fmt.val % P10_UINT64), 0, 19, 'u', { .all : 0 } }; \ 541 540 fmt2.flags.nobsdp = true; \ … … 544 543 (ostype &)(os | fmt2); \ 545 544 } else { \ 546 printf( "fmt %c %lld %d\n", fmt.base, fmt.val, fmt.all ); \ 547 (ostype &)(os | fmt); \ 545 printf( "fmt %c %lld %d\n", fmt.base, (unsigned long long int)fmt.val, fmt.all ); \ 546 _Ostream_Manip(SIGNED long long int) x @= { (unsigned long long int)fmt.val, fmt.wd, fmt.pc, fmt.base, { .all : fmt.all } }; \ 547 (ostype &)(os | x); \ 548 548 } /* if */ \ 549 549 } /* base10_128 */ \ … … 552 552 if ( $sepPrt( os ) ) fmt( os, "%s", $sepGetCur( os ) ); \ 553 553 \ 554 if ( f.base == 'b' | f.base == ' o' | f.base == 'x' | f.base == 'X' ) { \554 if ( f.base == 'b' | f.base == 'B' | f.base == 'o' | f.base == 'x' | f.base == 'X' ) { \ 555 555 unsigned long long int msig = (unsigned long long int)(f.val >> 64); \ 556 556 unsigned long long int lsig = (unsigned long long int)(f.val); \ … … 562 562 } else { \ 563 563 fmt2.flags.pad0 = fmt2.flags.nobsdp = true; \ 564 if ( f.base == 'b' ) { \564 if ( f.base == 'b' | f.base == 'B' ) { \ 565 565 if ( f.wd > 64 ) fmt.wd = f.wd - 64; \ 566 566 fmt2.wd = 64; \ -
libcfa/src/stdlib.hfa
re6cfa8ff r5b544a6 10 10 // Created On : Thu Jan 28 17:12:35 2016 11 11 // Last Modified By : Peter A. Buhr 12 // Last Modified On : T ue Feb 4 08:27:01202013 // Update Count : 40 112 // Last Modified On : Thu Mar 5 11:29:06 2020 13 // Update Count : 407 14 14 // 15 15 … … 21 21 #include <stdlib.h> // *alloc, strto*, ato* 22 22 23 // Reduce includes by explicitly defining these routines. 23 24 extern "C" { 24 25 void * memalign( size_t align, size_t size ); // malloc.h 26 void * cmemalign( size_t alignment, size_t noOfElems, size_t elemSize ); // CFA heap 25 27 void * memset( void * dest, int fill, size_t size ); // string.h 26 28 void * memcpy( void * dest, const void * src, size_t size ); // string.h 27 void * cmemalign( size_t alignment, size_t noOfElems, size_t elemSize ); // CFA heap28 29 } // extern "C" 29 30 … … 40 41 41 42 static inline forall( dtype T | sized(T) ) { 42 // C dynamic allocation43 // Cforall safe equivalents, i.e., implicit size specification 43 44 44 45 T * malloc( void ) { … … 72 73 } // posix_memalign 73 74 74 // Cforall dynamic allocation75 // Cforall safe general allocation, fill, resize, array 75 76 76 77 T * alloc( void ) { … … 159 160 160 161 static inline forall( dtype T | sized(T) ) { 161 // data, non-array types162 // Cforall safe initialization/copy, i.e., implicit size specification, non-array types 162 163 T * memset( T * dest, char fill ) { 163 164 return (T *)memset( dest, fill, sizeof(T) ); … … 170 171 171 172 static inline forall( dtype T | sized(T) ) { 172 // data, array types173 // Cforall safe initialization/copy, i.e., implicit size specification, array types 173 174 T * amemset( T dest[], char fill, size_t dim ) { 174 175 return (T *)(void *)memset( dest, fill, dim * sizeof(T) ); // C memset … … 180 181 } // distribution 181 182 182 // allocation/deallocation and constructor/destructor, non-array types183 // Cforall allocation/deallocation and constructor/destructor, non-array types 183 184 forall( dtype T | sized(T), ttype Params | { void ?{}( T &, Params ); } ) T * new( Params p ); 184 185 forall( dtype T | sized(T) | { void ^?{}( T & ); } ) void delete( T * ptr ); 185 186 forall( dtype T, ttype Params | sized(T) | { void ^?{}( T & ); void delete( Params ); } ) void delete( T * ptr, Params rest ); 186 187 187 // allocation/deallocation and constructor/destructor, array types188 // Cforall allocation/deallocation and constructor/destructor, array types 188 189 forall( dtype T | sized(T), ttype Params | { void ?{}( T &, Params ); } ) T * anew( size_t dim, Params p ); 189 190 forall( dtype T | sized(T) | { void ^?{}( T & ); } ) void adelete( size_t dim, T arr[] ); -
src/SynTree/LinkageSpec.cc
re6cfa8ff r5b544a6 9 9 // Author : Rodolfo G. Esteves 10 10 // Created On : Sat May 16 13:22:09 2015 11 // Last Modified By : Peter A. Buhr12 // Last Modified On : Mon Dec 16 15:02:29 201913 // Update Count : 2 811 // Last Modified By : Andrew Beach 12 // Last Modified On : Mon Mar 2 16:13:00 2020 13 // Update Count : 29 14 14 // 15 15 … … 20 20 21 21 #include "LinkageSpec.h" 22 #include "Common/CodeLocation.h" 22 23 #include "Common/SemanticError.h" 23 24 -
src/SynTree/LinkageSpec.h
re6cfa8ff r5b544a6 9 9 // Author : Rodolfo G. Esteves 10 10 // Created On : Sat May 16 13:24:28 2015 11 // Last Modified By : Peter A. Buhr12 // Last Modified On : Mon Dec 16 15:03:43 201913 // Update Count : 2 011 // Last Modified By : Andrew Beach 12 // Last Modified On : Mon Mar 2 16:13:00 2020 13 // Update Count : 21 14 14 // 15 15 … … 18 18 #include <string> 19 19 20 #include "Common/CodeLocation.h" 20 struct CodeLocation; 21 21 22 22 namespace LinkageSpec { -
tools/cfa.nanorc
re6cfa8ff r5b544a6 26 26 27 27 # Escaped Keywords, now Identifiers. 28 color white "` \w+`"28 color white "``\w+" 29 29 30 30 # Operator Names
Note: See TracChangeset
for help on using the changeset viewer.