Changeset f93c50a

doc/theses/andrew_beach_MMath/Makefile

-              r7e7a076
+              rf93c50a
 # The main rule, it does all the tex/latex processing.
 ${BUILD}/${BASE}.dvi: ${RAWSRC} ${FIGTEX} Makefile | ${BUILD}
+${BUILD}/${BASE}.dvi: ${RAWSRC} ${FIGTEX} termhandle.pstex resumhandle.pstex Makefile | ${BUILD}
         ${LATEX} ${BASE}
         ${BIBTEX} ${BUILD}/${BASE}
 …
 ${FIGTEX}: ${BUILD}/%.tex: %.fig | ${BUILD}
         fig2dev -L eepic $< > $@
+%.pstex : %.fig | ${Build}
+        fig2dev -L pstex $< > ${BUILD}/$@
+        fig2dev -L pstex_t -p ${BUILD}/$@ $< > ${BUILD}/$@_t
 # Step through dvi & postscript to handle xfig specials.

doc/theses/andrew_beach_MMath/existing.tex

r7e7a076	rf93c50a
49	49	asterisk (@*@) is replaced with a ampersand (@&@);
50	50	this includes cv-qualifiers (\snake{const} and \snake{volatile})
51		~~%\todo{Should I go into even more detail on cv-qualifiers.}~~
52	51	and multiple levels of reference.
53	52

doc/theses/andrew_beach_MMath/features.tex

-              r7e7a076
+              rf93c50a
 \section{Virtuals}
 \label{s:virtuals}
+%\todo{Maybe explain what "virtual" actually means.}
+A common feature in many programming languages is a tool to pair code
+(behaviour) with data.
+In \CFA this is done with the virtual system,
+which allow type information to be abstracted away, recovered and allow
+operations to be performed on the abstract objects.
 Virtual types and casts are not part of \CFA's EHM nor are they required for
 an EHM.
 …
 Since it is so general, a more specific handler can be defined,
 overriding the default behaviour for the specific exception types.
+%\todo{Examples?}
+For example, consider an error reading a configuration file.
+This is most likely a problem with the configuration file (@config_error@),
+but the function could have been passed the wrong file name (@arg_error@).
+In this case the function could raise one exception and then, if it is
+unhandled, raise the other.
+This is not usual behaviour for either exception so changing the
+default handler will be done locally:
+\begin{cfa}
+{
+        void defaultTerminationHandler(config_error &) {
+                throw (arg_error){arg_vt};
+        }
+        throw (config_error){config_vt};
+}
+\end{cfa}
 \subsection{Resumption}
 …
 the just handled exception came from, and continues executing after it,
 not after the try statement.
+%\todo{Examples?}
+For instance, a resumption used to send messages to the logger may not
+need to be handled at all. Putting the following default handler
+at the global scope can make handling that exception optional by default.
+\begin{cfa}
+void defaultResumptionHandler(log_message &) {
+    // Nothing, it is fine not to handle logging.
+}
+// ... No change at raise sites. ...
+throwResume (log_message){strlit_log, "Begin event processing."}
+\end{cfa}
 \subsubsection{Resumption Marking}
 …
 After a coroutine stack is unwound, control returns to the @resume@ function
 that most recently resumed it. @resume@ reports a
 @CoroutineCancelled@ exception, which contains a references to the cancelled
+@CoroutineCancelled@ exception, which contains a reference to the cancelled
 coroutine and the exception used to cancel it.
 The @resume@ function also takes the \defaultResumptionHandler{} from the

doc/theses/andrew_beach_MMath/implement.tex

-              r7e7a076
+              rf93c50a
 The problem is that a type ID may appear in multiple TUs that compose a
 program (see \autoref{ss:VirtualTable}), so the initial solution would seem
 to be make it external in each translation unit. Hovever, the type ID must
+to be make it external in each translation unit. However, the type ID must
 have a declaration in (exactly) one of the TUs to create the storage.
 No other declaration related to the virtual type has this property, so doing
 …
 \subsection{Virtual Table}
 \label{ss:VirtualTable}
-%\todo{Clarify virtual table type vs. virtual table instance.}
 Each virtual type has a virtual table type that stores its type ID and
 virtual members.
+Each virtual type instance is bound to a table instance that is filled with
+the values of virtual members.
+Both the layout of the fields and their value are decided by the rules given
+An instance of a virtual type is bound to a virtual table instance,
+which have the values of the virtual members.
+Both the layout of the fields (in the virtual table type)
+and their value (in the virtual table instance) are decided by the rules given
 below.
 …
 of a function's state with @setjmp@ and restoring that snapshot with
 @longjmp@. This approach bypasses the need to know stack details by simply
 reseting to a snapshot of an arbitrary but existing function frame on the
+resetting to a snapshot of an arbitrary but existing function frame on the
 stack. It is up to the programmer to ensure the snapshot is valid when it is
 reset and that all required cleanup from the unwound stacks is performed.
+This approach is fragile and requires extra work in the surrounding code.
+Because it does not automate or check any of this cleanup,
+it can be easy to make mistakes and always must be handled manually.
 With respect to the extra work in the surrounding code,
 …
 library that provides tools for stack walking, handler execution, and
 unwinding. What follows is an overview of all the relevant features of
+libunwind needed for this work, and how \CFA uses them to implement exception
+handling.
+libunwind needed for this work.
+Following that is the description of the \CFA code that uses libunwind
+to implement termination.
 \subsection{libunwind Usage}

doc/theses/andrew_beach_MMath/intro.tex

-              r7e7a076
+              rf93c50a
 it returns control to that function.
 \begin{center}
+\input{termination}
+%\input{termination}
+%
+%\medskip
+\input{termhandle.pstex_t}
+% I hate these diagrams, but I can't access xfig to fix them and they are
+% better than the alternative.
 \end{center}
-%\todo{What does the right half of termination.fig mean?}
 Resumption exception handling searches the stack for a handler and then calls
 …
 that preformed the raise, usually starting after the raise.
 \begin{center}
+\input{resumption}
+%\input{resumption}
+%
+%\medskip
+\input{resumhandle.pstex_t}
+% The other one.
 \end{center}
 …
 unwinding the stack like in termination exception
 handling.\cite{RustPanicMacro}\cite{RustPanicModule}
 Go's panic through is very similar to a termination, except it only supports
+Go's panic though is very similar to a termination, except it only supports
 a catch-all by calling \code{Go}{recover()}, simplifying the interface at
 the cost of flexibility.\cite{Go:2021}
 …
 through multiple functions before it is addressed.
+Here is an example of the pattern in Bash, where commands can only  ``return"
+numbers and most output is done through streams of text.
+\begin{lstlisting}[language=bash,escapechar={}]
+# Immediately after running a command:
+case $? in
+)
+        # Success
+        ;;
+)
+        # Error Code 1
+        ;;
+|3)
+        # Error Code 2 or Error Code 3
+        ;;
+# Add more cases as needed.
+asac
+\end{lstlisting}
 \item\emph{Special Return with Global Store}:
 Similar to the error codes pattern but the function itself only returns
 …
 This approach avoids the multiple results issue encountered with straight
+error codes but otherwise has the same disadvantages and more.
+error codes as only a single error value has to be returned,
+but otherwise has the same disadvantages and more.
 Every function that reads or writes to the global store must agree on all
 possible errors and managing it becomes more complex with concurrency.
+This example shows some of what has to be done to robustly handle a C
+standard library function that reports errors this way.
+\begin{lstlisting}[language=C]
+// Make sure to clear the store.
+errno = 0;
+// Now a library function can set the error.
+int handle = open(path_name, flags);
+if (-1 == handle) {
+        switch (errno) {
+    case ENAMETOOLONG:
+                // path_name is a bad argument.
+                break;
+        case ENFILE:
+                // A system resource has been exausted.
+                break;
+        // And many more...
+    }
+}
+\end{lstlisting}
+% cite open man page?
 \item\emph{Return Union}:
 …
 This pattern is very popular in any functional or semi-functional language
 with primitive support for tagged unions (or algebraic data types).
+% We need listing Rust/rust to format code snippets from it.
+% Rust's \code{rust}{Result<T, E>}
+Return unions can also be expressed as monads (evaluation in a context)
+and often are in languages with special syntax for monadic evaluation,
+such as Haskell's \code{haskell}{do} blocks.
 The main advantage is that an arbitrary object can be used to represent an
 error, so it can include a lot more information than a simple error code.
 …
 execution, and if there aren't primitive tagged unions proper, usage can be
 hard to enforce.
+% We need listing Rust/rust to format code snippets from it.
+% Rust's \code{rust}{Result<T, E>}
+This is a simple example of examining the result of a failing function in
+Haskell, using its \code{haskell}{Either} type.
+Examining \code{haskell}{error} further would likely involve more matching,
+but the type of \code{haskell}{error} is user defined so there are no
+general cases.
+\begin{lstlisting}[language=haskell]
+case failingFunction argA argB of
+    Right value -> -- Use the successful computed value.
+    Left error -> -- Handle the produced error.
+\end{lstlisting}
+Return unions as monads will result in the same code, but can hide most
+of the work to propagate errors in simple cases. The code to actually handle
+the errors, or to interact with other monads (a common case in these
+languages) still has to be written by hand.
+If \code{haskell}{failingFunction} is implemented with two helpers that
+use the same error type, then it can be implemented with a \code{haskell}{do}
+block.
+\begin{lstlisting}[language=haskell]
+failingFunction x y = do
+        z <- helperOne x
+        helperTwo y z
+\end{lstlisting}
 \item\emph{Handler Functions}:
 …
 function calls, but cheaper (constant time) to call,
 they are more suited to more frequent (less exceptional) situations.
+Although, in \Cpp and other languages that do not have checked exceptions,
+they can actually be enforced by the type system be more reliable.
+This is a more local example in \Cpp, using a function to provide
+a default value for a mapping.
+\begin{lstlisting}[language=C++]
+ValueT Map::key_or_default(KeyT key, ValueT(*make_default)(KeyT)) {
+        ValueT * value = find_value(key);
+        if (nullptr != value) {
+                return *value;
+        } else {
+                return make_default(key);
+        }
+}
+\end{lstlisting}
 \end{itemize}

doc/theses/andrew_beach_MMath/performance.tex

-              r7e7a076
+              rf93c50a
 resumption exceptions. Even the older programming languages with resumption
 seem to be notable only for having resumption.
+On the other hand, the functional equivalents to resumption are too new.
+There does not seem to be any standard implementations in well-known
+languages, so far they seem confined to extensions and research languages.
+% There was some maybe interesting comparison to an OCaml extension
+% but I'm not sure how to get that working if it is interesting.
 Instead, resumption is compared to its simulation in other programming
 languages: fixup functions that are explicitly passed into a function.
 …
 \CFA, \Cpp and Java.
 % To be exact, the Match All and Match None cases.
+%\todo{Not true in Python.}
+The most likely explanation is that, since exceptions
+are rarely considered to be the common case, the more optimized languages
+make that case expensive to improve other cases.
+The most likely explination is that,
+the generally faster languages have made ``common cases fast" at the expense
+of the rarer cases. Since exceptions are considered rare, they are made
+expensive to help speed up common actions, such as entering and leaving try
+statements.
+Python on the other hand, while generally slower than the other languages,
+uses exceptions more and has not scarified their performance.
 In addition, languages with high-level representations have a much
 easier time scanning the stack as there is less to decode.

doc/theses/andrew_beach_MMath/uw-ethesis.bib

r7e7a076	rf93c50a
50	50	author={The Rust Team},
51	51	key={Rust Panic Macro},
52		howpublished={\href{https://doc.rust-lang.org/std/~~panic/index.html}{https://\-doc.rust-lang.org/\-std/\-panic/\-index~~.html}},
	52	howpublished={\href{https://doc.rust-lang.org/std/macro.panic.html}{https://\-doc.rust-lang.org/\-std/\-macro.panic.html}},
53	53	addendum={Accessed 2021-08-31},
54	54	}

libcfa/src/concurrency/clib/cfathread.cfa

-              r7e7a076
+              rf93c50a
 //
+// #define EPOLL_FOR_SOCKETS
 #include "fstream.hfa"
 #include "locks.hfa"
 …
 #include "cfathread.h"
+extern "C" {
+                #include <string.h>
+                #include <errno.h>
+}
 extern void ?{}(processor &, const char[], cluster &, thread$ *);
 extern "C" {
       extern void __cfactx_invoke_thread(void (*main)(void *), void * this);
+        extern int accept4(int sockfd, struct sockaddr *addr, socklen_t *addrlen, int flags);
+}
 extern Time __kernel_get_time();
+extern unsigned register_proc_id( void );
 //================================================================================
+// Thread run y the C Interface
+// Epoll support for sockets
+#if defined(EPOLL_FOR_SOCKETS)
+        extern "C" {
+                #include <sys/epoll.h>
+                #include <sys/resource.h>
+        }
+        static pthread_t master_poller;
+        static int master_epollfd = 0;
+        static size_t poller_cnt = 0;
+        static int * poller_fds = 0p;
+        static struct leaf_poller * pollers = 0p;
+        struct __attribute__((aligned)) fd_info_t {
+                int pollid;
+                size_t rearms;
+        };
+        rlim_t fd_limit = 0;
+        static fd_info_t * volatile * fd_map = 0p;
+        void * master_epoll( __attribute__((unused)) void * args ) {
+                unsigned id = register_proc_id();
+                enum { MAX_EVENTS = 5 };
+                struct epoll_event events[MAX_EVENTS];
+                for() {
+                        int ret = epoll_wait(master_epollfd, events, MAX_EVENTS, -1);
+                        if ( ret < 0 ) {
+                                abort | "Master epoll error: " | strerror(errno);
+                        }
+                        for(i; ret) {
+                                thread$ * thrd = (thread$ *)events[i].data.u64;
+                                unpark( thrd );
+                        }
+                }
+                return 0p;
+        }
+        static inline int epoll_rearm(int epollfd, int fd, uint32_t event) {
+                struct epoll_event eevent;
+                eevent.events = event | EPOLLET | EPOLLONESHOT;
+                eevent.data.u64 = (uint64_t)active_thread();
+                if(0 != epoll_ctl(epollfd, EPOLL_CTL_MOD, fd, &eevent))
+                {
+                        if(errno == ENOENT) return -1;
+                        abort | acquire | "epoll" | epollfd | "ctl rearm" | fd | "error: " | errno | strerror(errno);
+                }
+                park();
+                return 0;
+        }
+        thread leaf_poller {
+                int epollfd;
+        };
+        void ?{}(leaf_poller & this, int fd) { this.epollfd = fd; }
+        void main(leaf_poller & this) {
+                enum { MAX_EVENTS = 1024 };
+                struct epoll_event events[MAX_EVENTS];
+                const int max_retries = 5;
+                int retries = max_retries;
+                struct epoll_event event;
+                event.events = EPOLLIN | EPOLLET | EPOLLONESHOT;
+                event.data.u64 = (uint64_t)&(thread&)this;
+                if(0 != epoll_ctl(master_epollfd, EPOLL_CTL_ADD, this.epollfd, &event))
+                {
+                        abort | "master epoll ctl add leaf: " | errno | strerror(errno);
+                }
+                park();
+                for() {
+                        yield();
+                        int ret = epoll_wait(this.epollfd, events, MAX_EVENTS, 0);
+                        if ( ret < 0 ) {
+                                abort | "Leaf epoll error: " | errno | strerror(errno);
+                        }
+                        if(ret) {
+                                for(i; ret) {
+                                        thread$ * thrd = (thread$ *)events[i].data.u64;
+                                        unpark( thrd, UNPARK_REMOTE );
+                                }
+                        }
+                        else if(0 >= --retries) {
+                                epoll_rearm(master_epollfd, this.epollfd, EPOLLIN);
+                        }
+                }
+        }
+        void setup_epoll( void ) __attribute__(( constructor ));
+        void setup_epoll( void ) {
+                if(master_epollfd) abort | "Master epoll already setup";
+                master_epollfd = epoll_create1(0);
+                if(master_epollfd == -1) {
+                        abort | "failed to create master epoll: " | errno | strerror(errno);
+                }
+                struct rlimit rlim;
+                if(int ret = getrlimit(RLIMIT_NOFILE, &rlim); 0 != ret) {
+                        abort | "failed to get nofile limit: " | errno | strerror(errno);
+                }
+                fd_limit = rlim.rlim_cur;
+                fd_map = alloc(fd_limit);
+                for(i;fd_limit) {
+                        fd_map[i] = 0p;
+                }
+                poller_cnt = 2;
+                poller_fds = alloc(poller_cnt);
+                pollers    = alloc(poller_cnt);
+                for(i; poller_cnt) {
+                        poller_fds[i] = epoll_create1(0);
+                        if(poller_fds[i] == -1) {
+                                abort | "failed to create leaf epoll [" | i | "]: " | errno | strerror(errno);
+                        }
+                        (pollers[i]){ poller_fds[i] };
+                }
+                pthread_attr_t attr;
+                if (int ret = pthread_attr_init(&attr); 0 != ret) {
+                        abort | "failed to create master epoll thread attr: " | ret | strerror(ret);
+                }
+                if (int ret = pthread_create(&master_poller, &attr, master_epoll, 0p); 0 != ret) {
+                        abort | "failed to create master epoll thread: " | ret | strerror(ret);
+                }
+        }
+        static inline int epoll_wait(int fd, uint32_t event) {
+                if(fd_map[fd] >= 1p) {
+                        fd_map[fd]->rearms++;
+                        epoll_rearm(poller_fds[fd_map[fd]->pollid], fd, event);
+                        return 0;
+                }
+                for() {
+                        fd_info_t * expected = 0p;
+                        fd_info_t * sentinel = 1p;
+                        if(__atomic_compare_exchange_n( &(fd_map[fd]), &expected, sentinel, true, __ATOMIC_SEQ_CST, __ATOMIC_RELAXED)) {
+                                struct epoll_event eevent;
+                                eevent.events = event | EPOLLET | EPOLLONESHOT;
+                                eevent.data.u64 = (uint64_t)active_thread();
+                                int id = thread_rand() % poller_cnt;
+                                if(0 != epoll_ctl(poller_fds[id], EPOLL_CTL_ADD, fd, &eevent))
+                                {
+                                        abort | "epoll ctl add" | poller_fds[id] | fd | fd_map[fd] | expected | "error: " | errno | strerror(errno);
+                                }
+                                fd_info_t * ninfo = alloc();
+                                ninfo->pollid = id;
+                                ninfo->rearms = 0;
+                                __atomic_store_n( &fd_map[fd], ninfo, __ATOMIC_SEQ_CST);
+                                park();
+                                return 0;
+                        }
+                        if(expected >= 0) {
+                                fd_map[fd]->rearms++;
+                                epoll_rearm(poller_fds[fd_map[fd]->pollid], fd, event);
+                                return 0;
+                        }
+                        Pause();
+                }
+        }
+#endif
+//================================================================================
+// Thread run by the C Interface
 struct cfathread_object {
 …
         // Mutex
         struct cfathread_mutex {
                 fast_lock impl;
+                linear_backoff_then_block_lock impl;
         };
         int cfathread_mutex_init(cfathread_mutex_t *restrict mut, const cfathread_mutexattr_t *restrict) __attribute__((nonnull (1))) { *mut = new(); return 0; }
 …
         // Condition
         struct cfathread_condition {
                 condition_variable(fast_lock) impl;
+                condition_variable(linear_backoff_then_block_lock) impl;
         };
         int cfathread_cond_init(cfathread_cond_t *restrict cond, const cfathread_condattr_t *restrict) __attribute__((nonnull (1))) { *cond = new(); return 0; }
 …
         // IO operations
         int cfathread_socket(int domain, int type, int protocol) {
+                return socket(domain, type, protocol);
+                return socket(domain, type
+                #if defined(EPOLL_FOR_SOCKETS)
+                        | SOCK_NONBLOCK
+                #endif
+                , protocol);
+        }
         int cfathread_bind(int socket, const struct sockaddr *address, socklen_t address_len) {
 …
         int cfathread_accept(int socket, struct sockaddr *restrict address, socklen_t *restrict address_len) {
+                return cfa_accept4(socket, address, address_len, 0, CFA_IO_LAZY);
+                #if defined(EPOLL_FOR_SOCKETS)
+                        int ret;
+                        for() {
+                                yield();
+                                ret = accept4(socket, address, address_len, SOCK_NONBLOCK);
+                                if(ret >= 0) break;
+                                if(errno != EAGAIN && errno != EWOULDBLOCK) break;
+                                epoll_wait(socket, EPOLLIN);
+                        }
+                        return ret;
+                #else
+                        return cfa_accept4(socket, address, address_len, 0, CFA_IO_LAZY);
+                #endif
+        }
         int cfathread_connect(int socket, const struct sockaddr *address, socklen_t address_len) {
+                return cfa_connect(socket, address, address_len, CFA_IO_LAZY);
+                #if defined(EPOLL_FOR_SOCKETS)
+                        int ret;
+                        for() {
+                                ret = connect(socket, address, address_len);
+                                if(ret >= 0) break;
+                                if(errno != EAGAIN && errno != EWOULDBLOCK) break;
+                                epoll_wait(socket, EPOLLIN);
+                        }
+                        return ret;
+                #else
+                        return cfa_connect(socket, address, address_len, CFA_IO_LAZY);
+                #endif
+        }
 …
         ssize_t cfathread_sendmsg(int socket, const struct msghdr *message, int flags) {
+                return cfa_sendmsg(socket, message, flags, CFA_IO_LAZY);
+                #if defined(EPOLL_FOR_SOCKETS)
+                        ssize_t ret;
+                        __STATS__( false, io.ops.sockwrite++; )
+                        for() {
+                                ret = sendmsg(socket, message, flags);
+                                if(ret >= 0) break;
+                                if(errno != EAGAIN && errno != EWOULDBLOCK) break;
+                                __STATS__( false, io.ops.epllwrite++; )
+                                epoll_wait(socket, EPOLLOUT);
+                        }
+                #else
+                        ssize_t ret = cfa_sendmsg(socket, message, flags, CFA_IO_LAZY);
+                #endif
+                return ret;
+        }
         ssize_t cfathread_write(int fildes, const void *buf, size_t nbyte) {
                 // Use send rather then write for socket since it's faster
+                return cfa_send(fildes, buf, nbyte, 0, CFA_IO_LAZY);
+                #if defined(EPOLL_FOR_SOCKETS)
+                        ssize_t ret;
+                        // __STATS__( false, io.ops.sockwrite++; )
+                        for() {
+                                ret = send(fildes, buf, nbyte, 0);
+                                if(ret >= 0) break;
+                                if(errno != EAGAIN && errno != EWOULDBLOCK) break;
+                                // __STATS__( false, io.ops.epllwrite++; )
+                                epoll_wait(fildes, EPOLLOUT);
+                        }
+                #else
+                        ssize_t ret = cfa_send(fildes, buf, nbyte, 0, CFA_IO_LAZY);
+                #endif
+                return ret;
+        }
 …
                 msg.msg_controllen = 0;
+                ssize_t ret = cfa_recvmsg(socket, &msg, flags, CFA_IO_LAZY);
+                #if defined(EPOLL_FOR_SOCKETS)
+                        ssize_t ret;
+                        yield();
+                        for() {
+                                ret = recvmsg(socket, &msg, flags);
+                                if(ret >= 0) break;
+                                if(errno != EAGAIN && errno != EWOULDBLOCK) break;
+                                epoll_wait(socket, EPOLLIN);
+                        }
+                #else
+                        ssize_t ret = cfa_recvmsg(socket, &msg, flags, CFA_IO_LAZY);
+                #endif
                 if(address_len) *address_len = msg.msg_namelen;
 …
         ssize_t cfathread_read(int fildes, void *buf, size_t nbyte) {
                 // Use recv rather then read for socket since it's faster
+                return cfa_recv(fildes, buf, nbyte, 0, CFA_IO_LAZY);
+        }
+}
+                #if defined(EPOLL_FOR_SOCKETS)
+                        ssize_t ret;
+                        __STATS__( false, io.ops.sockread++; )
+                        yield();
+                        for() {
+                                ret = recv(fildes, buf, nbyte, 0);
+                                if(ret >= 0) break;
+                                if(errno != EAGAIN && errno != EWOULDBLOCK) break;
+                                __STATS__( false, io.ops.epllread++; )
+                                epoll_wait(fildes, EPOLLIN);
+                        }
+                #else
+                        ssize_t ret = cfa_recv(fildes, buf, nbyte, 0, CFA_IO_LAZY);
+                #endif
+                return ret;
+        }
+}

libcfa/src/concurrency/invoke.h

-              r7e7a076
+              rf93c50a
                 bool corctx_flag;
-                int last_cpu;
                 //SKULLDUGGERY errno is not save in the thread data structure because returnToKernel appears to be the only function to require saving and restoring it
 …
                 struct cluster * curr_cluster;
                 // preferred ready-queue
+                // preferred ready-queue or CPU
                 unsigned preferred;

libcfa/src/concurrency/io.cfa

-              r7e7a076
+              rf93c50a
         static inline unsigned __flush( struct $io_context & );
         static inline __u32 __release_sqes( struct $io_context & );
         extern void __kernel_unpark( thread$ * thrd );
+        extern void __kernel_unpark( thread$ * thrd, unpark_hint );
         bool __cfa_io_drain( processor * proc ) {
 …
                         __cfadbg_print_safe( io, "Kernel I/O : Syscall completed : cqe %p, result %d for %p\n", &cqe, cqe.res, future );
                         __kernel_unpark( fulfil( *future, cqe.res, false ) );
+                        __kernel_unpark( fulfil( *future, cqe.res, false ), UNPARK_LOCAL );
+                }

libcfa/src/concurrency/kernel.cfa

-              r7e7a076
+              rf93c50a
+                                }
                                         __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 0\n", this->unique_id, rdtscl()); )
+                                __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 0\n", this->unique_id, rdtscl()); )
                                 __cfadbg_print_safe(runtime_core, "Kernel : core %p waiting on eventfd %d\n", this, this->idle);
+                                // __disable_interrupts_hard();
+                                eventfd_t val;
+                                eventfd_read( this->idle, &val );
+                                // __enable_interrupts_hard();
+                                {
+                                        eventfd_t val;
+                                        ssize_t ret = read( this->idle, &val, sizeof(val) );
+                                        if(ret < 0) {
+                                                switch((int)errno) {
+                                                case EAGAIN:
+                                                #if EAGAIN != EWOULDBLOCK
+                                                        case EWOULDBLOCK:
+                                                #endif
+                                                case EINTR:
+                                                        // No need to do anything special here, just assume it's a legitimate wake-up
+                                                        break;
+                                                default:
+                                                        abort( "KERNEL : internal error, read failure on idle eventfd, error(%d) %s.", (int)errno, strerror( (int)errno ) );
+                                                }
+                                        }
+                                }
                                         __STATS( if(this->print_halts) __cfaabi_bits_print_safe( STDOUT_FILENO, "PH:%d - %lld 1\n", this->unique_id, rdtscl()); )
 …
         /* paranoid */ verifyf( thrd_dst->link.next == 0p, "Expected null got %p", thrd_dst->link.next );
         __builtin_prefetch( thrd_dst->context.SP );
-        int curr = __kernel_getcpu();
-        if(thrd_dst->last_cpu != curr) {
-                int64_t l = thrd_dst->last_cpu;
-                int64_t c = curr;
-                int64_t v = (l << 32) | c;
-                __push_stat( __tls_stats(), v, false, "Processor", this );
+        }
-        thrd_dst->last_cpu = curr;
         __cfadbg_print_safe(runtime_core, "Kernel : core %p running thread %p (%s)\n", this, thrd_dst, thrd_dst->self_cor.name);
 …
                 if(unlikely(thrd_dst->preempted != __NO_PREEMPTION)) {
                         // The thread was preempted, reschedule it and reset the flag
                         schedule_thread$( thrd_dst );
+                        schedule_thread$( thrd_dst, UNPARK_LOCAL );
                         break RUNNING;
+                }
 …
 // Scheduler routines
 // KERNEL ONLY
 static void __schedule_thread( thread$ * thrd ) {
+static void __schedule_thread( thread$ * thrd, unpark_hint hint ) {
         /* paranoid */ verify( ! __preemption_enabled() );
         /* paranoid */ verify( ready_schedule_islocked());
 …
         // Dereference the thread now because once we push it, there is not guaranteed it's still valid.
         struct cluster * cl = thrd->curr_cluster;
         __STATS(bool outside = thrd->last_proc && thrd->last_proc != kernelTLS().this_processor; )
+        __STATS(bool outside = hint == UNPARK_LOCAL && thrd->last_proc && thrd->last_proc != kernelTLS().this_processor; )
         // push the thread to the cluster ready-queue
         push( cl, thrd, local );
+        push( cl, thrd, hint );
         // variable thrd is no longer safe to use
 …
+}
 void schedule_thread$( thread$ * thrd ) {
+void schedule_thread$( thread$ * thrd, unpark_hint hint ) {
         ready_schedule_lock();
                 __schedule_thread( thrd );
+                __schedule_thread( thrd, hint );
         ready_schedule_unlock();
+}
 …
+}
 void __kernel_unpark( thread$ * thrd ) {
+void __kernel_unpark( thread$ * thrd, unpark_hint hint ) {
         /* paranoid */ verify( ! __preemption_enabled() );
         /* paranoid */ verify( ready_schedule_islocked());
 …
         if(__must_unpark(thrd)) {
                 // Wake lost the race,
                 __schedule_thread( thrd );
+                __schedule_thread( thrd, hint );
+        }
 …
+}
 void unpark( thread$ * thrd ) {
+void unpark( thread$ * thrd, unpark_hint hint ) {
         if( !thrd ) return;
 …
                 disable_interrupts();
                         // Wake lost the race,
                         schedule_thread$( thrd );
+                        schedule_thread$( thrd, hint );
                 enable_interrupts(false);
+        }

libcfa/src/concurrency/kernel.hfa

-              r7e7a076
+              rf93c50a
 struct __attribute__((aligned(128))) __timestamp_t {
         volatile unsigned long long tv;
+};
+static inline void  ?{}(__timestamp_t & this) { this.tv = 0; }
+        volatile unsigned long long ma;
+};
+// Aligned timestamps which are used by the relaxed ready queue
+struct __attribute__((aligned(128))) __help_cnts_t {
+        volatile unsigned long long src;
+        volatile unsigned long long dst;
+        volatile unsigned long long tri;
+};
+static inline void  ?{}(__timestamp_t & this) { this.tv = 0; this.ma = 0; }
 static inline void ^?{}(__timestamp_t & this) {}
 …
                 // Array of times
                 __timestamp_t * volatile tscs;
+                // Array of stats
+                __help_cnts_t * volatile help;
                 // Number of lanes (empty or not)

libcfa/src/concurrency/kernel/fwd.hfa

-              r7e7a076
+              rf93c50a
         extern "Cforall" {
+                enum unpark_hint { UNPARK_LOCAL, UNPARK_REMOTE };
                 extern void park( void );
+                extern void unpark( struct thread$ * this );
+                extern void unpark( struct thread$ *, unpark_hint );
+                static inline void unpark( struct thread$ * thrd ) { unpark(thrd, UNPARK_LOCAL); }
                 static inline struct thread$ * active_thread () {
                         struct thread$ * t = publicTLS_get( this_thread );

libcfa/src/concurrency/kernel/startup.cfa

-              r7e7a076
+              rf93c50a
         __cfadbg_print_safe(runtime_core, "Kernel : Main cluster ready\n");
+        // Construct the processor context of the main processor
+        void ?{}(processorCtx_t & this, processor * proc) {
+                (this.__cor){ "Processor" };
+                this.__cor.starter = 0p;
+                this.proc = proc;
+        }
+        void ?{}(processor & this) with( this ) {
+                ( this.terminated ){};
+                ( this.runner ){};
+                init( this, "Main Processor", *mainCluster, 0p );
+                kernel_thread = pthread_self();
+                runner{ &this };
+                __cfadbg_print_safe(runtime_core, "Kernel : constructed main processor context %p\n", &runner);
+        }
+        // Initialize the main processor and the main processor ctx
+        // (the coroutine that contains the processing control flow)
+        mainProcessor = (processor *)&storage_mainProcessor;
+        (*mainProcessor){};
+        register_tls( mainProcessor );
         // Start by initializing the main thread
         // SKULLDUGGERY: the mainThread steals the process main thread
 …
         __cfadbg_print_safe(runtime_core, "Kernel : Main thread ready\n");
-        // Construct the processor context of the main processor
-        void ?{}(processorCtx_t & this, processor * proc) {
-                (this.__cor){ "Processor" };
-                this.__cor.starter = 0p;
-                this.proc = proc;
+        }
-        void ?{}(processor & this) with( this ) {
-                ( this.terminated ){};
-                ( this.runner ){};
-                init( this, "Main Processor", *mainCluster, 0p );
-                kernel_thread = pthread_self();
-                runner{ &this };
-                __cfadbg_print_safe(runtime_core, "Kernel : constructed main processor context %p\n", &runner);
+        }
-        // Initialize the main processor and the main processor ctx
-        // (the coroutine that contains the processing control flow)
-        mainProcessor = (processor *)&storage_mainProcessor;
-        (*mainProcessor){};
-        register_tls( mainProcessor );
-        mainThread->last_cpu = __kernel_getcpu();
         //initialize the global state variables
         __cfaabi_tls.this_processor = mainProcessor;
 …
         // Add the main thread to the ready queue
         // once resume is called on mainProcessor->runner the mainThread needs to be scheduled like any normal thread
         schedule_thread$(mainThread);
+        schedule_thread$(mainThread, UNPARK_LOCAL);
         // SKULLDUGGERY: Force a context switch to the main processor to set the main thread's context to the current UNIX
 …
         link.next = 0p;
         link.ts   = -1llu;
         preferred = -1u;
+        preferred = ready_queue_new_preferred();
         last_proc = 0p;
         #if defined( __CFA_WITH_VERIFY__ )

libcfa/src/concurrency/kernel_private.hfa

-              r7e7a076
+              rf93c50a
+}
 void schedule_thread$( thread$ * ) __attribute__((nonnull (1)));
+void schedule_thread$( thread$ *, unpark_hint hint ) __attribute__((nonnull (1)));
 extern bool __preemption_enabled();
 …
 // push thread onto a ready queue for a cluster
 // returns true if the list was previously empty, false otherwise
 __attribute__((hot)) void push(struct cluster * cltr, struct thread$ * thrd, bool local);
+__attribute__((hot)) void push(struct cluster * cltr, struct thread$ * thrd, unpark_hint hint);
 //-----------------------------------------------------------------------
 …
 //-----------------------------------------------------------------------
+// get preferred ready for new thread
+unsigned ready_queue_new_preferred();
+//-----------------------------------------------------------------------
 // Increase the width of the ready queue (number of lanes) by 4
 void ready_queue_grow  (struct cluster * cltr);

libcfa/src/concurrency/ready_queue.cfa

-              r7e7a076
+              rf93c50a
         #define __kernel_rseq_unregister rseq_unregister_current_thread
 #elif defined(CFA_HAVE_LINUX_RSEQ_H)
         void __kernel_raw_rseq_register  (void);
         void __kernel_raw_rseq_unregister(void);
+        static void __kernel_raw_rseq_register  (void);
+        static void __kernel_raw_rseq_unregister(void);
         #define __kernel_rseq_register __kernel_raw_rseq_register
 …
 // Cforall Ready Queue used for scheduling
 //=======================================================================
+unsigned long long moving_average(unsigned long long nval, unsigned long long oval) {
+        const unsigned long long tw = 16;
+        const unsigned long long nw = 4;
+        const unsigned long long ow = tw - nw;
+        return ((nw * nval) + (ow * oval)) / tw;
+}
 void ?{}(__ready_queue_t & this) with (this) {
         #if defined(USE_CPU_WORK_STEALING)
 …
                 lanes.data = alloc( lanes.count );
                 lanes.tscs = alloc( lanes.count );
+                lanes.help = alloc( cpu_info.hthrd_count );
                 for( idx; (size_t)lanes.count ) {
                         (lanes.data[idx]){};
                         lanes.tscs[idx].tv = rdtscl();
+                        lanes.tscs[idx].ma = rdtscl();
+                }
+                for( idx; (size_t)cpu_info.hthrd_count ) {
+                        lanes.help[idx].src = 0;
+                        lanes.help[idx].dst = 0;
+                        lanes.help[idx].tri = 0;
+                }
         #else
                 lanes.data  = 0p;
                 lanes.tscs  = 0p;
+                lanes.help  = 0p;
                 lanes.count = 0;
         #endif
 …
         free(lanes.data);
         free(lanes.tscs);
+        free(lanes.help);
+}
 //-----------------------------------------------------------------------
 #if defined(USE_CPU_WORK_STEALING)
         __attribute__((hot)) void push(struct cluster * cltr, struct thread$ * thrd, bool push_local) with (cltr->ready_queue) {
+        __attribute__((hot)) void push(struct cluster * cltr, struct thread$ * thrd, unpark_hint hint) with (cltr->ready_queue) {
                 __cfadbg_print_safe(ready_queue, "Kernel : Pushing %p on cluster %p\n", thrd, cltr);
                 processor * const proc = kernelTLS().this_processor;
+                const bool external = !push_local || (!proc) || (cltr != proc->cltr);
+                const bool external = (!proc) || (cltr != proc->cltr);
+                // Figure out the current cpu and make sure it is valid
                 const int cpu = __kernel_getcpu();
                 /* paranoid */ verify(cpu >= 0);
 …
                 /* paranoid */ verify(cpu * READYQ_SHARD_FACTOR < lanes.count);
+                const cpu_map_entry_t & map = cpu_info.llc_map[cpu];
+                // Figure out where thread was last time and make sure it's
+                /* paranoid */ verify(thrd->preferred >= 0);
+                /* paranoid */ verify(thrd->preferred < cpu_info.hthrd_count);
+                /* paranoid */ verify(thrd->preferred * READYQ_SHARD_FACTOR < lanes.count);
+                const int prf = thrd->preferred * READYQ_SHARD_FACTOR;
+                const cpu_map_entry_t & map;
+                choose(hint) {
+                        case UNPARK_LOCAL : &map = &cpu_info.llc_map[cpu];
+                        case UNPARK_REMOTE: &map = &cpu_info.llc_map[prf];
+                }
                 /* paranoid */ verify(map.start * READYQ_SHARD_FACTOR < lanes.count);
                 /* paranoid */ verify(map.self * READYQ_SHARD_FACTOR < lanes.count);
 …
                         if(unlikely(external)) { r = __tls_rand(); }
                         else { r = proc->rdq.its++; }
+                        i = start + (r % READYQ_SHARD_FACTOR);
+                        choose(hint) {
+                                case UNPARK_LOCAL : i = start + (r % READYQ_SHARD_FACTOR);
+                                case UNPARK_REMOTE: i = prf   + (r % READYQ_SHARD_FACTOR);
+                        }
                         // If we can't lock it retry
                 } while( !__atomic_try_acquire( &lanes.data[i].lock ) );
 …
                 processor * const proc = kernelTLS().this_processor;
                 const int start = map.self * READYQ_SHARD_FACTOR;
+                const unsigned long long ctsc = rdtscl();
                 // Did we already have a help target
                 if(proc->rdq.target == -1u) {
+                        // if We don't have a
+                        unsigned long long min = ts(lanes.data[start]);
+                        unsigned long long max = 0;
                         for(i; READYQ_SHARD_FACTOR) {
+                                unsigned long long tsc = ts(lanes.data[start + i]);
+                                if(tsc < min) min = tsc;
+                        }
+                        proc->rdq.cutoff = min;
+                                unsigned long long tsc = moving_average(ctsc - ts(lanes.data[start + i]), lanes.tscs[start + i].ma);
+                                if(tsc > max) max = tsc;
+                        }
+                         proc->rdq.cutoff = (max + 2 * max) / 2;
                         /* paranoid */ verify(lanes.count < 65536); // The following code assumes max 65536 cores.
                         /* paranoid */ verify(map.count < 65536); // The following code assumes max 65536 cores.
                         if(0 == (__tls_rand() % 10_000)) {
+                        if(0 == (__tls_rand() % 100)) {
                                 proc->rdq.target = __tls_rand() % lanes.count;
                         } else {
 …
+                }
                 else {
+                        const unsigned long long bias = 0; //2_500_000_000;
+                        const unsigned long long cutoff = proc->rdq.cutoff > bias ? proc->rdq.cutoff - bias : proc->rdq.cutoff;
+                        unsigned long long max = 0;
+                        for(i; READYQ_SHARD_FACTOR) {
+                                unsigned long long tsc = moving_average(ctsc - ts(lanes.data[start + i]), lanes.tscs[start + i].ma);
+                                if(tsc > max) max = tsc;
+                        }
+                        const unsigned long long cutoff = (max + 2 * max) / 2;
+                        {
                                 unsigned target = proc->rdq.target;
                                 proc->rdq.target = -1u;
+                                if(lanes.tscs[target].tv < cutoff && ts(lanes.data[target]) < cutoff) {
+                                lanes.help[target / READYQ_SHARD_FACTOR].tri++;
+                                if(moving_average(ctsc - lanes.tscs[target].tv, lanes.tscs[target].ma) > cutoff) {
                                         thread$ * t = try_pop(cltr, target __STATS(, __tls_stats()->ready.pop.help));
                                         proc->rdq.last = target;
                                         if(t) return t;
+                                        else proc->rdq.target = -1u;
+                                }
+                                else proc->rdq.target = -1u;
+                        }
 …
+        }
         __attribute__((hot)) void push(struct cluster * cltr, struct thread$ * thrd, bool push_local) with (cltr->ready_queue) {
+        __attribute__((hot)) void push(struct cluster * cltr, struct thread$ * thrd, unpark_hint hint) with (cltr->ready_queue) {
                 __cfadbg_print_safe(ready_queue, "Kernel : Pushing %p on cluster %p\n", thrd, cltr);
                 const bool external = !push_local || (!kernelTLS().this_processor) || (cltr != kernelTLS().this_processor->cltr);
+                const bool external = (hint != UNPARK_LOCAL) || (!kernelTLS().this_processor) || (cltr != kernelTLS().this_processor->cltr);
                 /* paranoid */ verify(external || kernelTLS().this_processor->rdq.id < lanes.count );
 …
 #endif
 #if defined(USE_WORK_STEALING)
         __attribute__((hot)) void push(struct cluster * cltr, struct thread$ * thrd, bool push_local) with (cltr->ready_queue) {
+        __attribute__((hot)) void push(struct cluster * cltr, struct thread$ * thrd, unpark_hint hint) with (cltr->ready_queue) {
                 __cfadbg_print_safe(ready_queue, "Kernel : Pushing %p on cluster %p\n", thrd, cltr);
                 // #define USE_PREFERRED
                 #if !defined(USE_PREFERRED)
                 const bool external = !push_local || (!kernelTLS().this_processor) || (cltr != kernelTLS().this_processor->cltr);
+                const bool external = (hint != UNPARK_LOCAL) || (!kernelTLS().this_processor) || (cltr != kernelTLS().this_processor->cltr);
                 /* paranoid */ verify(external || kernelTLS().this_processor->rdq.id < lanes.count );
                 #else
                         unsigned preferred = thrd->preferred;
                         const bool external = push_local || (!kernelTLS().this_processor) || preferred == -1u || thrd->curr_cluster != cltr;
+                        const bool external = (hint != UNPARK_LOCAL) || (!kernelTLS().this_processor) || preferred == -1u || thrd->curr_cluster != cltr;
                         /* paranoid */ verifyf(external || preferred < lanes.count, "Invalid preferred queue %u for %u lanes", preferred, lanes.count );
 …
         // Actually pop the list
         struct thread$ * thrd;
+        unsigned long long tsc_before = ts(lane);
         unsigned long long tsv;
         [thrd, tsv] = pop(lane);
 …
         __STATS( stats.success++; )
+        #if defined(USE_WORK_STEALING)
+        #if defined(USE_WORK_STEALING) || defined(USE_CPU_WORK_STEALING)
+                unsigned long long now = rdtscl();
                 lanes.tscs[w].tv = tsv;
+                lanes.tscs[w].ma = moving_average(now > tsc_before ? now - tsc_before : 0, lanes.tscs[w].ma);
         #endif
+        thrd->preferred = w;
+        #if defined(USE_CPU_WORK_STEALING)
+                thrd->preferred = w / READYQ_SHARD_FACTOR;
+        #else
+                thrd->preferred = w;
+        #endif
         // return the popped thread
 …
 //-----------------------------------------------------------------------
+// get preferred ready for new thread
+unsigned ready_queue_new_preferred() {
+        unsigned pref = 0;
+        if(struct thread$ * thrd = publicTLS_get( this_thread )) {
+                pref = thrd->preferred;
+        }
+        else {
+                #if defined(USE_CPU_WORK_STEALING)
+                        pref = __kernel_getcpu();
+                #endif
+        }
+        #if defined(USE_CPU_WORK_STEALING)
+                /* paranoid */ verify(pref >= 0);
+                /* paranoid */ verify(pref < cpu_info.hthrd_count);
+        #endif
+        return pref;
+}
+//-----------------------------------------------------------------------
 // Check that all the intrusive queues in the data structure are still consistent
 static void check( __ready_queue_t & q ) with (q) {
 …
         extern void __enable_interrupts_hard();
         void __kernel_raw_rseq_register  (void) {
+        static void __kernel_raw_rseq_register  (void) {
                 /* paranoid */ verify( __cfaabi_rseq.cpu_id == RSEQ_CPU_ID_UNINITIALIZED );
 …
+        }
         void __kernel_raw_rseq_unregister(void) {
+        static void __kernel_raw_rseq_unregister(void) {
                 /* paranoid */ verify( __cfaabi_rseq.cpu_id >= 0 );

libcfa/src/concurrency/ready_subqueue.hfa

-              r7e7a076
+              rf93c50a
         // Get the relevant nodes locally
-        unsigned long long ts = this.anchor.ts;
         thread$ * node = this.anchor.next;
         this.anchor.next = node->link.next;
 …
         /* paranoid */ verify( node->link.ts   != 0  );
         /* paranoid */ verify( this.anchor.ts  != 0  );
         return [node, ts];
+        return [node, this.anchor.ts];
+}

libcfa/src/concurrency/stats.cfa

-              r7e7a076
+              rf93c50a
                         stats->io.calls.completed   = 0;
                         stats->io.calls.errors.busy = 0;
+                        stats->io.ops.sockread      = 0;
+                        stats->io.ops.epllread      = 0;
+                        stats->io.ops.sockwrite     = 0;
+                        stats->io.ops.epllwrite     = 0;
                 #endif
 …
                         tally_one( &cltr->io.calls.completed  , &proc->io.calls.completed   );
                         tally_one( &cltr->io.calls.errors.busy, &proc->io.calls.errors.busy );
+                        tally_one( &cltr->io.ops.sockread     , &proc->io.ops.sockread      );
+                        tally_one( &cltr->io.ops.epllread     , &proc->io.ops.epllread      );
+                        tally_one( &cltr->io.ops.sockwrite    , &proc->io.ops.sockwrite     );
+                        tally_one( &cltr->io.ops.epllwrite    , &proc->io.ops.epllwrite     );
                 #endif
+        }
 …
                                      | " - cmp " | eng3(io.calls.drain) | "/" | eng3(io.calls.completed) | "(" | ws(3, 3, avgcomp) | "/drain)"
                                      | " - " | eng3(io.calls.errors.busy) | " EBUSY";
+                                sstr | "- ops blk: "
+                                     |   " sk rd: " | eng3(io.ops.sockread)  | "epll: " | eng3(io.ops.epllread)
+                                     |   " sk wr: " | eng3(io.ops.sockwrite) | "epll: " | eng3(io.ops.epllwrite);
                                 sstr | nl;
+                        }

libcfa/src/concurrency/stats.hfa

-              r7e7a076
+              rf93c50a
                                 volatile uint64_t sleeps;
                         } poller;
+                        struct {
+                                volatile uint64_t sockread;
+                                volatile uint64_t epllread;
+                                volatile uint64_t sockwrite;
+                                volatile uint64_t epllwrite;
+                        } ops;
                 };
         #endif

libcfa/src/concurrency/thread.cfa

-              r7e7a076
+              rf93c50a
 #include "invoke.h"
+uint64_t thread_rand();
 //-----------------------------------------------------------------------------
 // Thread ctors and dtors
 …
         preempted = __NO_PREEMPTION;
         corctx_flag = false;
-        disable_interrupts();
-        last_cpu = __kernel_getcpu();
-        enable_interrupts();
         curr_cor = &self_cor;
         self_mon.owner = &this;
 …
         link.next = 0p;
         link.ts   = -1llu;
         preferred = -1u;
+        preferred = ready_queue_new_preferred();
         last_proc = 0p;
         #if defined( __CFA_WITH_VERIFY__ )
 …
         /* paranoid */ verify( this_thrd->context.SP );
         schedule_thread$( this_thrd );
+        schedule_thread$( this_thrd, UNPARK_LOCAL );
         enable_interrupts();
+}

libcfa/src/containers/string_res.cfa

-              r7e7a076
+              rf93c50a
+#ifdef VbyteDebug
+extern HandleNode *HeaderPtr;
+// DON'T COMMIT:
+// #define VbyteDebug
+#ifdef VbyteDebug
+HandleNode *HeaderPtr;
 #endif // VbyteDebug
 …
 VbyteHeap HeapArea;
+VbyteHeap * DEBUG_string_heap = & HeapArea;
+size_t DEBUG_string_bytes_avail_until_gc( VbyteHeap * heap ) {
+    return ((char*)heap->ExtVbyte) - heap->EndVbyte;
+}
+const char * DEBUG_string_heap_start( VbyteHeap * heap ) {
+    return heap->StartVbyte;
+}
 // Returns the size of the string in bytes
 …
 void assign(string_res &this, const char* buffer, size_t bsize) {
+    // traverse the incumbent share-edit set (SES) to recover the range of a base string to which `this` belongs
+    string_res * shareEditSetStartPeer = & this;
+    string_res * shareEditSetEndPeer = & this;
+    for (string_res * editPeer = this.shareEditSet_next; editPeer != &this; editPeer = editPeer->shareEditSet_next) {
+        if ( editPeer->Handle.s < shareEditSetStartPeer->Handle.s ) {
+            shareEditSetStartPeer = editPeer;
+        }
+        if ( shareEditSetEndPeer->Handle.s + shareEditSetEndPeer->Handle.lnth < editPeer->Handle.s + editPeer->Handle.lnth) {
+            shareEditSetEndPeer = editPeer;
+        }
+    }
+    // full string is from start of shareEditSetStartPeer thru end of shareEditSetEndPeer
+    // `this` occurs in the middle of it, to be replaced
+    // build up the new text in `pasting`
+    string_res pasting = {
+        shareEditSetStartPeer->Handle.s,                   // start of SES
+        this.Handle.s - shareEditSetStartPeer->Handle.s }; // length of SES, before this
+    append( pasting,
+        buffer,                                            // start of replacement for this
+        bsize );                                           // length of replacement for this
+    append( pasting,
+        this.Handle.s + this.Handle.lnth,                  // start of SES after this
+        shareEditSetEndPeer->Handle.s + shareEditSetEndPeer->Handle.lnth -
+        (this.Handle.s + this.Handle.lnth) );              // length of SES, after this
+    // The above string building can trigger compaction.
+    // The reference points (that are arguments of the string building) may move during that building.
+    // From this point on, they are stable.
+    // So now, capture their values for use in the overlap cases, below.
+    // Do not factor these definitions with the arguments used above.
+    char * beforeBegin = shareEditSetStartPeer->Handle.s;
+    size_t beforeLen = this.Handle.s - beforeBegin;
     char * afterBegin = this.Handle.s + this.Handle.lnth;
+    char * shareEditSetStart = this.Handle.s;
+    char * shareEditSetEnd = afterBegin;
+    for (string_res * editPeer = this.shareEditSet_next; editPeer != &this; editPeer = editPeer->shareEditSet_next) {
+        shareEditSetStart = min( shareEditSetStart, editPeer->Handle.s );
+        shareEditSetEnd = max( shareEditSetStart, editPeer->Handle.s + editPeer->Handle.lnth);
+    }
+    char * beforeBegin = shareEditSetStart;
+    size_t beforeLen = this.Handle.s - shareEditSetStart;
+    size_t afterLen = shareEditSetEnd - afterBegin;
+    string_res pasting = { beforeBegin, beforeLen };
+    append(pasting, buffer, bsize);
+    string_res after = { afterBegin, afterLen }; // juxtaposed with in-progress pasting
+    pasting += after;                        // optimized case
+    size_t afterLen = shareEditSetEndPeer->Handle.s + shareEditSetEndPeer->Handle.lnth - afterBegin;
     size_t oldLnth = this.Handle.lnth;
 …
     for (string_res * p = this.shareEditSet_next; p != &this; p = p->shareEditSet_next) {
         assert (p->Handle.s >= beforeBegin);
+        if ( p->Handle.s < beforeBegin + beforeLen ) {
+            // p starts before the edit
+            if ( p->Handle.s + p->Handle.lnth < beforeBegin + beforeLen ) {
+        if ( p->Handle.s >= afterBegin ) {
+            assert ( p->Handle.s <= afterBegin + afterLen );
+            assert ( p->Handle.s + p->Handle.lnth <= afterBegin + afterLen );
+            // p starts after the edit
+            // take start and end as end-anchored
+            size_t startOffsetFromEnd = afterBegin + afterLen - p->Handle.s;
+            p->Handle.s = limit - startOffsetFromEnd;
+            // p->Handle.lnth unaffected
+        } else if ( p->Handle.s <= beforeBegin + beforeLen ) {
+            // p starts before, or at the start of, the edit
+            if ( p->Handle.s + p->Handle.lnth <= beforeBegin + beforeLen ) {
                 // p ends before the edit
                 // take end as start-anchored too
                 // p->Handle.lnth unaffected
             } else if ( p->Handle.s + p->Handle.lnth < afterBegin ) {
                 // p ends during the edit
+                // p ends during the edit; p does not include the last character replaced
                 // clip end of p to end at start of edit
                 p->Handle.lnth = beforeLen - ( p->Handle.s - beforeBegin );
 …
             size_t startOffsetFromStart = p->Handle.s - beforeBegin;
             p->Handle.s = pasting.Handle.s + startOffsetFromStart;
+        } else if ( p->Handle.s < afterBegin ) {
+        } else {
+            assert ( p->Handle.s < afterBegin );
             // p starts during the edit
             assert( p->Handle.s + p->Handle.lnth >= beforeBegin + beforeLen );
             if ( p->Handle.s + p->Handle.lnth < afterBegin ) {
                 // p ends during the edit
+                // p ends during the edit; p does not include the last character replaced
                 // set p to empty string at start of edit
                 p->Handle.s = this.Handle.s;
                 p->Handle.lnth = 0;
             } else {
                 // p ends after the edit
+                // p includes the end of the edit
                 // clip start of p to start at end of edit
+                int charsToClip = afterBegin - p->Handle.s;
                 p->Handle.s = this.Handle.s + this.Handle.lnth;
+                p->Handle.lnth += this.Handle.lnth;
+                p->Handle.lnth -= oldLnth;
+                p->Handle.lnth -= charsToClip;
+            }
-        } else {
-            assert ( p->Handle.s <= afterBegin + afterLen );
-            assert ( p->Handle.s + p->Handle.lnth <= afterBegin + afterLen );
-            // p starts after the edit
-            // take start and end as end-anchored
-            size_t startOffsetFromEnd = afterBegin + afterLen - p->Handle.s;
-            p->Handle.s = limit - startOffsetFromEnd;
-            // p->Handle.lnth unaffected
+        }
         MoveThisAfter( p->Handle, pasting.Handle );     // move substring handle to maintain sorted order by string position
 …
     } // if
 #ifdef VbyteDebug
-    serr | "exit:MoveThisAfter";
+    {
         serr | "HandleList:";
 …
                 serr | n->s[i];
             } // for
             serr | "\" flink:" | n->flink | " blink:" | n->blink;
+            serr | "\" flink:" | n->flink | " blink:" | n->blink | nl;
         } // for
         serr | nlOn;
+    }
+    serr | "exit:MoveThisAfter";
 #endif // VbyteDebug
 } // MoveThisAfter
 …
 //######################### VbyteHeap #########################
-#ifdef VbyteDebug
-HandleNode *HeaderPtr = 0p;
-#endif // VbyteDebug
 // Move characters from one location in the byte-string area to another. The routine handles the following situations:

libcfa/src/containers/string_res.hfa

-              r7e7a076
+              rf93c50a
 void ?{}( HandleNode &, VbyteHeap & );          // constructor for nodes in the handle list
 void ^?{}( HandleNode & );                      // destructor for handle nodes
+extern VbyteHeap * DEBUG_string_heap;
+size_t DEBUG_string_bytes_avail_until_gc( VbyteHeap * heap );
+const char * DEBUG_string_heap_start( VbyteHeap * heap );

libcfa/src/fstream.cfa

-              r7e7a076
+              rf93c50a
 // Created On       : Wed May 27 17:56:53 2015
 // Last Modified By : Peter A. Buhr
 // Last Modified On : Thu Jul 29 22:34:10 2021
 // Update Count     : 454
+// Last Modified On : Tue Sep 21 21:51:38 2021
+// Update Count     : 460
 //
 …
 #define IO_MSG "I/O error: "
 void ?{}( ofstream & os, void * file ) {
         os.file$ = file;
         os.sepDefault$ = true;
         os.sepOnOff$ = false;
         os.nlOnOff$ = true;
         os.prt$ = false;
         os.sawNL$ = false;
         os.acquired$ = false;
+void ?{}( ofstream & os, void * file ) with(os) {
+        file$ = file;
+        sepDefault$ = true;
+        sepOnOff$ = false;
+        nlOnOff$ = true;
+        prt$ = false;
+        sawNL$ = false;
+        acquired$ = false;
         sepSetCur$( os, sepGet( os ) );
         sepSet( os, " " );
 …
 void open( ofstream & os, const char name[], const char mode[] ) {
         FILE * file = fopen( name, mode );
-        // #ifdef __CFA_DEBUG__
         if ( file == 0p ) {
                 throw (Open_Failure){ os };
                 // abort | IO_MSG "open output file \"" | name | "\"" | nl | strerror( errno );
         } // if
+        // #endif // __CFA_DEBUG__
+        (os){ file };
+        (os){ file };                                                                           // initialize
 } // open
 …
 } // open
 void close( ofstream & os ) {
   if ( (FILE *)(os.file$) == 0p ) return;
   if ( (FILE *)(os.file$) == (FILE *)stdout || (FILE *)(os.file$) == (FILE *)stderr ) return;
         if ( fclose( (FILE *)(os.file$) ) == EOF ) {
+void close( ofstream & os ) with(os) {
+  if ( (FILE *)(file$) == 0p ) return;
+  if ( (FILE *)(file$) == (FILE *)stdout || (FILE *)(file$) == (FILE *)stderr ) return;
+        if ( fclose( (FILE *)(file$) ) == EOF ) {
                 throw (Close_Failure){ os };
                 // abort | IO_MSG "close output" | nl | strerror( errno );
         } // if
         os.file$ = 0p;
+        file$ = 0p;
 } // close
 …
 } // fmt
 inline void acquire( ofstream & os ) {
         lock( os.lock$ );
         if ( ! os.acquired$ ) os.acquired$ = true;
         else unlock( os.lock$ );
+inline void acquire( ofstream & os ) with(os) {
+        lock( lock$ );                                                                          // may increase recursive lock
+        if ( ! acquired$ ) acquired$ = true;                            // not locked ?
+        else unlock( lock$ );                                                           // unwind recursive lock at start
 } // acquire
 …
 } // release
+inline void lock( ofstream & os ) { acquire( os ); }
+inline void unlock( ofstream & os ) { release( os ); }
+void ?{}( osacquire & acq, ofstream & os ) { &acq.os = &os; lock( os.lock$ ); }
+void ?{}( osacquire & acq, ofstream & os ) { lock( os.lock$ ); &acq.os = &os; }
 void ^?{}( osacquire & acq ) { release( acq.os ); }
 …
 // private
 void ?{}( ifstream & is, void * file ) {
         is.file$ = file;
         is.nlOnOff$ = false;
         is.acquired$ = false;
+void ?{}( ifstream & is, void * file ) with(is) {
+        file$ = file;
+        nlOnOff$ = false;
+        acquired$ = false;
 } // ?{}
 …
 void open( ifstream & is, const char name[], const char mode[] ) {
         FILE * file = fopen( name, mode );
-        // #ifdef __CFA_DEBUG__
         if ( file == 0p ) {
                 throw (Open_Failure){ is };
                 // abort | IO_MSG "open input file \"" | name | "\"" | nl | strerror( errno );
         } // if
+        // #endif // __CFA_DEBUG__
+        is.file$ = file;
+        (is){ file };                                                                           // initialize
 } // open
 …
 } // open
 void close( ifstream & is ) {
   if ( (FILE *)(is.file$) == 0p ) return;
   if ( (FILE *)(is.file$) == (FILE *)stdin ) return;
         if ( fclose( (FILE *)(is.file$) ) == EOF ) {
+void close( ifstream & is ) with(is) {
+  if ( (FILE *)(file$) == 0p ) return;
+  if ( (FILE *)(file$) == (FILE *)stdin ) return;
+        if ( fclose( (FILE *)(file$) ) == EOF ) {
                 throw (Close_Failure){ is };
                 // abort | IO_MSG "close input" | nl | strerror( errno );
         } // if
         is.file$ = 0p;
+        file$ = 0p;
 } // close
 …
 } // fmt
 inline void acquire( ifstream & is ) {
         lock( is.lock$ );
         if ( ! is.acquired$ ) is.acquired$ = true;
         else unlock( is.lock$ );
+inline void acquire( ifstream & is ) with(is) {
+        lock( lock$ );                                                                          // may increase recursive lock
+        if ( ! acquired$ ) acquired$ = true;                            // not locked ?
+        else unlock( lock$ );                                                           // unwind recursive lock at start
 } // acquire
 …
 } // release
 void ?{}( isacquire & acq, ifstream & is ) { &acq.is = &is; lock( is.lock$ ); }
+void ?{}( isacquire & acq, ifstream & is ) { lock( is.lock$ ); &acq.is = &is; }
 void ^?{}( isacquire & acq ) { release( acq.is ); }
 …
 // exception I/O constructors
 void ?{}( Open_Failure & this, ofstream & ostream ) {
         this.virtual_table = &Open_Failure_vt;
         this.ostream = &ostream;
         this.tag = 1;
 } // ?{}
 void ?{}( Open_Failure & this, ifstream & istream ) {
         this.virtual_table = &Open_Failure_vt;
         this.istream = &istream;
         this.tag = 0;
+void ?{}( Open_Failure & ex, ofstream & ostream ) with(ex) {
+        virtual_table = &Open_Failure_vt;
+        ostream = &ostream;
+        tag = 1;
+} // ?{}
+void ?{}( Open_Failure & ex, ifstream & istream ) with(ex) {
+        virtual_table = &Open_Failure_vt;
+        istream = &istream;
+        tag = 0;
 } // ?{}
 …
 // exception I/O constructors
 void ?{}( Close_Failure & this, ofstream & ostream ) {
         this.virtual_table = &Close_Failure_vt;
         this.ostream = &ostream;
         this.tag = 1;
 } // ?{}
 void ?{}( Close_Failure & this, ifstream & istream ) {
         this.virtual_table = &Close_Failure_vt;
         this.istream = &istream;
         this.tag = 0;
+void ?{}( Close_Failure & ex, ofstream & ostream ) with(ex) {
+        virtual_table = &Close_Failure_vt;
+        ostream = &ostream;
+        tag = 1;
+} // ?{}
+void ?{}( Close_Failure & ex, ifstream & istream ) with(ex) {
+        virtual_table = &Close_Failure_vt;
+        istream = &istream;
+        tag = 0;
 } // ?{}
 …
 // exception I/O constructors
 void ?{}( Write_Failure & this, ofstream & ostream ) {
         this.virtual_table = &Write_Failure_vt;
         this.ostream = &ostream;
         this.tag = 1;
 } // ?{}
 void ?{}( Write_Failure & this, ifstream & istream ) {
         this.virtual_table = &Write_Failure_vt;
         this.istream = &istream;
         this.tag = 0;
+void ?{}( Write_Failure & ex, ofstream & ostream ) with(ex) {
+        virtual_table = &Write_Failure_vt;
+        ostream = &ostream;
+        tag = 1;
+} // ?{}
+void ?{}( Write_Failure & ex, ifstream & istream ) with(ex) {
+        virtual_table = &Write_Failure_vt;
+        istream = &istream;
+        tag = 0;
 } // ?{}
 …
 // exception I/O constructors
 void ?{}( Read_Failure & this, ofstream & ostream ) {
         this.virtual_table = &Read_Failure_vt;
         this.ostream = &ostream;
         this.tag = 1;
 } // ?{}
 void ?{}( Read_Failure & this, ifstream & istream ) {
         this.virtual_table = &Read_Failure_vt;
         this.istream = &istream;
         this.tag = 0;
+void ?{}( Read_Failure & ex, ofstream & ostream ) with(ex) {
+        virtual_table = &Read_Failure_vt;
+        ostream = &ostream;
+        tag = 1;
+} // ?{}
+void ?{}( Read_Failure & ex, ifstream & istream ) with(ex) {
+        virtual_table = &Read_Failure_vt;
+        istream = &istream;
+        tag = 0;
 } // ?{}

tools/perf/process_stat_array.py

-              r7e7a076
+              rf93c50a
 #!/usr/bin/python3
+import argparse, os, sys, re
+import argparse, json, math, os, sys, re
+from PIL import Image
+import numpy as np
 def dir_path(string):
 …
 parser = argparse.ArgumentParser()
 parser.add_argument('--path', type=dir_path, default=".cfadata", help= 'paste path to biog.txt file')
+parser.add_argument('--out', type=argparse.FileType('w'), default=sys.stdout)
 try :
 …
 counters = {}
+max_cpu = 0
+min_cpu = 1000000
+max_tsc = 0
+min_tsc = 18446744073709551615
 #open the files
 for filename in filenames:
 …
                 with open(os.path.join(root, filename), 'r') as file:
                         for line in file:
+                                # data = [int(x.strip()) for x in line.split(',')]
+                                data = [int(line.strip())]
+                                data = [me, *data]
+                                raw = [int(x.strip()) for x in line.split(',')]
+                                ## from/to
+                                high = (raw[1] >> 32)
+                                low  = (raw[1] & 0xffffffff)
+                                data = [me, raw[0], high, low]
+                                max_cpu = max(max_cpu, high, low)
+                                min_cpu = min(min_cpu, high, low)
+                                ## number
+                                # high = (raw[1] >> 8)
+                                # low  = (raw[1] & 0xff)
+                                # data = [me, raw[0], high, low]
+                                # max_cpu = max(max_cpu, low)
+                                # min_cpu = min(min_cpu, low)
+                                max_tsc = max(max_tsc, raw[0])
+                                min_tsc = min(min_tsc, raw[0])
                                 merged.append(data)
+        except:
+        except Exception as e:
+                print(e)
                 pass
+print({"max-cpu": max_cpu, "min-cpu": min_cpu, "max-tsc": max_tsc, "min-tsc": min_tsc})
 # Sort by timestamp (the second element)
 …
 merged.sort(key=takeSecond)
+# for m in merged:
+#       print(m)
+json.dump({"values":merged, "max-cpu": max_cpu, "min-cpu": min_cpu, "max-tsc": max_tsc, "min-tsc": min_tsc}, args.out)
+single = []
+curr = 0
+# vmin = merged[ 0][1]
+# vmax = float(merged[-1][1] - vmin) / 2500000000.0
+# # print(vmax)
+# merge the data
+# for (me, time, value) in merged:
+for (me, value) in merged:
+        # check now much this changes
+        old = counters[me]
+        change = value - old
+        counters[me] = value
+# bins = []
+# for _ in range(0, int(math.ceil(vmax * 10))):
+#       bins.append([0] * (32 * 32))
+        # add change to the current
+        curr = curr + change
+        single.append( value )
+# # print(len(bins))
+# bins = np.array(bins)
+        pass
+# rejected = 0
+# highest  = 0
+print(single)
+# for x in merged:
+#       b = int(float(x[1] - vmin) / 250000000.0)
+#       from_ = x[2]
+#       if from_ < 0 or from_ > 32:
+#               rejected += 1
+#               continue;
+#       to_   = x[3]
+#       if to_ < 0 or to_ > 32:
+#               rejected += 1
+#               continue;
+#       idx = (to_ * 32) + from_
+#       bins[b][idx] = bins[b][idx] + 1
+#       highest = max(highest, bins[b][idx])
+# bins = np.array(map(lambda x: np.int8(x * 255.0 / float(highest)), bins))
+# print([highest, rejected])
+# print(bins.shape)
+# im = Image.fromarray(bins)
+# im.save('test.png')
+# vmax = merged[-1][1]
+# diff = float(vmax - vmin) / 2500000000.0
+# print([vmin, vmax])
+# print([vmax - vmin, diff])
+# print(len(merged))
+# for b in bins:
+#       print(b)
+# single = []
+# curr = 0
+# # merge the data
+# # for (me, time, value) in merged:
+# for (me, value) in merged:
+#       # check now much this changes
+#       old = counters[me]
+#       change = value - old
+#       counters[me] = value
+#       # add change to the current
+#       curr = curr + change
+#       single.append( value )
+#       pass
+# print(single)
 # single = sorted(single)[:len(single)-100]

Context Navigation

Legend:

doc/theses/andrew_beach_MMath/Makefile

doc/theses/andrew_beach_MMath/existing.tex

doc/theses/andrew_beach_MMath/features.tex

doc/theses/andrew_beach_MMath/implement.tex

doc/theses/andrew_beach_MMath/intro.tex

doc/theses/andrew_beach_MMath/performance.tex

doc/theses/andrew_beach_MMath/uw-ethesis.bib

libcfa/src/concurrency/clib/cfathread.cfa

libcfa/src/concurrency/invoke.h

libcfa/src/concurrency/io.cfa

libcfa/src/concurrency/kernel.cfa

libcfa/src/concurrency/kernel.hfa

libcfa/src/concurrency/kernel/fwd.hfa

libcfa/src/concurrency/kernel/startup.cfa

libcfa/src/concurrency/kernel_private.hfa

libcfa/src/concurrency/ready_queue.cfa

libcfa/src/concurrency/ready_subqueue.hfa

libcfa/src/concurrency/stats.cfa

libcfa/src/concurrency/stats.hfa

libcfa/src/concurrency/thread.cfa

libcfa/src/containers/string_res.cfa

libcfa/src/containers/string_res.hfa

libcfa/src/fstream.cfa

tools/perf/process_stat_array.py

Download in other formats: