Context Navigation

← Previous Changeset
Next Changeset →

Changeset bf8b77e

Timestamp:

Mar 3, 2022, 1:37:31 PM (4 years ago)

Author:

m3zulfiq <m3zulfiq@…>

Branches:

ADT, ast-experimental, enum, master, pthread-emulation, qualifiedEnum

Children:

40a606d2, ba897d21

Parents:

9c5aef9 (diff), b0d0285 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Files:

: 4 added
: 11 edited

benchmark/io/http/protocol.cfa (modified) (1 diff)
doc/theses/mubeen_zulfiqar_MMath/benchmarks.tex (modified) (1 diff)
doc/theses/mubeen_zulfiqar_MMath/performance.tex (modified) (1 diff)
libcfa/src/concurrency/io.cfa (modified) (7 diffs)
libcfa/src/concurrency/io/setup.cfa (modified) (2 diffs)
libcfa/src/concurrency/iofwd.hfa (modified) (2 diffs)
libcfa/src/concurrency/kernel/fwd.hfa (modified) (1 diff)
libcfa/src/concurrency/kernel_private.hfa (modified) (1 diff)
libcfa/src/concurrency/preemption.cfa (modified) (2 diffs)
src/Concurrency/Keywords.cc (modified) (1 diff)
tests/concurrent/.expect/mainError.txt (added)
tests/concurrent/mainError.cfa (added)
tests/io/.expect/away_fair.txt (added)
tests/io/away_fair.cfa (added)
tests/io/many_read.cfa (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

benchmark/io/http/protocol.cfa

-              r9c5aef9
+              rbf8b77e
+}
-static void zero_sqe(struct io_uring_sqe * sqe) {
-        sqe->flags = 0;
-        sqe->ioprio = 0;
-        sqe->fd = 0;
-        sqe->off = 0;
-        sqe->addr = 0;
-        sqe->len = 0;
-        sqe->fsync_flags = 0;
-        sqe->__pad2[0] = 0;
-        sqe->__pad2[1] = 0;
-        sqe->__pad2[2] = 0;
-        sqe->fd = 0;
-        sqe->off = 0;
-        sqe->addr = 0;
-        sqe->len = 0;
+}
 enum FSM_STATE {
         Initial,

doc/theses/mubeen_zulfiqar_MMath/benchmarks.tex

-              r9c5aef9
+              rbf8b77e
 \paragraph{Relevant Knobs}
 *** FIX ME: Insert Relevant Knobs
-\section{Existing Memory Allocators}
-With dynamic allocation being an important feature of C, there are many stand-alone memory allocators that have been designed for different purposes. For this thesis, we chose 7 of the most popular and widely used memory allocators.
-\paragraph{dlmalloc}
-dlmalloc (FIX ME: cite allocator) is a thread-safe allocator that is single threaded and single heap. dlmalloc maintains free-lists of different sizes to store freed dynamic memory. (FIX ME: cite wasik)
-\paragraph{hoard}
-Hoard (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and using a heap layer framework. It has per-thread heaps that have thread-local free-lists, and a global shared heap. (FIX ME: cite wasik)
-\paragraph{jemalloc}
-jemalloc (FIX ME: cite allocator) is a thread-safe allocator that uses multiple arenas. Each thread is assigned an arena. Each arena has chunks that contain contagious memory regions of same size. An arena has multiple chunks that contain regions of multiple sizes.
-\paragraph{ptmalloc}
-ptmalloc (FIX ME: cite allocator) is a modification of dlmalloc. It is a thread-safe multi-threaded memory allocator that uses multiple heaps. ptmalloc heap has similar design to dlmalloc's heap.
-\paragraph{rpmalloc}
-rpmalloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses per-thread heap. Each heap has multiple size-classes and each size-class contains memory regions of the relevant size.
-\paragraph{tbb malloc}
-tbb malloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses private heap for each thread. Each private-heap has multiple bins of different sizes. Each bin contains free regions of the same size.
-\paragraph{tc malloc}
-tcmalloc (FIX ME: cite allocator) is a thread-safe allocator. It uses per-thread cache to store free objects that prevents contention on shared resources in multi-threaded application. A central free-list is used to refill per-thread cache when it gets empty.

doc/theses/mubeen_zulfiqar_MMath/performance.tex

-              r9c5aef9
+              rbf8b77e
 \noindent
 ====================
+\section{Machine Specification}
+The performance experiments were run on three different multicore systems to determine if there is consistency across platforms:
+\begin{itemize}
+\item
+AMD EPYC 7662, 64-core socket $\times$ 2, 2.0 GHz
+\item
+Huawei ARM TaiShan 2280 V2 Kunpeng 920, 24-core socket $\times$ 4, 2.6 GHz
+\item
+Intel Xeon Gold 5220R, 48-core socket $\times$ 2, 2.20GHz
+\end{itemize}
+\section{Existing Memory Allocators}
+With dynamic allocation being an important feature of C, there are many stand-alone memory allocators that have been designed for different purposes. For this thesis, we chose 7 of the most popular and widely used memory allocators.
+\paragraph{dlmalloc}
+dlmalloc (FIX ME: cite allocator) is a thread-safe allocator that is single threaded and single heap. dlmalloc maintains free-lists of different sizes to store freed dynamic memory. (FIX ME: cite wasik)
+\paragraph{hoard}
+Hoard (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and using a heap layer framework. It has per-thread heaps that have thread-local free-lists, and a global shared heap. (FIX ME: cite wasik)
+\paragraph{jemalloc}
+jemalloc (FIX ME: cite allocator) is a thread-safe allocator that uses multiple arenas. Each thread is assigned an arena. Each arena has chunks that contain contagious memory regions of same size. An arena has multiple chunks that contain regions of multiple sizes.
+\paragraph{ptmalloc}
+ptmalloc (FIX ME: cite allocator) is a modification of dlmalloc. It is a thread-safe multi-threaded memory allocator that uses multiple heaps. ptmalloc heap has similar design to dlmalloc's heap.
+\paragraph{rpmalloc}
+rpmalloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses per-thread heap. Each heap has multiple size-classes and each size-class contains memory regions of the relevant size.
+\paragraph{tbb malloc}
+tbb malloc (FIX ME: cite allocator) is a thread-safe allocator that is multi-threaded and uses private heap for each thread. Each private-heap has multiple bins of different sizes. Each bin contains free regions of the same size.
+\paragraph{tc malloc}
+tcmalloc (FIX ME: cite allocator) is a thread-safe allocator. It uses per-thread cache to store free objects that prevents contention on shared resources in multi-threaded application. A central free-list is used to refill per-thread cache when it gets empty.
 \section{Memory Allocators}

libcfa/src/concurrency/io.cfa

-              r9c5aef9
+              rbf8b77e
         //=============================================================================================
         // submission
         static inline void __submit( struct $io_context * ctx, __u32 idxs[], __u32 have, bool lazy) {
+        static inline void __submit_only( struct $io_context * ctx, __u32 idxs[], __u32 have) {
                 // We can proceed to the fast path
                 // Get the right objects
 …
                 ctx->proc->io.pending = true;
                 ctx->proc->io.dirty   = true;
+        }
+        static inline void __submit( struct $io_context * ctx, __u32 idxs[], __u32 have, bool lazy) {
+                __sub_ring_t & sq = ctx->sq;
+                __submit_only(ctx, idxs, have);
                 if(sq.to_submit > 30) {
                         __tls_stats()->io.flush.full++;
 …
 // I/O Arbiter
 //=============================================================================================
+        static inline void block(__outstanding_io_queue & queue, __outstanding_io & item) {
+        static inline bool enqueue(__outstanding_io_queue & queue, __outstanding_io & item) {
+                bool was_empty;
                 // Lock the list, it's not thread safe
                 lock( queue.lock __cfaabi_dbg_ctx2 );
+                {
+                        was_empty = empty(queue.queue);
                         // Add our request to the list
                         add( queue.queue, item );
 …
                 unlock( queue.lock );
                 wait( item.sem );
+                return was_empty;
+        }
 …
                 pa.want = want;
+                block(this.pending, (__outstanding_io&)pa);
+                enqueue(this.pending, (__outstanding_io&)pa);
+                wait( pa.sem );
                 return pa.ctx;
 …
                 ei.lazy = lazy;
+                block(ctx->ext_sq, (__outstanding_io&)ei);
+                bool we = enqueue(ctx->ext_sq, (__outstanding_io&)ei);
+                ctx->proc->io.pending = true;
+                if( we ) {
+                        sigval_t value = { PREEMPT_IO };
+                        pthread_sigqueue(ctx->proc->kernel_thread, SIGUSR1, value);
+                }
+                wait( ei.sem );
                 __cfadbg_print_safe(io, "Kernel I/O : %u submitted from arbiter\n", have);
 …
                                         __external_io & ei = (__external_io&)drop( ctx.ext_sq.queue );
                                         __submit(&ctx, ei.idxs, ei.have, ei.lazy);
+                                        __submit_only(&ctx, ei.idxs, ei.have);
                                         post( ei.sem );

libcfa/src/concurrency/io/setup.cfa

-              r9c5aef9
+              rbf8b77e
         #include "bitmanip.hfa"
+        #include "fstream.hfa"
         #include "kernel_private.hfa"
         #include "thread.hfa"
 …
                 struct __sub_ring_t & sq = this.sq;
                 struct __cmp_ring_t & cq = this.cq;
+                {
+                        __u32 fhead = sq.free_ring.head;
+                        __u32 ftail = sq.free_ring.tail;
+                        __u32 total = *sq.num;
+                        __u32 avail = ftail - fhead;
+                        if(avail != total) abort | "Processor (" | (void*)this.proc | ") tearing down ring with" | (total - avail) | "entries allocated but not submitted, out of" | total;
+                }
                 // unmap the submit queue entries

libcfa/src/concurrency/iofwd.hfa

-              r9c5aef9
+              rbf8b77e
 extern "C" {
         #include <asm/types.h>
+        #include <sys/stat.h> // needed for mode_t
         #if CFA_HAVE_LINUX_IO_URING_H
                 #include <linux/io_uring.h>
 …
 // Check if a function is blocks a only the user thread
 bool has_user_level_blocking( fptr_t func );
+#if CFA_HAVE_LINUX_IO_URING_H
+        static inline void zero_sqe(struct io_uring_sqe * sqe) {
+                sqe->flags = 0;
+                sqe->ioprio = 0;
+                sqe->fd = 0;
+                sqe->off = 0;
+                sqe->addr = 0;
+                sqe->len = 0;
+                sqe->fsync_flags = 0;
+                sqe->__pad2[0] = 0;
+                sqe->__pad2[1] = 0;
+                sqe->__pad2[2] = 0;
+                sqe->fd = 0;
+                sqe->off = 0;
+                sqe->addr = 0;
+                sqe->len = 0;
+        }
+#endif

libcfa/src/concurrency/kernel/fwd.hfa

r9c5aef9	rbf8b77e
347	347	struct oneshot * want = expected == 0p ? 1p : 2p;
348	348	if(__atomic_compare_exchange_n(&this.ptr, &expected, want, false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST)) {
349		if( expected == 0p ) { ~~/* paranoid */ verify( this.ptr == 1p);~~ return 0p; }
	349	if( expected == 0p ) { return 0p; }
350	350	thread$ * ret = post( *expected, do_unpark );
351	351	__atomic_store_n( &this.ptr, 1p, __ATOMIC_SEQ_CST);

libcfa/src/concurrency/kernel_private.hfa

-              r9c5aef9
+              rbf8b77e
 extern bool __preemption_enabled();
+enum {
+        PREEMPT_NORMAL    = 0,
+        PREEMPT_TERMINATE = 1,
+        PREEMPT_IO = 2,
+};
 static inline void __disable_interrupts_checked() {
         /* paranoid */ verify( __preemption_enabled() );

libcfa/src/concurrency/preemption.cfa

-              r9c5aef9
+              rbf8b77e
         lock{};
+}
-enum {
-        PREEMPT_NORMAL    = 0,
-        PREEMPT_TERMINATE = 1,
-};
 //=============================================================================================
 …
         choose(sfp->si_value.sival_int) {
                 case PREEMPT_NORMAL   : ;// Normal case, nothing to do here
+                case PREEMPT_IO       : ;// I/O asked to stop spinning, nothing to do here
                 case PREEMPT_TERMINATE: verify( __atomic_load_n( &__cfaabi_tls.this_processor->do_terminate, __ATOMIC_SEQ_CST ) );
                 default:

src/Concurrency/Keywords.cc

-              r9c5aef9
+              rbf8b77e
+                        ;
                 else if ( auto param = isMainFor( decl, cast_target ) ) {
+                        // This should never trigger.
+                        assert( vtable_decl );
+                        if ( !vtable_decl ) {
+                                SemanticError( decl, context_error );
+                        }
                         // Should be safe because of isMainFor.
                         StructInstType * struct_type = static_cast<StructInstType *>(

tests/io/many_read.cfa

r9c5aef9	rbf8b77e
5	5	// file "LICENCE" distributed with Cforall.
6	6	//
7		// many_read.cfa -- Make sure that multiple concurrent reads to mess up.
	7	// many_read.cfa -- Make sure that multiple concurrent reads don't mess up.
8	8	//
9	9	// Author : Thierry Delisle

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset bf8b77e

Legend:

Download in other formats: