\chapter{\CFA Demangler} \label{demangler} \section{Introduction} While \CFA is a translator for additional features that C does not support, all the extensions compiled down to C code. As a result, the executable file marks the DWARF tag \verb|DW_AT_language| with the fixed hexadecimal value for the C language. Because it is possible to have one frame in C code and another frame in Assembly code, GDB encodes a language flag for each frame. \CFA adds to this list, as it is essential to know when a stack frame contains mangled names versus C and assembler unmangled names. Thus, GDB must be told \CFA is a distinct source-language. \section{Design Constraints} Most GDB targets use the DWARF format. The GDB DWARF reader initializes all the appropriate information read from the DIE structures in object or executable files, as mentioned in Chapter \ref{GDB}. However, GDB currently does not understand the new DWARF language-code assigned to the language \CFA, so the DWARF reader must be updated to recognize \CFA. Additionally, when a user enters a name into GDB, GDB needs to lookup if the name exists in the program. However, different languages may have different hierarchical structure for dynamic scope, so an implementation for nonlocal symbol lookup is required, so an appropriate name lookup routine must be added. \section{Implementation} Most of the implementation work discussed below is from reading GDB's internals wiki page and understanding how other languages are supported in GDB \cite{Reference5}. A new entry is added to GDB's list of language definition in \verb|gdb/defs.h|. Hence, a new instance of type \verb|struct language_def| must be created to add a language definition for \CFA. This instance is the entry point for new functions that are only applicable to \CFA. These new functions are invoked by GDB during debugging if there are operations that are applicable specifically to \CFA. For instance, \CFA can implement its own symbol lookup function for non-local variables because \CFA can have a different scope hierarchy. The final step for registering \CFA in GDB, as a new source language, is adding the instance of type \verb|struct language_def| into the list of language definitions, which is found in \verb|gdb/language.h|. This setup is shown in listing \ref{cfa-lang-def}. \begin{figure} \begin{lstlisting}[style=C++, caption={Language definition declaration for \CFA}, label={cfa-lang-def}, basicstyle=\small] // In gdb/language.h extern const struct language_defn cforall_language_defn; // In gdb/language.c static const struct language_defn *languages[] = { &unknown_language_defn, &auto_language_defn, &c_language_defn, ... &cforall_language_defn, ... } // In gdb/cforall-lang.c extern const struct language_defn cforall_language_defn = { "cforall", /* Language name */ "CForAll", /* Natural name */ language_cforall, range_check_off, case_sensitive_on, ... cp_lookup_symbol_nonlocal, /* lookup_symbol_nonlocal */ ... cforall_demangle, /* Language specific demangler */ cforall_sniff_from_mangled_name, .. } \end{lstlisting} \end{figure} The next step is updating the DWARF reader, so the reader can translate the DWARF code to an enum value defined above. However, this assumes that the language has an assigned language code. The language code is a hexadecimal literal value assigned to a particular language, which is maintained by GCC. For \CFA, the hexadecimal value \verb|0x0025| is added to \verb|include/dwarf2.h| to denote \CFA, which is shown in listing \ref{cfa-dwarf}. \begin{lstlisting}[style=C++, caption={DWARF language code for \CFA}, label={cfa-dwarf}, basicstyle=\small] // In include/dwarf2.h enum dwarf_source_language { // Source language names and codes. DW_LANG_C89 = 0x0001, ... DW_LANG_CForAll = 0x0025, } \end{lstlisting} Once the demangler implementation goes into the \verb|libiberty| directory along with other demanglers, the demangler can be called by updating the language definition defined in listing \ref{cfa-lang-def} with the entry point of the \CFA demangler, and adding a check if the current demangling style is \verb|CFORALL_DEMANGLING| as seen in listing \ref{cfa-demangler}. GDB then automatically invokes this \CFA demangler when GDB detects the source language is \CFA. In addition to the automatic invocation of the demangler, GDB provides an option to manually set which demangling style to use in the command line interface. This option can be turned on for \CFA in GDB by adding a new enum value for \CFA in the list of demangling styles along with setting the appropriate mask for this style in \verb|include/demangle.h|. After doing this step, users can now choose if they want to use the \CFA demangler in GDB by calling \verb|set demangle-style |, where the language name is defined by the preprocessor macro \verb|CFORALL_DEMANGLING_STYLE_STRING| in listing \ref{cfa-demangler-style}. \begin{figure} \begin{lstlisting}[style=C++, caption={libiberty setup for the \CFA demangler}, label={cfa-demangler}, basicstyle=\small] // In libiberty/cplus-dem.c const struct demangler_engine libiberty_demanglers[] = { { NO_DEMANGLING_STYLE_STRING, no_demangling, "Demangling disabled" }, ... { CFORALL_DEMANGLING_STYLE_STRING, cforall_demangling, "Cforall style demangling" }, } ... char * cplus_demangle(const char *mangled, int options) { ... /* The V3 ABI demangling is implemented elsewhere. */ if (GNU_V3_DEMANGLING || RUST_DEMANGLING || AUTO_DEMANGLING) { ... } ... if (CFORALL_DEMANGLING) { ret = cforall_demangle (mangled, options); if (ret) { return ret; } } } \end{lstlisting} \begin{lstlisting}[style=C++, caption={Setup \CFA demangler style}, label={cfa-demangler-style}, basicstyle=\small] // In gdb/demangle.h #define DMGL_CFORALL (1 << 18) ... /* If none are set, use 'current_demangling_style' as the default. */ #define DMGL_STYLE_MASK (DMGL_AUTO|DMGL_GNU|DMGL_LUCID|DMGL_ARM|DMGL_HP|DMGL_EDG|DMGL_GNU_V3 |DMGL_JAVA|DMGL_GNAT|DMGL_DLANG|DMGL_RUST|DMGL_CFORALL) ... extern enum demangling_styles { no_demangling = -1, unknown_demangling = 0, ... cforall_demangling = DMGL_CFORALL } current_demangling_style; ... #define CFORALL_DEMANGLING_STYLE_STRING "cforall" ... #define CFORALL_DEMANGLING (((int)CURRENT_DEMANGLING_STYLE)&DMGL_CFORALL) \end{lstlisting} \end{figure} However, the setup for the \CFA demangler above does not demangle mangled symbols during symbol-table lookup while the program is in progress. Therefore, additional work needs to be done in \verb|gdb/symtab.c|. Prior to looking up the symbol, GDB attempts to demangle the name of the symbol, which can either be a mangled or unmangled name, to see if it can detect the language, and select the appropriate demangler to demangle the symbol. This work enables invocation of the \CFA demangler during symbol lookup. \begin{lstlisting}[style=C++, caption={\CFA demangler setup for symbol lookup}, label={cfa-symstab-setup}, basicstyle=\small] // In gdb/symtab.c const char * demangle_for_lookup ( const char *name, enum language lang, demangle_result_storage &storage ) { /* When using C++, D, or Go, demangle the name before doing a lookup to use the binary search. */ if (lang == language_cplus) { char *demangled_name = gdb_demangle(name, DMGL_ANSI|DMGL_PARAMS); if (demangled_name != NULL) return storage.set_malloc_ptr (demangled_name); } ... else if (lang == language_cforall) { char *demangled_name = cforall_demangle (name, 0); if (demangled_name != NULL) return storage.set_malloc_ptr (demangled_name); } ... } \end{lstlisting} \section{Result} The addition of hooks throughout GDB enables the invocation of the new \CFA demangler during symbol lookup and during the usage of \verb|binutils| tools such as \verb|objdump| and \verb|nm|. Additionally, these \verb|binutils| tools also understand \CFA because of the addition of the \CFA language code. However, as the language develops, symbol lookup for non-local variables must be implemented to produce the correct output.