source: doc/theses/lynn_tran_SE499/Chapters/Demangler.tex @ b797d978

ADTast-experimental
Last change on this file since b797d978 was 1b34b87, checked in by Peter A. Buhr <pabuhr@…>, 6 years ago

Lynn's GDB essay

  • Property mode set to 100644
File size: 8.2 KB
Line 
1\chapter{\CFA Demangler} \label{demangler}
2
3\section{Introduction}
4While \CFA is a translator for additional features that C does not support,
5all the extensions compiled down to C code.  As a result, the executable file
6marks the DWARF tag \verb|DW_AT_language| with the fixed hexadecimal value for
7the C language. Because it is possible to have one frame in C code and another
8frame in Assembly code, GDB encodes a language flag for each frame. \CFA adds
9to this list, as it is essential to know when a stack frame contains mangled
10names versus C and assembler unmangled names. Thus, GDB must be told \CFA is a
11distinct source-language.
12
13\section{Design Constraints}
14Most GDB targets use the DWARF format.  The GDB DWARF reader initializes all
15the appropriate information read from the DIE structures in object or
16executable files, as mentioned in Chapter \ref{GDB}. However, GDB currently
17does not understand the new DWARF language-code assigned to the language \CFA,
18so the DWARF reader must be updated to recognize \CFA.
19
20Additionally, when a user enters a name into GDB, GDB needs to lookup if the
21name exists in the program. However, different languages may have different
22hierarchical structure for dynamic scope, so an implementation for nonlocal
23symbol lookup is required, so an appropriate name lookup routine must be added.
24
25\section{Implementation}
26Most of the implementation work discussed below is from reading GDB's internals
27wiki page and understanding how other languages are supported in GDB \cite{Reference5}.
28
29A new entry is added to GDB's list of language definition in
30\verb|gdb/defs.h|. Hence, a new instance of type \verb|struct language_def|
31must be created to add a language definition for \CFA. This instance is the
32entry point for new functions that are only applicable to \CFA. These new
33functions are invoked by GDB during debugging if there are operations that are
34applicable specifically to \CFA. For instance, \CFA can implement its own
35symbol lookup function for non-local variables because \CFA can have a
36different scope hierarchy. The final step for registering \CFA in GDB, as a
37new source language, is adding the instance of type \verb|struct language_def|
38into the list of language definitions, which is found in
39\verb|gdb/language.h|. This setup is shown in listing \ref{cfa-lang-def}.
40
41\begin{figure}
42\begin{lstlisting}[style=C++, caption={Language definition declaration for
43\CFA}, label={cfa-lang-def}, basicstyle=\small]
44// In gdb/language.h
45extern const struct language_defn cforall_language_defn;
46
47// In gdb/language.c
48static const struct language_defn *languages[] = {
49    &unknown_language_defn,
50    &auto_language_defn,
51    &c_language_defn,
52    ...
53    &cforall_language_defn,
54    ...
55 }
56
57// In gdb/cforall-lang.c
58extern const struct language_defn cforall_language_defn = {
59    "cforall",                      /* Language name */
60    "CForAll",                      /* Natural name */
61    language_cforall,
62    range_check_off,
63    case_sensitive_on,
64    ...
65    cp_lookup_symbol_nonlocal,      /* lookup_symbol_nonlocal */
66    ...
67    cforall_demangle,               /* Language specific demangler */
68    cforall_sniff_from_mangled_name,
69    ..
70}
71\end{lstlisting}
72\end{figure}
73
74The next step is updating the DWARF reader, so the reader can translate the
75DWARF code to an enum value defined above. However, this assumes that the
76language has an assigned language code.  The language code is a hexadecimal
77literal value assigned to a particular language, which is maintained by
78GCC. For \CFA, the hexadecimal value \verb|0x0025| is added to
79\verb|include/dwarf2.h| to denote \CFA, which is shown in listing
80\ref{cfa-dwarf}.
81\begin{lstlisting}[style=C++, caption={DWARF language code for \CFA},
82label={cfa-dwarf}, basicstyle=\small]
83// In include/dwarf2.h
84enum dwarf_source_language { // Source language names and codes.
85    DW_LANG_C89 = 0x0001,
86    ...
87    DW_LANG_CForAll = 0x0025,
88}
89\end{lstlisting}
90
91Once the demangler implementation goes into the \verb|libiberty| directory
92along with other demanglers, the demangler can be called by updating the
93language definition defined in listing \ref{cfa-lang-def} with the entry point
94of the \CFA demangler, and adding a check if the current demangling style is
95\verb|CFORALL_DEMANGLING| as seen in listing \ref{cfa-demangler}. GDB then
96automatically invokes this \CFA demangler when GDB detects the source language
97is \CFA. In addition to the automatic invocation of the demangler, GDB provides
98an option to manually set which demangling style to use in the command line
99interface.  This option can be turned on for \CFA in GDB by adding a new enum
100value for \CFA in the list of demangling styles along with setting the
101appropriate mask for this style in \verb|include/demangle.h|. After doing this
102step, users can now choose if they want to use the \CFA demangler in GDB by
103calling \verb|set demangle-style <language>|, where the language name is
104defined by the preprocessor macro \verb|CFORALL_DEMANGLING_STYLE_STRING| in
105listing \ref{cfa-demangler-style}.
106
107\begin{figure}
108\begin{lstlisting}[style=C++, caption={libiberty setup for the \CFA demangler},
109label={cfa-demangler}, basicstyle=\small]
110// In libiberty/cplus-dem.c
111const struct demangler_engine libiberty_demanglers[] = {
112    {
113        NO_DEMANGLING_STYLE_STRING,
114        no_demangling,
115        "Demangling disabled"
116    },
117    ...
118    {
119        CFORALL_DEMANGLING_STYLE_STRING,
120        cforall_demangling,
121        "Cforall style demangling"
122    },
123}
124...
125char * cplus_demangle(const char *mangled, int options) {
126    ...
127    /* The V3 ABI demangling is implemented elsewhere.  */
128    if (GNU_V3_DEMANGLING || RUST_DEMANGLING || AUTO_DEMANGLING) { ... }
129    ...
130    if (CFORALL_DEMANGLING) {
131        ret = cforall_demangle (mangled, options);
132        if (ret) { return ret; }
133    }
134}
135\end{lstlisting}
136
137\begin{lstlisting}[style=C++, caption={Setup \CFA demangler style},
138label={cfa-demangler-style}, basicstyle=\small]
139// In gdb/demangle.h
140#define DMGL_CFORALL (1 << 18)
141...
142/* If none are set, use 'current_demangling_style' as the default. */
143#define DMGL_STYLE_MASK
144(DMGL_AUTO|DMGL_GNU|DMGL_LUCID|DMGL_ARM|DMGL_HP|DMGL_EDG|DMGL_GNU_V3
145|DMGL_JAVA|DMGL_GNAT|DMGL_DLANG|DMGL_RUST|DMGL_CFORALL)
146...
147extern enum demangling_styles {
148    no_demangling = -1,
149    unknown_demangling = 0,
150    ...
151    cforall_demangling = DMGL_CFORALL
152} current_demangling_style;
153...
154#define CFORALL_DEMANGLING_STYLE_STRING  "cforall"
155...
156#define CFORALL_DEMANGLING (((int)CURRENT_DEMANGLING_STYLE)&DMGL_CFORALL)
157\end{lstlisting}
158\end{figure}
159
160However, the setup for the \CFA demangler above does not demangle mangled
161symbols during symbol-table lookup while the program is in progress. Therefore,
162additional work needs to be done in \verb|gdb/symtab.c|. Prior to looking up
163the symbol, GDB attempts to demangle the name of the symbol, which can either
164be a mangled or unmangled name, to see if it can detect the language, and select
165the appropriate demangler to demangle the symbol. This work enables invocation
166of the \CFA demangler during symbol lookup.
167\begin{lstlisting}[style=C++, caption={\CFA demangler setup for symbol lookup},
168label={cfa-symstab-setup}, basicstyle=\small]
169// In gdb/symtab.c
170const char * demangle_for_lookup ( const char *name, enum language lang,
171                                   demangle_result_storage &storage ) {
172    /* When using C++, D, or Go, demangle the name before doing a
173       lookup to use the binary search. */
174    if (lang == language_cplus) {
175        char *demangled_name = gdb_demangle(name, DMGL_ANSI|DMGL_PARAMS);
176        if (demangled_name != NULL)
177            return storage.set_malloc_ptr (demangled_name);
178    }
179    ...
180    else if (lang == language_cforall) {
181        char *demangled_name = cforall_demangle (name, 0);
182        if (demangled_name != NULL)
183            return storage.set_malloc_ptr (demangled_name);
184    }
185    ...
186}
187\end{lstlisting}
188
189\section{Result}
190The addition of hooks throughout GDB enables the invocation of the new \CFA
191demangler during symbol lookup and during the usage of \verb|binutils| tools
192such as \verb|objdump| and \verb|nm|. Additionally, these \verb|binutils| tools
193also understand \CFA because of the addition of the \CFA language code.
194However, as the language develops, symbol lookup for non-local variables must
195be implemented to produce the correct output.
Note: See TracBrowser for help on using the repository browser.