The alias Attribute - Some things I learned about the alias attribute
Recently, I had to read and modify some code that was responsible for handling alias
attributes in C sources. Although I knew the alias
attribute existed, I never had a closer look and thus needed to do some research. This escalated quite a bit and as documentation on attributes in general is rather scarce and the alias
attribute poses no exception, I’d like to share my findings with you.
These notes begin with some probably rather dull remarks on the syntax of alias
attributes and a subsequent explanation of how the compiler handles these. After having the basics up our sleeve, we have a closer look at some syntactic and semantic peculiarities. The remainder of these notes is then concerned with some cool or fun examples of what one might actually do with aliases. Readers who are easily bored or prefer to learn by example are well-advised to jump directly to these examples and return to the earlier paragraphs only when needed.
So, who’s the intended target audience? To be honest, I don’t exactly know. Obviously, these notes apply to C only and involve some rather low level hackery. Moreover, if you’re not using GCC or clang on some more or less common platform like x86_64
, ARM
or RISCV
, the probabilities are high that your toolchain does not support the alias
attribute at all. Now, if that didn’t put you off already, I can think of two possible ways of reading these notes:
- If you’ve got some familiarity with C and are simply looking to use the
alias
attribute, it’s advisable to skim through the syntax and semantics sections before diving into some more complicated examples. - If you enjoy torturing your compiler, you’ll probably find some engaging content in the section on testing the limits and some of the later examples.
Before delving in, it’s crucial to acknowledge (yet again) the portability issues associated with the alias attribute. While it offers some quite powerful capabilities, its usage should be approached with caution, especially in cross-platform projects.
Syntax
Owing to the fact that attributes like the alias
attribute have their origin in vendor-specific extensions to the C language, there are several ways to actually define an alias. We’ll only deal with two variants but the interested reader may find a third variant by looking here. Moreover, not every syntax is compatible with every compiler or even compiler settings, and different compilers do not necessarily agree upon the way in which certain attributes interoperate with standard C features. We’ll see examples below, so stay tuned.
GNU C
The GCC compiler originally introduced the alias
attribute in GNU C using its __attribute__
syntax. According to GCC’s documentation, an attribute specifier is of the form
__attribute__(( attribute specifier list ))
where the attribute specifier list is going to consist of the single attribute alias("alias_tgt")
for the greater part of these notes. In principle, such attributes could be attached to anything in your sources. However, the alias
attribute applies only to variable and function declarations. In the example
__attribute__((alias("alias_tgt1")))
extern int alias_var;
__attribute__((alias("alias_tgt2")))
extern void alias_fun();
the attribute alias("alias_tgt1")
applies to the declaration of alias_var
whereas the attribute alias("alias_tgt2")
applies to the declaration of alias_fun()
.
We’ll look into what exactly these alias
attributes mean, but for the moment it suffices to think of these declarations as introducing an additional name alias_var
for the previously defined variable alias_tgt1
and an additional name alias_fun
for the previously defined function alias_tgt2
.
C2x
Now, the reader might know very well that the attribute syntax
[[ namespace::attribute ]]
introduced in C++11 is to be included in the upcoming C2x standard. If you want to have a look, the draft on attributes can be found here.
As the alias
attribute originated in GNU C, the namespace
is going to be gnu
in our case of interest. Hence, in C2x our first example of aliases from above could also be written as
[[ gnu::alias("alias_tgt1") ]]
extern int alias_var;
[[ gnu::alias("alias_tgt2") ]]
extern void alias_fun();
and both clang and GCC happily accept this syntax, as witnessed by compiler explorer. Just make sure that you don't omit the flag -std=c23
when invoking clang.
What is more, the proposal comes with clear and precise rules, where attributes can appear. It tells us that C2x will
[...] allow an attribute specifier to appear to the left of a declaration so that the attributes appertain to all of the declarators in the declaration list, or to appear to the right of all declaration specifiers so that the attributes appertain to the type determined by the specifier sequence. [...] Similarly, an attribute specifier can appear to the right of a type in a declarator to appertain to the type, or to the right of an identifier in a declarator to appertain to the identifier declared.
So, our above example might have also been written as
extern int alias_var [[ gnu::alias("alias_tgt1") ]];
extern void alias_fun [[ gnu::alias("alias_tgt2") ]] ();
and, again, both clang and GCC happily accept this syntax. This can also be checked on compiler explorer.
Semantics
In this section, we explain what the compiler makes of an alias
attribute and give some first working examples. However, in order to explain how the alias
attribute really works, we first need to recall some simple facts about symbol tables in object files.
Symbol tables
Disclaimer: While preparing these notes, the author was working on a GNU/Linux machine and while the specifics given below do not necessarily apply verbatim to your toolchain, the underlying mechanisms are almost certainly the same.
Whenever you declare a global variable or a function in one of your C sources, it ends up as an entry in the corresponding object file’s symbol tables, so that other parts of your program may use it. There is a symbol table for executable code, i.e. functions, called .text
and there are symbol tables for initialized and uninitialized data called .data
and .bss
, respectively.
An entry in the symbol table simply tells the linker what a specific name in your program means. That is, an entry in the symbol tables either maps an identifier to a specific address in one of the segments of your object file, or is marked as UNDEFINED
to tell the linker that this symbol must be defined in some other object file or library.
If you have one module a.c
defining a global variable global
and another module b.c
using that global variable like in the example
// a.c
int global = 23;
// b.c
extern int global;
then the symbol table of a.o
will contain an entry mapping global
to some address in the .data
segment and the symbol table of b.o
will contain an UNDEFINED
entry for global
. The linker is then responsible for merging the symbol tables and making the code in b.c
actually use the address of global
in the .data
segment of a.o
.
The main point I want to make here is that names of functions and global variables are merely entries in some symbol tables that eventually map to specific addresses in memory. This means in particular that two names mapping to the same address will be indistinguishable after compilation and linking because a processor does not know of any names but works with memory addresses only.
Actually, this last paragraph fully explains how the alias
attribute works.
A simple example with variables
Let’s get our hands dirty and try to understand the alias
-attributes by investigating the following simple example:
// simple.c
int alias_tgt;
extern int alias_var [[gnu::alias("alias_tgt")]];
Compiling these two lines and looking at the symbol table of the resulting object file by invoking objdump
or nm
, reveals what happens inside your compiler:
> objdump --syms simple.o simple.o: file format elf64-x86-64 SYMBOL TABLE: 0000000000000000 l df *ABS* 0000000000000000 simple.c 0000000000000000 g O .bss 0000000000000004 alias_tgt 0000000000000000 g O .bss 0000000000000004 alias_var
Don’t worry, if you don’t know how to read the output of objdump
. For our purposes, you only need to know that
- the first column contains the address of a symbol in its segment,
- the fourth column contains the segment that contains the symbol,
- the last column contains the name of the symbol.
The columns are also explained in objdump
's manpage and reading through nm
's manpage might give some further information if you're really curious.
In our simple example, we thus find two symbols alias_tgt
and alias_var
at address 0
of the segment .bss
for uninitialized data. Once loaded into memory by your operating system or a boot loader, the variables alias_tgt
and alias_var
from simple.c
will hence refer to the same actual address in memory and are indistinguishable from your computer’s point of view.
A simple example with functions
The alias attribute does not only apply to variable symbols but also to function symbols. Compiling the example
// fun.c
int fn(int a) {
return a + a;
}
[[gnu::alias("fn")]] int twice(int);
and investigating the resulting object file with objdump
, we get the result
> objdump --syms fun.o fun.o: file format elf64-x86-64 SYMBOL TABLE: 0000000000000000 l df *ABS* 0000000000000000 fun.c 0000000000000000 l d .text 0000000000000000 .text 0000000000000000 g F .text 000000000000000e fn 0000000000000000 g F .text 000000000000000e twice
Again, we have two symbols fn
and twice
in the .text
section of our object file that share the single address 0
.
Testing out the limits
What exactly is an alias
?
We now know more or less what the compiler does when it encounters an alias
attribute. However, it is not so clear what an alias
attribute means in terms of the C language. Essentially the only official documentation on the alias
attribute is GCC’s documentation of which I’ll quote the relevant first half:
The alias variable attribute causes the declaration to be emitted as an alias for another symbol known as an alias target. Except for top-level qualifiers the alias target must have the same type as the alias. For instance, the following int var_target; extern int __attribute__ ((alias ("var_target"))) var_alias; defines var_alias to be an alias for the var_target variable. It is an error if the alias target is not defined in the same translation unit as the alias.
One question that remains unanswered by this explanation is the question whether an alias is a definition in the standard committee’s sense. In fact, this is the question that originally initiated my investigations into the alias
attribute.
Without further ado, let’s see what GCC and clang have to say. The snippet
int target;
[[ gnu::alias("target")]]
extern int alias_var;
int alias_var;
is happily accepted by both GCC and clang. However, if we turn the second declaration of alias_var
from a tentative definition into a certain definition by changing the second declaration to
int alias_var = 0;
the two compilers start to disagree. In fact, clang complains about a redefinition of symbol alias_var
while GCC tells us nothing. See for yourself. When you’re at it, you may check that GCC also accepts the alias after a definition of alias_var
.
Now, it seems that clang treats the alias
as a symbol definition and thus rightly complains about a symbol redefinition. But what does GCC do? Well, a quick inspection of the symbol table tells us that the alias seems to override any other variable definition that exists. There’s also this old bug report for GCC that seems related but never got any activity.
By the way, when it comes to functions, both clang and GCC seem to treat an alias
attribute as a function definition and therefore complain about redefinitions.
Returning to variables, there’s one last surprise. Above, we noted that it seems that clang treats the alias
as a symbol definition. However, that’s not completely true. Whereas a definition may appear after any number of tentative definitions, clang does not allow this for alias
definitions and some tentative definitions such as
int alias_var;
[[gnu::alias("alias_tgt")]]
extern int alias_var;
as can also be seen on godbolt. Funny enough, turning the tentative definition int alias_var
into a definition with external linkage (in the standard’s sense, see §6.9.2 in the latest working draft) as in
extern int alias_var;
[[gnu::alias("alias_tgt")]]
extern int alias_var;
reconciles clang with the code snippet. As already mentioned above, GCC is more forgiving and accepts both variants.
So, what’s the upshot? I guess there are two points:
- GCC and clang don’t necessarily agree on what an
alias
actually is. - If you’re using clang, an
alias
attribute on some declaration is very close to a usual definition. Just make sure that either thealias
is the first declaration or all other declarations are explicitly extern.
Syntactic limits
The reader may have observed that while the clear rules for the attribute syntax in C2x were highlighted above, there was no mention of any rules in the case of GCC’s original syntax. There’s a reason for that and in order to set the stage, let me quote from GCC’s documentation on its attribute syntax:
For compatibility with existing code written for compiler versions that did not implement attributes on nested declarators, some laxity is allowed in the placing of attributes.
Even though we’re not dealing with any nested declarators here, let’s see what GCC and clang are able to swallow. In fact, all four declarations in the example
__attribute__((alias("alias_tgt"))) extern int alias_var1;
extern __attribute__((alias("alias_tgt"))) int alias_var2;
extern int __attribute__((alias("alias_tgt"))) alias_var3;
extern int alias_var4 __attribute__((alias("alias_tgt")));
are happily accepted by both GCC and clang and turn out to be semantically equivalent. For function aliases there are even more positions where one could place the __attribute__
. Of these, the correct C2x variant
extern void alias_fun __attribute__((alias("alias_tgt"))) ();
is rejected by both GCC and clang, and the variant
extern void alias_fun (__attribute__((alias("alias_tgt"))));
is of course interpreted as an attribute for parameter declarations. Thus, just as in the variable case, we are left with four valid and semantically equivalent variants to define a function alias:
__attribute__((alias("alias_tgt"))) extern void alias_fun1();
extern __attribute__((alias("alias_tgt"))) void alias_fun2();
extern void __attribute__((alias("alias_tgt"))) alias_fun3();
extern void alias_fun4() __attribute__((alias("alias_tgt")));
Note that for functions like int* fn()
returning a pointer, there are even more variations of a function alias
definition. Rest assured that in this case, too, GCC and clang swallow whatever you may come up with except for the two cases already excluded above.
However, one step further, GCC and clang start to disagree on what monstrosities are still acceptable. The alias inbetween
in the example
int** ptr = 0;
int** target() {
return ptr;
}
extern int * __attribute__((alias("target"))) * inbetween();
is still accepted by clang while GCC has problems parsing the corresponding line and consequently does not export a function called inbetween
. You can see the warning in compiler explorer and either investigate the symbol tables on your own or have a look at the output on my machine:
> clang -c -std=c2x between.c > objdump --syms between.o between.o: file format elf64-x86-64 SYMBOL TABLE: 0000000000000000 l df *ABS* 0000000000000000 between.c 0000000000000000 l d .text 0000000000000000 .text 0000000000000000 g F .text 000000000000000e target 0000000000000000 g O .bss 0000000000000008 ptr 0000000000000000 g F .text 000000000000000e inbetween
and
> gcc -c -std=c2x between.c between.c:6:1: warning: ‘alias’ attribute does not apply to types [-Wattributes] 6 | extern int * __attribute__((alias("target"))) * inbetween(); | ^~~~~~ > objdump --syms between.o between.o: file format elf64-x86-64 SYMBOL TABLE: 0000000000000000 l df *ABS* 0000000000000000 between.c 0000000000000000 l d .text 0000000000000000 .text 0000000000000000 g O .bss 0000000000000008 ptr 0000000000000000 g F .text 000000000000000d target
A Word of Warning
Even though the beginning of the preceding paragraph might have left you with the impression that we may place __attribute__((alias="target"))
essentially wherever we want, the placement does matter for a declaration list. In
extern int a, b __attribute__((alias("target"))) ;
only b
is an alias
for target
while in
__attribute__((alias("target")))
extern int a, b;
both a
and b
are alias
es for target
. If you don’t believe me, just have a look at the symbol tables on your own and keep in mind that this remark applies to the new C2x syntax, too.
So, if I’d be pressed to write guidelines for alias
attributes, I’d suggest the following simple rules:
- An
alias
declaration should consist of a single declarator to avoid any confusion. - If you can, stick to the syntax introduced in C2x.
- The attribute annotations have to be placed in front of the declaration like in our very first examples for the GNU and C2x syntax as this syntax seems to reliably work with both GCC and clang.
Examples
We’ll give several slightly more involved examples of the alias
attribute. The first two of these examples are real-world examples. The other examples again more or less classify as testing the limits.
Aliases as portability and linking hacks
One problem where aliases might come handy is offering a specific API without renaming all your functions or offering different implementations of an API and switching between these implementations by changing aliases, e.g. with guarding #ifdef
s. Among other examples for this use case, searching on github for __attribute__((alias
yields for instance espressif’s implementation of VFS.
Software engineering with aliases
Another nice trick that crucially employs aliases is more related to software engineering than to linking problems: Imagine a situation, where one module of your software owns a global variable essentially_const
that is computed once during startup but constant afterwards. If essentially_const
is exposed as a global variable, then - rather sooner than later - some rogue developer on your team will introduce a function that modifies essentially_const
and thereby introduce a hard-to-find bug[1]. After all, essentially_const
is a non-constant global variable and modifying it should be OK, right?
Let us indicate a neat solution for this problem that makes good use of aliases:
a.c | a.h |
---|---|
static int essentially_const;
[[gnu::alias("essentially_const")]]
extern const int const_view;
void init() {
essentially_const = 42;
}
|
extern const int const_view;
void init();
|
Let’s see what’s happening here:
- The variable
essentially_const
ina.c
is declared non-constant and the functioninit()
at the bottom ofa.c
may consequently modify it as its author pleases. Moreover,essentially_const
is declaredstatic
and it therefore has internal linkage. That means in particular that our rogue developer cannot modifyessentially_const
as the symbol is not exposed to her[2].
- In addition to
essentially_const
, the modulea.c
also defines a variableconst_view
of typeconst int
as an alias ofessentially_const
. This global symbol is then exposed ina.h
to all other modules of the project. What have we gained? The interfacea.h
of the modulea.c
now clearly states thatconst_view
is constant and any compiler will complain if said rogue developer tries to modify its value.
In fact, trying to compile
// rogue.c
#include "a.h"
void fn() {
init();
const_view = 13;
}
results in an error message such as
rogue.c: In function ‘fn’: rogue.c:6:14: error: assignment of read-only variable ‘const_view’ 6 | const_view = 13; | ^
That clearly tells our rogue developer that modifying const_view
is an evil thing to do.
Actually, this trick is not something the author came up with by himself. Rather, while doing some preliminary research for these notes, the author learned this technique from a source file in the qemu repositories, where the reader may also find some complementary explanations.
Doing weird stuff with aliases
As mentioned earlier, aliases are essentially just a way to have distinct names for the same memory address. As already exploited in the previous example, this means in particular that the aliasing mechanism bypasses C’s already quite weak type system and we might very well have symbols of different types sharing a single memory address.
In the example
// double1.c
#include <stdio.h>
#include <stdint.h>
double a;
[[gnu::alias("a")]]
extern uint64_t b;
#define SIGN_MASK 0x8000000000000000UL
int main(int argc, char **argv) {
a = 0.5;
b |= SIGN_MASK;
printf("%f\n", a);
return 0;
}
we have a symbol a
of type double
and a symbol b
of type uint64_t
sharing the same memory address. Running this example on a computer, where a double
occupies 64 bits and where the sign of doubles is stored in the most significant bit, will result in an output like
> ./double1 -0.500000
The reason is that the line
b |= SIGN_MASK;
sets the most significant bit of the value stored at the address referred to by b
, i.e. the very same address that a
refers to. This most significant bit of b
thus happens to be the sign bit of our double a
.
As the reader might guess and as witnessed by the following example, one is of course not confined to messing around with the sign bit of floating point numbers. If you know the format of your floating point types, you might freely modify the exponent or the mantissa:
// double2.c
#include <stdio.h>
double a;
[[gnu::alias("a")]]
extern uint64_t b;
#define EXPONENT_MASK 0x7FF0000000000000UL
#define EXPONENT_SHIFT 52
int main() {
// messing with the exponent
a = 1;
b = (b ^ EXPONENT_MASK) | ((b & EXPONENT_MASK) + (2UL << EXPONENT_SHIFT));
printf("%f\n", a);
// messing with the mantissa
a = 1;
b = b | (1UL << (EXPONENT_SHIFT - 2));
printf("%f\n", a);
return 0;
}
On common consumer hardware, the output of the program looks as follows:
> ./double2 4.000000 1.250000
If you don’t know why, the Wikipedia article on double precision floating point numbers might be a good starting point for finding an explanation on your own.
Doing more weird stuff with aliases
As you might have guessed by now, there are very few safeguards in place to prevent us from defining function aliases with different signatures. In combination with other attributes like gnu::packed
or gnu::aligned
this may be used for tricks like the following:
// signatures.c
#include <stdio.h>
#include <stdint.h>
typedef struct [[gnu::packed]] {
uint32_t a;
uint32_t b;
} S;
void f(S s) {
printf("a = %u, b = %u\n", s.a, s.b);
}
[[gnu::alias("f")]]
extern void g(uint64_t ab);
int main(int argc, char **argv) {
S s = {2,3};
printf("Calling f: ");
f(s);
printf("Calling g: ");
g(0x0000000200000001UL);
return 0;
}
Before showing you what this program does, let us examine it a bit more closely:
- The file begins with the declaration of a struct type
S
with two membersa
andb
of typeuint32_t
. The attributegnu::packed
tells the compiler to not insert any padding bytes, so that the layout of any instance of the structS
in memory is exactly as we as C programmers see it: 4 bytes of memory occupied by the membera
followed by 4 bytes of memory occupied by the memberb
. - The function
f
takes an instances
of the struct typeS
and simply prints its members to the standard output. - What follows is an alias
g
off
with a different signature. The functiong
takes a single argumentab
of typeuint64_t
. Being an alias off
, however, any call ofg
will execute the exact same code thatf
compiles to. Luckily, values of typeuint64_t
occupy 8 bytes of memory, which is the exact same amount of memory occupied by instances of the struct typeS
. The machine code thatf
compiles to will thus interpret the lower and higher 4 bytes ofg
’s argumentab
as membersa
andb
off
’s arguments
, respectively.
Now, let’s see what this program does:
> ./signatures Calling f: a = 2, b = 3 Calling g: a = 1, b = 2
As explained above, the function f
indeed interprets the 8 bytes 0x0000000200000001UL
given as argument to g
as an instance of type S
with member a
being the lower four bytes 0x00000001
and member b
being the upper four bytes 0x00000002
.
Let me finish this example with a word of warning: Although being great fun, tricks like this one are highly non-portable as they depend among other things on memory alignment, padding, calling conventions and the concrete hardware that your binary is going to be deployed on. So, if you don’t know exactly what you’re doing, they should never be used in any production code.
An invitation
If you’ve come this far, there’s not much more I want to tell you for now. Get a cup of coffee, fire up your favourite editor and have some fun with aliases. If you don’t know where to start, let me give you one last hint: One may use the alias attribute for variables of struct
type, too. This can be seen in the following program that is merely a slight variation of our previous example of function aliases:
#include <stdio.h>
#include <stdint.h>
typedef struct [[gnu::packed]] {
uint32_t a;
uint32_t b;
} S;
S s;
[[gnu::alias("s")]]
extern uint64_t c;
int main(int argc, char **argv) {
s.b = 0x42;
printf("%lx\n", c);
return 0;
}
Links
- Series of very well-written articles by Martin Sebor for RedHat Developer. The second article also deals with the
alias
attribute from a slightly different perspective. - GCC's documentation on common variable attributes
- GCC's documentation on common function attributes
- ↑ With some non-negligible probablity that rogue developer is you!
- ↑ Actually, having an
alias
on a static symbol leads to that symbol being exposed in the corresponding object file’s symbol table, see ARM’s documentation for instance. However, I did not find a way to modify the value through that entry in the symbol table as it is a non-global object.