The Solaris 11 link-editor (ld) contains support for a new type
of object that we call a stub object.
A stub object is a shared object, built entirely from
mapfiles, that supplies the same linking interface as the real
object, while containing no code or data.
Stub objects cannot be executed the runtime linker will
kill any process that attempts to load one. However, you can link to a
stub object as a dependency, allowing the stub to act as a proxy for
the real version of the object.
You may well wonder if there is a point to producing an object that
contains nothing but linking interface. As it turns out, stub objects are very
useful for building large bodies of code such as Solaris. In the last year,
we've had considerable success in applying them to one of our oldest
and thorniest build problems. In this discussion, I will describe
how we came to invent
these objects, and how we apply them to building Solaris.
This posting explains where the idea for stub objects came
from, and details our long and twisty journey from hallway idea to
standard link-editor feature. I expect that these details are
mainly of interest to those who work on Solaris and its makefiles,
those who have done so in the past, and those who work with other similar
bodies of code.
A subsequent posting will omit the history
and background details, and instead discuss how to build and use
stub objects. If you are mainly interested in what stub objects are,
and don't care about the underlying software war stories,
I encourage you to skip ahead.
The Long Road To Stubs
This all started for me with an email discussion in May of 2008, regarding
a change request that was filed in 2002, entitled:
4631488 lib/Makefile is too patient: .WAITs should be reduced
This CR encapsulates a number of cronic issues with Solaris builds:
We build Solaris with a parallel make (dmake) that tries to
build as much of the code base in parallel as possible. There
is a lot of code to build, and we've long made use of parallelized
builds to get the job done quicker. This is even more important
in today's world of massively multicore hardware.
Solaris contains a large number of executables and
shared objects. Executables depend on shared objects,
and shared objects can depend on each other. Before you
can build an object, you need to ensure that the objects it
needs have been built. This implies a need for serialization,
which is in direct opposition to the desire to build everying
in parallel.
To accurately build objects in the right order requires
an accurate set of make rules defining the things that depend
on each other. This sounds simple, but the reality is quite
complex. In practice, having programmers explicitly specify
these dependencies is a losing strategy:
It's really hard to get right.
It's really easy to get it wrong and never know it
because things build anyway.
Even if you get it right, it won't stay that way,
because dependencies between objects can change over
time, and make cannot help you detect such drifing.
You won't know that you got it wrong until the builds
break. That can be a long time after the change that
triggered the breakage happened, making it hard to connect
the cause and the effect. Usually this happens just before
a release, when the pressure is on, its hard to think
calmly, and there is no time for deep fixes.
As a poor compromise, the libraries in core Solaris were
built using a set of grossly incomplete
hand written rules, supplemented with a number of
dmake .WAIT directives
used to group the libraries into sets of non-interacting groups that
can be built in parallel because we think they don't depend on
each other.
From time to time, someone will suggest that we could
analyze the built objects themselves to determine their
dependencies and then generate make rules based on those
relationships. This is possible, but but there
are complications that limit the usefulness of that approach:
To analyze an object, you have to build it first.
This is a classic chicken and egg scenario.
You could analyze the results of a previous build, but
then you're not necessarily going to get accurate rules
for the current code.
It should be possible to build the code without having
a built workspace available.
The analysis will take time, and remember that we're
constantly trying to make builds faster, not slower.
By definition, such an approach will always be approximate,
and therefore only incremantally more accurate than the hand
written rules described above. The hand written rules are fast
and cheap, while this idea is slow and complex, so we stayed with
the hand written approach.
Solaris was built that way, essentially forever, because these are genuinely
difficult problems that had no easy answer. The makefiles were full of build
races in which the right outcomes happened reliably for years until a
new machine or a change in build server workload upset the
accidental balance of things. After figuring out what had happened, you'd
mutter "How did that ever work?", add another incomplete and soon to be
inaccurate make dependency rule to the system, and move on. This was not
a satisfying solution, as we tend to be perfectionists in the Solaris group,
but we didn't have a better answer. It worked well enough,
approximately.
And so it went for years. We needed a different approach
a new idea to cut the
Gordian Knot.
In that discussion from May 2008, my fellow linker-alien
Rod Evans had the initial
spark that lead us to a game changing series of realizations:
The link-editor is used to link objects together, but
it only uses the ELF metadata in the object, consisting
of symbol tables, ELF versioning sections, and similar data.
Notably, it does not look at, or understand, the machine code
that makes an object useful at runtime.
If you had an object that only contained the ELF metadata
for a dependency, but not the code or data, the link-editor would
find it equally useful for linking, and would never know the difference.
Call it a stub object.
In the core Solaris OS, we require all objects to be built with
a link-editor mapfile that describes all of its publically available
functions and data. Could we build a stub object using the mapfile
for the real object?
It ought to be very fast to build stub objects, as there are
no input objects to process.
Unlike the real object, stub objects would not actually require
any dependencies, and so, all of the stubs for the entire system
could be built in parallel.
When building the real objects, one could link against the stub
objects instead of the real dependencies. This means that all the
real objects can be built built in parallel too, without any
serialization.
We could replace a system that requires perfect makefile rules with
a system that requires no ordering rules whatsoever.
The results would be considerably more robust.
We immediately realized that this idea had
potential, but also that there were many details to sort out, lots of
work to do, and that perhaps it wouldn't really pan out. As is often the
case, it would be necessary to do the work and see how it turned out.
Following that conversation, I set about trying to build a stub object.
We determined that a faithful stub has to do the following:
Present the same set of global symbols, with the same ELF
versioning, as the real object.
Functions are simple it suffices to
have a symbol of the right type, possibly, but not necessarily,
referencing a null function in its text segment.
Copy relocations make data more complicated to stub.
The possibility of a copy relocation means that when you create
a stub, the data symbols must have the actual size of the real data.
Any error in this will go uncaught at link time, and will cause
tragic failures at runtime that are very hard to diagnose.
For reasons too obscure to go into here, involving
tentative symbols, it is also important
that the data reside in bss, or not, matching its placement in
the real object.
If the real object has more than one symbol pointing at
the same data item, we call these aliased symbols.
All data symbols in the stub object must exhibit the same aliasing
as the real object.
We imagined the stub library feature working as follows:
A command line option to ld tells it to produce a stub rather
than a real object. In this mode, only mapfiles are examined,
and any object or shared libraries on the command line are
are ignored.
The extra information needed
(function or data, size, and bss details) would be added to the
mapfile.
When building the real object instead of the stub, the extra
information for building stubs would be validated against the
resulting object to ensure that they match.
In exploring these ideas, I immediately run headfirst into the reality of
the original mapfile syntax, a subject that I would later write about
as
The Problem(s) With Solaris SVR4 Link-Editor Mapfiles. The idea of
extending that poor language was a non-starter. Until a better mapfile
syntax became available, which seemed unlikely in 2008, the solution could
not involve extentions to the mapfile syntax.
Instead, we cooked up the idea (hack) of augmenting mapfiles with stylized
comments that would carry the necessary information. A typical definition
might look like:
# DATA(i386) __iob 0x3c0
# DATA(amd64,sparcv9) __iob 0xa00
# DATA(sparc) __iob 0x140
iob;
A further problem then became clear: If we can't extend the mapfile syntax,
then there's no good way to extend ld with an option to produce stub
objects, and to validate them against the real objects. The idea of
having ld read comments in a mapfile and parse them for content is
an unacceptable hack. The entire point of comments is that they are
strictly for the human reader, and explicitly ignored by the tool.
Taking all of these speed bumps into account, I made a new plan:
A perl script reads the mapfiles, generates some small C
glue code to produce empty functions and data definitions, compiles
and links the stub object from the generated glue code, and then
deletes the generated glue code.
Another perl script used after both objects have been built,
to compare the real and stub objects, using
data from elfdump, and validate that they present the same linking
interface.
By June 2008, I had written the above, and generated a stub object
for libc. It was a useful prototype process to go through, and it
allowed me to explore the ideas at a deep level. Ultimately though,
the result was unsatisfactory as a basis for real product. There were
so many issues:
The use of stylized comments were fine for a prototype,
but not close to professional enough for shipping product.
The idea of having to document and support it was a large concern.
The ideal solution for stub objects really does involve
having the link-editor accept the same arguments used to
build the real object, augmented with a single extra command
line option. Any other solution, such as our prototype script,
will require makefiles to be
modified in deeper ways to support building stubs, and so, will
raise barriers to converting existing code.
A validation script that rederives what the linker knew
when it built an object will always be at a disadvantage
relative to the actual linker that did the work.
A stub object should be identifyable as such. In the prototype,
there was no tag or other metadata that would let you know
that they weren't real objects. Being able to identify a stub
object in this way means that the file command can tell you what
it is, and that the runtime linker can refuse to try and run
a program that loads one.
At that point, we needed to apply this prototype to building Solaris.
As you might imagine, the task of modifying all the makefiles in the core
Solaris code base in order to do this is a massive task, and not something
you'd enter into lightly. The quality of the prototype just wasn't good
enough to justify that sort of time commitment, so I tabled the project,
putting it on my list of long term things to think about, and moved on to
other work. It would sit there for a couple of years.
Semi-coincidentally,
one of the projects I tacked after that was to create a new mapfile
syntax for the Solaris link-editor. We had wanted to do something about
the old mapfile syntax for many years. Others before me had done
some paper designs,
and a great deal of thought had already gone into the features it should,
and should not have, but for various reasons things had never moved
beyond the idea stage. When I joined Sun in late 2005, I got involved
in reviewing those things and thinking about the problem.
Now in 2008, fresh from relearning for the Nth time why the old mapfile
syntax was a huge impediment to linker progress,
it seemed like the right time to tackle the mapfile issue. Paving the way for
proper stub object support was not the driving force behind that effort,
but I certainly had them in mind as I moved forward.
The new mapfile syntax, which we call version 2, integrated into
Nevada build snv_135 in in February 2010:
6916788 ld version 2 mapfile syntax
PSARC/2009/688 Human readable and extensible ld mapfile syntax
In order to prove that the new mapfile syntax was adequate for
general purpose use, I had also done an overhaul of the ON consolidation
to convert all mapfiles to use the new syntax, and put checks in place
that would ensure that no use of the old syntax would creep back in.
That work went back into snv_144 in June 2010:
6916796 OSnet mapfiles should use version 2 link-editor syntax
That was a big putback, modifying 517 files, adding 18 new files,
and removing 110 old ones.
I would have done this putback anyway, as the work was already done,
and the benefits of human readable syntax are obvious. However,
among the justifications listed in CR 6916796 was this
We anticipate adding additional features to the new mapfile
language that will be applicable to ON, and which will require
all sharable object mapfiles to use the new syntax.
I never explained what those additional features were, and no one
asked. It was premature to say so, but
this was a reference to stub objects. By that point, I had already put
together a working prototype
link-editor with the necessary support for stub objects. I was pleased to
find that building stubs was indeed very fast. On my desktop system
(Ultra 24), an amd64 stub for libc can can be built in a fraction of a second:
% ptime ld -64 -z stub -o stubs/libc.so.1 -G -hlibc.so.1 \
-ztext -zdefs -Bdirect ...
real 0.019708910
user 0.010101680
sys 0.008528431
In order to go from prototype to integrated link-editor feature, I knew
that I would need to prove that stub objects were valuable. And to do that,
I knew that I'd have to switch the Solaris ON consolidation
to use stub objects and evaluate the outcome. And in order to do that
experiment, ON would first need to be converted to version 2 mapfiles.
Sub-mission accomplished.
Normally when you design a new feature, you can devise reasonably small
tests to show it works, and
then deploy it incrementally, letting it prove its value as it goes.
The entire point of stub objects however was
to demonstrate that they could be successfully applied to an extremely large
and complex code base, and specifically to solve the Solaris build issues
detailed above. There was no way to finesse the matter
in order to move ahead, I would have to
successfully use stub objects to build the entire ON consolidation
and demonstrate their value.
In software, the need to boil the ocean can often be a warning sign that things
are trending in the wrong direction. Conversely, sometimes progress demands
that you build something large and new all at once. A big win, or a big loss
sometimes all you can do is try it and see what happens.
And so, I spent some time staring at ON makefiles trying to
get a handle on how
things work, and how they'd have to change. It's a big and messy world,
full of complex interactions, unspecified dependencies, special cases,
and knowledge of arcane makefile features...
...and so, I backed away, put it down for a few months and did other work...
...until the fall, when I felt like it was time to stop thinking and pondering
(some would say stalling) and get
on with it. Without stubs, the following gives
a simplified high level view of how Solaris is built:
An initially empty directory known as the proto, and referenced
via the ROOT makefile macro is established to receive the files that
make up the Solaris distribution.
A top level setup rule creates the proto area, and performs
operations needed to initialize the workspace so that the main
build operations can be launched, such as copying needed header files
into the proto area.
Parallel builds are launched to build the kernel (usr/src/uts),
libraries (usr/src/lib), and commands. The install makefile
target builds each item and delivers a copy to the proto area.
All libraries and executables link against the objects previously
installed in the proto, implying the need to synchronize the order
in which things are built.
Subsequent passes run lint, and do packaging.
Given this structure, the additions to use stub objects are:
A new second proto area is established, known as the stub proto
and referenced via the STUBROOT makefile macro. The stub proto has the
same structure as the real proto, but is used to hold stub objects.
All files in the real proto are delivered as part of the Solaris
product. In contrast, the stub proto is used to build the product,
and then thrown away.
A new target is added to library Makefiles called
stub. This
rule builds the stub objects. The ld command is designed so that you
can build a stub object using the same ld command line you'd use to
build the real object, with the addition of a single -z stub
option. This means that the makefile rules for building the
stub objects
are very similar to those used to build the real objects, and many
existing makefile definitions can be shared between them.
A new target is added to the Makefiles called stubinstall
which delivers the stub objects built by the stub rule into the
stub proto. These rules reuse much of existing plumbing used by
the existing install rule.
The setup rule runs stubinstall over the entire lib subtree
as part of its initialization.
All libraries and executables link against the objects in
the stub proto rather than the main proto, and can therefore
be built in parallel without any synchronization.
There was no small way to try this that would yield meaningful
results. I would have to take a leap of faith and edit approximately 1850
makefiles and 300 mapfiles first, trusting that it would all work out. Once
the editing was done, I'd type make and see what happened.
This took about 6 weeks to do, and there were many dark days when I'd
question the entire project, or struggle to understand some of the many
twisted and complex situations I'd uncover in the makefiles. I even found
a couple of new issues that required changes to the new stub object related
code I'd added to ld. With a substantial
amount of encouragement and help from some key people in the Solaris group,
I eventually got the editing done and stub objects for the entire workspace
built. I found that my desktop system could build all the stub objects
in the workspace in roughly a minute. This was great news, as it meant
that use of the feature is effectively free no one was
likely to notice or care about the cost of building them.
After another week of typing
make, fixing whatever failed, and doing it again, I succeeded in
getting a complete build!
The next step was to remove all of the make rules and .WAIT statements
dedicated to controlling the order in which libraries under
usr/src/lib are built. This came together pretty quickly,
and after a few more speed bumps, I had a workspace that built cleanly
and looked like something you might actually be able to integrate someday.
This was a significant milestone, but there
was still much left to do.
I turned to doing full nightly builds. Every
type of build (open, closed, OpenSolaris, export, domestic) had to be tried.
Each type failed in a new and unique way, requiring some thinking and
rework. As things came together, I became aware of things that could have
been done better, simpler, or cleaner, and those things also required
some rethinking, the seeking of wisdom from others, and some rework.
After another couple of weeks, it was in close to final form. My focus
turned towards the end game and integration. This was a huge workspace,
and needed to go back soon, before changes in the gate would made merging
increasingly difficult.
At this point, I knew that the stub objects had greatly simplified
the makefile logic and uncovered a number of race conditions, some of
which had been there for years. I assumed that the builds were faster
too, so I did some builds intended to quantify the speedup in build time that
resulted from this approach. It had never occurred to me that there might
not be one. And so,
I was very surprised to find that the wall clock
build times for a stock ON workspace were essentially identical
to the times for my stub library enabled version! This is why it is
important to always measure, and not just to assume.
One can tell from first principles, based on all those removed dependency
rules in the library makefile, that the stub object version of
ON gives dmake considerably more opportunities to overlap library construction.
Some hypothesis were proposed, and shot down:
Could we have disabled dmakes parallel feature? No,
a quick check showed things being build in parallel.
It was suggested that we might be I/O bound, and so, the threads
would be mostly idle. That's a plausible explanation, but system stats
didn't really support it. Plus, the timing between the stub and non-stub
cases were just too suspiciously identical.
Are our machines already handling as much parallelism
as they are capable of, and unable to exploit these
additional opportunities? Once again, we didn't see the evidence
to back this up.
Eventually, a more plausible and obvious reason
emerged: We build the libraries and commands
(usr/src/lib, usr/src/cmd) in parallel with the kernel (usr/src/uts).
The kernel is the long leg in that race, and so, wall clock measurements
of build time
are essentially showing how long it takes to build uts. Although it would
have been nice to post a huge speedup immediately, we can take solace in
knowing that stub objects simplify the makefiles and reduce the possibility
of race conditions. The next step in reducing build time should be to find
ways to reduce or overlap the uts part of the builds. When that leg of the
build becomes shorter, then the increased parallelism in the libs and
commands will pay additional dividends. Until then, we'll just have to
settle for simpler and more robust.
And so, I integrated the link-editor support for creating stub objects
into snv_153 (November 2010) with
6993877 ld should produce stub objects
PSARC/2010/397 ELF Stub Objects
followed by the work to convert the ON consolidation in snv_161 (February 2011)
with
7009826 OSnet should use stub objects
4631488 lib/Makefile is too patient: .WAITs should be reduced
This was a huge putback, with 2108 modified files, 8 new files, and 2
removed files. Due to the size, I was allowed a window after snv_160 closed
in which to do the putback. It went pretty smoothly for something this
big, a few more preexisting race conditions would be discovered and addressed
over the next few weeks, and things have been quiet since then.
Conclusions and Looking Forward
Solaris has been built with stub objects since
February. The fact that developers no longer specify the order in which
libraries are built has been a big success, and we've eliminated an
entire class of build error. That's not to say that there are no build
races left in the ON makefiles, but we've taken a substantial bite out
of the problem while generally simplifying and improving things.
The introduction of a stub proto area has also opened some interesting
new possibilities for other build improvements. As this article has
become quite long, and as those uses do not involve stub objects,
I will defer that discussion to a future article.