- Position-independent code
In
computing , position-independent code (PIC) or position-independent executable (PIE) is machine instruction code that executes properly regardless of where in memory it resides. PIC is commonly used for shared libraries, so that the same library code can be loaded in a location in each program address space where it won't overlap any other uses of memory (for example, other shared libraries). PIC was also used on older computer systems lacking an MMU,Fact|date=November 2007 so that theoperating system could keep applications away from each other even within the singleaddress space of an MMU-less system.Position-independent code can be copied to any memory location without modification and executed, unlike relocatable code, which requires special processing by a
link editor orprogram loader to make it suitable for execution at a given location.Code must generally be written or compiled in a special fashion in order to be position independent. Instructions that refer to specific memory addresses, such as absolute branches, must be replaced with equivalentprogram counter relative instructions. The extra indirection may cause PIC code to be less efficient, although modern processors are designed to make this more tolerable.Fact|date=November 2007History
In early computers, code was position-dependent: each program was built to be loaded into, and run from, a particular address. In order to run multiple jobs using separate programs at the same time, an operator had to carefully schedule the jobs so that no two simultaneous jobs would run programs that required the same load addresses. For example, if both the payroll program and the accounts receivable program were built to run at address 32K, the operator could not run both at the same time. Sometimes, an operator would keep multiple versions of a program around, each built for a different load address, to expand his options.
To make things more flexible, position-independent code was invented. Position-independent code could run from any address at which the operator chose to load it.
The invention of dynamic address translation (the function provided by an MMU) obsoleted position-independent code, because every job could have its own separate address 32K, so the programmer could build all programs to run at address 32K and they could still run all at the same time (each in its own address space). Because position-independent code is less efficient than position-dependent code, this was a better solution to the problem.
The next problem to be attacked was the memory waste that happens when the same code is loaded multiple times to be used by multiple simultaneous jobs. If two jobs run entirely identical programs, dynamic address translation provides a solution by allowing the system simply to map two different jobs' address 32K to the same bytes of real memory, containing the single copy of the program.
But more often, the programs are different and merely share a lot of common code. For example, the payroll program and the accounts receivable program probably both contain an identical sort subroutine. So designers invented shared modules (a shared library is a form of shared module). While the main payroll and accounts receivable programs get loaded into separate memory, the shared module gets loaded once and simply mapped into the two address spaces.
But this introduces a memory allocation problem similar to the one that position-independent code solved above: If a program can have one shared module, it can have lots of them. What if a single program, in a single address space, wants to use two shared modules, both built to run at the same address? The system cannot load both at the same time, so there is no way to load the program. To work around this, programmers made sure they never built two shared modules to run at the same address if they might both have to be used by the same program. Sometimes, they made multiple versions of a shared module, each built to run at a different address.
This is obviously not a desirable situation. It's a lot of manual work and wastes address space. Position-independent code comes to the rescue again, because if a shared module can run from any address, then the program loader can simply load it into any free address. The sort subroutine might run at address 32K in a payroll job, but 48K in a simultaneous accounts receivable job. Both addresses refer to the same real memory; there is only one copy of the sort subroutine in real memory.
Position-independent code has been used not only to coordinate the work of user-level applications, but within operating systems as well. The earliest paging systems did not use virtual memory address spaces; instead, the operating system would explicitly load individual modules of itself as needed, overwriting less needed ones (the memory available for the operating system was much smaller than the operating system). A module had to be capable of running in whatever memory was free at the time it was needed, so individual operating system modules were made of position-independent code.
The invention of virtual memory obsoleted that method, because the operating system could have a virtual address space so big that every module of the operating system could have its own permanent virtual address.
Technical details
Procedure calls inside a shared library are typically made through small procedure linkage table stubs, which then call the definitive function. This notably allows a shared library to inherit certain function calls from previously loaded libraries rather than using its own versions.
Data references from position-independent code are usually made indirectly, through
global offset table s (GOTs), which store the addresses of all accessedglobal variable s. There is one GOT per compilation unit or object module, and it is located at a fixed offset from the code (although this offset is not known until the library is linked). When alinker links modules to create a shared library, it merges the GOTs and sets the final offsets in code. It is not necessary to adjust the offsets when loading the shared library later.Position independent functions accessing global data start by determining the absolute address of the GOT given their own current program counter value. This often takes the form of a fake function call in order to obtain the return value on stack (
x86 ) or in a special register (PowerPC ,SPARC , probably at least some otherRISC processors,ESA/390 ), which can then be stored in a predefined standard register. Some processor architectures, like theMotorola 68000 ,Motorola 6809 , ARM and the newAMD64 allow referencing data by offset from theprogram counter . This is specifically targeted at making position-independent code smaller, less register demanding and hence more efficient.Windows DLLs
Microsoft Windows DLLs are not shared libraries in theUnix sense and do not use position independent code. This means they cannot have their routines overridden by previously loaded DLLs and require small tricks for sharing selected global data. Code has to be relocated after it has been loaded from disk, making it potentially non-shareable between processes; sharing mostly occurs on disk.To alleviate this limitation, almost all Windows system DLLs are pre-mapped at different fixed addresses in such a way that there is no conflict. It is not necessary to relocate the libraries before using them and memory can be shared. Even pre-mapped DLLs still contain information which allows them to be loaded at arbitrary addresses if necessary.
A sharing technique Windows calls "memory mapping" (not to be confused with
Memory-mapped I/O ) is sometimes able to allow multiple processes to share an instance of a DLL loaded into memory. However, the reality is that Windows is not always able to share one instance of a DLL loaded by multiple processes. [cite web
title = The End of DLL Hell
author = Rick Anderson
month = January
year = 2000
work =Microsoft Developer Network
url = http://msdn2.microsoft.com/en-us/library/ms811694.aspx#dlldanger1_topic2
accessdate = 2007-04-26
quote = Sharing common DLLs does provide a significant savings on memory load. But Windows is not always able to share one instance of a DLL that is loaded by multiple processes.] Windows requires each compiled program to know where in "its" address space each DLL will be accessed — there is no support for position independence.A DLL specifies its "desired" base address when it is created (Visual C++ defaults to an offset of 0x10000000) but if multiple DLLs have the same desired base address, a program cannot relocate them all to that base offset and must specify new offsets when linking. When the Windows loader loads an executable into memory for execution, it checks to see if each DLL has already been loaded with the offset used when the executable was created (not the DLL). If the DLL is not already loaded with that offset, it is relocated to the base requested by the executable. Note that this will provide sharing across multiple processes of the "same" executable (e.g. if started in different accounts via
Fast User Switching ), but not necessarily across different programs that link to the same DLL.Other platforms such as
Mac OS X andLinux now support forms of prebinding as well. ForMac OS X the system is calledprebinding . UnderLinux , the system used is implemented via a program calledprelink . This is vastly different from memory mapping.Position-independent executables
Position-independent executables (PIE) are executable binaries made entirely from position-independent code. While some systems only run PIC executables, there are other reasons they are used. PIE binaries are used in some security-focused
Linux distributions to allowPaX orExec Shield to useaddress space layout randomization to prevent attackers from knowing where existing executable code is during a security attack using exploits that rely on knowing the offset of the executable code in the binary, such asreturn-to-libc attack s.See also
*
Dynamic linker References
Further reading
*cite book
title = Linkers and Loaders
author =John R. Levine
chapter = Chapter 8: Loading and overlays
chapterurl = http://www.iecc.com/linker/linker08.html
month = October
year = 1999
publisher = Morgan-Kauffman
isbn = 1-55860-496-0External links
* [http://www.gentoo.org/proj/en/hardened/pic-guide.xml Introduction to Position Independent Code]
* [http://www.gentoo.org/proj/en/hardened/pic-internals.xml Position Independent Code internals]
* [http://linux4u.jinr.ru/usoft/WWW/www_debian.org/Documentation/elf/node21.html Programming in Assembly Language with PIC]
Wikimedia Foundation. 2010.