|
This note presents some issues on the porting of unix-based
applications and libraries to unix-on-windows platforms like
uwin
and
cygwin.
The presentation is based on experience coding and porting the
ast-open
open source packages over the past 20 years.
Our goal in porting to any architecture is to avoid makefile, api,
or application changes and to push workarounds to single points
of change in low level libraries:
It may seem strange to isolate makefiles from change.
After all, most software packages generate makefiles as a side effect of
the configuration process.
The big difference for
ast-open
is that
nmake(1)
(the AT&T
nmake,
not the MicroSoft
NMAKE.EXE)
is used to control all building and packaging.
Think of an
nmake
makefile as a source and binary manifest
(target and prerequisite asserions and variable definitions but rarely
any action or recipe specifics).
Based on this manifest and persistent state maintained by
nmake,
the default rules provide actions that:
-
build binaries from source
-
build locale message catalogs
-
install binaries and generated files
-
construct source and binary tarballs
-
clean,
clobber,
and
test
To be fair,
nmake
also has a configuration step
(generated by the
probe(1)
command),
but it is automatic and shared between all
nmake
users.
probe
results are used to parameterize
the default rules, including
the option flags to build dlls,
the option flags to enable verbose warnings,
the various file prefixes and suffixes
(.o,
.a,
.so,
.dll,
etc.)
- path separator
-
We assume that the path separator is
/
and that all files are accessible through the root directory
/.
FAT
and
NTFS
already support this.
All that is needed is some
mount(1)
magic to unixize the drive letters
A:,
etc.
- case-ignorant filesystem
-
Mixed case files can be created but file lookup is case-insensitive.
The upshot is that any application that reads directory contents
must be aware of the ignorant filesystem below it.
This forced us to rename some files in the packages to avoid case ambiguities.
- invalid file names
-
Files with base names
aux,
nul,
etc. can be captured and massaged by the pathname system call intercepts.
- invalid file name characters
-
FAT
and
NTFS
do not allow these characters in pathnames:
hex
01-1f
and "*:<>?\|.
This is a tough one to handle, so we give up and simply do not generate
paths containing these characters.
- optional .exe
-
Executable
a.out
files must have a
.exe
suffix, but the suffix may be omitted when executing the command.
(The DOS shell requires
.exe
suffix on all windows variants; otherwise its optional on NT and 2K.)
To avoid breaking every makefile and script,
cc -o foo
by default generates
foo.exe.
But to make the illusion complete
every
non-creat pathname system call must check for
foo.exe
when
foo
is not found.
Otherwise almost
every
application will require patches to add/drop
.exe
at the appropriate places; and it gets weird when applications aren't patched.
e.g., run these on cygwin:
(cygwin) /usr/bin/file /usr/bin/file /usr/bin/file.exe
/usr/bin/file: executable,
/usr/bin/file.exe: MS Windows PE 32-bit Intel 80386 console executable not relocatable
(cygwin.i386) /usr/bin/ls -li /usr/bin/file /usr/bin/file.exe | cut -f 1 -d ' '
562949953442507
562949953442507
This shows that
stat(
works because it tacks on a
.exe
suffix,
but
open(
fails because it doesn't.
Try to rationalize
that.
- environment variables with path values
-
Some environment variables contain path names or lists of path names.
These may be used by unix, win32, or both.
PATH,
used by by unix and win32, is automatically handled by the unix-on-windows
platforms.
In
UWIN
the environment variable
DOSPATHVARS
can be set to the names of additional variables that get converted to and
from native path formats.
DOSPATHVARS
is also supported by
the
ast
library on
CYGWIN.
- #if _WINIX
-
For systems built on the
win32
or
dos
apis.
Mainly for dealing with filesystem case-ignorance and optional
.exe
suffix hacks.
- #if _UWIN
-
For uwin specific workarounds.
- #if __CYGWIN__
-
For cygwin specific workarounds.
Most of our source is built on top of
libast
and includes
<ast.h>.
This header contains:
#if !defined(_WINIX) && (_UWIN || __CYGWIN__ || __EMX__)
#define _WINIX 1
#endif
All of our libraries can be compiled as static archives or dlls.
Each library
foo
has an interface header
<foo.h>.
The macro
_BLD_foo
is defined when library
foo
is being compiled.
In addition, the macros
__EXPORT__
and
__IMPORT__
are defined as
extern
keyword replacements for all non-static compiles.
__EXPORT__
is required for all public functions and
__IMPORT__
is required for public shared data.
Public shared data is problematic for many shared library and dll
implementations; most of our public shared data is from old code before we
knew better.
Interface headers follow the following template:
#if _BLD_foo
# if defined(__EXPORT__)
# define extern extern __EXPORT__
# endif
#else
# if defined(__IMPORT__)
# define extern extern __IMPORT__
# endif
#endif
extern Foo_data_t foo_info;
#undef extern
#if _BLD_foo && defined(__EXPORT__)
# define extern extern __EXPORT__
#endif
extern Foo_t* fooopen(const char*);
extern int fooclose(Foo_t*);
#undef extern
nmake(1)
now suports the automatic generation of this goop when the
headers are installed.
An alternative and less intrusive method is to specify a
.def
(list of functions to export)
or
.ign
(list of global functions to
not
export)
prerequisite file on the
:LIBRARY:
makefile assertion.
We use this technique when converting 3rd party makefiles to
nmake,
and will most likely use it for new libraries.
Here are the specific
ast
utilities and library routines that
required win32 workarounds:
- ksh(1)
-
Most changes were for filesystem case-ignorance and optional
.exe
workarounds.
ksh
can also be built to use
spawnveg(2)
(no
fork(2)
or
exec(2)).
spawnveg
is a combined
fork
and
exec
that provides process group and session control.
- nmake(1)
-
Filesystem case-ignorance and optional
.exe
workarounds.
i.e., tell
nmake
to make
foo
and any of
{ foo FOO foo.exe FOO.EXE }
will be accepted.
The directory cache and pattern metarules were affected.
The
nmake
Makerules.mk
and
probe(1)
script were also changed to handle library and dll prefixes and suffixes.
The
probe
script is run once per compiler instantiation and the results are
shared among all users.
The prefix/suffix probe variables are:
- CC.PREFIX.ARCHIVE
-
The prefix for static archive libraries.
lib
for all systems.
- CC.PREFIX.DYNAMIC
-
The prefix for dynamic link time archive libraries.
These libraries contain thunks and a file name reference to a runtime dll.
lib
for
cygwin
and null otherwise.
- CC.PREFIX.SHARED
-
The prefix for runtime shared libraries or dlls.
cyg
for
cygwin,
null for
uwin,
and
lib
otherwise.
- CC.SUFFIX.ARCHIVE
-
The suffix for static archive libraries.
.a
for all systems.
- CC.SUFFIX.DYNAMIC
-
The suffix for dynamic link time archive libraries.
These libraries contain thunks and a file name reference to a runtime dll.
.dll.a
for
cygwin
(note that the double suffix can break some code that attempts to extract
file base names),
.lib
for dll systems, and otherwise not used.
- CC.SUFFIX.SHARED
-
The suffix for runtime shared libraries or dlls.
.dll
for
win32,
otherwise
.so
or
.sl
or other inventive permutations, catenations, truncations and abbreviations of
{ dynamic library link load object shared }.
With these
probe
additions no
nmake
makefile changes are required for
win32.
- ps(1)
-
The
npid
(native process pid)
and
refcount
fields were added for win32 systems.
- libast/astconf(3)
-
astconf(3)
is a string-oriented interface for the POSIX
sysconf(2),
pathconf(2)
and
confstr(2)
calls.
PATH_ATTRIBUTES
was added to handle global filesystem attributes.
Each attribute is represented by a single character.
Specifically for win32,
c
denotes a case-ignorant filesystem.
- libast/lc*()
-
Calls the win32 GetLocaleInfo().
- libast/mnt(3)
-
Filesystem mount interface that handles unix and ms variants.
- libast/pathnative(3)
-
Calls uwin_path() for _UWIN,
cygwin_conv_to_win32_path() for __CYGWIN__,
otherwise local and native paths are the same
(used by
ksh
typeset -H
and
nmake
:P=N:)
- libast/tmlocale
-
Calls the win32 GetLocaleInfo().
These are changes made for __CYGWIN__ only.
The changes were limited to our library implementation --
no application #ifdefs were required.
The system call intercepts are implemented in the file
src/lib/libast/comp/omitted.c
distributed with any of the
ast-*
packages at
http://www.research.att.com/sw/download/
Note that cygwin system call intercepts are easy because a
_foo
version is provided for each
foo
system call.
- .exe check
-
If a readonly operation on
path
fails with ENOENT and
path
has no suffix then the operation is attempted again on path.exe.
The affected calls were:
chmod(2),
link(2),
open(2),
pathconf(2),
rename(2),
stat(2),
truncate(2),
unlink(2),
utime(2).
- chmod(2) .exe rename
-
If
chmod(2)
succeeds on
path
and it has execute permission and no suffix and the first 2 bytes are
a.out
magic and path.exe does not exist then the path is renamed to
path.exe.
- creat(2)/close(2) .exe rename
-
If
creat(2)
succeeds on
path
and the mode enables execute permissions and
path
has no suffix and file descriptor is small (<= 16) then
path
is placed in a small table indexed by the file descriptor.
The first
write(2)
on the file descriptor is intercepted and checked for the
.exe
magic header.
close(2)
is then intercepted and if the table entry intercept path corresponds
to the file descriptor and the intercepted
write(2)
call detected a magic
.exe
header then
path
is renamed to path.exe.
Admittedly this is prone to all sorts of subversion (duped file descriptors,
file descriptors passed to other processes),
but
it catches just enough to keep cygwin changes from creeping into other
applications like
cp(1),
pax(1),
shar(1),
and
tar(1).
- utime(2) and st_ctime
-
A successful
utime(2)
does not update the
st_ctime
file time.
Its not clear if
any
cygwin calls properly handle
st_ctime.
UNIX semantics for
utime(2)
are required for proper synchronization of
ast
probe(1)
files; without this workaround
probe
information is never deemed up-to-date.
- CC.SHELLMAGIC
-
This
probe(1)
variable contains a magic string that must appear at the top of every
shell script.
So far
cygwin
and
os/2 (emx)
are the only systems that require this.
The
cygwin
workaround is to add
ntsec
to the
CYGWIN
environment variable, and
ast
enforces this.
- pathconf(2)
-
Without looking at the cygwin
pathconf(2)
implementation source, it behaves as if the path argument is ignored.
Existence and access checks must still be done even if the path is
otherwise ignored.
- ENOEXEC
-
Instead of failing with errno
ENOEXEC,
execve(2)
summarily passes non-magic executables to
/bin/sh.exe.
This interferes with typical shell implementations that use
exec
error returns to separate native system executables (and
#!
magic executables) from shell scripts:
when
exec
fails with
ENOEXEC
the file is assumed to be a script and shell simply executes it in its
own process address space.
To circumvent
/bin/sh
comandeering, the user is forced to add
#!
magic to
all
scripts; and this is a maintenance nightmare, especially for users that
may not have the privileges to add/change installed executables.
Adding an
ENOEXEC
intercept crashed and burned -- the guess being that
fork(2)
missed a resource or two, so the intercept compromises and passes script
candidates to the shell in the
SHELL
environment variable,
/bin/sh
by default.
These examples expose the twisted
cygwin
script logic:
# lines preceded by $input are command input
input=
# make sure . is on PATH
$input PATH=$PATH:
$input export PATH
# make sure ntsec is in CYGWIN
$input CYGWIN="$CYGWIN ntsec"
$input export CYGWIN
# set up a test script that generates a shell specific error message
$input echo 'getopts' > shtest
$input chmod -x shtest
# first see which shells support getopts
$input sh -c getopts
getopts: Usage: getopts optstring var [arg]
$input bash -c getopts
getopts: usage: getopts optstring name [arg]
$input ksh -c getopts
Usage: getopts [-a name] opstring name [args...]
$input zsh -c getopts
getopts: not enough arguments
# { sh bash ksh zsh } support getopts
# bonus: the getopts usage message identifies the shell
# why do { bash sh } think a non-x script is executable?
$input sh -c 'type shtest'
shtest is shtest
$input bash -c 'type shtest'
shtest is ./shtest
$input ksh -c 'type shtest'
shtest: not found
$input zsh -c 'type shtest'
shtest not found
# now all shells should find the script
$input chmod +x shtest
$input sh -c 'type shtest'
shtest is shtest
$input bash -c 'type shtest'
shtest is ./shtest
$input ksh -c 'type shtest'
shtest is a tracked alias for /home/gsf/tst/cygwin/shtest
$input zsh -c 'type shtest'
shtest is shtest
# now run the script in the different shells
$input SH=sh; SHELL=/bin/$SH $SH -c shtest
getopts: Usage: getopts optstring var [arg]
$input SH=bash; SHELL=/bin/$SH $SH -c shtest
getopts: Usage: getopts optstring var [arg]
$input SH=zsh; SHELL=/bin/$SH $SH -c shtest
getopts: Usage: getopts optstring var [arg]
$input SH=ksh; SHELL=/bin/$SH $SH -c shtest
shtest[1]: getopts: missing options argument
# why do { bash zsh } execute scripts via sh?
# just for fun remove execute permission
$input chmod -x shtest
$input SH=sh; SHELL=/bin/$SH $SH -c shtest
getopts: Usage: getopts optstring var [arg]
$input SH=bash; SHELL=/bin/$SH $SH -c shtest
getopts: Usage: getopts optstring var [arg]
$input SH=zsh; SHELL=/bin/$SH $SH -c shtest
getopts: Usage: getopts optstring var [arg]
$input SH=ksh; SHELL=/bin/$SH $SH -c shtest
ksh: line 1: shtest: cannot execute [Permission denied]
# why do { bash sh zsh } execute non-executable scripts?
- /bin not in $PATH
-
The
win32
dll
search order is:
-
the directory of the executable
-
the current directory
-
/c/WINNT/system32,
/c/WINDOWS/system32,
/c/WINNT,
/c/WINDOWS
-
the directories on
$PATH
There are no
cygwin
dlls
in (3), so if (1) and (2) fail to produce the required
dlls
then its up to (4).
The standard allows
PATH
to be anything once the path to an executable is determined.
The
ast
execve
intercept ensures that
PATH
contains
/bin
so that at least the
cygwin
dll,
required by
all
cygwin
executables, will be found.
- getpagesize()
-
In x/open
getpagesize(2)
is intimately tied to
mmap(2).
In this context it probably should have been called
getmmapalignmentsize(),
because that is its sole purpose.
It is not the filesystem block size or the memory system page size or
the malloc chunk size -- it is the aligment size for fixed addresses
passed to
mmap(2),
and for win32 implementations the alignment is
64*1024.
- unix spawn with microsoft mode arg
-
From a unix perspective the
spawn
family of calls should have the same argument prototypes as the
exec
family.
cygwin, however, chose to model the windows interface by inserting
int mode
as the first argument.
The
ast
library was modified to intercept all
spawn
functions and pass them to
spawnveg(2),
which is roughly equivalent to the windows
spawnve(P_DETACHED,...)
(at least for the job control requirements of
ast
applications).
- mnt(3)
-
mnt(3)
is the
ast
portable mount table interface.
For cygwin the
system
and
local
getmntent(3)
types were demoted to the
mnt.options
list; now the
real
filesystem types (NTFS, FAT, FAT32) show up in
df(1)
and
mount(1).
- pss(3)
-
pss(3)
is the
process status stream
library used by
ps(1)
to isolate implementation differences from the generic
ps
code; and differences abound.
cygwin_internal(CW_GETPINFO,pid)
logic was added,
but it would be nice if the interface also provided complete information
for native processes
(beyond
just the pid.)
- execrate(1)
-
execrate(1)
is a command prefix that works around
.exe
inconsistencies in
/bin/chmod,
/bin/cmp,
/bin/cp,
/bin/ln,
/bin/mv
and
/bin/rm.
It is used during the bootstrap build when only native system commands
are available.
execrate
allows us to workaround
.exe
challenged systems by simply placing execrated command wrappers of the form
execrate /bin/foo "$@"
in a separate bin directory that appears early in
PATH.
The alternative of examining every blasted line of the bootstrap scripts for
.exe
interactions was never seriously considered.
|