coshell - network shell coprocess server
coshell +
[
info ...
]
coshell -
coshell
-op
[
arg ...
]
coshell
is a local network shell coprocess server for programs using
coshell(3).
There is one
coshell
server per user.
This server runs as a daemon on the user's home host,
and only processes running on the home host have access to the server.
The server controls a background
ksh(1)
shell process, initiated by
rsh(1),
on each of the connected hosts.
The environment of the local host shell is inherited from the server
whereas the environment of remote shells is initialized by
.profile
and
$ENV.
The shells run with the
ksh
bgnice
and
monitor
options on.
Job requests are accepted from user processes on the local host
and are executed on the connected hosts.
stdout,
stderr,
FPATH,
NPROC
(see ENVIRONMENT),
PWD,
PATH,
VPATH,
vpath,
umask
and the environment variables listed in
COEXPORT
(see ENVIRONMENT)
are set to match requesting user values.
stdin
is set to
/dev/null;
coshell
does not directly support interactive jobs.
Job scheduling is based on load and idle time information generated by the
ss(1)
system status daemon.
This information is updated every 60 seconds on average.
The server is started by running
coshell +.
The command exits after a child server process is forked in the background.
The optional
info
arguments name files containing local network host information
which may be generated from two shell scripts
genlocal
and
genshare
under the subdirectory bin of the installation root directory.
If no files are specified then the default
local
is used.
The local network is comprised of hosts sharing the same file name space.
Attributes used by
coshell
can be categorized
into two types, global and host-specific.
The
global
attributes control
coshell
and are not associated with any particular host.
Attribute value pairs, not including readonly ones, may be specified in the
local network host information files,
in
COATTRIBUTES
(see ENVIRONMENT)
or may be set/added using
coshell -a,
and may be referenced in an expression in
COATTRIBUTES.
Attribute names must match [a-zA-Z_][a-zA-Z_0-9]*.
In the following description on these attributes,
host
may be an actual host name or a comma separated list of attribute value pairs
specified in
COATTRIBUTES.
The attributes used by
coshell
are:
- auto=n
auto=1
adds the host to the automatic selection pool.
Hosts named in the
file
global-attribute have
auto=1
by default.
- bias=n.m
The host scheduling bias.
Hosts with a high bias are (linearly) least likely to be scheduled for job
execution.
The default bias is
1.00.
- busy=time
(global attribute)
The grace period for jobs running on busy hosts.
A host is busy when its
idle
attribute is non-zero and its minimum logged in
user idle time is less than the value of
busy.
For a job running on an idle host,
busy
is the maximum amount of time the job may run after the host becomes
busy. If the job does not finish within
busy,
the SIGSTOP signal is sent to the job and the job stops. When
the host idle time exceeds
busy,
the SIGCONT signal is sent to the job and the job resumes.
The meaningful unit of
time
may be m(inute) and h(our).
The default is
busy=2m.
- cpu=n
The number of cpus on the host.
The default is
1.
- file=path
(global attribute)
Names a file containing default attributes for machines on the local network.
If no directory components are specified then the subdirectory
share/lib/cs
of the installation root directory is searched.
The default attribute file for the local network is
share/lib/cs/local.
- idle=time
The minimum logged in user idle time before jobs will be scheduled on the host.
The meaningful unit of
time
may be m(inute) and h(our).
The default is
0,
meaning no idle time restrictions.
idle
is usually
15m
for workstations and is not specified (i.e., always
available) for compute servers.
- label=string
string
labels either the current
coshell
connection (via
coopen
in
coshell(3))
or the current job (via
coexec
in
coshell(3)).
Labels are displayed but are otherwise ignored.
- load=n.m
A readonly attribute that evaluates to the host load average.
- maxload=n.m
(global attribute)
The maximum host load average. No job will be
scheduled on a host with load average >=
maxload.
The default
maxload=0
means no load average limit.
- name=host
The host name in the local domain (i.e., no .'s in the name).
The
name=
may be omitted.
In a host selection context:
host
may be a
sh(1)
file match pattern;
-
matches any host;
local
matches the local host.
- open=fd
A readonly attribute that evaluates to
0
if the host shell is closed,
<0
if the host shell is being opened, and
>0
if the host shell is open.
- percpu=n
(global attribute)
The maximum number of concurrent jobs on each cpu. The default is
3.
- perserver=n
(global attribute)
The maximum number of concurrent jobs run by
coshell.
perserver
has an upper limit that is silently enforced.
The limit is the half of the number of file descriptors allowed.
perserver=0
queues jobs until
perserver>0.
- peruser=n
(global attribute)
The maximum number of concurrent jobs per user connection to
coshell. The default is
12.
- pool=n
(global attribute)
The number of cpus in the processor pool.
- rating=n.m
The host cpu rating, usually in mips relative to the other hosts
on the local network. This is usually the observed rating rather than
the one in the vendor's advertisements.
- type=string
The host type that differentiates different processor types, usually
related to the object and executable attributes.
The default type is
*.
- up=n
-
A readonly attribute that evaluates to the number of seconds the host has
been up.
If
n
is less than 0 then it is the number of seconds the host has been down.
Other user-defined attributes may be specified.
They may be referenced in
COATTRIBUTES
expressions, but are otherwise ignored by
coshell.
coshell -
opens an interactive connection to the running server.
The commands are:
- a host[,attributes ...]
Set or add attributes for the named hosts.
- c host ...
Close the shell connections to the named hosts.
The hosts are also removed from the automatic selection pool.
- d [level]
Set the server
stderr
to the
stderr
of the calling process.
If
level
is specified then the server debug trace level is set to
-level.
Higher debug levels produce more output on
stderr.
- f [fd]
This is a debugging option and may not be present in all
implementations.
If
fd
is specified then close the internal server file descriptor
fd,
otherwise list the status of all open file descriptors in the server.
- g
-
List global state.
- h
-
List command help.
- j
-
List the status of all jobs.
The status fields are:
- JOB
-
The id assigned to the job by the server.
This number may be used as an argument to the
k
command.
- USR
-
The id assigned to the requesting user by the server.
- RID
-
The id assigned to the job by the requesting user.
- PID
-
The job process id,
QUEUE
if the shell is in the process of opening,
START
if the PID has not been determined yet, and
WARPED
if the job completed before its PID was determined.
- TIME
-
The elapsed time since the job started.
*
follows the time if the job is about to terminate.
- HOST
-
The host where the job is running.
The most recent signal sent to the job follows the host name.
- LABEL
The label assigned to the job by the requesting user.
- k [ c | k | s | t ] job
Kill the job with the server JOB id
job.
If no argument is specified then the
SIGTERM
signal is sent to the job.
c
sends
SIGCONT,
k
sends
SIGKILL,
s
sends
SIGSTOP,
and
t
sends
SIGSTERM.
- l expr
List all host names matching the attribute expression
expr.
The names are sorted in scheduling rank order from best to worst.
If
pool=n
is specified in
expr
then only the first
n
names (after sorting) are listed.
- o host ...
Open a shell connection to the named hosts.
- q
-
Quit the interactive connection.
- Q
-
Kill the server and quit the interactive connection.
- r host [ command ]
Run
command
on
host.
host
may be an attribute expression.
If
command
is omitted then
hostname(1)
is used.
- s [ a | e | l | o | p | s | t ]
List the shell connection status.
There is at most one shell connection per host.
If no argument is specified then only open connections are listed.
a
lists the attributes for all shells,
e
lists all shells,
l
lists all shells in the processor pool,
o
lists all open shells,
p
lists the process id of all open shells,
s
lists the shell scheduling status (primarily for debugging),
and
t
lists all open shells sorted by the recent job activities running on each
host.
The status fields for se and sl are:
- CON
-
The id assigned to the open shell by the server,
\@
if the shell is not open and is not in the processor pool,
-
if the shell is not open, and
+
if an open is in progress.
- JOBS
-
The number of jobs currently running on the host.
*
follows the number if any of the jobs are queued pending the completion of an
open in progress.
- TOTAL
The total number of jobs run on the host.
- USER
-
The accumulated user time
(times(2))
of all jobs on the host.
- SYS
-
The accumulated sys time
(times(2))
of all jobs on the host.
- IDLE
-
The elapsed time since the most recent logged in user activity.
*
follows the time if the host does not meet the processor pool
idle time requirements.
- CPU
-
The number of cpus on the host.
- LOAD
-
The host load average.
- RATING
The host rating, usually in network relative mips.
- BIAS
-
The scheduling bias.
Hosts with lower bias are more likely to be scheduled.
- TYPE
-
The host type, usually related to object and executable attributes.
- HOST
-
The host name.
The status fields for so, ss, and st are:
- CON
-
The id assigned to the open shell by the server,
\@
if the shell is not open and is not in the processor pool,
-
if the shell is not open, and
+
if an open is in progress.
- OPEN
-
The accumulated number of times the server has connected to the host.
- USERS
The current number of active users.
- UP
-
The amount of time the host has been up.
- CONNECT
The amount of time the server has connected to the host.
- UPDATE
The amount of time before the host status information is out-of-date.
- OVERRIDE
The amount of time of keeping the host connection followed
by the host identification code, 1 for the local host, 0 for other
hosts in the network.
- IDLE
-
The specified idle time.
- TEMP
-
A measure of the recent job activities running on the host.
- RANK
-
A measure of the desirability of the host. This takes idle time
restriction, load average, and the number of CPU into account.
Two digits after the decimal point are random numbers which are
used to break ties between different
coshell
servers. Hosts with lower
RANK
are more likely to be scheduled.
- HOST
-
The host name.
- t
-
List the accumulated totals.
The fields are:
- SHELLS
The number of active shell connections followed by the total number
of successful shell connections.
- USERS
The number of active user connections followed by the total number
of successful user connections.
- JOBS
-
The number of active jobs followed by the total number
of jobs run.
- CMDS
-
The number of server-user transactions.
- UP
-
The elapsed time since the server started.
- REAL
-
The elapsed time during which the USER and SYS times were accumulated.
- USER
-
The accumulated user time for all jobs on all hosts.
- SYS
-
The accumulated sys time for all jobs on all hosts.
- CPU
-
The number of cpus available on all connected hosts followed by the
processor pool cpu limit plus the explicit host override count.
An
override
host is a connected host that does not meet the processor pool
idle time requirements.
- LOAD
-
The load average, averaged over all connected hosts.
- RATING
The host rating, averaged over all connected hosts.
- u
-
List connected user status.
The status fields are:
- CON
-
The id assigned to the user connection by the server.
- PID
-
The user process id.
- JOBS
-
The number of jobs currently running on behalf of the user.
- TOTAL
The total number of jobs requested by the user.
- TTY
-
The user process
stderr
file name.
- label
The label assigned to the connection by the requesting user.
- v
-
List the server version stamp.
The interactive commands are useful in terms of tuning some global
variable values. For example, one could set
NPROC
to 100, export it,
and control the number of jobs executed using the
coshell
interactive command:
coshell> a local,peruser=10,perserver=40
The interactive commands may be used as options for non-interactive
coshell
queries.
For example,
coshell -sl
produces a long shell status listing and
coshell -c dodo
closes the shell connection to the host
dodo.
The following environment variables must be set if
coshell
is installed in a non-standard directory (not
/bin,
/usr/bin,
or
/usr/local/bin):
root=<coshell-installation-root-directory>
export PATH=$root/bin:$PATH
If
coshell
is dynamically linked, the
LD_LIBRARY_PATH
environment variable needs to
be set.
export LD_LIBRARY_PATH=$root/lib:$LD_LIBRARY_PATH
The following two commands are used to generate the local
network host information which is shared among all the
coshell
users and only needs to be generated once unless this information needs
to be updated.
If you run into permission problems, contact your system administrator.
genshare > $root/lib/cs/share
genlocal > $root/share/lib/cs/local
The
genshare
command is run first to generate information on servers for the network.
By default, this information is stored in $root/lib/cs/share.
Based on this information,
genlocal
is run to generate the local host attribute file. By default,
this information is stored in $root/share/lib/cs/local. If the
share
file generated by the
genshare
command is not stored in the default path, you need to pass its path
to
the
genlocal
command using the -f option.
You may also modify the generated files to meet your needs.
A sample local host attribute file follows:
#
# local host attributes
#
local pool=8 bias=4 busy=1m
server type=sun4 rating=20
cruncher type=mips rating=30 cpu=20
station type=sun3 rating=6 idle=15m
token type=3b rating=0.1 idle=15m
The
local
entry sets the processor pool size to 8, the local host bias
to 4, and the busy host grace period to 1 minute.
Compute servers that are available to all users usually have no
idle
attribute whereas personal workstations are given at least
idle=15m
out of courtesy to the workstation owner.
The following starts the
coshell
server.
The processor pool size is taken from the local host attribute file.
The following instruct programs using
coshell(3)
to use
coshell
rather than
ksh
or
sh
for command execution and sets the command execution concurrency level to 8.
export COSHELL=coshell
export NPROC=8
The shell function
cosh
provides a convenient interface for common coshell actions:
export FPATH=$root/fun:$FPATH
# start coshell, export COSHELL,NPROC, and set window title
cosh
coshell (AT&T Bell Laboratories) 10/11/93 [first time only]
# run hostname on best host
on - hostname
dodo
# interact with server ...
cosh -
coshell>
A
coshell
connect stream file is created in the
/tmp/cs
directory.
Some systems:
- (1)
-
do not update the times on the connect stream file when it is accessed
- (2)
-
automatically remove stale files from
/tmp
- (3)
-
fail to generate a
poll(2)
or
select(2)
event when the connect stream file is removed
- (4)
-
do not handle mounted streams or sockets.
In any of these cases, the environment variable
CS_MOUNT_LOCAL
needs to be set to another file system where
all the users have read and write permissions. For example:
export CS_MOUNT_LOCAL=<coshell-installation-root-directory>/tmp
On some systems the server may not detect that its connect stream
file has been unlinked,
resulting in erroneous `server not running' errors.
To handle this situation the server checks and recreates the connect
stream file on receipt of a
SIGINT
signal.
NFS cache inconsistencies may arise for files generated via NFS on remote hosts
but serviced via the native file system on the local host.
Running
coshell
from a diskless host avoids the problem.
Host load average and logged in user idle times are used
to schedule hosts and jobs.
Some terminal lock programs, e.g.,
xlock(1),
inflate the load average, usually doing complex graphics operations
on displays that have long since been blanked out by an independent
screen saver.
A simple lock program that blocks on a read request may open up idle cycles
for better use.
- COATTRIBUTES
Host attribute expression,
(type@local)
by default.
Non-numeric valued attributes may appear as the first operand of
the comparison operators
<,
<=,
==,
!=,
>=
and
>,
where the second operand must be a "..." or '...' string
that is compared with the attribute value.
For the
==
and
!=
operators the second operand is taken to be a
ksh(1)
file match pattern.
For example, given the host definitions:
coot type=sun4 mem=8m rating=11.0 cad
dodo type=sun3 mem=4m rating=2.0
loon type=mips mem=16m rating=20.0
(type=='sun*'&&mem>6m)
selects
coot,
(rating>=11.0)
selects
coot
and
loon,
and
(cad)
selects
coot.
attribute@host
represents the
attribute
value for
host.
For example,
type@local
matches the type of the host running the
coshell
server.
- COEXPORT
A colon separated list of environment variables to export to each job.
This is to support the rare cases where some environment variables
change after the
coshell
server has been started.
For example, some commands use environment variables rather
than arguments or options to pass input data.
- COSHELL
Set to
coshell
for the network shell service
- COTEMP
Set to a different value for each shell command. It is used for
temporary file names. (see Engine Variables in
nmake(1))
This variable may be referenced in
.profile.
- HOMEHOST
Set within each action to the name of the host executing
coshell.
- HOSTNAME
Set within each action to the name of the host executing the action.
This variable may be referenced in
.profile.
- HOSTTYPE
Set within each action to the type
(from the local coshell host attribute file)
of the host executing the action.
This variable may be referenced in
.profile.
- NPROC
Default command concurrency level
- share/lib/cs/local
local network host attributes
Glenn Fowler
gsf@research.att.com
AT&T Bell Laboratories
3d(1), ksh(1), nmake(1), rsh(1), ss(1), coshell(3), cs(3)