the openMosix-API
by Matt Rechenburg
Overview
General description
Detailed description
The /proc/hpc-interface:
getting
information
set
values
Local information
from /proc/hpc/admin/
examples
Information
about the (remote) nodes
Additional Information
about local and remote processes
The openMosix Filesystem MFS
The functions from libmos
Examples using /proc/hpc
cpucount
-> functions returning the number of nodes
foreach node -> do "something" for each node
unlock self
-> ensure that the created process is unlocked
get ip-address
-> which ip-address has node N
get node Ids
-> converts ip-addresses into node IDs
migrate all
-> migrate all possible processes
Examples using /mfs
distributing
a file -> copy a file to each node
Examples using libmos
get load of a
node -> function which return the openMosix-load
get speed of
a node -> function which return the openMosix-speed
Using in applications
Summary
Disclaimer
Additional sources
General description
This documentation about the openMosix API
(application programming interface) will explain in detail the functionalities
of the openMosix-structure and how to access it in the kernel during runtime. It
will help (and encourage) you to use this API with your applications and in
programs you develop your own.
The example section of this documentation
provides several functions for some "common-languages" (shell, perl, php, c/c++)
which you can directly use in your code.
Also it will enhance your knowledge
about your cluster and ease up administration.
Please contribute your
own ideas to the openMosix-community either by posting it to the openMosix-mailing list or to the "Wiki-area" on the openMosix-website
http://www.openmosix.org/
Detailed description
openMosix is a Linux-kernel enhancement and
most of the functionality is located in the kernel. To configure, administer and
to access statistical information from the kernel there need to be a well
defined interface.
The most common way to provide this is to create a
proc-interface during kernel-bootup or module-load. The interface has to provide
a possibility to "get" and "set" values in the kernel-space from the user-level.
This is exactly the way openMosix does it. The directory /proc/hpc
provides the access to the openMosix-structure in the kernel. It contains files
which are used either for "local-configuration" or information about all
"remote" nodes ("local" means the nodes you are currently logged on, "remote"
means every other nodes in your cluster).
openMosix holds all information
on all nodes in the /proc/hpc directory and these values will be
updated within the "decay-interval" (which is also configurable during runtime).
So in your cluster each node will know exactly the values from all other
"remote" nodes. This is required for calculating how to balance the load and the
processes across the clusters. This calculation is called "the
openMosix-load-balancing algorithm" and is invented by Moshe Bar. Because this
mechanism is organized in a decentralized way each node decides itself if a process
should be migrated to another node (it minimized the overhead and provides
linear-scalability with up to 1000 or more nodes)
Administrators and
applications can directly interact with an openMosix-cluster through the
/proc/hpc interface and change the whole configuration of the super-computer
easy from any system.
The /proc/hpc-interface
getting information
Most of the files in /proc/hpc are pure text
files. You can use the standard commandline-utilities like "cat" or you can open
them with your favorite editor to "read" the current
values from the
openMosix /proc -interface.
In you application you can use the regular
file-related function to read values from
/proc/hpc (those functions are all
using the "fopen", "read" and "fclose" syscalls).
There are only some
"binary" files (marked in the list below) which cannot be read without parsing
them.
set values
As you can "read" from the files in /proc/hpc there are
also several files you can write to and directly "communicate" with the kernel
and your openMosix-cluster. The values you "write" into those files are
influencing the behavior of your cluster.
e.g.
echo 1 >
/proc/hpc/admin/block
-blocks the arrival of remote processes for this node
echo 1 > /proc/hpc/admin/bring
-bring all migrated processes home
for this node
Local information from /proc/hpc/admin/
(flat files)
The files
in /proc/hpc/admin/ are containing information about the "local"-configuration
of the openMosix node you are logged on.
block allow/forbid
arrival of remote processes
bring bring home all migrated processes
dfsalinks list of current symbolic dfsa-links
expel
sending guest processes home
gateways maximum number of gateways
lstay local processes should stay
mospe contains the
openMosix node id
nomfs disables/enables MFS
overheads for
tuning
quiet stop collecting load-balancing information
decayinterval interval for collecting information about
load-balancing
slowdecay default 975
fastdecay default 926
speed speed relative to PIII/1GHz)
stay enables/disables
automatic process migration
(binary files)
config the main
configuration file (written by the setpe util)
examples
Writing a 1 to the following files in /proc/hpc/decay/
clear clears the decay statistics
echo '1' >
/proc/hpc/decay/clear
cpujob tells openMosix that the process is
cpu-bound
echo '1' > /proc/hpc/decay/cpujob
iojob tells openMosix that the process is io-bound
echo '1' > /proc/hpc/decay/iojob
slow
tells openMosix to decay its statistics slow
echo '1'
> /proc/hpc/decay/slow
fast tells openMosix to decay its
statistics fast
echo '1' >
/proc/hpc/decay/fast
Information about the (remote) nodes
The directories in
/proc/hpc/nodes/[openMosix_ID] are containing information about all other nodes
in your openMosix-cluster (text-files). Each directory belongs to the openMosix
node with the same node-ID than the directory-name e.g. the file
/proc/hpc/nodes/4/load contains the load-value (openMosix-load value) from node
4. These directories are equal (updated in the decay-interval) on all nodes so
you can get information about each node in your cluster from each node.
/proc/hpc/nodes/[openMosix_ID]/cpus how many cpu's the node has
/proc/hpc/nodes/[openMosix_ID]/load the openMosix load of this node
/proc/hpc/nodes/[openMosix_ID]/mem available memory as openMosix
believes
/proc/hpc/nodes/[openMosix_ID]/rmem available memory as
Linux believes
/proc/hpc/nodes/[openMosix_ID]/speed speed of the node
relative to PIII/1GHz
/proc/hpc/nodes/[openMosix_ID]/status status of
the node
/proc/hpc/nodes/[openMosix_ID]/tmem available memory
/proc/hpc/nodes/[openMosix_ID]/util utilization of the node
These values are extremely useful for monitoring applications. They are
very easy to access because they are "clusters-wide" (you can access all information
from each node in your cluster as explained before).
Additional Information about local and remote processes
These
following files in the /proc-directories are additional process-information
provided by openMosix. Applications can read them and make them available e.g.
for process-statistics.
local process-information:
(values
to read/get)
/proc/[PID]/cantmove reason why a process cannot be
migrated
/proc/[PID]/lock if a process is locked to its home node
/proc/[PID]/nmigs how many times the process migrated
/proc/[PID]/where where the process is currently being computed
(values to write/set)
/proc/[PID]/lock if a process is locked
to its home node you can
write a 0 into it. If the process can migrate it
will
be unlocked then.
/proc/[PID]/goto write the node-ID into
the file to tell the process
to migrate to the requested node.
/proc/[PID]/migrate same as goto remote processes
remote
process-information:
(values to read/get)
/proc/hpc/remote/from the home node of the process
/proc/hpc/remote/identity additional information about the process
/proc/hpc/remote/statm memory statistic of the process
/proc/hpc/remote/stats cpu statistics of the process
The openMosix Filesystem MFS
Besides the /proc/hpc-interface every
node, administrator or an application can access every filesystem of each node
in an openMosix-cluster if the MFS-filesystem is mounted.
(read more about
the openMosix-filesystem internals and configuration in the openMosix-HowTo).
This documentation will describe the fundamental function and use of MFS in
your cluster and applications. Below the directory /mfs you will find several
directories which are now discussed in detail.
/mfs/here -> /
filesystem of the current node where your process runs.
/mfs/home
-> / filesystem of the home node.
/mfs/magic -> / filesystem of
the current node when used by the "creat" system
call (or an "open" with the
"O_CREAT" option) - otherwise, the
last node on which an MFS magical file
was successfully created.
/mfs/lastexec -> / filesystem of the
node on which the process last issued a
successful "execve" system-call.
/mfs/selected -> / filesystem of the node you selected by either
your process itself
or one of its ancestors (before forking this process),
writing a
number into "/proc/self/selected".
Additional to these
MFS-directories there is one directory for each node in your cluster. It is
named after the openMosix-node ID it belongs to and contains the complete /
filesystem of the remote node (without /proc to avert endless loops).
These
directories in /mfs are very useful for distributing files to all nodes or
creating a single-system-image filesystem.
In the example section you will
find out how to use and take advantage of MFS.
The functions from libmos
openMosix also provides a
programming-library which is useful to include because it gives access to the
openMosix-information very easy. This library, called 'libmos' is included in
the openMosix user-tools and is normally installed automatically.
The
following list contains these functions from libmosix.h and will explain in
detail:
int msx_readval(char *path, int *val);
-reads 'val' from
'path'
('path' is here always the path to the file to read/write in the
/proc/hpc-interface
int msx_readval2(char *path, int *val1, int *val2);
-reads 'val1' and 'val2' from 'path'
int msx_write(char *path, int
val);
-writes 'val' to 'path'
int msx_write2(char *path, int val1,
int val2);
-writes 'val1' and 'val2' to 'path'
int msx_readnode(int
node, char *item);
-reads 'item' from 'node'
('item' can be e.g. load,
speed, cpus, util, status, mem, rmem, tmem)
int msx_readproc(int pid,
char *item);
-read 'item' from a specific 'pid'
( these function reads information
about a 'pid' from /proc/[pid]/[item],
'item' can be e.g.
block, bring, stay.. )
int msx_read(char *path);
-reads value from
'path'
int msx_writeproc(int pid, char *item, int val);
-writes
'val' to 'item' for process 'pid'
int msx_readdata(char *fn, void *into,
int max, int size);
(no information yet)
int msx_writedata(char
*fn, char *from, int size);
(no information yet)
int
msx_replace(char *fn, int val);
(no information yet)
int
msx_count_ints(char *fn);
(no information yet)
int
msx_fill_ints(char *fn, int *, int);
(no information yet)
The
libmosix-API also contains functions to control or change the openMosix-behavior:
int msxctl(msx_cmd_t cmd, int arg, void *resp, int
len);
#define msxctl1(x) (msxctl((x), 0, NULL, 0))
#define msxctl2(x,y)
(msxctl((x), (y), NULL, 0))
#define msxctl3(x,y,z) (msxctl((x), (y), (z),
sizeof(*(z))))
e.g. 'mosclt' is using this function to let you
administer your cluster.
The possible commands (msx_cmd cmd) are listed and
explained below:
(explanation from libmosix.h)
D_STAY, /* Disable
automatic migration from here */
D_NOSTAY, /* Allow automatic migrations
from here */
D_LSTAY, /* Disable automatic mig. of local processes */
D_NOLSTAY, /* Allow automatic mig. of local processes */
D_BLOCK, /*
Block automatic migration to here */
D_NOBLOCK, /* Enable automatic
migration to here */
D_EXPEL, /* Expel all processes to remote processors */
D_BRING, /* Bring back all processes */
D_GETLOAD, /* Get current load
*/
D_QUIET, /* Stop internal load-balancing activity */
D_NOQUIET, /*
Resume internal load-balancing activity */
D_TUNE, /* Enter tuning mode */
D_NOTUNE, /* Exit tuning mode */
D_NOMFS, /* Disallow MFS access to this
node */
D_MFS, /* Reallow MFS access to this node */
D_SETSSPEED, /* Set
the standard speed, affecting D_GETLOAD */
D_GETSSPEED, /* Get the standard
speed (default=1000) */
D_GETSPEED, /* Get machine's speed */
D_SETSPEED, /* Set machine's speed */
D_MOSIX_TO_IP, /* Convert
openMosix to IP address */
D_IP_TO_MOSIX, /* Convert IP to openMosix address
*/
D_GETNTUNE, /* get number of kernel tuning parameters */
D_GETTUNE,
/* get kernel tuning parameters */
D_GETSTAT, /* get openMosix status */
D_GETMEM, /* get current memory (free and total) */
D_GETDECAY, /* get
decay parameters */
D_SETDECAY, /* set decay parameters */
D_GETRMEM, /*
get OS' idea of memory (free and total) */
D_GETUTIL, /* get CPU
utilizability % */
D_SETWHERETO, /* send a process somewhere */
D_GETPE,
/* get node number */
D_GETCPUS, /* get number of CPUs */
To use the
libmosix-API in your application you need to include 'libmosix.h' in your header
files and compile with the -lmos option.
e.g. add the following line at the
top of your .h file(s):
#include <libmosix.h>
and compile
with the command:
gcc -o your_program -lmos your_program.c
Then you can use the above explained functions in your source-code e.g
like the following small example:
int nodeid=1;
int omstatus=0;
struct mosix_info info;
omstatus=msx_readnode(nodeid, "status");
info.status = omstatus;
printf("status of node %d is %d", nodeid,
omstatus);
You will find more code-snippets how to use the
libmos-functions in the example-section.
Examples using /proc/hpc
1) cpucount
-> functions returning the number of nodes
2) foreach node -> do "something" for each node
3) unlock self
-> ensure that the created process is unlocked
4) get
ip-address -> which ip-address has node N
5) get node Ids
-> converts ip-addresses into node IDs
6) migrate all
-> migrate all possible processes
Examples using /mfs
7) distributing a
file -> copy a file to each node
Examples using libmos
8) get load
of a node -> function which return the openMosix-load
9) get speed
of a node -> function which return the openMosix-speed
1) functions to get the number of nodes in an
openMosix-cluster
shell
######################### cpucount.sh
####################################
#!/bin/bash
function cpucount()
{
HOWMANY=0
for n in `ls /proc/hpc/nodes`
do
let "TMP=`cat
/proc/hpc/nodes/$n/cpus`"
if (( $TMP!="-101" ));
then
let
"HOWMANY=HOWMANY+`cat /proc/hpc/nodes/$n/cpus`"
fi
done;
echo
$HOWMANY
}
cpucount
#######################################################################
perl
##########################
cpucount.pl###################################
#!/usr/bin/perl
sub
cpucount {
$CLUSTERDIR="/proc/hpc/nodes/";
$howmany=0;
opendir($nodes, $CLUSTERDIR);
while(readdir($nodes)) {
$howmany++;
}
$howmany--;
$howmany--;
closedir ($nodes);
print
"$howmany\n";
}
cpucount;
#######################################################################
php
########################## cpucount.php
#################################
<?php
function cpucount() {
$CLUSTERDIR="/proc/hpc/nodes/";
$howmany=0;
exec("ls ".$CLUSTERDIR
,$ls);
for ($p=0;$p $cpus =
file($CLUSTERDIR.$ls[$p]."/cpus");
if ($cpus[0] != -101) {
$howmany++;
}
}
echo $howmany;
}
cpucount();
?>
#######################################################################
c/c++
########################## cpucount.c
####################################
#include <stdio.h>
#include
<stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <dirent.h>
#define clusterdir "/proc/hpc/nodes"
int cpucount() {
DIR *dirhpc;
struct dirent *dir_info;
int
howmany=0;
int cpus=0;
FILE *fp;
char tmpdirname[200];
char
tmpfilename[200];
strcpy(tmpdirname, "");
if
((dirhpc=opendir(clusterdir))!=NULL) {
while ((dir_info =
readdir(dirhpc))!=NULL) {
strcpy(tmpdirname, dir_info->d_name);
strcpy(tmpfilename, "/proc/hpc/nodes/");
strcat(tmpfilename,
tmpdirname);
strcat(tmpfilename, "/cpus");
if (!strchr(tmpdirname, '.'))
{
fp=fopen(tmpfilename, "r");
if (fp) {
fscanf(fp, "%d", &cpus);
if(cpus>0) {
howmany=howmany+cpus;
}
fclose(fp);
}
}
}
closedir(dirhpc);
printf("%d\n", howmany);
return howmany;
}
}
int main() {
int processors=0;
processors=cpucount();
// fork as many computing-processes
// as
"processors"
}
#######################################################################
2) foreach node -> do "something" for each node
shell
######################### foreach.sh ####################################
#!/bin/bash
function foreach() {
for n in `ls /proc/hpc/nodes`
do
let "TMP=`cat /proc/hpc/nodes/$n/cpus`"
if (( $TMP!="-101" ));
then
# execute something for node $n
echo "execute something for
node $n"
fi
done;
}
foreach
#######################################################################
perl
##########################
for_each.pl###################################
#!/usr/bin/perl
sub
for_each {
$CLUSTERDIR="/proc/hpc/nodes/";
opendir($nodes, $CLUSTERDIR);
while($nodeid=readdir($nodes)) {
if (($nodeid!='.') and
($nodeid!='..')){
open(INFILE, "$CLUSTERDIR$nodeid/cpus");
$cpus =
;
if ($cpus!="-101") {
print "do something for node $nodeid\n";
}
close(INFILE);
}
}
closedir ($nodes);
}
for_each;
#######################################################################
php
########################## foreach.php
#################################
<?php
function for_each() {
$CLUSTERDIR="/proc/hpc/nodes/";
exec("ls ".$CLUSTERDIR ,$ls);
for
($p=0;$p $cpus = file($CLUSTERDIR.$ls[$p]."/cpus");
if ($cpus[0] != -101) {
echo "do something with node $ls[$p]
";
}
}
}
for_each();
?>
#######################################################################
3) unlock self -> ensure that the created process is
unlocked
shell
######################### unlock.sh
####################################
#!/bin/bash
function unlock() {
echo "0" > /proc/self/lock
}
unlock
#######################################################################
perl
##########################
unlock.pl###################################
#!/usr/bin/perl
sub
unlock {
open (OUTFILE,">/proc/self/lock") ||
die "Could not unlock
myself!\n";
print OUTFILE "0";
}
unlock;
#######################################################################
php
########################## unlock.php
#################################
<?php
function unlock() {
$fd = fopen("/proc/self/lock", "w");
fputs($fd, "0", 1);
fclose($fd);
}
unlock();
?>
(!! requires that your
webserver is running as user "root" which is a security risk)
#######################################################################
c/c++
########################## unlock.c
####################################
#include <stdio.h>
#include
<stdlib.h>
int unlock() {
FILE *fp;
fp=fopen("/proc/self/lock", "r");
if (fp) {
fputc('0', fp);
fclose(fp);
return 1;
} else {
printf("could not unlock
myself\n");
return 0;
}
}
// usage in application source-code
int main() {
if (unlock()) {
// do "something" here
printf("hallo\n");
}
}
#######################################################################
4) get ip-address -> which ip-address has node N
shell
######################### whichip.sh ####################################
#!/bin/bash
NODE_ID="1"
IPADDRESS=`mosctl whois $NODE_ID`
echo "node $NODE_ID has the ip-address $IPADDRESS"
#######################################################################
perl
##########################
whichip.pl###################################
#!/usr/bin/perl
my
$id=1;
sub whichip {
$ipadress=system("mosctl whois $id");
return
$ipaddress;
}
whichip;
#######################################################################
php
########################## whichip.php
#################################
<?php
function whichip($id) {
$ipaddress=exec("mosctl whois $id");
return $ipaddress;
}
$nodeip=whichip(1);
echo "$nodeip";
?>
#######################################################################
5) get node Ids -> converts ip-addresses into node IDs
c/c++
########################## ip2id.c #######################################
#include <stdlib.h>
#include <stdio.h>
#include
<ctype.h>
#include <time.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int
main(int argc, char *argv[]) {
unsigned long ip;
long nodeid;
if
(argc != 2) {
printf("Usage: ip2id [ip_address]\n");
exit(1);
}
ip = inet_addr(argv[1]);
nodeid = ntohl(ip) & 0xffff;
printf("ip = %d\n", nodeid);
return 0;
}
#######################################################################
6) migrate all -> migrate all possible processes
shell
######################### migrateall.sh ###################################
#!/bin/bash
function migrateall() {
for n in `ps -eo pid | sed
-e "s/PID//"`;
do
migrate $n balance;
done;
}
migrateall
#######################################################################
7) distributing a file -> copy a file to each node
shell
######################### distributefile.sh
#################################
#!/bin/bash
function
distributefile() {
for n in `ls /proc/hpc/nodes`
do
let "TMP=`cat
/proc/hpc/nodes/$n/cpus`"
if (( $TMP!="-101" ));
then
/bin/cp
/tmp/test.dat /mfs/$n/tmp/test.dat
fi
done;
}
distributefile
#######################################################################
8) get load of a node -> function which return the
openMosix-load
c/c++
########################## getload.c
#####################################
#include <stdio.h>
#include
<stdlib.h>
#include <libmosix.h>
int main() {
int
nodeid=1;
int load=0;
struct mosix_info info;
load=msx_readnode(nodeid, "load");
if (load >= 0)
info.load =
load;
printf("load of node %d is %d", nodeid, load);
}
#######################################################################
9) get speed of a node -> function which return the
openMosix-speed
c/c++
##########################getspeed.c
####################################
#include <stdio.h>
#include
<stdlib.h>
#include <libmosix.h>
int main() {
int
nodeid=1;
int speed=0;
struct mosix_info info;
speed =
msxctl(D_GETSPEED, 0, NULL, 0);
if (speed >= 0)
info.speed = speed;
printf("speed of node %d is %d\n", nodeid, speed);
}
#######################################################################
Using in applications
e.g. povray (graphic rendering)
With the
the pvm-patched povray you can set the number of processes at the commandline
with the -NT option. Using the cpucount.sh from the example 1 above you can
calculate how many processes should start automatically (one process for each
processor in your cluster).
Many applications are using
commandline-parameters to give the number of processes to fork. For all those
programs you can easily use `cpucount.sh` instead of a static number (or edit
the program-sources and include the cpucount-function from cpucount.c
(example 1) at the point where the application forks n-number of
computing-processes.
############################## pov.sh
###################################
#!/bin/bash
# start pvm in the
background
pvm &
# unlock myself
echo "0" > /proc/self/lock
# start rendering
/usr/bin/x-pvmpov -NT`cpucount.sh`
+I/root/mypovpic.pov +O/root/mypovpic.tga +L/usr/local/povray31/include +W1024
+H768 antialias=on +P +V +D
#######################################################################
e.g. bladenc (converting audio into mp3)
This short example
converts an audio-CD into mp3-files. All processes will start parallel on the
system with the CD inserted and on which the scripts is executed. openMosix will
care to balance to load across your cluster-nodes automatically.
############################## mp3rip.sh
################################
#!/bin/bash
cdparanoia -B
for n
in `ls *.wav`;
do
bladeenc -quit -quiet $n -256 -copy -crc &
done;
#######################################################################
Other examples of applications which are using the openMosix-API are:
- mon (openMosix-utility) -> uses libmosix
- mosctl
(openMosix-utility) -> uses libmosix
- openMosixview/Mosixview -> uses
the /proc/hpc-interface
Summary
'You do not have to change you application' is one of the
slogans of openMosix. That is and will stay true for sure. This document
describes the well defined openMosix-API which you could use in or with your
applications. The explained functions are useful to automate monitoring and
administration of your cluster.
Of course you can also use this
functions for creating "pure" openMosix-applications (e.g. you can "hardcode" in
your application to run on specific nodes or "do something" if the load
increases/decreases). It is an additional option and not a must because
openMosix provides this functionality transparently in the kernel.
Disclaimer
All code-examples without guarantee.
Use the
information in this document at your own risk and feel free to contribute your
own ideas.
Matt Rechenburg (mosixview@t-online.de)
Additional sources
Many Thanks to Kris Buytaert for :
The
openMosix HOWTO by Kris Buytaert (buytaert@be.stone-it.com)
http://howto.ipng.be/openMosix-HOWTO/
(some of the explanations of the /proc/-interface are from this great
howto)
Also many thanks to Bruce Knox for updating this page to a much more nicer outfit.
This page is: http://www.openmosixview.com/docs/openMosixAPI.html