Notes to Donalds Talk.



But in display manamger.

Scyld computing corperation.

7years on linux network.
Device drivers.

beawulf project.

reliability. open source important to reliability.

software distribution system for building large systems.

A standard for files. Many bits. A project names.

dedicated clusters. All machines work together. All
are internal networks. off the shelf PCs.

1992 started with linux.

prototype built 100Mhgz dx4?
1996 2.2 GF/sec
1997 10GF/sec
1998 avalon 40+GF/sec

PC are gaining in $/performacne.

96/97 crossed.

real applications. Gordon Bell prize $/performance.

T3D in the basement 10GF/sec.

The hive, image processing. refocusing hubble images.

more science for the money.

application portability question. yes.

channel binding.

global process id. kill a process on remote machine.

automatic parralization.

application specific. Ray tracing.

administartion of many machines.

programing work to parralize. Weather forcasting. (trvially parralized..)

climat modles. ocean surface tempratures. long term trends.

Eric hendrix. Lobos. foundary network switch. Gigabit 100baseT.

flat-cluster design. move more ealy into the next generation.

network booting technique.

Inproved network support.

exreem linux....
beawuelf clusters....

64 mounted files.

structures in the kernel were modified.

200 - 2000 nodes

working outside of the cluster.

kernel scale to extreem applications.

exreem linux RH cd is beawoulf software, not extreem linux software.

no security protocols

1 node to control the jobs.

masquarade on the front end.

Node fails? pinging nodes or multicast packets.

active node list. Jobs submition.

feeback from network design engeeners. load balancing switching...

telecom dont have high performance switches. less concernece about
performacne and latacy ...

ATM networking on cluster. To expensive. Protocols overhead way to

via, virtual interface architecture.

changing device drivesr and kernel mods.

limited to a few 100 to 1000 macines cluster specific
protocol. None standard. gain 10 to 20%

bottleneck. rarely the network.

low latencay. mirror net.

300 per adapter. 500-700 per port for the switch.

channel bonding technology. same latency higher

fine grained access to global memory space.

simulting evolution of galaxies.

particle interacts with every other particles.

tree structure technique. all to all interaction is mapped
well to network clusterd nodes.

use a simple rsh.

vproc more efficient than rsh.

ship libraries as well. use cluster more transparantly.

mosix. progress migration.

implementation high overhead.

data warehousgin.

data mining. searches, web serach engines

read mostly database or warehouse. search engines.


400 gigs not so big.

10 terabytes now. later not so big

10 on tape is nothing.

2.2 upgrade

the power of OS. modify a few lines of code.

ddl upgrades for small bug fixes. huge upgrade cascade.

requalify a system.

scyld computing corporation.

version control. rpm.

"great things for clusters"

rpm -verify -all, very cross the entire system.

help install packages throughout the cluster.

why linux?

bds are great OS, not there when the project started.

device supprot bad.

more stability. hide internal development. Could not work
with the BSD teams. large egos. not please. Dealing with
Linus is good. laied back guy.

49 days. timer bug.

rcs for bw.

bitkeeper, bitmover, will not endorse version control system.

rpm is the important technology.

debian package management?

community developement project.

hardware parameter monitoring.

look for potential failures. Montoring fan speed.

fans fail. big deal.

lm78 chips monitor fans.

case temperature.

/proc/net/dev developed by donalde

MII reporting. pair skew errors etc.

working on disk errors.


compile kernels for 2 days strait to burn in a node.


the book. collection of tutorial notes.

comercial applications? Database is an active topic.

1 year away. (A scientific oddity.)

run the stock market.

mpi/vpm. (mpiII)

bproc sit undr mpi, via too.

An abstraction layer.

run a cluster for 1 or 2 years, then give the PC's
away as workstations.

"8-16 node cluster. Interesting space to be in."

SGI orgin an expensive way to do the same.

origianl author of nfs server. (10 years ago.)
all user level. the kernel server much better.

sun provided a free rpc library.

Unsigned field, put -1 -2 for errors..
no active contribution from sun.

NIS re-written from scratch. unreliable.

alot of NIS trafic when processes start up.

uni-vs-muli processor boxes. A toss up.

STREAMS a performace limitation
BSD sockets better.

rfork and rexec.

checkpoint and restart.

"turn key applications"

"web server back ends"

scale limitation. application limitted.