tue: traffic management timescales, users, large and small, demands, short and long...
signaling and soft state
thu: systems design patterns
tue: traffic management timescales, users, large and small, demands, short and long...
signaling and soft state
thu: systems design patterns
This week we'll cover
Optimisation is a very large topic in itself, and underpins many of the ideas in machine learning when it comes to training - ideas like stochastic gradient descent (SGD) are seen in how assign traffic flows to routes, here. In contrast, the decentralised, implicit optimisation that a collection of TCP or "TCP friendly" flows use is more akin to federated learning, which is another whole topic in itself.
Why a log function? maybe see bernouilli on risk
Why proportional fairness? from social choice theory!
Are people prepared to pay more for more bandwidth? One famous Index Experiment says yes.
0. See Computer Systems Modeling for why the delay grows quickly as load approaches capacity.
1. see IB Distributed Systems for clock synch
2. see prev year's Cloud Computing (II) module for a bit more about data centers&platforms.
We've revisited flow and congestion control - one way of visualising the progress of an adaptive flow&congestion control protocol is the time sequence diagram:-
but note this is a massive over-simplification as really what you see here is an ideal with only one source and (apparently) only one bottleneck queue. In reality, in FIFO queues, traffic from multiple sources mixes and interferences (causing high variance in delay, hence round trip time, and very unpredictable loss. If we had Round Robin (by flow) queues, things might be a bit better, but how much? That is what we look at under Scheduling, and with the Generalised Processor Sharing model, can see how close to some ideal of "isolation" between flows, we can get.
With FIFO queues, RTTs and rates (as estiamted from data or ack packet inter-arrival times are going to be varying fairly chaotically, as the ensemble of flows at any bottleneck will not be coordinated in any special way as they all have different RTTs and perhaps just different performance senders, maybe different packet sizes, possibly different inter-packet timing at transmit time, etc etc
This week, we've got two random[ref] examples
1. random telephone routing
c.f. triangles
can we greedily find the tandem? (only) if you want to check the reasoning behind that part of DAR!
2. random drop congestion - serioiusly, today's lecture is mainly revision of IB congestion/flow control...
c.f.tcp arena
how many TCPs are there, really? again, only here for the keen background reader!
ref: there's a very nice study of general applicability of random choices in this harvard paper if you want some more background reading...
separately, in a discussion with a supervisee/supervisor, i realise some people may be interested in looking at real code to reinforce their understanding - for IB material, I strongly recommend Rich Stevens' TCP/IP Illustrated Volumes 1 and 2, but for this course, there's not really any such a nice single text - an alternative might be this:-
This week we wrap up BGP - looking at abstractions of the algorithm (The Stable Paths Problem in Interdomain Routing) and concrete realisations of problems in implementations.
While you may find the stable paths model helpful in removing noise from BGP complexity, I am not so sure its a great abstraction for thinking about how actually to resolve the problem(s) (non convergence etc). A nicer approach (by same lead person, Tim Griffin) is meta routing, which is very powerful and general, but would need an entire other course to discuss and I just put here for background in case anyone is interested !
Then we'll next make a start on Multicast Routing. Two compelling applications were tv/radio broadcast and software distribution. However, application layer overlays (Content Distribution Networks) have subsumed those needs - a great example is how Zoom coordinates multiparty sessions - this talk by their CEO is instructive. Also interesting is this paper on netflix content distribution approach.
Interdomin routing- BGP - key 4 slides - 124 126 131 132
Moving from intra-domain (within one autonomous system/routing domain/internet service provider) to intradomain, the key change is from policy within a domain (as used for steering traffic in centralised routing or in mpls or segment routing) to policy between multiple autonomous (i.e. independent) domains who may have conflicts and often require some level of information hiding (protecting knowledge of their customers' needs from competitors). So while connectivity is the minimum requirement, there's often no shared goal in terms of what is "optimal" (i.e. what routing metric to use) - we'll see that in default cases, for traffic engineering, and various tie breaking reasons, metrics implicitly creep in as an implici part of BGP routing, but not in any way consistently.
1 Centralised (!) Routing - Fibbing (see fibbing paper for more details - esp. figure 9/10 on failure/recovery modes) - in particular, fail-open and fail-close is used in the paper to refer to the persistence of a path made up by fibbing in the event of a controller failure where in some cases this is needed, and in others it needs to be removed! [1,2]
2 Stateful Routing - MPLS. Multi-protocol label switching has a long history (probably started in Cambridge!). It simplifes switch/router design in terms of forwarding, at the cost of increasing complexity in the control plane -- possibly in routing, but more crucially, signaling. Signaling protocols have a very long history (from railways over 200 years ago). One interesting computer science dimension that arose from signaling is the concept of mutual exclusion. The first algorithms for avoiding contention for a limited resource are direct descendents of the P and V flags used to prevent two trains entering the same section of track....
In a sense, these two ideas (central and stateful) can be reconciled via "soft state" protocols (see last lecture).
Note also: MLPS involves a "shim" layer between IP and lower levels. Segment routing may use that, or may just use IP6 routing options directly. Recall layering from IB networking course. It is often not a pure picture - IP tunnels are another example of extra layers between this and that. MPLS can also simplify switch & router port processing (and possibly, if switch is "cell switched" scheduling forwarding packets across the switch fabric - again, recall router architecure from IB networking course). Segment Routing is a re-think of MPLS, which can use IPv6 routing options as labels, and then use IP routing updates to distribute the label information to ( amongst others, upstream) neighbours. There's a nice slidepack SR explainer from CERN which shows the interaction with routing... Note segment routing with IPv6 dispenses with the potential hardware speedup of having 20 bit MLPS labels for forwarding, so one assumes router NICs and Processors may have ASIC support for v6 header processing!
Optional background reading...
2. Another dimension of signaling is that it requires a level of accesss control authentication and authorisation not typically present in pure datagram networks like traditional IP. For a measure of how bad it can get, look no further than the old digital telephone network signaling system number 7 (SS7) which is more complex than the whole TCP/IP regular data stack (see report on vulnerabilities in SS7). RSVP (serves similar function for signaling for MPLS if you don't just rely on routing!) is about as bad.
1. Further work was done based on the fibbing idea:
I'll be noting progress and also adding occasional related reading/ and corrections on this blog.
If you want to revise anything to warm up for the course, I suggest last year's Computer Networks course should be a quick re-read (e..g on routing )! A fun review of 40+ years of the Internet
This week we'll just make a start on routing.
For fun, you might find this discussion on why LLMs aren't much use for networking, mostly interesting - video recording of panel session
Reference requested for Glossary of Terms: from ISOC
Some acronyms come from the 7 layer model of the communications stack including terms like PHY (short for physical, so not really an acronym).
Being digitally colonized has been a serious threat to national sovereignty, but also to individual freedoms from survellance and censorship for decades. This applied to the Internet, the WWW, the Cloud and now AI.
Whether the digital Emperors are based in the USA or China, they are there.
To avoid these sorts of risk for content, Ross Anderson proposed the eternity service which finesses the problems in (typical for Ross) an ingeneous way by structuring the infrastructure as a mix of sharing, striping, and sharding and builds in the threat of mutually assured destruction - if you are a low level engineer/computer scientist, the idea is like CDMA or Network Coding or what some colleagues re-purposed to be spread spectrum computing.
A simpler idea is more coarse grained - organisations that provide critical infrastructure (railways, power grids, water&sewerage, Internet etc) can source technology from (say) three different providers. The London Internet Neutral Exchange (LINX), which extends this to cooperative ownership as well. So undermining of one supplier's gear has a limit to damage on the service - indeed, many services operate with a headroom for coping with simple natural disasters in any case (internet and power grids also to allow for wide variance in demand/supply) so this is a natural way to do things.
Another digital version is the Certificate Transparency, which also creates a merkle tree space for (horrible word) coopetition (cooperation amongst competitors), enforced by the tamper evident (or to some extent, socio-economically tamper proof) service space, in a way a single application version of eternity.
This would apply to sourcing data, training, and models themselves + inferencing after the training.
So how about, using the state of the art multi-AI protocols to connect agents, we construct a multi-national AI substrate that serves no-one in particular, but everyone in general. Any attack (removal of an agency, pollution of data) would damage the attacker as much as everyone else. It is in everyones interest to keep the system running and running honestly.
So how to combine neural networks? (something that would also be useful during training or inference so as to share GPU or other accelerator h/w)? You'd need some sort of way to interpret multiple interleaved graphs with multiple (XORd or turbocoded) weights. This is research. Margin's too small to put it here =>
Towards an RSPI for AI (UK) - alpha name RSPAI (short name "pai")
The rspi was at one point going to be the BBC Nano (or model n) but ended up as rspi because ... my reaction to the 100-200$ cost of the one laptop per child project, and to the limitations thereof...was to "blow a raspberry"...
An AI, like a computer, is a general purpose machine-a purely physical tool analogy is a swiss army knife, but software on computers is tools and die - tool and die makers design, and build or repair tools. they have a bench with lathes and saws and drills and hammers and screwdrivers and boxes full of bits...
A s/w toolbench is something like unix, with subsytems (network stack, file systems, i/o in general (serial, display etc) plus SDK for development (vi, cc, add etc) and then some handy pre-made tools (regex/grep, sort, awk, sed, etc) with source and documents available so people could use them as design patterns (templates etc)
the RSPI prospered as it had 30 year pre-history (BBC Micro, Acorn, ARM, Broadcom system on chip) and ran Linux (descendent of bell labs et al) and had, on the system on chip a GPU (so it could use openGL rendering / gaming s/w) and wifi, and gpio (so it could connect to sensors and actuators and do robotics or similar
An RSPAI would also need to scale up - by being modular, and networks (c.f. wifi and gpio above - nowadays MCP or AI2AI or similar) and federated learning tools/platforms- the equivalent of s/w development environment but with examples (e.g. start with some huggingface pieces) and flower.ai
A network, small and large...with a way to train (NN to FL)- in fact networking at on chip, in software, and between chips/systems...
It also needs some data (kind of AI equivalent to sensor input) - this doesn't have to be _on_ the RSPAI - it needs to be where you can get it (equivalent was that
the Pi part of the Rasperry Pi name came from the original idea that like the BBC micro that booted into basic, the RSPI would boot into python. we discarded that early on and said kids must learn the Command Line (shell) but one simple example lesson (how to write Snakes in Python) would start with how to download and install python, and then... ... ...
It could run on an RSPI easily (esp. given cheapo GPU) but what is the core tool bench for the Ai part of this RSPAI? is it just huggingface with pytorch tensorflow and all that gubbins, or is there a core that could be still general, but simpler to start off with and afford people with easy lessons - and here is a candidate lean language model that runs on a laptop, as a starter, fresh out of the Turing Institute!
(one class we ran with the RSPI had 11yr old schoolgirls withotu any teacher go from nothing to writing snakes in 1 hour from scratch) - what is the equiv to that, that lets you still also go all the way to writing a version of asteroids and control a bipedal lego robot? or genai including stable difussion video liks this:- https://youtu.be/kG8fmSW_5wM?si=hsbBdeCUtdshNyBM
A small neural net, a regression/stats library, some causal inference graph stuff? what what what?
And who is it for? policy makers, wannabe AGI gods, defense contractors, health and environment researchers, Jane Public who who who?
Answers on a postcard please...