clog: 2025

Tuesday, December 02, 2025

Principles of Communications Last Week 8 2/12/2025

Today we wrap up this course with the network systems design toolbag of tricks.

It could be instructive to take each trick and see where it showed up earlier in the course.

Statistical multiplexing - in conservation law and scheduling.

Pipelining- the entire idea both of a stack (vertical) and the network path (sequence of links and switches and routers) is illustrative of pipelining. Both closed loop and open loop flow control are trying to balance the pipeline, either through congestion control to share the bottleneck, or through admission control.

Caching/Locality - mentioned in passing memcached as a data center service, for example.

Batching also shows up in some data center distributed processing platforms (not really covered here)

Optimising the common case is probably something that BGP demonstrates how not to do!

Binding and Indirection - e.g. multicast addresses aren't real locations, but logical group identifiers, and the actual locations are distributed by the group membership protocol...

Randomness - as discussed in telephone routing! Also used in some load balancers when talking to replicated services (and flow or packet level balancing over multipath enabled routes).

There are many many other examples...

On the other, space could be cool but cooling in space could be very hard... this is maybe a nice example of how systems thinking can be applied using the simple laws of physics (gravity, EM radiation, thermodynamics) before you get as far even as thinking about resource management of processing, memory, comms.. ... ...

Things you do not need to remember:- Padhye's "simple" TCP throughput equation, and Erlang's call blocking probability ...

Tuesday, November 25, 2025

Principles of Communications Week 8 25-27/11/2025

tue: traffic management timescales, users, large and small, demands, short and long...

signaling and soft state

thu: systems design patterns

Monday, November 17, 2025

Principles of Communications Week 7 18-20/11/2025

This week we'll cover

qjump[qj] - data center networking - a big special corner case of traffic scheduling.

note qj puts the token bucket pacer in a hypversions so it is a regulator and a policer in one. it could also be offloaded to a smart NIC If you had one, or just put in the guest OS if you trust it.

optimisation - routes for traffic, and traffic for routes

The start of traffic engineering by steering traffic by weights - note the hill descent is a classic old school AI technique. And the origin of why tcp does AIMD is also a type of optimisation, of joint source and network utility!

Optimisation is a very large topic in itself, and underpins many of the ideas in machine learning when it comes to training - ideas like stochastic gradient descent (SGD) are seen in how assign traffic flows to routes, here. In contrast, the decentralised, implicit optimisation that a collection of TCP or "TCP friendly" flows use is more akin to federated learning, which is another whole topic in itself.

Why a log function? maybe see bernouilli on risk

Why proportional fairness? from social choice theory!

Are people prepared to pay more for more bandwidth? One famous Index Experiment says yes.

0. See Computer Systems Modeling for why the delay grows quickly as load approaches capacity.

1. see IB Distributed Systems for clock synch

2. see prev year's Cloud Computing (II) module for a bit more about data centers&platforms.

[qj] The qj paper gives lots more details of setup, for anyone interested (the figures in the paper are clickable and take you to software and switch/network configurations and data).

Tuesday, November 11, 2025

Principles of Communications Week 6 11-13/11/2025

We've revisited flow and congestion control - one way of visualising the progress of an adaptive flow&congestion control protocol is the time sequence diagram:-

but note this is a massive over-simplification as really what you see here is an ideal with only one source and (apparently) only one bottleneck queue. In reality, in FIFO queues, traffic from multiple sources mixes and interferences (causing high variance in delay, hence round trip time, and very unpredictable loss. If we had Round Robin (by flow) queues, things might be a bit better, but how much? That is what we look at under Scheduling, and with the Generalised Processor Sharing model, can see how close to some ideal of "isolation" between flows, we can get.

With FIFO queues, RTTs and rates (as estiamted from data or ack packet inter-arrival times are going to be varying fairly chaotically, as the ensemble of flows at any bottleneck will not be coordinated in any special way as they all have different RTTs and perhaps just different performance senders, maybe different packet sizes, possibly different inter-packet timing at transmit time, etc etc

A combination of open loop (admission control) and closed loop (feedback) would allow sources to simply operate at a flat rate (the requested rate) or adapt above it as capacity is made free...

The trade off in flow control and scheduling is all about overall system complexity.

Next, we'll wrap up on queue management, then move on to the special (but important) case of data center networks and latency control!

Tuesday, November 04, 2025

Principles of Communications Week 5 4-6/11/2025

This week, we've got two random[ref] examples

1. random telephone routing

c.f. triangles

can we greedily find the tandem? (only) if you want to check the reasoning behind that part of DAR!

2. random drop congestion - serioiusly, today's lecture is mainly revision of IB congestion/flow control...

c.f.tcp arena

how many TCPs are there, really? again, only here for the keen background reader!

ref: there's a very nice study of general applicability of random choices in this harvard paper if you want some more background reading...

separately, in a discussion with a supervisee/supervisor, i realise some people may be interested in looking at real code to reinforce their understanding - for IB material, I strongly recommend Rich Stevens' TCP/IP Illustrated Volumes 1 and 2, but for this course, there's not really any such a nice single text - an alternative might be this:-

There's a neat network simulator called NS3 which has real code in it - it is used both for routing experiments and for transport - the package also has a nice visualisation system, so when you configure a simulation with a topology of routers and hosts and links, you can record the traces of events (packets) and later 'play it back' with a nice layout - see

https://www.nsnam.org/

for where to get it & documentation including examples...

Simulations are interesting as they often (in this case) include fragments of real code from real systems, so they are also test harnesses. however, more fundamentally, packet-based event driven simulators are implented with much the same design pattern as packet switching systems (i.e. routers) - effectively the simulator is an aggregsation of all the event schedules for all the entities in the system confituation being simulators- there are basically two types of events

- timers - both for when to send periodic route updates/keepalives, and for tcp/quic retransmit type actions

- packets arriving to go to the next stage (whether next layer or next hop...)

Associated with events is some state (e.g. forwarding information base searched by destination address in IP header, or label switched path indexed by packet label in MPLS shim header) - or in transport protocols, the

whole bunch of information about sequence numbers, windows and congestion, etc

Event driven code tends to look a bit different from classic application layer code

[that said, proxies/caches/servers can look similar).

From the router perspective, there's the event/timer driven distributed routing control protocol,

and separately theres the actual packet forwarding processes/threads ...the former (as i mentioned before) is often in the pattern of a unix daemon/service, whereas the latter is much lower level (and often has to interface to custom hardware for supportng fast lookup and even switch fabric interfaces) - there are open source routers other than just linux based (home broadband) ones - (e.g. www.nongnu.org/quagga ) but i think this might be a lot to read with few extra lessons..

Tuesday, October 28, 2025

Principles of Communications Week 4 28-30/10/2025

This week we wrap up BGP - looking at abstractions of the algorithm (The Stable Paths Problem in Interdomain Routing) and concrete realisations of problems in implementations.

While you may find the stable paths model helpful in removing noise from BGP complexity, I am not so sure its a great abstraction for thinking about how actually to resolve the problem(s) (non convergence etc). A nicer approach (by same lead person, Tim Griffin) is meta routing, which is very powerful and general, but would need an entire other course to discuss and I just put here for background in case anyone is interested !

Then we'll next make a start on Multicast Routing. Two compelling applications were tv/radio broadcast and software distribution. However, application layer overlays (Content Distribution Networks) have subsumed those needs - a great example is how Zoom coordinates multiparty sessions - this talk by their CEO is instructive. Also interesting is this paper on netflix content distribution approach.

Sunday, October 19, 2025

Principles of Communications Week 3 21-23/10/2025

Interdomin routing- BGP - key 4 slides - 124 126 131 132

Moving from intra-domain (within one autonomous system/routing domain/internet service provider) to intradomain, the key change is from policy within a domain (as used for steering traffic in centralised routing or in mpls or segment routing) to policy between multiple autonomous (i.e. independent) domains who may have conflicts and often require some level of information hiding (protecting knowledge of their customers' needs from competitors). So while connectivity is the minimum requirement, there's often no shared goal in terms of what is "optimal" (i.e. what routing metric to use) - we'll see that in default cases, for traffic engineering, and various tie breaking reasons, metrics implicitly creep in as an implici part of BGP routing, but not in any way consistently.

The last thing on BGP is going to follow from Traffic ENgineering and asks two questions: what really is the model BGP implements, to better understand how it works (and goes wrong)? and What engineering tweaks can we do to make it actually operate better in practice. We'll finish up on this on Oct 28th

Tuesday, October 14, 2025

Principles of Communications Week 2 14-16/10/2025

1 Centralised (!) Routing - Fibbing (see fibbing paper for more details - esp. figure 9/10 on failure/recovery modes) - in particular, fail-open and fail-close is used in the paper to refer to the persistence of a path made up by fibbing in the event of a controller failure where in some cases this is needed, and in others it needs to be removed! [1,2]

2 Stateful Routing - MPLS. Multi-protocol label switching has a long history (probably started in Cambridge!). It simplifes switch/router design in terms of forwarding, at the cost of increasing complexity in the control plane -- possibly in routing, but more crucially, signaling. Signaling protocols have a very long history (from railways over 200 years ago). One interesting computer science dimension that arose from signaling is the concept of mutual exclusion. The first algorithms for avoiding contention for a limited resource are direct descendents of the P and V flags used to prevent two trains entering the same section of track....

In a sense, these two ideas (central and stateful) can be reconciled via "soft state" protocols (see last lecture).

Note also: MLPS involves a "shim" layer between IP and lower levels. Segment routing may use that, or may just use IP6 routing options directly. Recall layering from IB networking course. It is often not a pure picture - IP tunnels are another example of extra layers between this and that. MPLS can also simplify switch & router port processing (and possibly, if switch is "cell switched" scheduling forwarding packets across the switch fabric - again, recall router architecure from IB networking course). Segment Routing is a re-think of MPLS, which can use IPv6 routing options as labels, and then use IP routing updates to distribute the label information to ( amongst others, upstream) neighbours. There's a nice slidepack SR explainer from CERN which shows the interaction with routing... Note segment routing with IPv6 dispenses with the potential hardware speedup of having 20 bit MLPS labels for forwarding, so one assumes router NICs and Processors may have ASIC support for v6 header processing!

Optional background reading...

2. Another dimension of signaling is that it requires a level of accesss control authentication and authorisation not typically present in pure datagram networks like traditional IP. For a measure of how bad it can get, look no further than the old digital telephone network signaling system number 7 (SS7) which is more complex than the whole TCP/IP regular data stack (see report on vulnerabilities in SS7). RSVP (serves similar function for signaling for MPLS if you don't just rely on routing!) is about as bad.

1. Further work was done based on the fibbing idea:

https://www.michaelschapira.com/_files/ugd/3b1e1e_dc51c8d43c7c41cc9ffbfa84923cd864.pdf

Basically, it leverages FIBbing-like mechanisms to optimize (oblivious) TE in IP networks.

https://www.michaelschapira.com/_files/ugd/3b1e1e_b928d21ed137475bb89f51c9073ed8de.pdf

This paper is about the hardness of configuring ECMP for TE. This is actually based on a very cute hardness amplification proof technique.

Wednesday, October 08, 2025

Principles of Communications Week 1 9/10/2025

I'll be noting progress and also adding occasional related reading/ and corrections on this blog.

If you want to revise anything to warm up for the course, I suggest last year's Computer Networks course should be a quick re-read (e..g on routing )! A fun review of 40+ years of the Internet
This week we'll just make a start on routing.
For fun, you might find this discussion on why LLMs aren't much use for networking, mostly interesting - video recording of panel session
Reference requested for Glossary of Terms: from ISOC
Some acronyms come from the 7 layer model of the communications stack including terms like PHY (short for physical, so not really an acronym).

Wednesday, July 30, 2025

An AI eternity service (instead of "sovereign" AI)

Being digitally colonized has been a serious threat to national sovereignty, but also to individual freedoms from survellance and censorship for decades. This applied to the Internet, the WWW, the Cloud and now AI.

Whether the digital Emperors are based in the USA or China, they are there.

To avoid these sorts of risk for content, Ross Anderson proposed the eternity service which finesses the problems in (typical for Ross) an ingeneous way by structuring the infrastructure as a mix of sharing, striping, and sharding and builds in the threat of mutually assured destruction - if you are a low level engineer/computer scientist, the idea is like CDMA or Network Coding or what some colleagues re-purposed to be spread spectrum computing.

A simpler idea is more coarse grained - organisations that provide critical infrastructure (railways, power grids, water&sewerage, Internet etc) can source technology from (say) three different providers. The London Internet Neutral Exchange (LINX), which extends this to cooperative ownership as well. So undermining of one supplier's gear has a limit to damage on the service - indeed, many services operate with a headroom for coping with simple natural disasters in any case (internet and power grids also to allow for wide variance in demand/supply) so this is a natural way to do things.

Another digital version is the Certificate Transparency, which also creates a merkle tree space for (horrible word) coopetition (cooperation amongst competitors), enforced by the tamper evident (or to some extent, socio-economically tamper proof) service space, in a way a single application version of eternity.

This would apply to sourcing data, training, and models themselves + inferencing after the training.

So how about, using the state of the art multi-AI protocols to connect agents, we construct a multi-national AI substrate that serves no-one in particular, but everyone in general. Any attack (removal of an agency, pollution of data) would damage the attacker as much as everyone else. It is in everyones interest to keep the system running and running honestly.

So how to combine neural networks? (something that would also be useful during training or inference so as to share GPU or other accelerator h/w)? You'd need some sort of way to interpret multiple interleaved graphs with multiple (XORd or turbocoded) weights. This is research. Margin's too small to put it here =>

Friday, July 25, 2025

RSPAI?

Towards an RSPI for AI (UK) - alpha name RSPAI (short name "pai")

The rspi was at one point going to be the BBC Nano (or model n) but ended up as rspi because ... my reaction to the 100-200$ cost of the one laptop per child project, and to the limitations thereof...was to "blow a raspberry"...

An AI, like a computer, is a general purpose machine-a purely physical tool analogy is a swiss army knife, but software on computers is tools and die - tool and die makers design, and build or repair tools. they have a bench with lathes and saws and drills and hammers and screwdrivers and boxes full of bits...

A s/w toolbench is something like unix, with subsytems (network stack, file systems, i/o in general (serial, display etc) plus SDK for development (vi, cc, add etc) and then some handy pre-made tools (regex/grep, sort, awk, sed, etc) with source and documents available so people could use them as design patterns (templates etc)

the RSPI prospered as it had 30 year pre-history (BBC Micro, Acorn, ARM, Broadcom system on chip) and ran Linux (descendent of bell labs et al) and had, on the system on chip a GPU (so it could use openGL rendering / gaming s/w) and wifi, and gpio (so it could connect to sensors and actuators and do robotics or similar

An RSPAI would also need to scale up - by being modular, and networks (c.f. wifi and gpio above - nowadays MCP or AI2AI or similar) and federated learning tools/platforms- the equivalent of s/w development environment but with examples (e.g. start with some huggingface pieces) and flower.ai

A network, small and large...with a way to train (NN to FL)- in fact networking at on chip, in software, and between chips/systems...

It also needs some data (kind of AI equivalent to sensor input) - this doesn't have to be _on_ the RSPAI - it needs to be where you can get it (equivalent was that

the Pi part of the Rasperry Pi name came from the original idea that like the BBC micro that booted into basic, the RSPI would boot into python. we discarded that early on and said kids must learn the Command Line (shell) but one simple example lesson (how to write Snakes in Python) would start with how to download and install python, and then... ... ...

It could run on an RSPI easily (esp. given cheapo GPU) but what is the core tool bench for the Ai part of this RSPAI? is it just huggingface with pytorch tensorflow and all that gubbins, or is there a core that could be still general, but simpler to start off with and afford people with easy lessons - and here is a candidate lean language model that runs on a laptop, as a starter, fresh out of the Turing Institute!

(one class we ran with the RSPI had 11yr old schoolgirls withotu any teacher go from nothing to writing snakes in 1 hour from scratch) - what is the equiv to that, that lets you still also go all the way to writing a version of asteroids and control a bipedal lego robot? or genai including stable difussion video liks this:- https://youtu.be/kG8fmSW_5wM?si=hsbBdeCUtdshNyBM

A small neural net, a regression/stats library, some causal inference graph stuff? what what what?

And who is it for? policy makers, wannabe AGI gods, defense contractors, health and environment researchers, Jane Public who who who?

Answers on a postcard please...