clog: 2023

Tuesday, November 28, 2023

Principles of Communications Final Week to Nov 28th/L16

Wrap up - you should now be able to see the supervisor pages which summarise the course and have links to questions and releavnt/in scope past exam questions.

Tuesday, November 21, 2023

Principles of Communications Week 8 to Nov 23rd/L15

This week is two Big Pictures

On Nov 21, we do:

1/ Traffic Management over multiple time scales - linking together several ideas between routing/congestion control, open and closed loop flow control, admission, pricing, and, finally, a bit about signaling.

feedback/admission work at any time scale

short> reduce rate/ lose packets

medium> try again later

long>change to a better provider

So note peak rate pricing is just a slow version of congestion control + shadow prices.

Note using a single network may amortize costs well, but a single technology has single failure mode - the model of copper/fiber/radio has a lot of fault tolerance, which is high value for critical infrastructure.

Note that the problem with capacity planning and traffic matrix/source behaviour in the general internet is that a) it isnt a regular topology, b) each ISP is doing it in competition but also in cooperation with other ISPs! However, this paper showed how well the ISPs coped when everyone switched to working/learning from him the internet in the age of covid

On Nov 23rd we'll cover:

2/ System Design Rules - these are rules of thumb that apply in many systems, processsor design, operating systems, networks, etc etc

More can and will be said...

Tuesday, November 14, 2023

Principles of Communications Week 7 to Nov 17th/L13

This week we're covering firstly data centers (through the lens of q-jump and its clever "fan-in" trick), and secondly, optimisation of routes and rates.

The mix of application platforms we see in the data center lecture are classic mapreduce (as represented by Apache Hadoop) and stream processing (as represented by Microsoft's Naiad)[2], as well as memcached (an in memory file store cache, widely used to reduce storage access latency and increase file throughput - e.g. for aforesaid platforms), as well as PTP (a more precise clock sysnch system than the widely used Network Time Protocol[1]). For an example of use of map reduce, consider Google's original pagerank in hadoop. For the interested reader, this paper on timely data flow is perhaps useful background.

Optimisation is a very large topic in itself, and underpins many of the ideas in machine learning when it comes to training - ideas like stochastic gradient descent (SGD) are seen in how assign traffic flows to routes, here. In contrast, the decentralised, implicit optimisation that a collection of TCP or "TCP friendly" flows use is more akin to federated learning, which is another whole topic in itself.

Why a log function? maybe see bernouilli on risk

Why proportional fairness? from social choice theory!

Are people prepared to pay more for more bandwidth? One famous Index Experiment says yes.

1. see IB Distributed Systems for clock synch

2. see last year's Cloud Computing (II) module for a bit more about data centers&platforms.

Monday, November 06, 2023

Principles of Communications Week 6 to Nov 10th/L11

This week, we cover two closely related topics:-

1. Flow&Congestion Control

There are three regimes for where there may need to be a matching process between sender and recipient rates.

a) sender - the sender may not always have data to send - think about sourcing data from a camera or microphone with a given coding, you get some peak rate anyhow. or you just might have a short data transfer that never "picks up speed"

b) network - here some type of feedback is needed to tell senders about network conditions - this could be a call setup (open loop control) - not something we do in today's internet, or it could be closed loop feedback via dupacks, packet loss or ECN, and a rate adjustment in the sender (congestion window or explicit rate control.

c) receiver - the receiver can slow down the sender by advertising a small receive windor or just delaying acks or both.

Much of today's network uses these three approachs via TCP or QUIC. There's a nice study based on wide area measrements that shows how congestion control (case b above) is quite rare and more often we have a or c...see this paper:-

TCP equation&reality

2. Scheduling&Queue Management

Schedulers in switches and routers try to minimise the "damage" caused by one flow to other packet flows through a network device that has more packets arriving than it can send just now. Buffering packets turns into jitter, delay, which for audio or video turns into playout buffer delay, or packet loss, which for TCP data traffic is a key congestion signal, and for audio/video is quality degradation.

Schedulers try to allocate bottleneck capacity fairly e.g. via round robin, but how long do queues really need to be? as the network scales up, perhaps the impact of queueing decreases and fairly simple schedulers may suffice - see this paper/talk for more ideas in this space. Of course, actual overload leading to packet loss is still bad, but the actual delay and jitter seen by a large aggregation of flows through a switch/router may not be the significant worry...

law of large numbers and what happens in queues then really

Tuesday, October 31, 2023

Principles of Communications Week 5 to Nov 3rd/L9

This week were covering Mobile Networking(*rant*) (briefly) and then telephony networks(^motives for innovation!) - architecture, but in particular, random routing as it is a neat illustration of the use of non-deterministic approaches to distributed systems which yields efficient results See also Valiant Load Balancing (for example this work at Microsoft, also used in Operating Systems (NGINX, for example.

For a general discussion of random choices and their application, see this study (very much deep background reading only!)

*rant* The TCP checksum is designed to protect end-to-end packets against corruption - it is fairly low cost to. compute (ones-compliment sum of all the 16 bit fields in the TCP header and application data) compared to more complex integrity checks used in lower level protocols (link level checks are usually designed to be implemented in hardware).

However, it features one incredibly poor additional design choice, which is that the check also includes adding in fields from the "pseudo header" which includes parts of the IP header (source & destination addresses, protocol field and IP segment length). (see for details) This has two problems: firstly it messes with the fate sharing design of the Internet so that end system comms is now entangled with network layer. Secondly, it stops a TCP connection surviving IP address re-assignment, which is essentially what happens if you move network (e.g. switch from using your wifi to using your cellular data link).

The argument made by the TCP designers was that this prevents some security threat from someone messing with your IP address. It is not clear that that threat actually really represents a real problem since nowadays, TCP implements multipath with multiple IP addresses and switches or combines them safely. Also other protocols (e.g. TLS, above TCP, or else QUIC, as a replacement for TCP+TLS) make use of a secure association between end systems, which is far more secure and resilient to mobility anyhow. So another mistake in the Internet design.

^Motivation for Innovation! The first automatic exchange was designed by an undertaker, Almon Strwoger who believed human operators were diverting business calls to his competitors - real story is a bit less interesting, but still important - precursor to the cross-bar switch!

Started on Flow Control - for closed loop/feedback control systems, this paper by James CLerk Maxwell, back in 1868 covered some nice maths behind how to desgin controllers!

Tuesday, October 24, 2023

Principles of Communications Week 4 to Oct 26th/L7

This week, we'll finish up BGP, and cover multicast routing.

General lessons from BGP - information hiding can be harmful to decentralised algorithms. but information hiding may be a necessary dimension to some distributed systems due to business cases (commercial inconfidence/competitive data).

Are there other ways to retain decentralised or federated operations but to retain also confidentiality? Perhaps using secure multiparty computations (not covered in this course!). There are a number of other federated services emerging in the world (applications like Mastodon and Matrix, and also, federated Machine Learning systems like flower) so this probably needs a good solution.

Multicast has a great future behind it - some neat thinking, but largely replaced by application layer content distribution networks (e.g. netflix, youtube, apple/microsoft software update distribution), and the move away from simultaneous mass consumption of video/audio. For more info on limitations of multicast, this paper is a good (quick) read.

Are application layer overlays a solution to other "in network" alleged enhancements? Possibly - for example, Cloudflare obviate the need tor always on IPv4 or IPv6 addresses - this also wasn't covered in this course, but might be of interest. Thir work is described here and other papers of theirs may be of current interest.

Tup A previously unavailable route is announced as avail- able. This represents a route repair.

Tdown A previously available route is withdrawn. This represents a route failure.

Tshort An active route with a long ASPath is implicitly re-

placed
with a new route p ossessing a shorter ASPath. This represents b oth a route repair and failover.

Tlong An active route with a short ASPath is implicitly replaced with a new route p ossessing a longer ASPath. This represents b oth a route failure and failover.

Couple of questions on Graphs in last section on BGP

1. A Few Bad Apples - graph doesn't asymptote, despite being cummulative as it prefixes can be announced more than once (i.e. > 100%!)

2 How long does BGP take to adapt to changes - the legend refers to

Tup A previously unavailable route is announced as available. This represents a route repair.

Tdown A previously available route is withdrawn. This represents a route failure.

Tshort An active route with a long ASPath is implicitly placed

with a new route p ossessing a shorter ASPath. This represents b oth a route repair and failover.

Tlong An active route with a short ASPath is implicitly replaced with a new route p ossessing a longer ASPath. This represents b oth a route failure and failover.

Tup A previously unavailable route is announced as avail- able. This represents a route repair.

Tdown A previously available route is withdrawn. This represents a route failure.

Tshort An active route with a long ASPath is implicitly re-

placed
with a new route p ossessing a shorter ASPath. This represents b oth a route repair and failover.

Tlong An active route with a short ASPath is implicitly replaced with a new route p ossessing a longer ASPath. This represents b oth a route failure and failover.

Monday, October 16, 2023

Principles of Communications Week 3 to Oct 19th/L5

Wrapping up with centralised routing, MPLS and Segment Routing - if you are interested in latter topic, see this survey of SR

This week, we'll start on BGP/Policy Routing, getting up to traffic engineering, including the amusingly named "hot and cold potato". Next week we will wrap up BGP covering semantics & performance.

A great overview of BGP if you want an alternate source.

Some General High level questions about material together with rough dates when each topic will appear in lectures.

Thursday, October 12, 2023

Principles of Communications Week 2 13/10/2023

This week we covered "centralised" routing - two aspects:

Fibbing - hybrid of SDN/central controller and distributed computation using link-state

(note on terminology. )

If you are interested in how to replicate state machines, have a look at this work on distributed consensus . Noting that these replication schemes themselves are very subtle and difficult to get right.

MPLS - and segment routing - which is somewhat stateful forwarding, which therefore requires some input from some controller or management plane to do anything other than default paths (if MPLS or SR just use the IGP to setup labels/segements, you just get whatever the IGP does, so kind of making the use of MLPS or SR rather pointless).

Some useful backup material on segmenet routing, operationally from Juniper Networks

Principles of Communications Week 1 5/10/2023

Today sees start of 2023/2024 year and for Part II taking Principles of Communications I'll be noting progress and also adding occasional related reading/ and corrections on this blog.
If you want to revise anything to warm up for the course, I suggest last year's Computer Networks course should be a quick re-read! A fun review of 40+ years of the Internet
This week we'll just make a start on routing.
For fun, you might find this discussion on why LLMs aren't much use for networking, mostly interesting - video recording of panel session

Reference requested for Glossary of Terms: from ISOC
Some acronyms come from the 7 layer model of the communications stack including terms like PHY (short for physical, so not really an acronym).