Monday, November 28, 2022

Principles of Communications Week 9 L16 29th Nov, 2022 - Systems Design pt II and Wrap.

Solving for Systems:-

  • multiplexing
  • pipelining
  • batching 
  • exploiting locality (spatial and temporal)
  • exploiting commonality 
  • hierarchy 
  • using binding with indirection 
  • virtualization (multiplexing, indirection, binding) 
  • exploiting randomization 
  • using softstate versus explicit state exchange 
  • employing hysterisis 
  • separating data and control 
  • extensibility

and measure, measure, measure and characterise.

Thursday, November 24, 2022

lovely explainer of the internet, at 5 levels of complexity

Jim Kurose explains the Internet to five different levels of complexity.

Tuesday, November 22, 2022

Principles of Communications Week 8 L14/L15 22nd&24th Nov, 2022 - Traffic Management and Systems Design

Traffic Management

This is mainly about the Timescale decomposition of traffic management components, packet transit/RTT times, flow setup times, traffic matrix time variations, and long term demand variation (usually increase).

Also novel protocol deployment create new demand patterns.

This all creates requirements for empirical input e.g. of user utilities for elastic and inelastic traffic demands, and for behaviour in response to offpeak (or congestion) type time varying charging for resource use. A great example of recent work on how things change sometimes quite quckly is this paper about the change in application demands during the pandemic

Signaling protocol complexity - signaling couples multiple components - end systems and users, to reuters/switches, schedulers and admission control, and even routing, so signaling is a mess, and to date, very little deployed in the Internet. Most traffic management decisions are made using bespoke (ad hoc) measurement tools and techniques.

One of the big arguments recently is how to divide up a budget (whether congestion or delay) amongst hops (e.g. AS hops) in a path - with some overall resource constraint (e2e delay must be less than 300ms) each hop will want to maximise the delay it can impose for bursty traffic (same for RED based ECN triggers) so it can maximise user traffic - need incentive matching!

Systems Design

Some things change demand surprisingly quick. - already mentioned the impact of the pandemic on increased in demand for interactive video/audio, but also sources shifted from work to working from home in daytime. New change is decentralised social media like Mastodon and Matrix, which have many p2p servers (mastdon today has around 4000, serving 8M people - the user base is growing at around 1M a week!)

Interesting social/legal/regulatory constraints include the simple (the EU mandates free data roaming or even trivial, like USBC phone charging sockets!) to the subtle - the Digital Markets Act requires any large service provider to open up APIs for interoperation - new work in internet standards means there will be open protocols that allow all systems (not just e-mail and web servers, but messaging and social media etc) to interwork - presumably also video conferencing 

Technically, this is actually trivial (e.g. most video and audio use same coding - indeed in WebRTC have to use same protocols too). The trickier pieces are interworking key management for security, especially for group communications.

Pipelining example lead to replacement of HTTP 1.0/TCP with HTTP 3/QUIC, which allows arbitrary ordering of packets delivered over UDP (but still with reliability, flow and congestion control, and e2e privacy) - this allows browsers far more freedom to render material from multiple sources/media.

And this will keep changing and changing as people create new applications and services, and new communications technology - all that requires measurement, measurement and also measurement.

Tuesday, November 15, 2022

Principles of Communications Week 7 L12/L13 15th&17th Nov, 2022 - Data Centers & Optimisation

 Data Centers - example of Queue Jump

1. Data centers have regular topology and software can be centrally managed

2. the Fan-in factor of traffic, sometiems known as TCP in-cast, can and does cause spikey delays, which very badly reduces the performance of distributed compiutations, clock synchronisation and in memory disk caches, all of which degrades the throughput of data centers

3. if we can differentiate traffic, we can use different schedules to treat flows with low latency requirements, from those with high throughput but no particularly latency bound needs.

4. in a 3 hop data center, a small number of priority queues will work - as long as the sum of traffic from sources of a given priority is controlled based on computing the occupancy of that priority class, and its delay impact - then there is capacity left for lower priority, and still very short queues too... Sources can be policed to ensure the classes don't starve out lower priorities, and this can be done below the app in the OS, or below the OS in a hypervisor

Optimisation:- multipath routes, decentralised rates.

from an entirely different perspective, we can treat the routing and rate control problems as optimisation challenges - in this approach, we can use gradient descent methods for assigning flows to routes, and distributed optimisation (via feedback control and increase/decrease searching for optimal utility) to compute rates.

Note well - the optimisation of routes can work at any level of aggregation, and hence is suitable for traffic engineering, and is largely a centralised technique.  The formulation is also agnostic about multipath routing, so is suitable in the presence of live use of redundant paths and load balancers, and is consistent with end-to-end multipath protocols (e.g. multipath TCP or QUIC). One tends to think of the route optimisation approaches being for longer term matching of traffic to paths (but could also suit open flow controlled traffic at the individual flow level, obviously, though this is not used in the Internet today). Of course, gradient descent methods are widely used in training in machine learning and AI.

n.b. finding minimum of a function reminder

The distributed, asynchronous, non-coordinated optimisation of rates (i.e. TCP congestion control or equivalents) is also applicable to other distributed machine learning. The rate adaptation is also suitable for multipath end-to-end protocols (i.e. MPTCP). So the rate optimisation techniques operate on the round-trip time timescales.

Tuesday, November 08, 2022

Principles of Communications Week 6 L10/L11 8&10 Nov, 2022 - Scheduling and Queue Management

I am going to loop back around to flow and congestion control, because these things go together like strawberries and cream, or meta and verse:-)

Flow control can be open loop (call setup with a traffic descriptor and an associated admission control algorithm), or closed loop, based on feedback (dupack, packet loss/timeout, or explicit congestion notification).

Packet forwarding can be FIFO, or Round Robin, or weighted according to some request (management, setup, payment, etc - out-of-band). If it is round robin, we get fairness quite naturally, and some degree of protection against misbehaviour of other flows. If the flows are policed (because they gave a descriptor in open loop, or because they implement congestion avoidance and control), then we get more protection against misbehaviour of other flows of packets. If the router implements an Active Queue Management scheme (like RED), as well as a schedule (like round robin), then we get more protection (even against our own misbehaviour).

A neat example of use of scheduling in a different layer is in the new web protocol, QUIC, in browsers - this paper illustrates nicely how one can improve rendering of pages by changing the order that components flow through the layers of HTTP, QUIC, to/from UDP/IP...

Tuesday, November 01, 2022

Principles of Communications Week 5 L8/L9 1&3 Nov, 2022 - Mobile&Random Routing + Open&Closed Loop Flow Control

 We've looked at unicast (one direction) multicast (some direction), broadcast (all directions), mentioned any cast (any direction), and mobile (one level of redirection. We've also looked at metric v. random based route choice, and some policy interactions with traffic engineering (preferences that differ from policy or metric).

Note - slide 228 lists R (trunk reservation) with opposite sense to r on y axes on graphs showing diminishing return of impact of trunk reservation

Next, we look at open-loop and closed-loop flow control.  History of feedback controllers is ancient!

A fun thing to look at is the BSD TCP kernel code. - this book by Rich Stevens is a walk through of that!

Wednesday, October 26, 2022

Principles of Communications Week 4 L6/7 25&27 Oct, 2022 - BGP Abstraction + Multicast

 This week, we finished up looking at BGP by covering:

  1. the stable paths abstraction - what does path vector+policy do, algorithmically?
  2. real world dynamics and engineering for stability - the latter performance challenges show that the information hiding goals of inter-domain routing have not really been achieved very thoroughly! So intra-domain dynamics get exported to the world, because of hot potato and MEDs.
As well as this, BGP requires a lot of external machinery for specifying policy, and for guarding against misinformation (by definition, harder to do with a system which has a goal of information hiding).

Then we looked at Internet Multicast - this also has some interesting challenges, that have prevented wide spread adoption:

  1. potential use for leveraging DDoS attacks
  2. lack of a policy/interdomain/business model (who causes down stream traffic?)
In addition, multicast created headaches for high speed switch/router hardware designers

General lesson here is that global scale systems entail complex, multi-factor considerations, and don't believe everything out there running the world is actually based on completely sound design. Nevertheless, it has application in special cases/limited domains, such as backbones for TV distribution, and more especially, data center networks.


Aside: just to support this, the university information service video recording system failed to capture audio&slides for the 20.10.22 lecture but luckily, last year's recordings are still available, and roughly from same dates (2021 recordings) e.g. 2021-10-19 2021-10-21 2021-10-26 

Apologies for that!