Thursday, November 21, 2024

Principles of Communications Week 7 to Nov 21st/L13

This week we're covering firstly data centers (through the lens of q-jump and its clever "fan-in" trick), and secondly, optimisation of routes and rates. Both of these topics relate to practice, and theory of quite a few machine learning systems/platforms.

The mix of application platforms we see in the data center lecture are classic mapreduce (as represented by Apache Hadoop) and stream processing (as represented by Microsoft's Naiad)[2], as well as memcached (an in memory file store cache, widely used to reduce storage access latency and increase file throughput - e.g. for aforesaid platforms), as well as PTP (a more precise clock sysnch system than the widely used Network Time Protocol[1]). For an example of use of map reduce, consider Google's original pagerank in hadoop. For  the interested reader, this paper on timely data flow is perhaps useful background.

Optimisation is a very large topic in itself,  and underpins many of the ideas in machine learning when it comes to training - ideas like stochastic gradient descent (SGD) are seen in how assign traffic flows to routes, here. In contrast, the decentralised, implicit optimisation that a collection of TCP or "TCP friendly" flows use is more akin to federated learning, which is another whole topic in itself.

Why a log function? maybe see bernouilli on risk

Why proportional fairness? from social choice theory!

Are people prepared to pay more for more bandwidth? One famous Index Experiment says yes.

0. See Computer Systems Modeling for why the delay grows quickly as load approaches capacity.

1. see IB Distributed Systems for clock synch

2. see last year's Cloud Computing (II) module for a bit more about data centers&platforms.

No comments: