Tuesday, November 15, 2022

Principles of Communications Week 7 L12/L13 15th&17th Nov, 2022 - Data Centers & Optimisation

 Data Centers - example of Queue Jump


1. Data centers have regular topology and software can be centrally managed

2. the Fan-in factor of traffic, sometiems known as TCP in-cast, can and does cause spikey delays, which very badly reduces the performance of distributed compiutations, clock synchronisation and in memory disk caches, all of which degrades the throughput of data centers

3. if we can differentiate traffic, we can use different schedules to treat flows with low latency requirements, from those with high throughput but no particularly latency bound needs.

4. in a 3 hop data center, a small number of priority queues will work - as long as the sum of traffic from sources of a given priority is controlled based on computing the occupancy of that priority class, and its delay impact - then there is capacity left for lower priority, and still very short queues too... Sources can be policed to ensure the classes don't starve out lower priorities, and this can be done below the app in the OS, or below the OS in a hypervisor

Optimisation:- multipath routes, decentralised rates.

from an entirely different perspective, we can treat the routing and rate control problems as optimisation challenges - in this approach, we can use gradient descent methods for assigning flows to routes, and distributed optimisation (via feedback control and increase/decrease searching for optimal utility) to compute rates.

Note well - the optimisation of routes can work at any level of aggregation, and hence is suitable for traffic engineering, and is largely a centralised technique.  The formulation is also agnostic about multipath routing, so is suitable in the presence of live use of redundant paths and load balancers, and is consistent with end-to-end multipath protocols (e.g. multipath TCP or QUIC). One tends to think of the route optimisation approaches being for longer term matching of traffic to paths (but could also suit open flow controlled traffic at the individual flow level, obviously, though this is not used in the Internet today). Of course, gradient descent methods are widely used in training in machine learning and AI.

n.b. finding minimum of a function reminder

The distributed, asynchronous, non-coordinated optimisation of rates (i.e. TCP congestion control or equivalents) is also applicable to other distributed machine learning. The rate adaptation is also suitable for multipath end-to-end protocols (i.e. MPTCP). So the rate optimisation techniques operate on the round-trip time timescales.

No comments: