clog: October 2008

Friday, October 31, 2008

DCII 1-3.11.08

Multicast IP has been around for 20 years (!) - but only recently seen large scale use, e.g. for IPTV - there are now some examples of compelling large scale deployments which show why it is really useful. This relies on the relaxation of the old so-called "any-source multicast" to the more restricted "single-source" multicast model (or point-multipoint).

The potential for google click-thru type market research and targetted advertising should be obvious:)

If you are getting tired of reading this, and its after the cocktail hour, then
here is some advice on what to read to go with your drink.

DCII 29/31.10.08

Picking hello timers for routing can be quite hard - too fast, and you react to short term conditions and flap, too long, and routing is not responsive to improved conditions - a very nice discussion of this is in
talk by folks from the beautifully named startup, "packetdesign".

Floyd and Jacobson discuss the problem of naive periodic timers causing
synchronisation of work in this
paper from a while back

There's a great book about road traffic - see
"Traffic: Why We Drive the Way We Do (and What it Says About Us)" by Tom Vanderbilt;

Monday, October 27, 2008

DCII 27.10.08

Today, I'm finishing off the protocols soup - I should have said
If a million monkeys type for a million years, you might get a shakespeare play
amongst what they type, but if you subtract the shakespeare play,
then what is left is CS acronyms:)

meanwhile, on protocol stacks, this
talk on the waist of the hour glass by my old mate Steve Deering is great!

Wednesday, October 22, 2008

DCII 24.10.08

Protocol Implementation - see W Rich Steven's very excellent series of books, TCP/IP Illustrated (especially volume 2!). As well as a structured walk through of Berkeley Unix kernel source code, there's lots of useful lessons in there - the Linux code is "interesting", but less transparent as to purpose in my opinion (e.g. see my book on the Linux code from a while back).

Other useful protocol implementation ideas (in terms of managing concurrency, buffering, and other patterns) are distributed throughout lots of papers. Some early RFCs cover tricks used to do small fast data structures for various things. Model checking protocol implementations was an important contribution from groups here, notably the Network Semantics project, which did a full scale model of TCP and the socket layer!

In terms of the huge protocol TLA (Three Letter Acronym) soup out there, wikipedia is probably your best friend! basically, if a million monkeys typed for a million year,s they'd still not have all the ISO, ITU, IETF acronyms, even if they had shakespeare down pat.

Monday, October 20, 2008

DCII 22.10.08

The Internet's Layering model derives from the ISO's OSI (wow, there's a great mix of acronyms - actually, technically, ISO is not an acronym, since it is the International Organisation for Standardisation (with an 's') - OSI is the Open Systems Interconnection architecture and is an acronym:)

Meanwhile, alternatives abound - see for example, Arizona CS's ideas in the The X Kernel on protocol graphs. More recently, folks at USC and UCL came up with the more random idea of ole based protocols and Heap based composition. There are a lot of ways to skin a cat.

At a more basic level, the idea of interconnection in the OSI model tends to appear as if it is only a "network layer" function - the reality is that engineers and hackers build interconnects at every layer you can think of - for example:

physical layer repeaters and relays (especially in wireless and optical)

link layer switches and bridges (especially in LANs)

network layer (routers)

transport layer relays (usually to deal with transport layers that don't have enough network layer information to react end to end or hop by hop to loss because the network layer doesn't disambiguate congestion from interference or from noise).

Session, presentation and application layer proxies - e.g. web caching proxies, and in general, just about any peer-to-peer (P2P) system.

Data link and physical layer (possibly including network layer too) are starting to fall apart as "isolated" abstractions recently in multihop radio systems where
cooperative antennaes, cooperative coding, and cooperative multi-path routing mean that the three lowest layers need to be treated together at each device. This is a hot research topic area

End-to-end arguments in system design have never been more argumentative, seeing re-factorizations into different network architectures about once a year for the last decade.

Friday, October 17, 2008

DCII 20.10.08

Systems Thinking about Communications - some related material:

Systems design for protocols - see Trading packet headers for packet processing, by
Chandranmenon, G.P.; Varghese, G., and for improving state set up costs, see A model, analysis, and protocol framework for soft state-based communication by Raman and McCanne.

Protocols can be given the performance improvements by batching (amortizing cost of processing multiple times in one go - e.g. aggregating packet processing in one interrupt service routine), pipelining (e.g. Integrated Layer Processing), but beware, this can be bad if the loop to do the processing exceeds the instruction cache size (or ditto for packet header data). This is not just diminishing returns, but can be counter-intuitive. We can amalgamate different processing and improve not just implementation, but also design - two elegant examples are TCP/IP header compression and header prediction - see RFC1144 for a beautiful piece of design and implementation in this regard.

Statistical multiplexing (see lectures on telephone, internet and atm for discussion) can be great, if you know the statistics of the traffic at various levels (of scale in time and space and aggregation)- as we pointed out, this is very true for voice (whether telephony voice or skype/voip) but less so for data (we know how a TCP flow works, but not how a lot go togeter, but worse, with a mix of client/server and client/proxy traffic, and P2P traffic, we really don't haev a good handle on the "traffic matrix" any more, if we ever did!)

Virtualization (of h/w/OS - see Xen, or network, see VPNs) is a way computer science hides multiplexing (sharing) - the control plane allocates a share and schedules it, then each share (slice/multiplex) sees a dedicated resource "apparently all its own" - reality (accuracy of dedicated share) depends on accuracy of the scheduler - as in all things CS, "your mileage may vary":-). To work well, the scheduler has to model the resource being virtualized, and the requirements of the accuracy of the model trade-off against its efficiency dependent on the statisitcs of the workload - see, easy!

Thursday, October 16, 2008

DCII 17.10.08

Asynchronous Transfer Mode (not Automatic Teller Machine, or even, Another Terrible Mistake)

What I said in 1994 is still not wrong.

The Fairisle Cambridge ATM Switch

Fore and Marconi and links to Cambridge are historically very interesting.

much longer ago, a precursor to ATM was the Cambridge Ring-based local area network technology

ATM lives on in BT's Colussus network, which is used to provide dynamic control of unbundling DSL lines. The lines themselves often use ATM to multiplex a single voice channel with a higher speed data channel. At least one of tee chips used to do this was designed by a member of the Computer Lab.

Wednesday, October 15, 2008

DCII 15.10.08

IP Addresses (now, lamentably, referred to as IPs) have to be matched to the longest matching prefix (at least in IPv4) - there are a bunch of cool algorithms for this based in fast multi-level hashes (linux), Tries, and in specialized hardware (ternary Content Addressable Memory- CAMs) - so "most specifics"are found (last resort is all match- default route). For the first elegant paper on a data structure for doing this see Degermark's small fast route lookuup work. BSD and Linux kernel code are also informative.

IPv6 has enough bits to let you do proper separation of identity and location, so that one can at least consider a sensible way to support 3 different address allocation policies (topological, provider and geographic), as well as support both multi-homing and mobility. However, none of this is seeing much deployment yet (except in China).

Finally, the Host Identity Protocol (HIP) uses crypto-assigned addresses as a way to generate unique identities and then uses other services to map from ID to location - this, together with appropriate shim software (similar to the shims to let it work on v6) lets TCP work across host movements (or multi-homed - similar problem really) - the HIP RFC also cites other work on IPv6 and the basic problem of IPv4:)

An amusing (and not incorrect) map of the IPv4 Address Allocation today was done in the ever marvellous xkcd.

Monday, October 13, 2008

DCII 13.10.08

Erlang was a Danish mathematician who first solved the problem of provisioning for telephone networks by computing the relationship between the call arrival statistics, the capacity, and the "call blocking probability" (i.e. chance that there isn't a single call's worth of capacity on any circuit to the destination for your call). N.B. as well as being independent random arrivals, calls are mostly short (3 mins) and local (one hop), so hierarchical capacity assignment is straightforward. Of course, there are "flash crowds" (a.k.a. "Mother's Day" events), which do not follow these statistics, and generally lead to higher than usual call blocking.

Leland, Willinger and others were the first to report the (then surprising) non-poisson nature of Internet traffic in their rightly celebrated sigcomm 93 paper - this somewhat went against received telephone network wisdom that traffic was I.I.D and therefore characterized by the Poisson distribution. This, of course, describes the aggregate traffic characteristics - most traffic is TCP-based, and a single source's behaviour is described by Padhye's equation

THe Telephone networks were designed, top down, by large national monopolies and their topology was created by network design equations. The Internet has grown in a decentralised way, and so its topology is a result of a number of natural (demographic) forces and is described better as a (nearly) scale free graph, whose node degree distribution follows a power law - many other systems are like this (e.g. authorship graphs, the web page link topology, the appearance of actors in films (kevin bacon etc). This also has interesting properties in terms of resilience to failure and attack, and reflects an ancient truth about society which network scientists call "preferential attachment" and social scientists call "rich get richer".

Errata: I am corrected about use of Mobile Phones on planes in landing in HK - apparently, it isn't allowed there yet - this was something I was told a few years back, and was by someone from the far east but perhaps it wasn't HK _ perhaps Beijing?
Anyway, key point is that the main objection is from cell phone providers worried about network overload from roaming, handover and beaconing traffic rather than the specifics of where - two CS solutions are 1). predictive handover and 2). aggregate signaling (e.g. via a small "microcell" on the plane).

Friday, October 10, 2008

DCII 10.10.08

Digital Communications II starts today. I'm blogging links to sites about things I mention in passing.

Talking Telephone Numbers

Party Lines

Strowger

voltage of ringing - as asked: 90v AC, frequency 20-25Hz.
(28 volt I mentioned is the pulse in the exchange to drive the rotary electro-mechanical relay, not the ringing - and was a UK analogue post office standard.

Wednesday, October 08, 2008

what does "back of an envelope" mean....

...in an e-mail world? virtual envelopes are arbitrarily large (or small)...

indeed, what is a margin (where fermat might lack the space?) ?

Tuesday, October 07, 2008

very fine tv math!

the story of mathematics...

on the beeb, courtesy of du Sautoy of Oxford and the OU:

...is jolly good fun, but I can't help thinkin he keeps overclaimin
what is "fundamental" and "mathematics" rather than, say, natural sciences
but then I have to remind myself what he is tryin to achieve which wont do us
any harm....and take a look at this for good measure:

http://www.xkcd.com/435/

what I like is that du Sautoy goes just too fast (e.g. I watched the first episode with a smart 10 year old and he had to re-work through the Pythagoras proof by re-arrangement again afterwards to convince himself - this is good as it had the desired result (of making someone work through something again afterwards:)

I also like his unpretentious, but calmly enthusiastic presentation and lack of overly posh vowels:)

Thursday, October 02, 2008

A refined theory of wind in cambridge

in this continuing series on the problem of wind in Cambridge, and the fact that whatever direction you cycle in, it is against you, we apply the theory of complex dynamical systems, and topological manifolds to prove that, unless Cambridge is converted into a toroidal vertex, the problem is insurmountable, just like some of the bikes.

Essentially there is a set of subjects with a power law distrbution of popularity, and hence there is a distribution of people and rooms that have to be fit together across a set of spaces over time - aside from the known computationally harsh (technical term) problem of scheduling the rooms at all, there is a packing problem - now this means that there are always different numbers of people going in each direction from A->B, and then after a lecture from B->A. Consider then the time of events. One arranges for lectures to be in fixed length slots (for no other reason than otherwise the schedule would be temporally as well as spatially intractable) and yet this creates a set of waves of air across cambridge with fractal vorticies. Now consider the preferred size slot. Clearly there is a tendancy for people to arrive just in time (or just too late) for a lecture. This has a knock on (off) effect which re-enforces the pessimal slot length choice.

If it was an ecosystem (and we used some natural selection - e.g. based on energy left mapping to chances of passing) then we could fail more people, and find a more linear relationship between class size, room size and slot size and then easily solve the packing problem, but the system is essentially sufficiently large that the chances of this are zero.

In the next piece, I will look at whether combined oxygen and cash would be a solution to the market turmoils created by subprime loans.

Wednesday, October 01, 2008

WHat is the CS Programme?

it is the programme to determine if CS==Programming ?