clog: 2016

Wednesday, November 30, 2016

Principles of Communications -- Michaelmas Term 2016....weak ate (to Nov 30)

This week, wra up with Ad hoc Net capacity&coding tricks
+ systems structures
+ Course Overview - including 2 missing pieces
1/ didnt cover shared media (as was done in 1b)
2/ didn't cover traffic engineering&signaling (rsvp) as most the principles already covered in other lectures earlier in term (open & closed loop control, optimisation and fibbing).

Wednesday, November 23, 2016

Principles of Communications -- Michaelmas Term 2016....week 7 (to Nov 25)

scheduling & switching last week and this week - some associated info:

sorting queues

huis clos shows up in multicores, routers, and data centers:-)

will not cover shared media, friday, as was done in 1b physical & dala link layer really nicely already, so revise that -

instead, will move on to ad hoc/mobile networking capacity *might talk about opportunistic networks and firechat too:-)

Friday, November 11, 2016

Principles of Communications -- Michaelmas Term 2016....week 5 (to Nov 11)

This week was feedback control, theory &
optimization (routing and congestion pricing)...a bit math/algebra/calculus heavy methinks (supervisions should work through one or two examples of a PID controller for different systems
and how you show stability&long term operating point) - next week, real world TCP, then scheduling.

Friday, November 04, 2016

Principles of Communications -- Michaelmas Term 2016....week 4 (to Nov 4)

Have covered Sticky Random Routing (DAR material here) + Network Coding for TCP

see wikipedia for Gaussian Elimination, + Linear Network Coding articles - best source/explanation I can find

And Open Loop Flow Control (including leaky bucket regulators/policiers)

Next week, closed loop/feedback control, and underlying theory for controller design and stability/efficiency analysis.

Friday, October 21, 2016

Principles of Communications -- Michaelmas Term 2016....week 3 (to Oct 21)

Done centralised/hybrid routing/fibbing -- as someone pointed out, forwarding continues if a central controler caches - depends on timeout in openflow added state/fib entries - with fibbing, the timeout will be whatever OSPF Or equiv does - so a comparison is potentially a bit more subtle than as presented.
Also, SDN/Openflow lets you add entries based on 5-tuple, whereas fibbing lets you add by destination only.....so potentially more fine grain choices in SDN, even if at the cost of more state - so the "prefix hijack" in fibbing is neat, but not the last word in adding custom routes...(e.g. if you wanted by source, needs more state than can be added by fake Link State Advert - as far as I can see)

rest of week was on BGP - why, what, how, where, when, and why not!

Friday, October 14, 2016

Principles of Communications -- Michaelmas Term 2016....

Finishing up end of week 2 (lecture 4) with Compact routing
having done background graphs & reminder of routing basics.

See the slides page for lecture material + links to papers with more detail if you want (where not covered in books -

I've also added some more pointers to background book like reading on the course materials page for people that want to read more around the area, as there's no specific single book that covers all the course materials.

ttfn

Friday, September 09, 2016

fairness, machine learning, versus optimal stopping and cognitive bias

There's a bunch of work in making sure that machine learning systems are, in some carefully defined sense, fair - see for example the MPI work by Krishna Gummadi, in removing biases in various ML use cases (e.g. gender as an explicit or implicit discriminator).

For me there's a really subtle problem here which links between this work and other problems of Optimal Stopping and Cognitive Biases, and how one choose to define fairness in ML and the feedback loop between this and human society and the views we take on each other.

So lets take two simple use cases:

1. Admissions to University and Gender

Imagine a Computer Science department has 100 applicants a month, over 3 months for 50 places and wants to pick the best 50 people. Naive use of Optimal stopping would say wait til you have 37% of the applicants (111 people), then pick. What if the population is drawn differently by gender - e.g. out of every 100 applicants, only 1 is female. Lets say this is because applicants are self selecting based on the position in the ability of their own sub-population.. You have about a 1/2 chance of having 0 women in the admissions. The feedback to the population in society is you have to be in the top 1% of female applicants, but in the top 18% of men. Assuming their isn't actually a gender basis for ability distribution. You've just built a system that re-enforces it. TO get out of this, you have to run a two-factor optimal stopping scheme. If you want to do this for other groups in society, it will get more complex too...

2. Stop&Search and Race

It may be the case that you stop and search people in safeguarding society by profiling individuals based on past cases of stopping and successfully apprehending miscreants. Lets say this leads to a higher probability of stopping people who "look middle eastern". Again there's a feedback loop between your "correct" but naive selection scheme, and how people behave - in this case, various cognitive biases in how society will regard the group you target, may lead to the group being marginalised, out of proportion to even your allegedly accurate statistical model. e.g. anchorism....or many others, will lead to over-weighting by society, especially since humans are risk averse.

Friday, August 26, 2016

sigcomm 2016 #2 - what worked well

a number of things about Sigcomm 2016 were really smooth, and i''d like to say what those were and why, for future reference

1/ a large number of volunteers did a meet/greet/arrange transport from the airport to the conference venue & hotels site - this was great - even delayed planes had a person with local knowledge 9of language/culture/taxi/etc) and so many panic moments were averted- for example there were two flights from north america which were disrupted but people still got met - also
several people were severely mis-advised by airlines that their checked bags would go through from the international to the regional flight (this isn't the case in any country in the world that I know of, but united and tam managed to tell people this, despite that security demands passengers and bags are reconciled per flight) - nevertheless, with some local help, bags were retrieved within a day....

2/ the conference venue has a LOT of rooms and is very conveniently laid out so that almost instant access to coffee/lunch areas, and between rooms was very very easy. - the space is pleasant, acoustics are good, audio/visual (mikes/PA/speakers) worked well (including for remote speakers and for Q&A) - there are 6 main rooms next to the catering area, plus the large auditoriam area up one floor, with large bathroom area next to both, too - the groundfloor rooms can be reconfigured to 3 larger rooms - this is necessary as the 5 days of Sigcomm these days include
multiple tutorials and workshops on the monday & friday, plus multiple other events like the student research, the topic preview sessions, mentoriing meetings, and various committtee (sigcomm exec, next year handover)....all rooms were used most the time...

3/ many sponsors attended and several had desks for info for possible employment etc, out in the large catering area...

4/ the conference banquet (tue) and student dinners (wed) were both fantastic events - the former was 8 minutes walk from the conference venue, so people could get bak to hotels at their own leisure - the latter was a bus ride away - coaches whisked us there and back in about a half hour, and that was probably rhe best meal I have ever had at a sigcomm conference. The reception on monday (in the main conference auditorium) was good.

5/ there were a couple of pretty good local restaurants at the venue for people looking for socialising on the other days (including award winners dinner, N2women dinner + for people arriving early, plus on the thursday nite)

6/ there was fairly seamless interaction between webmasters, and a/v team, so slides, papers, other access (e.g. to printers for boarding passes, for travel info/asking for taxis back to restaurants/airport etc) was all pretty painless

we had a day of wireless outage, which appeared to be on part of the internet not inside the conference venue site, but an engineer did come and fix it that day - this disrupted the live streaming (although we hope the recordings will still have worked and will soon be available via the ACM digital library links)

7/ the remote presentations were remarkably successful - this was because
a) the actual talk was a pre-recorded video, pre-shipped to us, so didn't depend on the net working well in realtime
b) most presenters had prepared lively talks with a super-imposed video of the full standing figure of the speaker, alongside the slide show
c) the talks all had a live Q&A (relying on skype/telephone call out as a backup of the internet was down) so there was little difference in terms of presentation between the remote presentation and local presentations in terms of human experience - indeed, several people commented that almost all the remote presentations were technically higher quality that most of the local ones....

8/ the size of the conference (approx 400 attendees) combined with the local relaxed culture was perhaps responsible for a very friendly atmosphere- I think this meant that it was a really fantastic experience for the (large number of) student attendees, giving many opportunities for mentoring moments and general exchange of ideas....a larger event might need slightly more formally organised mechanisms (certainly, last year's sigcomm in London with 700 attendees was a bit more of a "zoo").

9/ there was a lot of behind the scenes tech used to track all the organisation of things....this is available from the general chairs and other members of the organising committee (OC) on request

10/ the OC all carried out their tasks with incredible efficiency and timeliness. This matters as many of those tasks have dependencies (e.g. travel grant, visa letters, or tutorial/registration/registration, or PC paper shepherding/web site program) - there are a couple of race conditions, but we had fixes....which requires everyone to be responsive (i.e. very day) and responsible....

thats all for now, folks...

Thursday, August 25, 2016

sigcomm 2016 - so long & thanks for all the fishy behaviour

Sigcomm 2016 for me

As general chair, I felt I'd have to attend Sigcomm in Brazil, even though I
had a co-chair who is local, in fact not least to give moral support for him.

However, for me, August involves my family holiday typically, and this year was no different.
So we'd booked a large villa in southwest france for 2 weeks for up to 20 people, so that the extended family (from
UK, Ireland, North America & Kenya and anyone else who wanted to drop in on their travels) could all be there.

So one week in, I had to head for the conference, taking a couple of the family with to get 1 to Berlin, 1 back to
london, (another couple were heading the other way from London to France at the same time...

Montpelier->Floreanopolis took about 30 hours (with a very good flight from LATAM for 830$ connecting in Sao Paolo
with only a 3 hour connection). I met a couple of people on the last 1 hour flight who had come from Beijing (one
poster & one paper author), who's journey was also about 30 hours, plus a couple of Europeans who had a journey of
about 15 hours - like mine would have been had I not been on annual vacation.

So while the conference, as a social event, and a technical piece of my day job, is really excellent (much
gratitude to the superb Brazilian hosts!), I missed a whole week of seeing my extended family, including some of them who
I only see then, but won't til next year now.

So it is disappointing that a number of people who had papers to present (note, I didn't), did not attend. On papers
with an average of 5+ authors, they couldn't find one person to travel and present. The excuse was the Zika
outbreak. There is more Zika in some US locations than in the conference location, but hey, who expects everyone to
be rational. It seems also that many of these authors were connected with papers with Microsoft authors. Microsoft
were not a sponsor of the conference either (they have been in the past), despite having an author on 20% of papers
in the conference, and making a thing of this on social media. It seems that the conference has value as a place to
get visibility for work of the company or student interns at the company, but not enough to have a presence (not
even a recruiting desk at an event where there are around 200 junior researchers in networking, one assumes many
of who are looking for interesting paces of employment next).

So for the first time we've allowed some remote presentations
at SIGCOMM - we had one live one in at the NetPL on day 1 because a
speaker was held up by a plane failure and so managed to Skype in,
but we had two on day 2 in the main conference paper sessions, which were
planned. Authors unable to attend ahead of time, sent in a canned video of
their talk, then we skyped them in after for Q&A

The biggest problem with this is non-technical - its to do with the loss of community
building opportunities based in hallway conversations triggered by the talk or other
things th speaker/authors may have done that are of interest to attendees - this loss
is small for a small number of remote presentations, and in the case the remote
presenter is a student, probably worse for them than for conference physical attendees.
The loss is larger for the conference if the remote speaker is an experienced person
who might act as a mentor or offer useful feedback on other presentations,
live, in the other Q&A, or in hallways etc, if only they could have attended.
one simple example - the authors of the 2nd paper on the 1st main paper session day
differential provenance could interwork with authors of a paper on "light in the middle of the
tunnel" in hottmiddlebox on friday.

There's no advatnage to the primary author giving the remote presentation (rather than either anyone
else, or anyone else coming to present it in person) because no-one at the conference gets to
meet them anyhow, so they don't enhance theoir career any more than writing a Tech Report or putting a paper on
ArXiv with a video.

There was some care taken (at extra cost to the conference organising committees) to get
decent videos and have them present (and esp. attention to audio both for speaker and for
Q&A with the remote virtual attendee).

However, 3 semi-technical problems became obvious in the first 2 talks
1/ the speaker is canned - they can't adapt to audience attention, they can't re-pace
based on level of engagement, they can't change their presentation to account for other people's talks
or reference another talk where there's common ideas or differences - the speaker can't interact with the slides,
even pointing at axes to explain scales, or interesting features of a curve/anomalies, outliers etc....
2/ there's no obvious way in this model, to ask a speaker to "go back to slide 5" in the Q&A
3/ having a human figure in the projection who is larger than life (as would appear on stage) is an elementary
HCI fail.

On the 2nd&3rd day of the main conference we had quite a few more remote presentations. While they continued to be
well prepared, and the illusion of having the speaker in the room continued to be maintained by having a Skype
capability for Q&A at the end of each canned talk, the number was really stretching the credulity and patience of
many of us that out of 30+ distinct authors across that set of papers, 0 could get here. This continued on to
trying to have a handover meeting where the only people from 2017 able to be physically here on the lunchtime of
the last day of the main conference were people who were involved in the 2016 conference anyhow.

The event was no more difficult to get to than many past conferences, nor are there more real (rather than
perceived) risks about the location than many past locations.
[In fact, I attended last year in london to go to the handover meeting to learn the tasks required of us, so I
mssed vacation then too]

A lot of people went out of their way to make this a successful event, despite the lack of full engagement by many
people who obviously assume Sigcomm is worth submitting papers to for their career or their employers visibility,
but don't buy into the community idea. That's sadly shortsighted of them. They will be perceived as places less
interesting to go work for compared to those places that had a presence. Sad, because it has been incredible fun here and the local community showed up massively supportive, in huge numbers, and got a huge amount out of things. More loss for those who didn't make it here from north of the equator.

For those of us who took time out from valuable family life, took a lot of care about re-locating the conference to
deal with the public health issues with the original venue, it is doubly disappointing that there are people in our
profession who don't share our view of what the nature of the event should be. It is very unlikely that I shall
bother attending again, or consider being on the PC if asked. I dont care to work for people who don't care.

Saturday, January 30, 2016

unikernels & production

A recent blog called into question the fitness of unikernels for production. The title was a bit misleading as there are several unikernel systems out there. some of which are actually in production - one of our faves is the NEC/Bucharest Uni work on ClickOS, for example, which is used for NFV on switches and is clearly a class act.

However, I think the article is also missing some of the main motives behind MirageOS (see e.g. Jitsu or the asplos paper) which was based in experiences with managing a lot of Xen based cloud systems - sure, Unikernels are specialised, and don't possess a lot of the micro-management/debugging tools (yet, although a lot are on the way) that you have for kernel debugging or system tracing of linux etc etc. But that's because OCaml real world experience in production was that you have faster system creation, and way faster debugging times. However, that's still not the whole story- the story is that the whole toolchain for managing source, building a unikernel, deploying it and tracing it is much more homogeneous - so a whole system of unikernels is easier to manage (as per previous experience).

Crucially, we are also able to verify some components of the MirageOS (e.g. Peter Sewell's group in cambridge did this (for some definition of "this") a while back for the TCP/IP stack, plus confidence about David and Hannes TLS implementation can be quite a bit higher than the "industry standard" that had 65 vulnerabilities in one year alone.

But all this is missing yet another key factor - unikernels don't replace xen/linux or containers - they play side-by-side with them, so you can have flexibility and familiarity, while affording better protection - that's in the Jitsu paper btw, and I thought was fairly clear.

Sure there's some way to go - there always is -there was when Xen first shipped too. But the computer science behind this is not that bleeding edge (nor were VMs back in Xensource's day either:-), but the science is 15 years further on, and we should all benefit from that, in my opinion. Indeed, it took a day to add profiling

xkcd has us in there thrice

Wednesday, January 27, 2016

readings in computer science found in a time capsule

recently, I was re-reading the classic old article on Smashing the stack for fun and profit in phrack, and wondering where the positive alternative lesson might be. Quite a while ago, I did a port of the SR(Synchronized Resources) programming language out of Arizona, to the newly minted RS6000 system out of IBM. The language is an elegant system for teaching principles of concurrency in lots of nice ways - at the time we didnt have a single agreed way to do it (e.g. wrong like in C11, or possibly ok in Java), so there was a need for a pedagogic approach and SR together with a nice book was very cool. However, we'd just bought a bunch of new IBM AIX systems with the new POWER PC RISC processor, which had a whole new instruction set - lets leave aside the amising idea of a "reduced" or "regular" instruction set that includes "floating point multiple and add" in its portfolio. However, in terms of registers, its a pretty nice system.

So why is porting SR to the Power CPU reminiscent of stack smashing, I hear you cry?
Because, you need to implement the threads system it uses to emulate real multi-core, I hear myself answer. And what does it take to do that, you continue? well you need to save the current context (like all the registers and thread state (i.e. PC, stack) move to a different thread/stack by calling the scheduler with a pointer to that context. To do this involves becoming familiar with the stack format used for most programming languages.as well as all the important (i.e. all) registers etc. So basically, slightly more than the Phrack folks....basically, writing stuff that moves to another exection context, but can get back again correctly, is harder:-)
You can see some nice examples of the context switch stuff for a variety of processors in the (not supported) SR archive

Meanwhile, I was also reading about the integrity check used for DNSSEC caches. so that will be reported elsewhere, but the interesting thing is that its a weak version of the IP header (and TCP) checksum algorithm. Again this is something that excercises computer science #101 - you need to add up all the 16 bit fields in the buffer (imagine you have a n byte buffer, then if n is odd, add a zero value byte and sum it as a vector of 16 bit values using "end round carry" (or ones-complement) artithmentic - basically, you have a 32 bit accumulator, and loop over the buffer. when you are done, you do one more thing - in the checksum case: check any bits above the 16 least sig are set, and fold them in (add again) and then do one more add in case that overflows too. In the integrity check case, just 0 any bits in the more significant 16 bits (i.e. && accumulator with 0x0ffff). For the ARM IP cksum case, see this
code with inline asm - the crucial bit's lines 78&79 after the bne loop.

amusing, back in the early days, i remember someone loop unrolling asm code for hte M68000 (that's not a 68020 or 68010, but 68000 - that was sold as a "Codata" computer in the uk, but was a Sun-1, i think, never sold by Sun) - the code went slower as the loop was now to big to fit in the miniscule instruction cache of said CPU...

hum...........what could possibly go oddly wrong with the allegedly simpler (by 1 instruction) integrity check algorithm? I leave that as an exercise for the coder

Thursday, January 07, 2016

counciling the UK research councils

The UK research councils are unique in the way proposals are reviewed in several ways (in my experience)

firstly, unlike almost any other research funding agency (unlike DARPA, NSF in USA, CNRS in france, DFG in germany, the Japanese, scandinavian, and just about any other national funding agency I have dine reviewing for, which is a lot), the officers are not seconded experts so the assignment of reviewers depends on the reviewers self description - notoriously inaccurate. Unlike a journal (where the editor is an expert) there's not really a _peer_ review assignment process

secondly, the reviews can be rebutted by the proposer, but since the reviewers don't see each others reviews, they can't be calibrated against each other (unlike a conference or journal)

thirdly, the panel are not the reviewers and are not allowed to re-review the proposal, even if they are experts and only in exceptional circumstances will they discount an obviously incompetent or inappropriate review.

As a recipient of significant funding from the research councils, I am not expressing this through some sour grapes emotion, but more on behalf of my bewildered junior colleagues, who frequently receive inexplicably odd reviews and panel decisions. This is not good for community trust in the system - it may be ok at the obvious top 5-10% of proposals, but it results effectively in random decisions quite shortly after the very top (all 6s) ranked research. This is not good for confidence.

I can't give examples as that would be a breach of confidentiality, but everyone I know can tell a tale.

Maybe the new system after the recent review will involve people who have an answer to why the EPSRC and other councils should have a unique, and uniquely odd system. I have never heard an evidence based response to the comments above, which I have made several times to officials from the research councils. of course, since they themselves are not seconded from the community, how would they know, in any case. However, they could try talking to colleagues in other countries a bit more and see what works (or not) to persuade us that this is not just some random "we do it this way because we always have done"...