recently, I was re-reading the classic old article on Smashing the stack for fun and profit in phrack, and wondering where the positive alternative lesson might be. Quite a while ago, I did a port of the SR(Synchronized Resources) programming language out of Arizona, to the newly minted RS6000 system out of IBM. The language is an elegant system for teaching principles of concurrency in lots of nice ways - at the time we didnt have a single agreed way to do it (e.g. wrong like in C11, or possibly ok in Java), so there was a need for a pedagogic approach and SR together with a nice book was very cool. However, we'd just bought a bunch of new IBM AIX systems with the new POWER PC RISC processor, which had a whole new instruction set - lets leave aside the amising idea of a "reduced" or "regular" instruction set that includes "floating point multiple and add" in its portfolio. However, in terms of registers, its a pretty nice system.
So why is porting SR to the Power CPU reminiscent of stack smashing, I hear you cry?
Because, you need to implement the threads system it uses to emulate real multi-core, I hear myself answer. And what does it take to do that, you continue? well you need to save the current context (like all the registers and thread state (i.e. PC, stack) move to a different thread/stack by calling the scheduler with a pointer to that context. To do this involves becoming familiar with the stack format used for most programming languages.as well as all the important (i.e. all) registers etc. So basically, slightly more than the Phrack folks....basically, writing stuff that moves to another exection context, but can get back again correctly, is harder:-)
You can see some nice examples of the context switch stuff for a variety of processors in the (not supported) SR archive
Meanwhile, I was also reading about the integrity check used for DNSSEC caches. so that will be reported elsewhere, but the interesting thing is that its a weak version of the IP header (and TCP) checksum algorithm. Again this is something that excercises computer science #101 - you need to add up all the 16 bit fields in the buffer (imagine you have a n byte buffer, then if n is odd, add a zero value byte and sum it as a vector of 16 bit values using "end round carry" (or ones-complement) artithmentic - basically, you have a 32 bit accumulator, and loop over the buffer. when you are done, you do one more thing - in the checksum case: check any bits above the 16 least sig are set, and fold them in (add again) and then do one more add in case that overflows too. In the integrity check case, just 0 any bits in the more significant 16 bits (i.e. && accumulator with 0x0ffff). For the ARM IP cksum case, see this
code with inline asm - the crucial bit's lines 78&79 after the bne loop.
amusing, back in the early days, i remember someone loop unrolling asm code for hte M68000 (that's not a 68020 or 68010, but 68000 - that was sold as a "Codata" computer in the uk, but was a Sun-1, i think, never sold by Sun) - the code went slower as the loop was now to big to fit in the miniscule instruction cache of said CPU...
hum...........what could possibly go oddly wrong with the allegedly simpler (by 1 instruction) integrity check algorithm? I leave that as an exercise for the coder
So why is porting SR to the Power CPU reminiscent of stack smashing, I hear you cry?
Because, you need to implement the threads system it uses to emulate real multi-core, I hear myself answer. And what does it take to do that, you continue? well you need to save the current context (like all the registers and thread state (i.e. PC, stack) move to a different thread/stack by calling the scheduler with a pointer to that context. To do this involves becoming familiar with the stack format used for most programming languages.as well as all the important (i.e. all) registers etc. So basically, slightly more than the Phrack folks....basically, writing stuff that moves to another exection context, but can get back again correctly, is harder:-)
You can see some nice examples of the context switch stuff for a variety of processors in the (not supported) SR archive
Meanwhile, I was also reading about the integrity check used for DNSSEC caches. so that will be reported elsewhere, but the interesting thing is that its a weak version of the IP header (and TCP) checksum algorithm. Again this is something that excercises computer science #101 - you need to add up all the 16 bit fields in the buffer (imagine you have a n byte buffer, then if n is odd, add a zero value byte and sum it as a vector of 16 bit values using "end round carry" (or ones-complement) artithmentic - basically, you have a 32 bit accumulator, and loop over the buffer. when you are done, you do one more thing - in the checksum case: check any bits above the 16 least sig are set, and fold them in (add again) and then do one more add in case that overflows too. In the integrity check case, just 0 any bits in the more significant 16 bits (i.e. && accumulator with 0x0ffff). For the ARM IP cksum case, see this
code with inline asm - the crucial bit's lines 78&79 after the bne loop.
amusing, back in the early days, i remember someone loop unrolling asm code for hte M68000 (that's not a 68020 or 68010, but 68000 - that was sold as a "Codata" computer in the uk, but was a Sun-1, i think, never sold by Sun) - the code went slower as the loop was now to big to fit in the miniscule instruction cache of said CPU...
hum...........what could possibly go oddly wrong with the allegedly simpler (by 1 instruction) integrity check algorithm? I leave that as an exercise for the coder
No comments:
Post a Comment