Codewarrior Optimization

I guess I’m spoiled by my experiences with Borland compilers, but I learned today that the Metrowerks optimizer doesn’t optimize as thoroughly as one would expect.

MWinDrawAlignChars is a good example of this. This is a simple function in my MorePalmOS library that outputs text with a specific alignment. (_require, in case you’re curious, is a simple macro that tests the first argument. If it is zero, it goes to the second argument. In debug builds, it also shows a warning that an unexpected error has occured. Simple enough. I also have a _reject, which performs the action if the argument is non-zero.)

Here’s the source:

Coord MWinDrawAlignChars( const char *str, UInt32 len, Coord x, Coord y,
       Alignment align )
{
   Coord strWidth;
   _require( str, noString );
   _require( len, noString );
   strWidth = FntCharsWidth( str, len );
   if ( align != Left )
   {
       x -= ( align == Center ) ? ( strWidth >> 1 ) : strWidth;
   }
   WinPaintChars( str, len, x, y );
   return x + strWidth;
noString:
   return x;
}

I’m going to focus on the last three lines. Here’s the code it generates:

    return x + strWidth;
noString:
0000004E: 3004               move.w    d4,d0
00000050: D043               add.w     d3,d0
00000052: 4FEF 000A          lea       10(a7),a7
00000056: 6002               bra.s     *+4            ; 0x0000005a
   return x;
00000058: 3004               move.w    d4,d0

Now let’s try some looking at an alternate ending. Exactly the same thing is accomplished.

   WinPaintChars( str, len, x, y );
   x += strWidth;
noString:
   return x;
}

Again, let’s focus on the last three lines:

    x += strWidth;
noString:
0000004E: D843               add.w     d3,d4
00000050: 4FEF 000A          lea       10(a7),a7
   return x;
00000054: 3004               move.w    d4,d0

That’s a savings of four bytes! Why?

  • Because we haven’t interrupted program flow, we save two bytes on the short branch. (The branch is to some stack cleanup code.)
  • Because we’re incrementing x directly, we don’t need a temporary register to store the value. This saves two bytes on not moving the original value of x to a temporary register.

Codewarrior didn’t do anything terribly flawed the first time. It gave us exactly the code we asked for. Borland’s compilers would have seen what we were trying to accomplish and done it in a better way. Neither is more right than the other, but it shows you need to keep aware of code generation.

I suppose this also explains why PalmSource’s event handling code usually sets a handled local variable and returns it at the end, rather than returning true and false immediately as would seem to be more efficient.

Leave a Reply