Archive for December, 2004

The bite of the rabid Open Source advocate

Believing in Open Source is one thing — I’ve contributed code to LGPL and GPL (PILRC) projects, and I’ve even started a BOOST project (MorePalmOS — the BOST license is similiar to BSD, but without the need for credit). But somehow, believing that the license that code is distributed under should be respected makes me an enemy of rabid Open Source advocates.

I’ve spent over 1,000 hours on my current product. If you value my time at BC’s minimum wage, that’s $8,000 invested so far — counting only raw labour. (A more fair price would include at least part of my PC’s cost, the development software I had to buy specifically for this project since the Open Source tools were not adequate, and the hardware I had to purchase specifically for this project. Plus I’d be earning more than minimum wage!) By the end of this project, I expect over 10,000 hours of my effort to be focused on this project. That does not include testing, graphics work or marketing.

These people would have you believe that they should be legally free to not only use my product without paying for it, but also to offer it to other users for free or even sell it to other users.

Today, I logged in to find that three people on slashdot had declared me “foe” for daring to suggest that stealing software was still stealing, and that I had the right to decide who can and can not use my proprietary work. Also, checking my web site’s access logs revealed someone had tried to break in — something that hasn’t happened before. Coincidence? I don’t really think so.

So today’s lesson: Don’t dare suggest on Slashdot that anyone’s work product should be property of the creator.

(I should note that I don’t believe this of the majority of Open Source advocates, or even the majority of Slashdot users. I am one myself! But there seems to be a major problem in this world with poorly-raised children growing into stupid adults and thinking they own everything.)

Algorithms vs. ARMlets

The last few days I’ve been working on a set of polygon routines. (Don’t bother asking for them — if I get clearance to put them up, I will.)

The initial algorithm used floating point math. I timed it, but did not write down the time. I converted it to pure integer and timed it — it was about 5 seconds. I wasn’t happy with the results, though… the rounding errors looked bad.

“Why not make it an ARMlet?” you are probably asking. Well, I realized the algorithm needed some tuning first.

So I changed it back to floating point and started optimizing. First, I preflighted the calculations that needed to be done only once per vertex, instead of once per vertex per screen row or once per vertex per pixel. Next, I moved the calculations that only had to be done once per screen row. I discovered to my surprise that I had no floating point calculations per pixel anymore, despite still having full floating-point accuracy. I fixed up what I had and started counting instructions, then started counting division or multiplication instructions. Finally, I had the code about as good as I could get it without dipping into assembler.

Finally, I converted it to an ARMlet. (As an aside: the Metrowerks ARM tools are painfully buggy. Just selecting them in the Linker panel starts Codewarrior crashing constantly. Because of this, it took much longer than it should have to finish the conversion.) When I was finished, the resulting code took 0.04 seconds to run, and fully tuned 680×0 code took 0.29 seconds to run: the 680×0 code took 7.25 times longer.

On a whim, I decided to paste in my integer only code. (I couldn’t find the original floating point code I had devised anymore — I didn’t bother comitting it to source control once I got it working since it as too slow to keep.) I kept the more efficient loops, only replacing the core calculation. The result took 2.1 seconds in 68k, and 0.83 seconds for the ARMlet. That means that if I had tuned only the loops and converted to an ARMlet, the code would be roughly 20 times slower than it is now. The untuned ARM code takes 3 times more time than the tuned 68k code.

Granted, the tuning took more effort as I had to think about it more. But if I had to do just one, I was much better off with the tuning.

Rendering Performance
code time (s)
original integer in 68k 2.10
original integer in ARM 0.83
optimized floating-point in 68k 0.29
optimized floating-point in ARM 0.04

(All times are on a Tungsten T3.)

Codewarrior Command Line Compiler

Try compiling this code with the Codewarrior Palm 9 command-line tools (which were a big selling point of the upgrade):

class CTest
{
public:
    CTest( CNonExistentClass* obj )
    {
    };
    virtual ~CTest ( )
    {
    };
};

This will cause mwc68k to crash with an access fault.

Update: On codewarrior.palm, Ben Combee (former Palm development lead at Metrowerks) confirmed the bug.

It seems the 9.3 command line compilers are totally borked. Metrowerks has known about it since at least February 2004. The 9.2 ones work. Except, of course, that the code generated by 9.0, 9.1 and 9.2 is completely unusable. In other words, Metrowerks never delivered one of their key features of Codewarrior Palm v9.

Block Overruns

The Palm heap is stored as a linked list of headers. Between those headers are chunks, either allocated or free. When you allocate a chunk, the memory manager finds a free chunk. If it is exactly the right size, it’s returned. If not, the memory manager breaks it into two pieces — your chunk, and the remaining free chunk. Each of these chunks has a header. When you release a chunk, the memory manger checks the chunks before and after. If they’re free, the memory manager merges them together and one of the headers is abandoned.

Thus, there will always be a header before and after your block. If you overwrite your block, you will stomp on one of the control headers and have a corrupt heap. A corrupt heap causes the memory manager to do unexpected things, and you’re pretty much doomed at that point.

The Palm debug ROM is supposed to guard against this and show an error when it occurs, but in some cases this seems to fail. Learn to recognize the symptoms — mostly, deep recursion and other weird behaviour deep in the memory manager — and you’ll be better off. And consider checking the heap in the debug target of your application before and after you use a complicated function to write to memory. I plan on adding some to my application tomorrow morning.

Palm on Linux

So, today PalmSource announced a goal of running Palm OS applications on Linux devices.

I think people expecting a Palm OS module they can install on a Linux handheld are going to be disappointed. And I think people expecting PalmSource to open up a lot more of their tools to be disappointed.

This doesn’t seem to be as much about making Palm OS a linux application layer as it is about making Palm OS more portable across platforms. Hardware vendors will undoubtedly have to do customization to make Palm OS on their hardware, it’s just that with this new plan it will be possible to get Palm OS working on more hardware.

I’ve been trying to think of an analogy for this, but have been unable to. Mac OS X threw away their old kernel, and PalmSource isn’t doing that and shouldn’t. The best analogy might be the cross-platform NextStep APIs, but it seems PalmSource plans to port their visual look as well. (Good idea!)

On the plus side, this could be big news for developers willing and able to write vertical market applications. We’re going to see a huge jump in the variety of handhelds running Palm OS over the next few years, including some mobile devices we don’t even consider handhelds.

Hopefully, compile farms will be made available, or PalmSource will update the tools a lot… I know I don’t want to be maintaining a half dozen or more targets with current tools.

I’d like to see a return to some of PalmSource’s previous platform-agnosticism, and I’d like to be able to recompile PalmSource’s tools for hosts other than Windows. But I don’t expect to see either of these.

Still, just the increase in variety will be nice.

Codewarrior Optimization

I guess I’m spoiled by my experiences with Borland compilers, but I learned today that the Metrowerks optimizer doesn’t optimize as thoroughly as one would expect.

MWinDrawAlignChars is a good example of this. This is a simple function in my MorePalmOS library that outputs text with a specific alignment. (_require, in case you’re curious, is a simple macro that tests the first argument. If it is zero, it goes to the second argument. In debug builds, it also shows a warning that an unexpected error has occured. Simple enough. I also have a _reject, which performs the action if the argument is non-zero.)

Here’s the source:

Coord MWinDrawAlignChars( const char *str, UInt32 len, Coord x, Coord y,
       Alignment align )
{
   Coord strWidth;
   _require( str, noString );
   _require( len, noString );
   strWidth = FntCharsWidth( str, len );
   if ( align != Left )
   {
       x -= ( align == Center ) ? ( strWidth >> 1 ) : strWidth;
   }
   WinPaintChars( str, len, x, y );
   return x + strWidth;
noString:
   return x;
}

I’m going to focus on the last three lines. Here’s the code it generates:

    return x + strWidth;
noString:
0000004E: 3004               move.w    d4,d0
00000050: D043               add.w     d3,d0
00000052: 4FEF 000A          lea       10(a7),a7
00000056: 6002               bra.s     *+4            ; 0x0000005a
   return x;
00000058: 3004               move.w    d4,d0

Now let’s try some looking at an alternate ending. Exactly the same thing is accomplished.

   WinPaintChars( str, len, x, y );
   x += strWidth;
noString:
   return x;
}

Again, let’s focus on the last three lines:

    x += strWidth;
noString:
0000004E: D843               add.w     d3,d4
00000050: 4FEF 000A          lea       10(a7),a7
   return x;
00000054: 3004               move.w    d4,d0

That’s a savings of four bytes! Why?

  • Because we haven’t interrupted program flow, we save two bytes on the short branch. (The branch is to some stack cleanup code.)
  • Because we’re incrementing x directly, we don’t need a temporary register to store the value. This saves two bytes on not moving the original value of x to a temporary register.

Codewarrior didn’t do anything terribly flawed the first time. It gave us exactly the code we asked for. Borland’s compilers would have seen what we were trying to accomplish and done it in a better way. Neither is more right than the other, but it shows you need to keep aware of code generation.

I suppose this also explains why PalmSource’s event handling code usually sets a handled local variable and returns it at the end, rather than returning true and false immediately as would seem to be more efficient.

MorePalmOS gets a new license

I’ve switched MorePalmOS to the BOOST license. This license allows object code to be distributed without a copyright notice in the docs/about box.

It amazes me how many things there are that could be included in MorePalmOS. Sooner or later I’m going to have to announce this to the world…

The latest addition is a technique for getting the current application’s signature. Unfortunately, there’s a small hole in the documentation and I’m not entirely sure it works from a notification on all devices. It would be nice to know before I announce this library.

Codewarrior Pro 9

I received my copy of Metrowerks CodeWarrior for Palm OSĀ® Platform Development Studio, Professional Edition, version 9, yesterday. (Yeesh! What a name!) The technology behind it looks good. So far, my only major complaints are Windows-isms in it, such as not being able to rebind the key used to cycle between open windows in the same application. It has the feel of a last kick at the can to it, somehow.

But thanks to Ben Combee’s BCTextUtils, I have a yank line key at least!