I just stumbled across some pretty cool project at the junction of linguistics and IT. Of course, again, it concerns Chinese language processing …
The Character Description Language‘s aim is to provide a description language for Han ideographs. The project seems to be well-organized, and they have captured 56k CJK characters, including all from the BMP.
This data would probably be very useful for developing an Input Method Engine using a graphic tablet, or showing the decomposition of characters into their constituent parts. Alas, I have as of yet not been able to find the database – is it commercial stuff (namely, Wenlin) safely locked away from the interested public? That would really be a pity …