On HTML Butchering

by
Annika Backstrom
in misc, on 11 July 2004. It is tagged and #Web.

Jen has a resume. She writes this resume in Microsoft Word, and Microsoft Word exports a butchered mockery of HTML. (Note: this HTML has been modified to protect non-Internet Explorer browsers.) Word does not encourage semantic markup. It encourages table-based layouts and makes it easy to put arbitrary styles on elements. Consider this example:

<tr style='height:5.55pt'>
    <td width="2%" valign=top style='width:2.66%;padding:0in 5.4pt 0in 5.4pt;
        height:5.55pt'><a name="_Hlk67893959"></a>
        <p class=Institution> </p>
    </td>
    <td style='border:none;padding:0in 0in 0in 0in' width="97%" colspan=2><p class='MsoNormal'> </td>
 </tr>

For those of you unfamiliar with HTML, that code produces something similar to this:


No, that's not a typo; it really creates a blank line on the page.

It took me a couple hours over the course of a few days to hand-create Jen's resume. The end result is 10,000 characters and 370 lines shorter, much more semantic, and does not use the character "n" in a dingbats font to simulate list bullets. Thanks, Microsoft.