The last couple o’ days – firefox users might have noticed – some script of my Wordpress-installation was saved as utf-8 identified by the BOM. The BOM is a sequence of bytes that may precede the actual content of a file to indicate that the file is stored in utf-8 and specifies in which order the bytes of the encoding are represented. BOM is short for byte order mark
. The BOM for utf-8 is EF BB BF. utf-16 and utf-32 differentiate between little and big endian.
By accident I’ve stored one of the scripts that gets included by index.php as utf-8 and enabled identification via BOM. The data is sent in us-ascii-encoding, though. So the BOM is not recognized as such and displayed as characters. Well, actually only Firefox does this. Opera does not display the characters – which might be considered a bug in Opera. After saving the affected script in ANSI, it now works again. By ‘work’ I also mean that it validates again. Because the BOM invalidated the page. That’s the power of encodings.
More on encoding, utf and unicode can be found at The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
Appendix
I’ve collected some more links in the DenkzeitWiki on page UniCode.





Post a Comment