Display Japanese Characters with UTF8

Japanese characters consist of Hiragana, Katakana and Kanji symbols. The easiest way to display them on a web page is to use the UTF8 character set.

UTF8 allows non-ascii characters and ascii to be mixed on the page. Ascii character codes all lie in the range 0-127 which only requires 7 bits to encode the characters. UTF8 uses the 8th bit for all it's codes to allow mixing with ASCII.

The Japanese character sets are mapped into number ranges as follows:

The UTF-8 codes are generated from the character number by mapping bits to a template as follows:

For 16 bit numbers the UTF-8 code is: 1110xxxx 10xxxxxx 10xxxxxx (the 16 x's represent the 16 bits of the character number).
There are other templates for other bit lengths. But most japanese characters seem to need 16 bits. So Japanese UTF-8 codes will tend to be 3 bytes long starting with E3h, E5h and E9h.

To get the characters to display on the web page, you need to add the line:
<meta http-equiv="Content-Type" content="text/html; charset=UTF8">
after <head> in your HTML.

J-Eye Home - Discussion Forum

Related Links