UTF-16的编码转换函数（Python实现）

Example UTF-16 encoding procedure

The character at code point U+64321 (hexadecimal) is to be encoded in UTF-16. Since it is above U+FFFF, it must be encoded with a surrogate pair, as follows:

v  = 0x64321
v′ = v - 0x10000
   = 0x54321
   = 0101 0100 0011 0010 0001

vh = 0101010000 // higher 10 bits of v′
vl = 1100100001 // lower  10 bits of v′
w1 = 0xD800 // the resulting 1st word is initialized with the high bits
w2 = 0xDC00 // the resulting 2nd word is initialized with the low bits

w1 = w1 | vh
   = 1101 1000 0000 0000 |
            01 0101 0000
   = 1101 1001 0101 0000
   = 0xD950

w2 = w2 | vl
   = 1101 1100 0000 0000 |
            11 0010 0001
   = 1101 1111 0010 0001
   = 0xDF21

The correct UTF-16 encoding for this character is thus the following word sequence:

0xD950 0xDF21

Python实现如下：

def EncodeUTF16(u):

vc = u - 0x10000

vh = (vc & 0xFFC00) >>10

vl = vc & 0x3FF

w1 = 0xD800

w2 = 0xDC00

w1 = w1 | vh

w2 = w2 | vl

return w1,w2

0x1D300

0xd834 0xdf00

0x1D301

0xd834 0xdf01

0x1D302

0xd834 0xdf02

00000002h: 34 D8 00 DF 34 D8 01 DF 34 D8 02 DF

UTF-16模式保存后，前面加上了BOM,FF FE用于表示小头机，再换回文本模式你就能看到

Posted By 九天雁翎 at 九天雁翎的博客 on 2009年03月01日