异常图标导致转码失败

项目代码中有人使用iconv函数将utf8转成ucs2，但是没有对转换失败的流程做处理，产生现网bug。

了解后发现，iconv_open有个自带功能可能会解决。那就是在目标编码后面追加//IGNORE，可以忽略转换失败的部分。man手册中的解释是这样的：

iconv_t iconv_open(const char *tocode, const char *fromcode);
DESCRIPTION
       The  iconv_open() function allocates a conversion descriptor suitable for converting byte sequences from character encoding fromcode to character
       encoding tocode.

       The values permitted for fromcode and tocode and the supported combinations are system-dependent.  For the GNU C library,  the  permitted  values
       are listed by the iconv --list command, and all combinations of the listed values are supported.  Furthermore the GNU C library and the GNU libi-
       conv library support the following two suffixes:

       //TRANSLIT
              When the string "//TRANSLIT" is appended to tocode, transliteration is activated.  This means that when a character cannot be  represented
              in the target character set, it can be approximated through one or several similarly looking characters.

       //IGNORE
              When  the string "//IGNORE" is appended to tocode, characters that cannot be represented in the target character set will be silently dis-
              carded.

       The resulting conversion descriptor can be used with iconv(3) any number of times.  It remains valid until deallocated using iconv_close(3).

       A conversion descriptor contains a conversion state.  After creation using iconv_open(), the state is in the initial state.  Using iconv(3) modi-
       fies  the  descriptor’s  conversion  state.   (This implies that a conversion descriptor can not be used in multiple threads simultaneously.)  To
       bring the state back to the initial state, use iconv(3) with NULL as inbuf argument.

结果很无奈，异常图标过滤不了，比如火式样的图标。这网站竟然不支持这个图标，服了！

异常图标转成utf8时，占用4个字节，每个字节都在汉字的合法范围内，正则pass

最后使用utf8，汉字部分的编码特点解决：汉字占用的3字节分别为1110xxxx,10xxxxxx,10xxxxxx

异常图标导致转码失败

猜你喜欢