场景:
python获取到网页,把网页gzip打包,并Base64编码保存; 由java负责Base64解码并解压二进制成html
遇到的问题:
1、python 的request,缺省就把gzip响应包解压了,导致有一次损耗(解压的html再次压缩),时间关系,没有解决这个问题
2、python错误理解了二进制转base64,将二进制变成字符串后才base64编码,就没有起到压缩的意义,报文大小还更大了。
3、html有GB2312 GBK的, 如果希望 后端处理轻松,需要在获取是就转义成UTF8,保证内容编码的一致性
4、python没有做一次完整压缩编码, 和 解码解压的流程,验证html是否真的可以还原。 需要自我验证一次闭环操作
def test(): req = requests.get('http://www.xxx.com.cn') req.encoding = req.apparent_encoding try: www = req.content.decode("gbk").encode("utf-8") except(UnicodeDecodeError, )as e: www = req.content.decode("gb2312").encode("utf-8") import gzip ww = gzip.compress(www) base64_encrypt = base64.b64encode(ww) result = base64_encrypt.decode() print(result) # 解 result2 = base64.b64decode(result) qq = gzip.decompress(result2) ee = qq.decode(encoding="utf-8") print(ee)
后端采用java处理html:
1、java解压出现报错,第一时间,没用简单的英文字符串先验证解压算法的正确性
2、java解压中文html报错:java.io.EOFException: Unexpected end of ZLIB input stream ,网上解决方案很多不靠谱,最后确认原因是 二进制流处理上要保护最后一次空读
参考源: https://blog.csdn.net/qwfylwc/article/details/54580502
3、java解压代码从网上copy,大多数是对文件流的处理,没有对字符串的处理,可借鉴性少。
分享的代码里
1、验证了英文压缩编码的 解码解压正确性(str1)
2、验证了中文html的 解码解压正确性(str 有截断,占用空间大了点)
package test; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.EOFException; import java.io.IOException; import java.util.zip.GZIPInputStream; import java.util.zip.GZIPOutputStream; import org.apache.commons.codec.binary.*; public class basegziptest { /** * @param str * @return 解压缩 */ public static String uncompress(String str) { if (str == null || str.length() == 0) { return str; } ByteArrayOutputStream out = null; ByteArrayInputStream in = null; GZIPInputStream gzip = null; String uncompress = ""; try { out = new ByteArrayOutputStream(); new Base64(); // 这里增加base64解码 byte[] compressed = Base64.decodeBase64(str); in = new ByteArrayInputStream(compressed); byte[] buffer = new byte[1024]; gzip = new GZIPInputStream(in); while(true) { int offset = -1; try{offset = gzip.read(buffer);}catch(EOFException ex){} if(offset!=-1){ out.write(buffer, 0, offset); }else{ break; } } uncompress = out.toString("UTF-8"); } catch (IOException e) { e.printStackTrace(); } finally { if (null != gzip) { try { gzip.close(); } catch (IOException e) { e.printStackTrace(); } } if (null != in) { try { in.close(); } catch (IOException e) { e.printStackTrace(); } } if (null != out) { try { out.close(); } catch (IOException e) { e.printStackTrace(); } } } return uncompress; } public static void main(String[] args) throws IOException { String str1 = "H4sIACiZFV4C/yssTy0qqSzNzC/Iyc7KSE9LKUQXKE6sqkhOLkvKyx2uUggWAPPOXiQOAQAA"; String str = "H4sIALimFV4C/91a63PbVBb/DDP8D14xy8B4iGRJlm2Is+PI8it+W3Zsf+noaSnWw5bk56eWpaWldNNM2QbaAi2wdBdoSpdHk6aFf8ayk0/8C3v9SOqkKU2XpGRX47F0r+49Ouf3u+ecq3v1ysuvvDz7p2CKpEtpyhGhE3FHOj8fj5IO6E0YXsRIGA7SwfENfAZx0AajmbIl6xqjwDCVhBxzQIBkqcroLDA8ODvAMWvJliLM9e99tnVxeXDjjL35YGvtZ/v+Pwb/vNi/eL1/873Bo5VZeNxq0kUVLMahMargh8JUksoG6FQWcnC6Zgma5YcSMmfopi5ajoJsNhjFkbMavKw7ZpIU7fDMuKAD5JCpIHUqHkiG84EwNSWLfPWg1k3zFC+ITEOxSEUG7XKcIdesqW4xpslMKg/ubjFGRQD9JEFlpvpJllV7C4bNUb05o+5YMsPpKiyDRooim4JmCrAsuHdFK7JWdUiGIPqhmRkY/DjThHNWRxHMGXAJOaxODTzWEtoWPCobguK"; System.out.println("原长度:" + str.length()); System.out.println("压缩后内容:" + str); System.out.println("解压后内容:" + basegziptest.uncompress(str)); } }