java爬虫不解压直接读取压缩包内容
java解压缩
近期在一个爬虫的项目有个需求是下载一个压缩包不保存文件直接读取其中的内容。
常见的java解压缩代码、
public static void readZipFile(String file) {
File filef = new File("XXXX");
try {
ZipFile zf = new ZipFile(file);
InputStream in = new BufferedInputStream(new FileInputStream(file));
ZipInputStream zin = new ZipInputStream(in);
ZipEntry ze;
while ((ze = zin.getNextEntry()) != null) {
InputStream inputStream = zf.getInputStream(ze);
InputStreamReader read = new InputStreamReader(inputStream, "gb2312");
BufferedReader bufferedReader = new BufferedReader(read);
String line = "";
line = bufferedReader.readLine();
while (line != null) {
line = bufferedReader.readLine(); // 一次读入一行数据
System.out.println(line);
}
}
zin.closeEntry();
zf.close();
in.close();
} catch (Exception e) {
e.printStackTrace();
}
}
这个方法需要创建一个ZipFile对象,但当时没有找到如何将ZipInputStream转为ZipFile。
后面使用了方法二
public void readZipFile(byte[] bytes) {
ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
ZipInputStream zipInputStream = new ZipInputStream(bis);
try {
// InputStream in = new ByteArrayInputStream(bytes);
// BufferedInputStream bs = new BufferedInputStream(zipInputStream);
ZipEntry zipEntry;
byte[] byteEntry = null;
while ((zipEntry = zipInputStream.getNextEntry()) != null) {
if (zipEntry.isDirectory()) {
// do nothing
} else {
// String name = zipEntry.getName();
long size = zipEntry.getSize();
// unknown size
// ZipEntry的size可能为-1,表示未知
// 通过上面的几种方式下载,就会产生这种情况
if (size == -1) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
while (true) {
int byteint = zipInputStream.read();
if (byteint == -1) break;
baos.write(byteint);
}
baos.close();
byteEntry = baos.toByteArray();
System.out.println(String.format("Name:%s,Content:%s", name, new String(baos.toByteArray())));
} else { // ZipEntry的size正常
byteEntry = new byte[(int) zipEntry.getSize()];
zipInputStream.read(byteEntry, 0, (int) zipEntry.getSize());
System.out.println(String.format("Name:%s,Content:%s", name, new String(byteEntry)));
}
InputStreamReader read = new InputStreamReader(new ByteArrayInputStream(byteEntry), "gb2312");
BufferedReader bufferedReader = new BufferedReader(read);
String line = "";
line = bufferedReader.readLine();
while (line != null) {
line = bufferedReader.readLine(); // 一次读入一行数据
if (line == null) {//最后一行异常
break;
}
}
}
}
} catch (Exception e) {
e.printStackTrace();
} finally {
if (bis != null) {
try {
bis.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
if (zipInputStream != null) {
try {
zipInputStream.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
}
}
这个有个坑就是ZipEntry的size可能为-1当时一直找不到原因。
后面是通过https://www.jianshu.com/p/79bfe182a28f。
才找到解压的方法。