版权声明:本文为博主原创文章,未经博主允许不得转载。博客地址:http://www.fanlegefan.com/ https://blog.csdn.net/woloqun/article/details/83411363
贴代码
public static void getParquetFileSizeAndRowCount()throws Exception{
Path inputPath = new Path("/user/hive/warehouse/user_parquet");
Configuration conf = new Configuration();
FileStatus[] inputFileStatuses = inputPath.getFileSystem(conf).globStatus(inputPath);
for (FileStatus fs : inputFileStatuses) {
for (Footer f : ParquetFileReader.readFooters(conf, fs, false)) {
for (BlockMetaData b : f.getParquetMetadata().getBlocks()) {
logger.info("TotalByteSize:"+b.getTotalByteSize() +" CompressedSize:"+b.getCompressedSize()+" rowCount:"+b.getRowCount());
}
}
}
}
输出:
18/10/26 10:38:20 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
18/10/26 10:38:20 INFO hadoop.ParquetFileReader: reading another 1 footers
18/10/26 10:38:20 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
18/10/26 10:38:20 INFO test.HDFSTest: TotalByteSize:106324460 CompressedSize:106324460 rowCount:53285496
部分pom.xml
<properties>
<hadoop.version>2.8.4</hadoop.version>
<parquet.version>1.10.0</parquet.version>
</properties>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-common</artifactId>
<version>${parquet.version}</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-encoding</artifactId>
<version>${parquet.version}</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-column</artifactId>
<version>${parquet.version}</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop</artifactId>
<version>${parquet.version}</version>
</dependency>