简单统计了一下numpy在保存数据到文件时几种方式的耗时。
所用的数据有两个,一个是10000x10000的大矩阵,一个是640x480的小矩阵,分别查看在大数据和小数据上保存和加载的表现。
保存方式有三种:
- np.save():直接将对象dump为二进制文件,无压缩,文件大
- np.savez():可同时保存多个对象,加载时通过字典读取,无压缩,文件大
- np.savez_compressed():将np.savez()的结果进行压缩,文件小
运行环境:Win10 64bit,Python 3.7
测试数据显示:
- 大矩阵的保存:
np.save():耗时 550.91 ms, 文件大小 390625.12 KB
np.savez():耗时 970.71 ms, 文件大小 390625.24 KB
np.savez_compressed():耗时 36123.80 ms, 是np.save()的65.6倍,文件大小 63423.66 KB,压缩率 6.16 大矩阵的加载,都是np.load(),耗时分别为 488.87 ms, 1162.00 ms, 2158.33 ms。加载压缩数据的时间,是无压缩数据时间的4.4倍。
- 小矩阵的保存:
np.save():耗时 1.70 ms, 文件大小 1200.12 KB
np.savez():耗时 7.98 ms, 文件大小 1200.24 KB
np.savez_compressed():耗时 146.19 ms, 是np.save()的86倍,文件大小 195.20 KB,压缩率 6.15 小矩阵的加载,都是np.load(),耗时分别为 1.80 ms,6.28 ms,12.97 ms。加载压缩数据的时间,是无压缩数据时间的7.2倍。
由此可见:
- np.savez()因为有字典操作,所以耗时比np.save()会增加
- np.savez_compressed()有压缩操作,所以耗时比np.save()大60-90倍,对于随机数据,压缩率在6左右,如果是稀疏矩阵,压缩耗时及压缩率必然不同
- 加载数据时,加载压缩过的数据耗时是原始数据的4-8倍,如果是稀疏矩阵,解压缩耗时必然不同
结论:
- 对于偏大的稀疏矩阵,且对存储空间敏感,使用压缩方式存储是值得一试的方式
import os
import os.path as osp
import numpy as np
import time
# - check cost time for func()
def check_time(desc, func, run_times=10):
t = time.time()
for i in range(run_times):
func()
t = (time.time()-t)*1000/run_times
print('%s cost avg time = %.2f ms' % (desc, t))
return t
# - big and small ndarray
big = np.random.randint(0, 10, size=(10000,10000))
small = np.random.randint(0, 10, size=(640,480))
print('big =', big)
print('small =', small)
big = [[0 3 9 ... 7 3 2]
[9 5 9 ... 5 8 7]
[2 5 6 ... 3 6 9]
...
[3 6 0 ... 6 0 1]
[8 0 6 ... 5 1 1]
[7 0 1 ... 7 7 7]]
small = [[3 7 8 ... 1 4 1]
[6 1 0 ... 2 1 1]
[0 7 5 ... 4 3 9]
...
[3 5 4 ... 7 2 2]
[6 3 1 ... 4 5 9]
[3 1 9 ... 5 2 5]]
# - npy and npz filename
big_npy_filename = 'big_npy.npy'
big_npz_filename = 'big_npz.npz'
big_compressed_npz_filename = 'big_compressed.npz'
small_npy_filename = 'small_npy.npy'
small_npz_filename = 'small_npz.npz'
small_compressed_npz_filename = 'small_compressed.npz'
# - save functions
def test_save_big_npy():
np.save(big_npy_filename, big)
def test_save_big_npz():
np.savez(big_npz_filename, big)
def test_save_big_compressed_npz():
np.savez_compressed(big_compressed_npz_filename, big)
def test_save_small_npy():
np.save(small_npy_filename, small)
def test_save_small_npz():
np.savez(small_npz_filename, small)
def test_save_small_compressed_npz():
np.savez_compressed(small_compressed_npz_filename, small)
# - load functions
def test_load_big_npy():
return np.load(big_npy_filename)
def test_load_big_npz():
return np.load(big_npz_filename)['arr_0']
def test_load_big_compressed_npz():
return np.load(big_compressed_npz_filename)['arr_0']
def test_load_small_npy():
return np.load(small_npy_filename)
def test_load_small_npz():
return np.load(small_npz_filename)['arr_0']
def test_load_small_compressed_npz():
return np.load(small_compressed_npz_filename)['arr_0']
# - check save time for big
check_time('save big npy', test_save_big_npy)
check_time('save big npz', test_save_big_npz)
check_time('save big compressed npz', test_save_big_compressed_npz)
for f in [
big_npy_filename,
big_npz_filename,
big_compressed_npz_filename
]:
print('file %s size = %.2f KB' % (f, osp.getsize(f)/1024))
save big npy cost avg time = 550.91 ms
save big npz cost avg time = 970.71 ms
save big compressed npz cost avg time = 36123.80 ms
file big_npy.npy size = 390625.12 KB
file big_npz.npz size = 390625.24 KB
file big_compressed.npz size = 63423.66 KB
# - check load time for big
check_time('load big npy', test_load_big_npy)
check_time('load big npz', test_load_big_npz)
check_time('load big compressed npz', test_load_big_compressed_npz)
load big npy cost avg time = 488.87 ms
load big npz cost avg time = 1162.00 ms
load big compressed npz cost avg time = 2158.33 ms
2158.3264589309692
# - check save time for small
check_time('save small npy', test_save_small_npy)
check_time('save small npz', test_save_small_npz)
check_time('save small compressed npz', test_save_small_compressed_npz)
for f in [
small_npy_filename,
small_npz_filename,
small_compressed_npz_filename
]:
print('file %s size = %.2f KB' % (f, osp.getsize(f)/1024))
save small npy cost avg time = 1.70 ms
save small npz cost avg time = 7.98 ms
save small compressed npz cost avg time = 146.19 ms
file small_npy.npy size = 1200.12 KB
file small_npz.npz size = 1200.24 KB
file small_compressed.npz size = 195.20 KB
# check load time for small
check_time('load small npy', test_load_small_npy)
check_time('load small npz', test_load_small_npz)
check_time('load small compressed npz', test_load_small_compressed_npz)
load small npy cost avg time = 1.80 ms
load small npz cost avg time = 6.28 ms
load small compressed npz cost avg time = 12.97 ms
12.965798377990723