语言: Python 3.7.2
系统: Win10 Ver. 10.0.17763
主题: 004.01 不同 Python 数据类型的搜寻
最近在作资料搜索比对的案子的时候,发现大量的数据在搜索比对时,速度变的非常慢,慢到完全无法接受,我想要的是 ' 立即 ' 有结果,结果却是要等好几小时,晕!虽然以 Python 来说,肯定比不上 C 或 Assembly 语言,但是还是要想办法提升一下速度。
以下是在一万笔数据中,找一万笔数据的各种方法以及所需的时间,虽然最后一个方法 index_list_sort()
, 速度快了多,但是我还是觉得不够快,而且这里还只是整数的搜索,如果是字符串呢?如果是副字符串呢?各位如果有更好的方法,也请提示,谢谢!
结果:
0:00:04.734338: index_sequence 0:00:01.139984: index_list 0:00:00.330116: index_np 0:00:00.233343: index_np_sort 0:00:00.223401: index_dict 0:00:00.213462: index_set 0:00:00.007977: index_list_sort
代码:
from datetime import datetime import numpy as np import bisect import time import random import inspect import copy size =10000 value = size-1 db = random.sample(range(size), size) db_sort = copy.deepcopy(db) db_sort.sort() db_set =set(db) db_dict ={db[i]:i for i inrange(size)} db_np = np.array(db) value =[i for i inrange(size)]defcall(func):# Call function and calculate execution time, then print duration and function name start_time = datetime.now() func()print(datetime.now()- start_time,':',func.__name__)defdo_something():# Do something here, it may get duration different when multi-loop method usedfor i inrange(1000):passdefindex_sequence():# List unsort and just by Python without any method used or built-in function.for i inrange(size):for j inrange(size):if value[j]== db[i]: index = j do_something()breakdefindex_list():# Unsorted list, use list.index()for i inrange(size):try: index = db.index(value[i])except: index =-1if index >=0: do_something()defindex_np():# By using numpy and np(where)for i inrange(size): result = np.where(db_np==value[i])iflen(result[0])!=0: do_something()defindex_np_sort():# By using numpy and sorted numpy arrayfor i inrange(size): result = np.searchsorted(db_np, value[i])if result != size: do_something()defindex_list_sort():# By using bisect libraryfor i inrange(size): index = bisect.bisect_left(db, value[i])if index < size-1and value[index]==db[index]: do_something()defindex_set():# Set serachfor i inrange(size):if value[i]in db_set: do_something()defindex_dict():# Dictionary searchfor i inrange(size):try: index = db_dict[value[i]]except: index =-1if index >=0: do_something()# Test execution time call(index_sequence) call(index_list) call(index_np) call(index_np_sort) call(index_dict) call(index_set) call(index_list_sort)
IT技术交流群:887934385
最后,感谢观看!
更新日期: None
语言: Python 3.7.2
系统: Win10 Ver. 10.0.17763
主题: 004.01 不同 Python 数据类型的搜寻
最近在作资料搜索比对的案子的时候,发现大量的数据在搜索比对时,速度变的非常慢,慢到完全无法接受,我想要的是 ' 立即 ' 有结果,结果却是要等好几小时,晕!虽然以 Python 来说,肯定比不上 C 或 Assembly 语言,但是还是要想办法提升一下速度。以下是在一万笔数据中,找一万笔数据的各种方法以及所需的时间,虽然最后一个方法 index_list_sort()
, 速度快了多,但是我还是觉得不够快,而且这里还只是整数的搜索,如果是字符串呢?如果是副字符串呢?各位如果有更好的方法,也请提示,谢谢!
结果:
0:00:04.734338 : index_sequence
0:00:01.139984 : index_list
0:00:00.330116 : index_np
0:00:00.233343 : index_np_sort
0:00:00.223401 : index_dict
0:00:00.213462 : index_set
0:00:00.007977 : index_list_sort
代码:
from datetime import datetime
import numpy as np
import bisect
import time
import random
import inspect
import copy
size = 10000
value = size-1
db = random.sample(range(size), size)
db_sort = copy.deepcopy(db)
db_sort.sort()
db_set = set(db)
db_dict = {db[i]:i for i in range(size)}
db_np = np.array(db)
value = [i for i in range(size)]
def call(func):
# Call function and calculate execution time, then print duration and function name
start_time = datetime.now()
func()
print(datetime.now() - start_time,':',func.__name__)
def do_something():
# Do something here, it may get duration different when multi-loop method used
for i in range(1000):
pass
def index_sequence():
# List unsort and just by Python without any method used or built-in function.
for i in range(size):
for j in range(size):
if value[j] == db[i]:
index = j
do_something()
break
def index_list():
# Unsorted list, use list.index()
for i in range(size):
try:
index = db.index(value[i])
except:
index = -1
if index >= 0:
do_something()
def index_np():
# By using numpy and np(where)
for i in range(size):
result = np.where(db_np==value[i])
if len(result[0])!=0:
do_something()
def index_np_sort():
# By using numpy and sorted numpy array
for i in range(size):
result = np.searchsorted(db_np, value[i])
if result != size:
do_something()
def index_list_sort():
# By using bisect library
for i in range(size):
index = bisect.bisect_left(db, value[i])
if index < size-1 and value[index]==db[index]:
do_something()
def index_set():
# Set serach
for i in range(size):
if value[i] in db_set:
do_something()
def index_dict():
# Dictionary search
for i in range(size):
try:
index = db_dict[value[i]]
except:
index = -1
if index >= 0:
do_something()
# Test execution time
call(index_sequence)
call(index_list)
call(index_np)
call(index_np_sort)
call(index_dict)
call(index_set)
call(index_list_sort)