这几天准备完善下 Base64 & UUE 编码文件生成工具,发现处理大文件时,特别慢,分析了一下发现是字符串拼接和切分代码效率太低,看如下代码:
Private Sub Command1_Click()
Dim fL As Long, enfp As Integer, defp As Integer, enfn, defn
Dim B() As Byte, tmpstr As String, outStr As String
Dim timx As Single
timx = Timer
enfn = Text1.Text
defn = Text2.Text
enfp = FreeFile
Open enfn For Binary As #enfp
fL = LOF(enfp)
ReDim B(fL - 1)
Get #enfp, , B
Close #enfp
tmpstr = StrConv(B, vbUnicode)
defp = FreeFile
Open defn For Output As #defp
Do While Len(tmpstr) > 60
outStr = "M" & Mid(tmpstr, 1, 60)
tmpstr = Mid(tmpstr, 61) '这句导致效率变低 20220522
Print #defp, outStr
DoEvents
Loop
Print #defp, tmpstr
Close #defp
MsgBox "处理:" & fL & " 字节用时:" & Timer - timx & " 秒"
End Sub
编码结果得到的字符串,切分为固定长度时,这句:
tmpstr = Mid(tmpstr, 61) '这句导致效率变低 20220522
本意是将切过剩下的字符串取出来,在字符串短的时候没什么影响,但是字符串长度增加后,其速度越来越慢,于是重新想了一个办法:
Private Sub Command2_Click()
Dim fL As Long, enfp As Integer, defp As Integer, enfn, defn
Dim B() As Byte, tmpstr As String, outStr As String
Dim E
Dim timx As Single
timx = Timer
enfn = Text1.Text
defn = Text2.Text
enfp = FreeFile
Open enfn For Binary As #enfp
fL = LOF(enfp)
ReDim B(fL - 1)
Get #enfp, , B
Close #enfp
tmpstr = StrConv(B, vbUnicode)
defp = FreeFile
E = 1
Open defn For Output As #defp
Do While (fL - E) > 60
outStr = "M" & Mid(tmpstr, E, 60)
Print #defp, outStr
DoEvents
E = E + 60
Loop
outStr = Mid(tmpstr, E, 60)
Print #defp, outStr
Close #defp
MsgBox "处理:" & fL & " 字节用时:" & Format(Timer - timx, "0.000000") & " 秒"
End Sub
只从原字符串截取指定长度字符,不再变动原字符串,效率一下子提升了几百倍(字符串越长,提升效率越大)。
’================================================================
另外,对于整个文件读取来说,原先使用的是 :Line Input
Open defn For Input As #defp
Do While Not EOF(defp)
Line Input #defp, tmpstr
EnStr = EnStr & tmpstr
Loop
Close #defp
同理其中 EnStr = EnStr & tmpstr 这句字符串拼接语句也导致了读取效率超低,于是想到了使用 Adodb.Stream 来一次读取整个文件,同样的,小文件时不明显,但是对于2Mb以上的文件来说,obj.readtext 这句效率居然超低,对于8.27 MB的文件需时可达7.32秒。
Private Sub Command3_Click()
Dim str, stm, enfn, defn
Dim timx As Single, tmpstr As String
timx = Timer
enfn = Text1.Text
defn = Text2.Text
Set stm = CreateObject("Adodb.Stream")
stm.Type = 2 '1 bin,2 txt
stm.Mode = 3
stm.Open
stm.Charset = "GB2312"
stm.LoadFromFile enfn
str = stm.readtext '------ 低效 7.32秒
' str = stm.Read '--------高效 0.015秒
stm.Close
Set stm = Nothing
' tmpstr = StrConv(str, vbUnicode)
MsgBox "完成读取文件用时:" & Timer - timx & " 秒" '& Chr(str(0))
End Sub
于是改为 Obj.Read ,发现效率立马提升近500倍。
Private Sub Command3_Click()
Dim str, stm, enfn, defn
Dim timx As Single, tmpstr As String
timx = Timer
enfn = Text1.Text
defn = Text2.Text
Set stm = CreateObject("Adodb.Stream")
stm.Type = 1 '1 bin,2 txt
stm.Mode = 3
stm.Open
' stm.Charset = "GB2312"
stm.LoadFromFile enfn
' str = stm.readtext '------ 低效 7.32秒
str = stm.Read '--------高效 0.015秒
stm.Close
Set stm = Nothing
tmpstr = StrConv(str, vbUnicode)
MsgBox "完成读取文件用时:" & Timer - timx & " 秒" '& Chr(str(0))
End Sub
可见还是因为字符串拼接导致效率变低,同时,与我前面直接用单子节数组读取完整文件的方法比较,Adodb.Stream Obj.Read 的效率还是低了,用之前 8.27MB的文件,以下代码已经计算不出延时,几乎为 0 了。于是更换了一个 75.7 MB 的文件,Adodb.Stream Obj.Read 用时:0.109秒,而以下代码用时:0.023秒,可见 open 语句读取整个文件的话,效率至少是 Adodb.Stream Obj.Read 的 4 倍。
Private Sub Command4_Click()
Dim fL As Long, enfp As Integer, defp As Integer, enfn, defn
Dim B() As Byte, tmpstr As String, outStr As String
Dim timx As Single
timx = Timer
enfn = Text1.Text
defn = Text2.Text
enfp = FreeFile
Open enfn For Binary As #enfp
fL = LOF(enfp)
ReDim B(fL - 1) '----比 Adodb.Stream 更高效
Get #enfp, , B
Close #enfp
' tmpstr = StrConv(B, vbUnicode)
MsgBox "完成读取文件用时:" & Format((Timer - timx), "0.000000") & " 秒" '& Chr(B(0))
End Sub
'============================================
同时,在之前的 Base64 编码结果拼接时,原先使用的是字符直接拼接的方法(见:一个 VBS 写的 Base64 + UUE 编码程序源码,可自定义编码表_jessezappy的博客-CSDN博客): ret = ret & Chr(Base64EncMap((first \ 4) And 63)) ,全部拼接完成后返回整个字符串,也是在数据量变大后,发现其效率超级低,后来,将其改为先保存编码结果至 byte 单字节数组,
ReDim Preserve ret(retLength + 4)
ret(retLength + 1) = (Base64EncMap((first \ 4) And 63))
ret(retLength + 2) = (Base64EncMap(((first * 16) And 48) + ((second \ 16) And 15)))
ret(retLength + 3) = (Base64EncMap(((second * 4) And 60) + ((third \ 64) And 3)))
ret(retLength + 4) = (Base64EncMap(third And 63))
最后将单字节数组直接用 StrConv(ret, vbUnicode) 转换为字符串,对比效率提升了近千倍(倍率由编码数据长度决定)。
’===========================================
综上所述,字符串的拼接,裁剪,是导致以上代码效率变低的罪魁祸首。
----------此记