1. julia安装
首先去官网,找到对应的版本下载,苹果m1芯片对应的M-series
然后打开julia,安装IJulia:
using Pkg
Pkg.add("IJulia")
然后打开jupyter,就可以新建julia页面了。
2. m1 gpu编程库:Metal
安装:
julia> import Pkg; Pkg.add("Metal")
julia> using Metal
使用方式和cuda库极为相似,参考下面这个例子:
function vadd(a, b, c)
i = thread_position_in_grid_1d()
c[i] = a[i] + b[i]
return
end
a = MtlArray([1]); b = MtlArray([2]); c = similar(a);
@metal threads=length(c) vadd(a, b, c)
synchronize()
接下来我们用多项式计算对比测试一下性能:
using Metal
using Polynomials
function mpoly(a,c)
i = thread_position_in_grid_1d()
coef = 1:9
for j in coef
c[i] = j + c[i]*a[i]
end
return
end
function poly(a,c)
p = Polynomial(9:-1:1)
for j in 1:size(a)[1]
c[j]=p(a[j])
end
return
end
testa = MtlArray{Float32}(rand(10))
testc = similar(testa)
a = rand(1024*2048*64);
c = similar(a);
ma = MtlArray{Float32}(a);
mc = similar(ma);
@metal threads=10 grid=1 mpoly(testa,testc)
poly(testa,testc)
先小规模运行一遍,然后进行时间对比:
@time @metal threads=1024 grid=2048*64 mpoly(ma,mc)
@time poly(a,c)
结果为:
0.000421 seconds (132 allocations: 3.680 KiB)
0.604166 seconds (71.97 k allocations: 3.755 MiB, 3.64% compilation time)
耗时相差1400多倍,性能提升突出啊~