Tensorflow 使用 GPU 做 MINST 手写体识别

Tensorflow 的 CPU 版本安装很容易，但是 GPU 版安装后，不知什么原因就不能用了。这几天我查了些文章，分析原因可能是我不小心把显卡的驱动给升级了，造成版本不兼容。但这个说法我觉得也经不起推敲。于是我想重新安装 Tensorflow，但是发现以前用的联网安装不灵了，国外的数据源连不上了，清华大学的镜像好像也不能用了。

我决定卸载当前的 Tensorflow 1.13.x 版本，从网上找了 Tensorflow 的 1.10.0 版重新安装。先安装的 CPU 版，很容易就搞定了。然后安装 GPU 版，也不麻烦，按照 whl 扩展名，用 pip install 命令一个一个安装就行。中间经常用 pip list 命令查看要安装的内容是不是已经有了。最后安装 cuda 10.0，再把 cudnn 的内容复制到 cuda 10.0 的安装目录里就 OK 了。

然后构建 MINST 实验程序，结果发现 GPU 不能用，因为以前用 Tensorflow 1.13.x 版本试验过，当时 GPU 的名称用的是 “/gpu:0”，结果运行的时候系统提醒我需要改成 “/job:localhost/replica:0/task:0/device:GPU:0”，改了以后果然能运行。深刻的道理暂时不忙分析，接下来先搞几个自己设计的实验再说。

附：MNIST 手写体识别试验 Python 源码

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot = True)
import pylab

tf.reset_default_graph()

#定义占位符
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
W = tf.Variable(tf.random_normal([784, 10]))
b = tf.Variable(tf.zeros([10]))
pred = tf.nn.softmax(tf.matmul(x, W) + b)

#loss funtion
cost = tf.reduce_mean(-tf. reduce_sum(y*tf.log(pred), reduction_indices=1))

#define params
learning_rate = 0.01

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
training_epochs = 25
batch_size = 100
display_step = 1

saver = tf.train.Saver()
model_path = "log/521model.ckpt"
    
#start session
with tf.Session() as sess:
    with tf.device("/job:localhost/replica:0/task:0/device:GPU:0"):
        sess.run(tf.global_variables_initializer())
    
        #
        for epoch in range(training_epochs):
            avg_cost= 0
            total_batch= int(mnist.train.num_examples/batch_size)#
        #
            for i in range(total_batch):
                batch_xs, batch_ys = mnist.train.next_batch(batch_size)
                #
                _, c = sess.run([optimizer, cost], feed_dict = {x: batch_xs, y: batch_ys})
                
                #
                avg_cost += c / total_batch
            #
            if (epoch+1) % display_step == 0:
                print ("Epoch:", "%04d" % (epoch+1), "cost = ", "{:9f}".format(avg_cost))
                
        print("finished!")

#test model
        correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y,1))       
        #
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        print("Accuracy: ", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
        #save model
        save_path = saver.save(sess, model_path)
        print("Model saved in file: %s" % save_path)

print("Starting 2nd session..")
with tf.Session() as sess:
    with tf.device("/job:localhost/replica:0/task:0/device:GPU:0"):
        sess.run(tf.global_variables_initializer())
        saver.restore(sess, model_path)
        correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        print("Accuracy: ", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
        output = tf.argmax(pred, 1)
        batch_xs, batch_ys = mnist.train.next_batch(2)
        outputval, predv = sess.run([output, pred], feed_dict = {x: batch_xs})
        print(outputval, predv, batch_ys)
        
        im = batch_xs[0]
        im = im.reshape(-1, 28)
        pylab.imshow(im)
        pylab.show()
        
        im = batch_xs[1]
        im = im.reshape(-1, 28)
        pylab.imshow(im)
        pylab.show()

补充：不知什么原因，GPU改回原来的形式也能工作了：tf.device(’/gpu:0’):，原因慢慢研究吧。我觉得现在软件框架的产品化程度比30年前差的太远了，那个时候拿到的开发工具和开发包，几乎都不会出现这些乱七八糟的问题。

import tensorflow as tf
a = tf.constant(1)
b = tf.constant(2)
with tf.Session() as sess:
    with tf.device('/gpu:0'):
        add = tf.add(a, b)
        print(sess.run(add))

安装了 GPU-Z 监控程序看了一下，Tensorflow-GPU 版本，不需要设定 with tf.device(’/gpu:0’) 也可以自动使用 GPU：0，但是据说多个 GPU 的化，还得自己指定才行。

quicmous

发布了174 篇原创文章 · 获赞 80 · 访问量 35万+

私信关注

Tensorflow 使用 GPU 做 MINST 手写体识别

猜你喜欢