Keras & Tensorflow (GPU有)の環境構築 on Windows with Anaconda

GPU付きのPC買ったので試したくなりますよね。
ossyaritoori.hatenablog.com

事前準備
Tensorflowのインストール
出会ったエラー達 Tensorflow編
- CUDNNのPATHがない
- 初回実行時？の動作
Kerasのインストール
MNISTのサンプルコード実行

事前準備

ハードウェア要求

Geforceを積んでいる高性能なPCを持っていること。
深層学習、特にCNNはかなり時間がかかります。

入れるもの

ソフトウェア間の依存関係に注意。
以下のサイトをよく確認してください。CUDAのバージョンは合っているか？CUDNNは？Pythonのは？これら全部合っていないとうまくいきません。
Installing TensorFlow on Windows | TensorFlow

具体的な環境例（2018年3月現在）

Tensorflow1.6
CUDA 9.0
CUDNN 7.0
Python3.5

これらをすべて把握している必要があります。

CUDA関係のインストール

CUDAのツールキットをダウンロード。以下は9.0のもの。間違えないように。
developer.nvidia.com

以下のpathにもろもろが入るはずです。

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0

CUDNNのダウンロードはNVIDIAのサポートに入ることが必要。
developer.nvidia.com

ZIPを解凍した後に、 'cudnn64_7.dll' の入っているところをpathに追加します。
Pathの通っているところに当該ファイルを置いても良いかもしれませんが。

Anacondaのインストール

いつもの。こだわらないなら適当にクリックし続けていれば入ります。
www.anaconda.com

今回入れたときは、環境変数が登録されてなかったので、
「Anaconda\」と「Anaconda\Scripts」、「C:\ProgramData\Anaconda3\Library\bin」にPATHを通しておいてください。

Tensorflowのインストール

仮想環境の構築

仮想環境をたてます。AnacondaPromptからコンソールを呼んで

conda create --name=tf35 python=3.5

とします。仮想環境名は「tf35」です。次のようにして仮想環境へと入ります。

activate tf35

インストール

GPUバージョンを入れるので次のように。
公式のとちょっと違いますが、おそらく等価です。

pip install --ignore-installed --upgrade tensorflow-gpu

以下の記事を参照。
qiita.com

動作確認

pythonを起動し、次のようなプログラムを回せばいいです。

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

Hello world！

出会ったエラー達 Tensorflow編

CUDNNのPATHがない

動作確認時に超親切なエラー文が出てきた。
CUDNNのpathを設定するとのこと。

ImportError: Could not find 'cudnn64_7.dll'. TensorFlow requires that this DLL be installed in a directory that is named in your %PATH% environment variable. Note that installing cuDNN is a separate step from installing CUDA, and this DLL is often found in a different directory from the CUDA DLLs. You may install the necessary DLL by downloading cuDNN 7 from this URL: https://developer.nvidia.com/cudnn

CUDNNのbinをPathに加えて完了。

初回実行時？の動作

初回セッション立ち上げ時に次のような表示が出てきました。まぁ特に問題なく成功したんですが。

2018-03-27 12:11:06.853398: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-03-27 12:11:07.976741: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1212] Found device 0 with properties:
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.30GiB
2018-03-27 12:11:07.976947: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1312] Adding visible gpu devices: 0
2018-03-27 12:14:09.136681: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3033 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)

Kerasのインストール

Tensorflowをバックエンドに動くさらに高級な関数群を提供しているのがKeras。どうも本家TensorflowにもKerasの一部が移植されているようですが気にせずインストールします。
僕が試したのはこれだけ。

pip install keras

本当はAnaconda環境ならcondaで探して入れたほうがいいと思っているのですが面倒臭くなってpip使ってます。
Opencvもpipで入れて動作したしな…

＜参考：CondaでOpenCVをいれるには＞
ossyaritoori.hatenablog.com

MNISTのサンプルコード実行

Tensorflow編

以下の人のコードを借りました。
qiita.com

#TensorFlow Deep MNIST for Experts
#https://www.tensorflow.org/get_started/mnist/pros

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

import tensorflow as tf
sess = tf.InteractiveSession()

x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
sess.run(tf.global_variables_initializer())
y = tf.matmul(x,W) + b
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

for _ in range(1000):
  batch = mnist.train.next_batch(100)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,28,28,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())

for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print("step %d, training accuracy %g"%(i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print("test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

実行結果はコピペ忘れたので載せませんが20000回くらい学習して99.05%の正答率だったかと。

Keras編

とりあえず、ここのコードを丸コピして性能を確認しました。
github.com

コードをサラッと読むと簡便性が理解できるでしょうか。
CNNのアルゴリズムの詳細に突っ込まないうちはKerasで遊びたいと思います。

実行結果

6万回の学習を12Epochやって以下の結果を得ました。

Test loss: 0.02737301055721746
Test accuracy: 0.9913

1epochあたり17秒くらいで3分もかからなかったという印象です。
CPUでやったときは20分位かかったので5倍は早くなっているんじゃないでしょうか。

PCにかかる負荷など

4GあるGPUの3/4を専有していますね。CPUも多少働いています。メモリの方はChromeのせいです。
f:id:ossyaritoori:20180327212506p:plain

余談ですが新しく買ったZenbook Proで学習しましたがファンが爆音で鳴ることもなくスムースに終わったので活躍を期待できるんではないでしょうか。（ステマ）

粗大メモ置き場

個人用，たまーに来訪者を意識する雑記メモ