ABCIの利用法_tips_ver1

# アクセスサーバー(as.abci.ai)にログイン

yourpc$ ssh -L 10022:es:22 -l アカウント名 as.abci.ai -i ~/.ssh/秘密鍵ファイル名
# このターミナルはこのまま放置して置くこと

# インタラクティブノード(es)にポートフォワーディング

yourpc$ ssh -p 10022 -l アカウント名 localhost

# ファイル(local-filename)のアップロード(インタラクティブノードにログインし、別のターミナルから)

yourpc$ scp -P 10022 local-filename アカウント名@localhost:/home/アカウント名/

# ファイル(remote-filename)のダウンロード(インタラクティブノードにログインし、別のターミナルから)

yourpc$ scp -P 10022 アカウント名@localhost:/home/アカウント名/remote-filename ./

# インタラクティブジョブ(qrsh)の実行(インタラクティブノードにログインしたターミナルから)

[username@es1 ~]$ qrsh -g グループ名 -l rt_F=1 -l h_rt=01:00:00

# バッチジョブ(qsub)の実行(インタラクティブノードにログインしたターミナルから)

[username@es1 ~]$ qsub -g グループ名 -l rt_C.small=1 sample.sh

# tensor flow-gpu環境の構築とプログラム実行

# まず、インタラクティブジョブ(qrsh)を実行
[username@es1 ~]$ qrsh -g グループ名 -l rt_F=1 -l h_rt=01:00:00

# その後、以下を実行する
## モジュールのロード
[username@g0001 ~]$ module load python/3.6/3.6.5 cuda/9.0/9.0.176.4 cudnn/7.4/7.4.2

## python仮想環境に入る
[username@g0001 ~]$ python3 -m venv ~/Sample/v_tf_gpu
[username@g0001 ~]$ source ~/Sample/v_tf_gpu/bin/activate

## 必要ライブラリのインストール(1度だけ)
(v_tf_gpu)[username@g0001 ~]$ pip3 uninstall tensorflow-gpu # 古いtensorflow-gpuがあれば、アンイストールしておく
(v_tf_gpu)[username@g0001 ~]$ pip3 install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.12.0-cp36-cp36m-linux_x86_64.whl

## pythonでTensorFlowを実行
(v_tf_gpu)[username@g0001 ~]$ python
>>> import tensorflow as tf
>>> # TensorFlowを使う

## python仮想環境を出る（Ctrl-D (i.e. EOF) to exit）
(v_tf_gpu)[username@g0001 ~]$ 

# バッチジョブ(qsub)でmnistを実行する方法

## (1) スクリプト例(run_mnist.sh)

#!/bin/bash
# 仮想環境の初期化
source /etc/profile.d/modules.sh
# モジュールのロード
module load python/3.6/3.6.5 cuda/9.0/9.0.176.4 cudnn/7.4/7.4.2
# 仮想環境の有効化
source ~/Sample/v_tf_gpu/bin/activate
# プログラムの実行
cd ~/Sample/v_tf_gpu
python sample_mnist.py

## (2) プログラム例(sample_mnist.py)

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

def create_model():
 model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
 ])
 return model
model = create_model()
model.compile(optimizer='adam',
     loss='sparse_categorical_crossentropy',
     metrics=['accuracy'])
model.fit(x_train, y_train, epochs=3)
model.save("./mnist_model.hdf5")
print(model.evaluate(x_test, y_test))
print("finished")

## (3) スクリプト、プログラムをscpでアップロード

yourpc$ scp -P 10022 run_mnist.sh aaa12345xx@localhost:run_mnist.sh
yourpc$ scp -P 10022 sample_mnist.py aaa12345xx@localhost:sample_mnist.py

## (4) インタラクティブノードでのコマンド

# スクリプト、プログラムに実行権を付与
[username@es3 ~]$ chmod u+x run_mnist.sh
[username@es3 ~]$ chmod u+x sample_mnist.py
[username@es3 ~]$ ls –l
-rwxr----- 1 aaa12345xx aaa12345xx     417  3月 27 11:13 run_mnist.sh
-rwxr----- 1 aaa12345xx aaa12345xx     720  3月 27 10:05 sample_mnist.py
# -rwxr– となっていることを確認すること

[username@es3 ~]$ qsub -g gaa12345 -l rt_G.small=1 run_mnist.sh

# Jupyter Notebookの利用(2019/12/11更新)

## (1) 計算ノードを一台占有し、Python仮想環境を作成し、pipでtensorflow-gpuとjupyterをインストール（一度、実施すればいい）。

[username@es3 ~]$ qrsh -g gaa12345 -l rt_F=1
[username@g0001 ~]$ module load python/3.6/3.6.5 cuda/10.0/10.0.130.1 cudnn/7.4/7.4.2
[username@g0001 ~]$ python3 -m venv ~/jupyter_env
[username@g0001 ~]$ source ~/jupyter_env/bin/activate
(jupyter_env)[username@g0001 ~]$ pip3 install tensorflow-gpu jupyter numpy==1.16.4

## 次回以降は、以下のようにモジュールの読み込みと~/jupyter_envのアクティベートだけでいい。

[username@es3 ~]$ qrsh -g gaa12345 -l rt_F=1
[username@g0001 ~]$ module load python/3.6/3.6.5 cuda/10.0/10.0.130.1 cudnn/7.4/7.4.2
[username@g0001 ~]$ source ~/jupyter_env/bin/activate
(jupyter_env)[username@g0001 ~]

## (2) Jupyter Notebookを起動。

(jupyter_env)[username@g0001 ~]$ jupyter notebook --ip=`hostname` --port=8888 --no-browser
..
[I 20:41:12.082 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 20:41:12.090 NotebookApp]

To access the notebook, open this file in a browser:
     file:///home/username/.local/share/jupyter/runtime/nbserver-xxxxxx-open.html
Or copy and paste one of these URLs:
     http://g0001.abci.local:8888/?token=token_string
  or http://127.0.0.1:8888/?token=token_string

## (3) 別ターミナルで。ローカルPCの8888番ポートを計算ノードの8888番ポートに転送するSSHトンネルを作成

yourpc$ ssh -L 8888:g0001:8888 -l username -p 10022 localhost

## (4) ブラウザで下記のURLを開く（トークンは、(2)をコピペ）。

http://127.0.0.1:8888/?token=token_string

# Jupyter Notebook利用上の注意点(2019/12/11更新)

## notebookコンフィグファイル

Jupyter Notebookの利用には、notebookコンフィグファイルが必要です。
「jupyter_notebook_config.py」が下記のフォルダにない場合は、次のコマンドでnotebookコンフィグファイルを作成してください。
・　Windows: C:\Users\USERNAME\.jupyter\jupyter_notebook_config.py
・　OS X: /Users/USERNAME/.jupyter/jupyter_notebook_config.py
・　Linux: /home/USERNAME/.jupyter/jupyter_notebook_config.py

yourpc$ jupyter notebook --generate-config