Set up machine learning development environment

4 minute read

background

Machine learning algorithm always need lots of computing resource, generally, the computer is big and noisy, and most of them are host on Linux system, so most of us run our code on a remote server. How to set up the remote development environment to make us work smoothly is really important.

There includes several stuffs we have to set up:

  • ssh
  • file transfer with server
  • UI on remote
  • machine learning environment
  • remote development tool
  • others

Notice: If you are in **mainland China **, you’d better set up a proxy to go through GFW, so that you can enjoy free network, you need to set up you proxy both on browser and terminal, since you need to download many packages on terminal when you setting up you environment, otherwise it may cost you lots of time to set up stuffs. For how to set up network proxy, refer to my post—set up VPS.

Special statement: This tutorial is only for learning and research, thanks.

ssh

basically, we should visit our remote sever through ssh connection. refer this for ssh without password.

on Linux PC, just follow the basic tutorial, on Windows, you first need a bash environment, I recommend some bash env. based on mintty, such as Cygwin, git bash or wsl-terminal,

you can rename your ssh sever by edit ~/.ssh/config, eg.

Host my_server                       
    HostName example.com           # ip or domain name
    User root                      # user name

then your can just visit your server by ssh my_server

In addition, if you need to connect your server through a jumper machine, refer my note that how to make your ssh more smoothly by adding ssh tunnel.

If visiting your server should through a SSL based VPN, and the VPN client only has Windows version, and your host machine is linux, then how to make it work on your Linux? refer my post—Enabling SSL VPN on Linux.

file transfer with server

just refer my another post Transfer files, using scp or sshfs.

machine learning environment

  1. basic tools

    just install some basic tools on Linux, such as

    git vim tmux htop etc.

    you can write a shell scrip to do all that, and I will publish a scrip on my Github to duo that later.

  2. development environment

    machine learning frameworks:

    how to set up environment

    ​ most remote servers are using python as high-level development language, there are several python package management tools, such as pip, conda, and python virtual environment managers, such as miniconda, anaconda, pyenv, and I personally recommend conda, your can use Miniconda or anaconda.

    ​ you’d better set up an python environment which separate your env with system, since there amy be other people also using your machine, mixing up stuffs may make your env heavy and out of your control. moreover, setting up a virtual env make your transplant your env easier.

    some basic conda command, for more, refer conda cmd.

     conda install numpy
     conda remove numpy
     conda create -n myenv
     conda create –n test_env python=3.6
     conda list
     conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
     conda config --set show_channel_urls yes
     conda activate $ENVIRONMENT_NAME
     conda deactivate
    

remote development tool

some developer do like just write code on vim via ssh to their server, so how to develop on remote? one is Jupyter notebook, you can rite code and view plot figures on browser, for IDE, I recommend Pytorch, it can write code on local and automatically synchronize to your remote server and run code on your remote server with your local Pycharm IDE.

  1. Jupyter

    looking for more about jupyter set up, refer my note jupyter and tensorboard configuration

    if you want to set specific GPU in your python program, please us export CUDA_VISIBLE_DEVICES=0,1 text.py , if you are in Jupyter, there is a way to set it, but sometime it doesn’t work, so I recommend you just add the following line to your ‘~/.bashrc’, then you needn’t to set it every time.

    export CUDA_VISIBLE_DEVICES=0,1 
    
  2. tensorborad

    tensorboard is a monitor for your to trace and debug your algorithm. also visited on browser after remote server set up it.

  3. pycharm

    for set up pycharm to work on remove server, refer this post for to to set up.

    on pycharm, you can edit, run and debug your code on remote server. and it also sport matplotlib to plot figures on remote and view them in IDE.

  4. vs code

    vs code can only edit remote file, but can not run it on remote via local interface, for details, referDeveloping on a remote server

view remote UI on local

basically, you can use X11 forward to do it, just use ssh -X name@domain, on Linux, you only need basic set up .ssh/config to enable X11 forward, and to test it, just run xclock on remote server, and there will be a clock ui pop up on your local machine.

for windows user, your need install and open a X11 server first, you can install xming.

Others

TBC.

Comments