Thomas Goossens

Geographer/Data-analyst/Coder

stackoverflow spotify github linkedin email
Using R with Docker
Mar 1, 2018
8 minutes read

Why should you do that ? There are two main reasons to use R in conjunction with Docker. First, it allows you to quickly and easily share your work wathever the OS and R configuration of your collaborators. Hassle free collaboration ! Second, it allows you to work in an isolated environment. This means that you will never pollute your OS and e.g. run in time-consuming re-installation procedures due to broken configuration. In case of OS crash, simply relaunch your Docker R container with a single command (more about containers below) and you are ready to work !

This tutorial is an introduction to R with Docker. It it not an extensive description of the enormous amount of features and all the complexity of Docker. It’s rather a good base to get started that I’ve written based on my own R development needs.

What is a Docker container ?

Docker is the piece of software that allows you to run containers.

From the official Docker website :

A container image is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings. Available for both Linux and Windows based apps, containerized software will always run the same, regardless of the environment. Containers isolate software from its surroundings, for example differences between development and staging environments and help reduce conflicts between teams running different software on the same infrastructure.

This container approach has many advantages compares to the use of virtual machines : lightweight, quick and modular.

In the Docker terminology, a containers actually means a running instance of an image.

Again, from the official Docker Website :

Docker images are the basis of containers. An Image is an ordered collection of root filesystem changes and the corresponding execution parameters for use within a container runtime. An image typically contains a union of layered filesystems stacked on top of each other. An image does not have state and it never changes.

Docker installation instructions

You know why you should use Docker in the context of your R work and you want to install it now ! Well, to do it, simply follow the installation instructions on the Docker official website or follow this nice Digital Ocean tutorial.

Before we dive into the R part, you will need to understand some essential Docker concepts.

Essential Docker concepts & commands

Each image has its own name and ID. You can list all your available Docker images and get their name and ID using the image command :

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
agrometeor          latest              fea4eeec5c2a        10 days ago         2.41GB

To run an image as a container, simply use the run command with the image ID or name :

$ docker run <IMAGE-NAME>

Note that this command can receive many optional parameters (we will see an example later).

You can also run a container from an image which is hosted on Docker Hub. Docker will automatically download it on your computer and run it as a container once it is downloaded (to use this feature, you will first need to create a Docker hub account).

For geospatial R work you could for example run the image named rocker/geospatial which contains Linux, R, Rstudio and the most famous R spatial packages and their OS dependencies :

$ docker run rocker/geospatial

You can of course run multiple different images simultanesously but you can also run a single image simultaneously in multiple separate containers. To list all your running Docker containers and get their name and ID, use the ps command. Note that the name is randomly generated by Docker.

$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS                    NAMES
b18f77625a00        agrometeor          "/init"             About an hour ago   Up About an hour    0.0.0.0:8787->8787/tcp   silly_roentgen

Running containers use computing ressources. To stop and remove a running container use the rm command :

$ docker rm -f <CONTAINER-ID>

Pay attention that when you stop a container, all the work that has been done inside the container is lost ! This is on purpose and we will later see the proper and efficient way to save your R developments made inside a container. If you want to save modifications made inside a container (e.g. adding a R library and its OS dependencies) you have to commit your container. But this is out of the scope of this tutorial. If you are interested, you can read the corresponding doc.

If you need it, you can explore the file system of the running container (similarly to what you do when you are connected to a server using ssh conection) :

$ docker exec -t -i <CONTAINER-ID> /bin/bash

Docker is not limited to images existing on Docker Hub. It allows you to create your images with the configuration of your own. Creativity is the limit. Creating a Docker image requires a Dockfile which is simply a configuration file that tells Docker what to put in your image. For example, you can find the Dockfile that was used to create the rocker/geospatial image that we mentioned earlier on github . To build an image from a dockfile you simply open a terminal in the folder containing the dockfile and execute the build command with the name you want to attribute to your image (don’t forget the “.” at the end !) :

$ docker build -t <IMAGE-NAME> .

There a lot of ressources on the web that explain how to create your own images. Check my selection in the further reading section at the end of the post.

In case you are sure you will not anymore run an image as container(s), you can delete it to save some space on your computer :

$ docker rmi <IMAGE_NAME:VERSION/IMAGE-ID>

And to delete all images (really ?!) :

$ docker rmi $(docker images -qf "dangling=true")

Using RStudio inside a Docker container and saving your work

Let’s dive in the latest part of this tutorial : running R inside a container. It’s actually pretty simple. It involves 2 steps :

  1. Choosing the pre-build R oriented Docker image you want to use
  2. Running it as a container with optional parameters

Let’s say you need to make some R developments made easier with the tidyverse family packages. To do this you will download the pre-built rocker/tidyverse from Docker hub using the command pull (note the similarity with git):

$ docker pull rocker/tidyverse

Remember that we have learned that once closed, containers loose all the modifications you have made within it. So, how to save your R developments made within a container ? The trick is to actually mount your project folder from your host computer to the container. This is achieved by passing optional parameters to the run command.

If you want to run a container from the rocker/tidyverse image with an R project located in your host computer at /home/yourUsername/Rprojects/yourProject/ and work in RStudio, use the run command with these optional parameters :

$ docker run -w /home/rstudio/ rm -p 8787:8787 -v /yourUsername/Rprojects/yourProject/:/home/rstudio/ rocker/tidyverse

Docker will instantiate a new container from the rocker/tidyverse image and make your project folder available to the container by mounting it. All the modifications that you made to your mounted host folder from your container will be effective in your host machine. So once you stop your container, don’t worry, your modifications will be saved !

To launch your container RStudio install, open a web-browser and navigate to http://localhost:8787. You habitual RStudio interface will be launched within a few seconds and your mounted folder will appear in the files pane. Congratulations, you are now ready to work within a dockerized RStudio install !

In most of the cases, before running an image, you will need to customise it so that it reflects your own needs. Customising an image requires to edit its Dockerfile and rebuild the image as mentioned earlier.

To keep git versioned Dockerfile -s of your images, you can push them to Github. Hosting your Dockerfile on Github offers you a nice feature : automated builds. Once enabled, each time you push a modification of your Dockerfile to Github, Docker will rebuild your image and make it ready to be pulled by others.

You can share this very specific R environment with your co-workers. First, share them this tutorial and then share your image. For this purpose, you have two solutions :

  1. Sending them the corresponding Dockerfile and let them build the image on their machine (more complex)
  2. Upload your image to Docker hub (manually or with the automated build feature) and simply send them the name of your image so that then can you it immediately use it with the run command

Conclusion

You have learned how to use Docker to run your own customized R isolated environment inside a container and how to share this specific environment with your colleagues.

If you want to try a my pokyah/agrometeor container, have a look at its repository. There you will also learn how to create a custom bash command to launch your containers.

In a next tutorial, I’ll explain you how to run a container able to connect to an external postgreSQL database.

Tags:

Categories:

Back to posts


comments powered by Disqus