Shiny and the Ops long road

05 Jul 2020

Shiny and the Ops long journey

last update: 2020/07/05

Introduction

I am a (Dev)Ops working on a HTC and bioinformatic platform. R (maybe with python) is the favorite programming language for our users.

Let's be clear about that: shiny is not designed for production. That being said, R was not designed for production too. I remember some time ago, when I used R with environment modules on our cluster, and had some issues with Rscript (...): the PATH to the R binary seemed to be hardcoded within the Rscript binary during the install... We still advise our users to use R CMD BATCH instead of Rscript, even if our R version are now managed with both environment modules and singularity...

However, R is doing great jobs for data processing, statistics, or for bioinformatics, and more recently with packages like rmarkdown and knitr. Due to performance issues, main time processing packages are coded with C++, and then bring back to R, thanks to the Rcpp package. Rmpi or multithreaded R programs, thanks to the snow or parallel packages, are also a long way to go as a basic user, or even at a system administrator on a cluster.

Before shiny, some people used cgi-bin scripts or rserve to call R scripts remotely from a web server.

So, shiny emerged and the Web applications came into the hand of bioinformaticians and statisticians (who usually use R as their standard programing language). That was not a great move for webmasters, Ops, or even webdesigners or even frontend developpers (as the web interface is always the same).

When shiny has been launched by the rstudio team, it was the same problem as R: it is not designed for production. Like R, it is, by default, monothreaded. Nevertheless, some low level packages, like httpuv (used by shiny) or promises allowing asynchronous calls, with the help of future or httr (which provides an API to R), alleviate drastically this issue. It helped us to produce more production compliant web applications.

That is being said, it is not sufficiant to deploy a shiny application in a real production environment (even if a basic shiny application would nicely run if you have roughly only less than ~5 clients (!)).

Shiny needs stickiness and a web server that could manage both web services (which is quite new in the web world) + http standard requests.

These last years, I saw many try to remove the R/shiny locks.

The shiny basic web application

Rstudio team did a little howto on how to configure nginx or Apache for a R Shiny application.

This works for a demo application, but do not expect it to work as it is with a high load traffic.

We are using it locally, on our MBB platform, with apache on a shiny server, in order to provide a basic demo shiny server.

Apache basic configuration for shiny application under http://shiny.domain.tld/myapp URL

  RewriteEngine On

  ############ MyApp ################
  RewriteCond %{SERVER_NAME} =shiny.domain.tld
  RewriteRule ^/myapp/ https://%{SERVER_NAME}%{REQUEST_URI} [END,QSA,R=permanent]
  RewriteCond %{HTTP:Upgrade} =websocket
  RewriteRule /myapp/(.*)     ws://IP.IP.IP.IP:3838/$1  [P,L]
  RewriteCond %{HTTP:Upgrade} !=websocket
  RewriteRule /myapp/(.*)     http://IP.IP.IP.IP:3838/$1 [P,L]
  ProxyPass /myapp/ http://IP.IP.IP.IP:3838/
  ProxyPassReverse /myapp/ http://IP.IP.IP.IP:3838/
  ########## End MyApp ##############

  RedirectMatch permanent ^/myapp$ /myapp/
  ProxyRequests Off

You should also look carefully to timeout and keepalive configurations (value in seconds). Of course, that is also true for nginx.

Timeout 300
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 5

You can imagine many applications by just changing the shiny default port and proxying the same way with apache. Then, you can have nice URLs, instead of having one port by shiny application and all the firewall issues.

A configuration more complicated to serve more clients

Ok, we have a working shiny server. That is not great, but hey, it works !

Now, you have many choices :

shinyproxy,
shiny-server,
the hard way.

We have tested all these solutions, staying with the free editions. If you don't want to put your hands inside complicated OpenSource stacks, you should stop right now and look into the enterprise plan editions of shiny-server or shinyproxy.

shiny-server works great for a single server application (physical or VM, that does not matter). If you need something that could scale great, with many time consuming R applications, then you should prefer shinyproxy. Note that https is not possible in shiny-server free edition, although it looks like a regular minimal nginx configuration (see bellow).

However, you can use those 2 solutions in free plans with quite nice results. Yet, with service level agreements (SLA) increasing, production constraints, like nowadays real web applications, that won't be enough.

A free shiny-server example

A basic shiny-server configuration

run_as shiny;
server {
  listen 8001;
  ## https not working : shiny pro only...
  #ssl /etc/shiny-server/server.key /etc/shiny-server/server.cert;
  ## neither nginx default conf.
  #ssl_certificate /etc/shiny-server/server.cert;
  #ssl_certificate_key /etc/shiny-server/server.key;
  # Define a location at the base URL
  location / {
    # Host the directory of Shiny Apps stored in this directory
    site_dir /opt/shiny;
    # Log all Shiny output to files in this directory
    log_dir /var/log/shiny-server;
    # When a user visits the base URL rather than a particular application,
    # an index of the applications available in this directory will be shown.
    directory_index off;
  }
}

A free shinyproxy

Shinyproxy is great to serve shiny application with LDAP authentication. We are using it here with the help of a docker swarm cluster made up of 6 machines.

I won't enter too much into the details right now, but we are using a basic docker swarm, with portainer, a docker registry and a NAS server with NFS + a local dedicated network + a web dedicated PHP page in order to access this service.

Shinyproxy is well documented. However, that is java, quite opaque, and it is not really fast, especially with containers. Moreover, even if it looks to be opensource, developpers are mainly concerned by solving the problem they want (eg: they seem to prefer diving into k8s issues, rather than swarm ones).

Multiple apache or nginx proxy with loadbalancing

Apache

Here comes the complicated stuff. Let's say you are on apache and want to serve at least 5 parallel R sessions. You will need to have 5 apache threads dedicated to shiny.

Configuration for apache here is quite longer

  Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED
  <Proxy "balancer://myhttpcluster">
    BalancerMember "http://IP.IP.IP.IP:9090" route=1
    BalancerMember "http://IP.IP.IP.IP:9091" route=2
    BalancerMember "http://IP.IP.IP.IP:9092" route=3
    BalancerMember "http://IP.IP.IP.IP:9093" route=4
    BalancerMember "http://IP.IP.IP.IP:9094" route=5
  </Proxy>
  
  <Proxy "balancer://mywscluster">
    BalancerMember ws://IP.IP.IP.IP:9090 route=1
    BalancerMember ws://IP.IP.IP.IP:9091 route=2
    BalancerMember ws://IP.IP.IP.IP:9092 route=3
    BalancerMember ws://IP.IP.IP.IP:9093 route=4
    BalancerMember ws://IP.IP.IP.IP:9094 route=5
  </Proxy>
  
  ProxyPass        "/myapp/" "balancer://myhttpcluster"  stickysession=ROUTEID
  ProxyPassReverse "/myapp/" "balancer://mywscluster"    stickysession=ROUTEID
  
  ############### myapp with load balancing ################
  RewriteCond %{SERVER_NAME} =www.domain.tld
  RewriteRule ^/myapp/ https://%{SERVER_NAME}%{REQUEST_URI} [END,QSA,R=permanent]
  RewriteCond %{HTTP:Upgrade} =websocket
  RewriteRule /myapp/(.*)     balancer://mywscluster/$1  [P,L]
  RewriteCond %{HTTP:Upgrade} !=websocket
  RewriteRule /myapp/(.*)     balancer://myhttpcluster/$1 [P,L]

As you can see, the multiple proxying threads for both web services and HTTP requests considerably increase this configuration. Now imagine doing this for 20 threads, and 20 shiny applications !!

Apache allows you to use lb (aka loadbalancer) with a list of servers in a txt file or even a script. My colleague tried all those solutions. We had three dedicated machines configured for this purpose.

Apache and `lb` scripts

Apache configuration:

  #RewriteMap lb "rnd:/var/www/serverlist.txt"
  #RewriteMap lb "prg:/var/www/get_a_working_server.sh"
  RewriteMap lb "prg:/var/www/get_a_working_shiny_server.sh"

serverlist.txt

servers www.domain.tld:8080|IP.IP.IP.IP:4123|IP.IP.IP.IP:4124|IP2.IP2.IP2.IP2:PORT

We extensively used netcat in the next scripts to check remote ports:

get_a_working_server.sh:

#!/bin/bash
stdbuf -i0 -o0
while read request
do 
    list=($(nc -znv IP.IP.IP.IP 8080-8090 2>&1 | grep open | cut -f3 -d' ' | awk '{print "www.domain.tld:"$1}') )
    list+=($(nc -znv IP1.IP1.IP1.IP1 4123-4140 2>&1 | grep open | cut -f3 -d' ' | awk '{print "www2.domain.tld:"$1}') )
    list+=($(nc -znv IP2.IP2.IP2.IP2 4123-4140 2>&1 | grep open | cut -f3 -d' ' | awk '{print "www1.domain.tld:"$1}') )

    RANGE=${#list[*]}

    number=$RANDOM
    let "number %= $RANGE"
    server=${list[$number]}
    echo $server
done < /dev/stdin

Finally, that is what we used during some times : get_a_working_shiny_server.sh:

#!/bin/bash

while read request
do
   list=()
   #list=($(nc -zv 127.0.0.1 4123-4140 2>&1 | grep succeeded | cut -f3,4 -d' ' --output-delimiter ":") )
   #list+=($(nc -zv www1.domain.tld 4123-4140 2>&1 | grep succeeded | cut -f3,4 -d' ' --output-delimiter ":") )
   #list+=($(nc -zv www2.domain.tld 4123-4140 2>&1 | grep succeeded | cut -f3,4 -d' ' --output-delimiter ":") )
   for host in www1.domain.tld www2.domain.tld
   #for host in www2.domain.tld
   #for host in www1.domain.tld
   do
    for app in `seq 1 10`
    do
     list+=($host:8080/myapp$app)
    done
   done

   RANGE=${#list[*]}

   number=$RANDOM
   let "number %= $RANGE"
   server=${list[$number]}
   echo $server


done < /dev/stdin

This solution is nice, but you can have https or firewall issues due to multiple servers and ports.

However, that being said, you may be able to solve this with another layer of frontend/proxy, like HAproxy (eg. see this interesting message here on stackoverflow).

Nginx

Here, we used one big host server. We will use ip_hash to loadbalance sessions, as it will keep sticky sessions, based on the ip address. For a true nginx loadbalancer, we would need Nginx Plus.

nginx configuration with a single shiny app served many times

nginx.conf:

user shiny;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
	worker_connections 768;
}
http {
	sendfile on;
	tcp_nopush on;
	tcp_nodelay on;
	keepalive_timeout 300;
	types_hash_max_size 2048;
	include /etc/nginx/mime.types;
	default_type application/octet-stream;
	ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
	ssl_prefer_server_ciphers on;
	access_log /var/log/nginx/access.log;
	error_log /var/log/nginx/error.log;
	gzip on;
    map $http_upgrade $connection_upgrade {
        default upgrade;
        ''      close;
      }
    upstream myapp {
        ip_hash;
        server IP.IP.IP.IP:8001;
        server IP.IP.IP.IP:8002;
        server IP.IP.IP.IP:8003;
        server IP.IP.IP.IP:8004;
        server IP.IP.IP.IP:8005;
        server IP.IP.IP.IP:8006;
        server IP.IP.IP.IP:8007;
        server IP.IP.IP.IP:8008;
        server IP.IP.IP.IP:8009;
        server IP.IP.IP.IP:8010;
        server IP.IP.IP.IP:8011;
        server IP.IP.IP.IP:8012;
        server IP.IP.IP.IP:8013;
        server IP.IP.IP.IP:8014;
        server IP.IP.IP.IP:8015;
        server IP.IP.IP.IP:8016;
        server IP.IP.IP.IP:8017;
        server IP.IP.IP.IP:8018;
        server IP.IP.IP.IP:8019;
        server IP.IP.IP.IP:8020;
        server IP.IP.IP.IP:8021;
        server IP.IP.IP.IP:8022;
        server IP.IP.IP.IP:8023;
        server IP.IP.IP.IP:8024;
        server IP.IP.IP.IP:8025;
        server IP.IP.IP.IP:8026;
        server IP.IP.IP.IP:8027;
        server IP.IP.IP.IP:8028;
        server IP.IP.IP.IP:8029;
        server IP.IP.IP.IP:8030;
        keepalive 300;
    }
	include /etc/nginx/conf.d/*.conf;
	include /etc/nginx/sites-enabled/*;
}

Site configuration:


server {
    listen 80 default_server;
    listen [::]:80 default_server;
    listen 443 ssl default_server;
    listen [::]:443 ssl default_server;
    ssl_certificate /etc/letsencrypt/live/www.domain.tld    /fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/www.domain.tld    /privkey.pem; # managed by Certbot
    root /opt/shinyy;
    index index.html index.htm index.nginx-debian.html;
    server_name www.domain.tld     myapp.domain.tld;
    location / {
        proxy_set_header   Host             $host;
        proxy_set_header   X-Real-IP        $remote_addr;
        proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_buffering off;
        client_max_body_size       10m;
        client_body_buffer_size    128k;
        proxy_connect_timeout      300;
        proxy_send_timeout         300;
        proxy_read_timeout         300;
        proxy_buffer_size          4k;
        proxy_buffers              4 32k;
        proxy_busy_buffers_size    64k;
        proxy_temp_file_write_size 64k;
        
        proxy_pass http://myapp;
        proxy_redirect      / $scheme://$host/;
    }

Then, you will need to create all the 30 shiny servers (port 8000 to 8030) in /opt/shiny. You can have the same content served 30 times using symbolic links. Then, using a basic launch script like this:

#!/usr/bin/env bash

for i in {1..30}
do
    if [ $i -lt 10 ] ;then
        /sbin/runuser -s /bin/bash -l shiny -c ". /usr/local/shiny/.Renviron && LANG=en_US.UTF-8 /usr/bin/Rscript -e \"shiny::runApp('/opt/shiny/myapp_$i', port=800$i, host='IP.IP.IP.IP')\" 1>&2 >> /usr/local/shiny/myapp_${i}.log &"
    else
        /sbin/runuser -s /bin/bash -l shiny -c ". /usr/local/shiny/.Renviron && LANG=en_US.UTF-8 /usr/bin/Rscript -e \"shiny::runApp('/opt/shiny/myapp_$i', port=80$i, host='IP.IP.IP.IP')\" 1>&2 >> /usr/local/shiny/myapp_${i}.log &"
    fi
done

That solution is quite nice, but it will be hit by a major issue: the necessary timeout to remove the lock on each of nginx proxy threads.

Also keep in mind that your web user, here, shiny needs also to have full permissions on your DocumentRoot (root in nginx), all the /opt/shiny subdirectories, especially if you serve static contents in it (*.ccs, *.js, data, pictures...).

Singularity instance for simple shiny application

I also experienced shiny instance through singularity instance + nginx proxying. It works fine for a single instance but I am not sure if it could scale fine.

Conversion from a dockerfile is quite easy. Main advantage is to be able to launch it from a standard user (non root / sudo).

Please note that I still don't know if it could scale extensively as we used it only for old applications, but I thought it was a nice solution for a single shiny application.

Traefik came into the game

It is been a while I am looking at Traefik. Traefik is a modern web load balancer especially designed for containers that could handle both web services and http requests. The other usual loadbalancers or proxying solutions, like the previous one we saw, apache, nginx, or even HAproxy are not performing as smoothly as traefik in our case; that is to say with web services + http(s) requests + stickiness sessions + containers.

After reading someone that made a similar solution, I was definitively sure that was possible. I also commented this article as QuiPasseParLà, explaining my point of view about this. I wanted to remove the shinyproxy, as it is not intended to deliver fast small web applications (without intensive workloads).

Traefik could be really complex to configure. So here is my solution, referenced in this thread.

The downside of this configuration is the lack of authentication feature (except with TraefikEE), and the fact that it should run on one host. As I said before, for real complex applications with heavy load, we are using shinyproxy.

Nevertheless, here are some articles on how to bypass those issues with traefik :

Authentication:
Traefik and let's encrypt + domain wildcard certicates:

Or, maybe, you can use another ID solution provider, within a dedicated container...