Getting started¶
Jupyter Enterprise Gateway requires Python (Python 3.3 or greater, or Python 2.7) and is intended to be installed on a Apache Spark 2.x cluster.
The following Resource Managers are supported with the Jupyter Enterprise Gateway:
- Spark Standalone
- YARN Resource Manager - Client Mode
- YARN Resource Manager - Cluster Mode
The following kernels have been tested with the Jupyter Enterprise Gateway:
- Python/Apache Spark 2.x with IPython kernel
- Scala 2.11/Apache Spark 2.x with Apache Toree kernel
- R/Apache Spark 2.x with IRkernel
To support Scala kernels, Apache Toree must be installed. To support IPython kernels and R kernels to run in YARN containers, various packages have to be installed on each of the YARN data nodes. The simplest way to enable all the data nodes with required dependencies is to install Anaconda on all cluster nodes.
To take full advantage of security and user impersonation capabilities, a Kerberized cluster is recommended.
Enterprise Gateway Features¶
Jupyter Enterprise Gateway exposes the following features and functionality:
- Enables the ability to launch kernels on different servers thereby distributing resource utilization across the enterprise
- Pluggable framework allows for support of additional resource managers
- Secure communication from client to kernel
- Persistent kernel sessions (see Roadmap)
- Configuration profiles (see Roadmap)
- Feature parity with Jupyter Kernel Gateway
- A CLI for launching the enterprise gateway server:
jupyter enterprisegateway OPTIONS
- A Python 2.7 and 3.3+ compatible implementation
Installing Enterprise Gateway¶
For new users, we highly recommend installing Anaconda. Anaconda conveniently installs Python, the Jupyter Notebook, the IPython kernel and other commonly used packages for scientific computing and data science.
Use the following installation steps:
- Download Anaconda. We recommend downloading Anaconda’s latest Python version (currently Python 2.7 and Python 3.6).
- Install the version of Anaconda which you downloaded, following the instructions on the download page.
- Install the latest version of Jupyter Enterprise Gateway from PyPI
using
pip
(part of Anaconda) along with its dependencies.
# install using pip from pypi
pip install --upgrade jupyter_enterprise_gateway
# install using conda from conda forge
conda install -c conda-forge jupyter_enterprise_gateway
At this point, the Jupyter Enterprise Gateway deployment provides local kernel support which is fully compatible with Jupyter Kernel Gateway.
To uninstall Jupyter Enterprise Gateway…
#uninstall using pip
pip uninstall jupyter_enterprise_gateway
#uninstall using conda
conda uninstall jupyter_enterprise_gateway
Installing Kernels¶
Please follow the link below to learn more specific details about how to install/configure specific kernels with Jupyter Enterprise Gateway:
Configuring Spark Resource Managers¶
To leverage the full distributed capabilities of Spark, Jupyter Enterprise Gateway has provided deep integrarion with YARN resource manager. Having said that, EG also supports running in pseudo-distributed utilizing both YARN client or Spark Standalone modes.
Please follow the links below to learn more specific details about how to enable/configure the different modes:
Starting Enterprise Gateway¶
Very few arguments are necessary to minimally start Enterprise Gateway. The following command could be considered a minimal command:
jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0
where --ip=0.0.0.0
exposes Enterprise Gateway on the public network and --port_retries=0
ensures
that a single instance will be started.
We recommend starting Enterprise Gateway as a background task. As a result, you might find it best to create a start script to maintain options, file redirection, etc.
The following script starts Enterprise Gateway with DEBUG
tracing enabled (default is INFO
) and idle
kernel culling for any kernels idle for 12 hours where idle check intervals occur every minute. The Enterprise Gateway log can then be monitored via tail -F enterprise_gateway.log
and it can be
stopped via kill $(cat enterprise_gateway.pid)
#!/bin/bash
LOG=/var/log/enterprise_gateway.log
PIDFILE=/var/run/enterprise_gateway.pid
jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 --log-level=DEBUG > $LOG 2>&1 &
if [ "$?" -eq 0 ]; then
echo $! > $PIDFILE
else
exit 1
fi
Connecting a Notebook to Enterprise Gateway¶
NB2KG is used to connect a Notebook from a local desktop or laptop to the Enterprise Gateway instance on the Spark/YARN cluster. We strongly recommend that NB2KG v0.1.0 be used as our team has provided some security enhancements to enable for conveying the notebook user (for configurations when Enterprise Gateway is running behind a secured gateway) and allowing for increased request timeouts (due to the longer kernel startup times when interacting with the resource manager or distribution operations).
Extending the notebook launch command listed on the NB2KG repo, one might use the following…
export KG_URL=http://<ENTERPRISE_GATEWAY_HOST_IP>:8888
export KG_HTTP_USER=guest
export KG_HTTP_PASS=guest-password
export KG_REQUEST_TIMEOUT=30
export KERNEL_USERNAME=${KG_HTTP_USER}
jupyter notebook \
--NotebookApp.session_manager_class=nb2kg.managers.SessionManager \
--NotebookApp.kernel_manager_class=nb2kg.managers.RemoteKernelManager \
--NotebookApp.kernel_spec_manager_class=nb2kg.managers.RemoteKernelSpecManager
For your convenience, we have also built a docker image (elyra/nb2kg) with Jupyter Notebook, Jupyter Lab and NB2KG which can be launched by the command below:
docker run -t --rm \
-e KG_URL='http://<master ip>:8888' \
-e KG_HTTP_USER=guest \
-e KG_HTTP_PASS=guest-password \
-p 8888:8888 \
-e VALIDATE_KG_CERT='no' \
-e LOG_LEVEL=DEBUG \
-e KG_REQUEST_TIMEOUT=40 \
-e KG_CONNECT_TIMEOUT=40 \
-v ${HOME}/notebooks/:/tmp/notebooks \
-w /tmp/notebooks \
elyra/nb2kg
To invoke Jupyter Lab, simply add lab
as the last option following the image name (e.g., elyra/nb2kg lab
).