Implementing a kernel specification#
If you find yourself implementing a kernel launcher, you’ll need a way to make that kernel and kernel launcher available to applications. This is accomplished via the kernel specification or kernelspec.
Kernelspecs reside in well-known directories. For Enterprise Gateway, we generally recommend they reside in /usr/local/share/jupyter/kernels
where each entry in this directory is a directory representing the name of the kernel. The kernel specification is represented by the file kernel.json
, the contents of which essentially indicate what environment variables should be present in the kernel process (via the env
stanza) and which command (and arguments) should be issued to start the kernel process (via the argv
stanza). The JSON also includes a metadata
stanza that contains the process_proxy configuration, along with which process proxy class to instantiate to help manage the kernel process’s lifecycle.
One approach the sample Enterprise Gateway kernel specifications take is to include a shell script that actually issues the spark-submit
request. It is this shell script (typically named run.sh
) that is referenced in the argv
stanza.
Here’s an example from the spark_python_yarn_cluster
kernel specification:
{
"language": "python",
"display_name": "Spark - Python (YARN Cluster Mode)",
"metadata": {
"process_proxy": {
"class_name": "enterprise_gateway.services.processproxies.yarn.YarnClusterProcessProxy"
},
"debugger": true
},
"env": {
"SPARK_HOME": "/usr/hdp/current/spark2-client",
"PYSPARK_PYTHON": "/opt/conda/bin/python",
"PYTHONPATH": "${HOME}/.local/lib/python3.8/site-packages:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip",
"SPARK_OPTS": "--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/home/${KERNEL_USERNAME}/.local --conf spark.yarn.appMasterEnv.PYTHONPATH=${HOME}/.local/lib/python3.8/site-packages:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/conda/bin:$PATH ${KERNEL_EXTRA_SPARK_OPTS}",
"LAUNCH_OPTS": ""
},
"argv": [
"/usr/local/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh",
"--RemoteProcessProxy.kernel-id",
"{kernel_id}",
"--RemoteProcessProxy.response-address",
"{response_address}",
"--RemoteProcessProxy.public-key",
"{public_key}",
"--RemoteProcessProxy.port-range",
"{port_range}",
"--RemoteProcessProxy.spark-context-initialization-mode",
"lazy"
]
}
where run.sh
issues spark-submit
specifying the kernel launcher as the “application”:
eval exec \
"${SPARK_HOME}/bin/spark-submit" \
"${SPARK_OPTS}" \
"${IMPERSONATION_OPTS}" \
"${PROG_HOME}/scripts/launch_ipykernel.py" \
"${LAUNCH_OPTS}" \
"$@"
For container-based environments, the argv
may instead reference a script that is meant to create the container pod (for Kubernetes). For these, we use a template file that operators can adjust to meet the needs of their environment. Here’s how that kernel.json
looks:
{
"language": "python",
"display_name": "Python on Kubernetes",
"metadata": {
"process_proxy": {
"class_name": "enterprise_gateway.services.processproxies.k8s.KubernetesProcessProxy",
"config": {
"image_name": "elyra/kernel-py:VERSION"
}
},
"debugger": true
},
"env": {},
"argv": [
"python",
"/usr/local/share/jupyter/kernels/python_kubernetes/scripts/launch_kubernetes.py",
"--RemoteProcessProxy.kernel-id",
"{kernel_id}",
"--RemoteProcessProxy.port-range",
"{port_range}",
"--RemoteProcessProxy.response-address",
"{response_address}",
"--RemoteProcessProxy.public-key",
"{public_key}"
]
}
When using the launch_ipykernel
launcher (aka the Python kernel launcher), subclasses of ipykernel.kernelbase.Kernel
can be launched. By default, this launcher uses the classname "ipykernel.ipkernel.IPythonKernel"
, but other subclasses of ipykernel.kernelbase.Kernel
can be specified by adding a --kernel-class-name
parameter to the argv
stanza. See Invoking subclasses of ipykernel.kernelbase.Kernel
for more information.
As should be evident, kernel specifications are highly tuned to the runtime environment so your needs may be different, but should resemble the approaches we’ve taken so far.