- Published on
Apache SPARK Up and Running FAST with Docker
https://www.youtube.com/watch?v=Zr_FqYKC6Qc&ab_channel=CoderGrammer
Table of Contents
Here's a simple guide to get Apache Spark up and running fast with Docker. We will be using docker-compose to manage the docker containers.
Step 1: Install Docker and Docker Compose
If you haven't installed Docker and Docker Compose on your system yet, you can do it by following the official guides:
- Docker: https://docs.docker.com/engine/install/
- Docker Compose: https://docs.docker.com/compose/install/
Step 2: Setup Docker Compose File
Create a docker-compose.yml
file in your working directory and paste the following code:
version: '3'
services:
spark-master:
image: bitnami/spark:latest
environment:
- SPARK_MODE=master
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
ports:
- '8080:8080'
spark-worker:
image: bitnami/spark:latest
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark-master:7077
- SPARK_WORKER_MEMORY=1G
- SPARK_WORKER_CORES=1
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
depends_on:
- spark-master
This configuration file defines two services: spark-master
and spark-worker
. We're using the bitnami/spark:latest
Docker image which has Spark pre-installed. The ports are also exposed so that we can interact with Spark from outside the Docker container.
Step 3: Run Docker Compose
Run the following command in the same directory as the docker-compose file:
docker-compose up
This command starts two Docker containers, one for the Spark master and one for a Spark worker.
Step 4: Access Spark UI
You can access the Spark master UI in your web browser at http://localhost:8080
. You should be able to see one worker connected to the master.
That's it! You now have a simple Apache Spark cluster running in Docker. This setup is fine for development and testing, but for more serious workloads, you would want to increase the resources available to Spark and add more worker nodes.
Remember to stop the services when you're done using the following command:
docker-compose down