Published on

Apache SPARK Up and Running FAST with Docker

https://www.youtube.com/watch?v=Zr_FqYKC6Qc&ab_channel=CoderGrammer

Table of Contents

Here's a simple guide to get Apache Spark up and running fast with Docker. We will be using docker-compose to manage the docker containers.

Step 1: Install Docker and Docker Compose

If you haven't installed Docker and Docker Compose on your system yet, you can do it by following the official guides:

Step 2: Setup Docker Compose File

Create a docker-compose.yml file in your working directory and paste the following code:

version: '3'
services:
  spark-master:
    image: bitnami/spark:latest
    environment:
      - SPARK_MODE=master
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
    ports:
      - '8080:8080'

  spark-worker:
    image: bitnami/spark:latest
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://spark-master:7077
      - SPARK_WORKER_MEMORY=1G
      - SPARK_WORKER_CORES=1
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
    depends_on:
      - spark-master

This configuration file defines two services: spark-master and spark-worker. We're using the bitnami/spark:latest Docker image which has Spark pre-installed. The ports are also exposed so that we can interact with Spark from outside the Docker container.

Step 3: Run Docker Compose

Run the following command in the same directory as the docker-compose file:

docker-compose up

This command starts two Docker containers, one for the Spark master and one for a Spark worker.

Step 4: Access Spark UI

You can access the Spark master UI in your web browser at http://localhost:8080. You should be able to see one worker connected to the master.

That's it! You now have a simple Apache Spark cluster running in Docker. This setup is fine for development and testing, but for more serious workloads, you would want to increase the resources available to Spark and add more worker nodes.

Remember to stop the services when you're done using the following command:

docker-compose down