Published on

How to Install Apache Spark on a Local Machine using Windows

Table of Contents

Prerequisites

Before you start, make sure you have the following software installed on your Windows machine:

  1. Java Development Kit (JDK): Apache Spark requires JDK version 8 or later. Download the latest version from Oracle's website and follow the installation instructions.

  2. Python: Apache Spark supports Python 2.7, 3.4, and higher. Download the latest version of Python from Python's website and follow the installation instructions.

  3. Winutils: Download the appropriate version of winutils.exe that corresponds to your installed Hadoop version from this GitHub repository. Create a directory named hadoop on your C:\ drive and place the downloaded winutils.exe file in the bin subdirectory (i.e., C:\hadoop\bin\).

Step 1: Download Apache Spark

  1. Visit the Apache Spark official website.
  2. Select the latest stable release of Apache Spark.
  3. Choose the package type as "Pre-built for Apache Hadoop".
  4. Click the "Download" button to download the Spark package.

Step 2: Extract Apache Spark

  1. Navigate to the downloaded Spark package (usually in the "Downloads" folder).
  2. Extract the contents of the package using a tool like 7-Zip.
  3. Move the extracted folder to a desired location (e.g., C:\Spark).

Step 3: Set Environment Variables

  1. Right-click on "This PC" or "Computer" and select "Properties".
  2. Click on "Advanced system settings" and then the "Environment Variables" button.
  3. Click "New" under "System variables" to add the following variables:
    • Variable name: JAVA_HOME Variable value: (Path to your JDK installation, e.g., C:\Program Files\Java\jdk1.8.0_291)

      If you are using: C:\Program Files (x86)\ then try C:\progra~2\Java\jre1.8.0_361 check: progra~2

    • Variable name: HADOOP_HOME Variable value: C:\hadoop

    • Variable name: SPARK_HOME Variable value: (Path to your Spark installation, e.g., C:\Spark)

  4. Edit the "Path" system variable and append the following paths:
    • %JAVA_HOME%\bin
    • %HADOOP_HOME%\bin
    • %SPARK_HOME%\bin

Step 4: Test Apache Spark Installation

  1. Open a new Command Prompt.
  2. Type spark-shell and press "Enter". If the installation is successful, you will see the Spark shell starting.
  3. To exit the Spark shell, type :quit and press "Enter".

Congratulations! You have successfully installed Apache Spark on your local Windows machine.