How to Install Apache Spark on a Local Machine using Windows

Table of Contents

Prerequisites

Before you start, make sure you have the following software installed on your Windows machine:

Java Development Kit (JDK): Apache Spark requires JDK version 8 or later. Download the latest version from Oracle's website and follow the installation instructions.
Python: Apache Spark supports Python 2.7, 3.4, and higher. Download the latest version of Python from Python's website and follow the installation instructions.
Winutils: Download the appropriate version of winutils.exe that corresponds to your installed Hadoop version from this GitHub repository. Create a directory named hadoop on your C:\ drive and place the downloaded winutils.exe file in the bin subdirectory (i.e., C:\hadoop\bin\).

Right-click on "This PC" or "Computer" and select "Properties".
Click on "Advanced system settings" and then the "Environment Variables" button.
Click "New" under "System variables" to add the following variables:
- Variable name: JAVA_HOME Variable value: (Path to your JDK installation, e.g., C:\Program Files\Java\jdk1.8.0_291)
  If you are using: C:\Program Files (x86)\ then try C:\progra~2\Java\jre1.8.0_361 check: progra~2
- Variable name: HADOOP_HOME Variable value: C:\hadoop
- Variable name: SPARK_HOME Variable value: (Path to your Spark installation, e.g., C:\Spark)
Edit the "Path" system variable and append the following paths:
- %JAVA_HOME%\bin
- %HADOOP_HOME%\bin
- %SPARK_HOME%\bin

Open a new Command Prompt.
Type spark-shell and press "Enter". If the installation is successful, you will see the Spark shell starting.
To exit the Spark shell, type :quit and press "Enter".

Congratulations! You have successfully installed Apache Spark on your local Windows machine.