- Published on
The power of Pipenv and Jupyter Notebook
Table of Contents
- Introduction
- Why Use Pipenv with Jupyter?
- Prerequisites
- Step 1: Install Pipenv
- Step 2: Create Your Project Directory
- Step 3: Install Jupyter and Essential Packages
- Step 4: Register Your Environment as a Jupyter Kernel
- Step 5: Launch Jupyter Notebook
- Step 6: Configure VSCode for Pipenv and Jupyter
- Complete Example: Data Analysis Project
- Best Practices
- Troubleshooting Common Issues
- Advanced Topics
- Performance Tips
- Comparison: Pipenv vs Other Tools
- Summary
- Related Topics
Introduction
Managing Python dependencies and environments can be challenging, especially for data science projects that require multiple packages with complex dependencies. Pipenv combines the best of pip and virtualenv into a single tool, providing deterministic builds and simplified dependency management. Jupyter Notebook remains the gold standard for interactive data analysis, featuring live code execution, rich visualizations, and narrative documentation.
This guide will show you how to integrate Pipenv with Jupyter Notebook and VSCode, creating a robust and reproducible development environment for your data science projects.
Why Use Pipenv with Jupyter?
Before diving into the setup, let's understand the benefits of this combination:
Benefits of Pipenv
- Deterministic Builds: Pipenv generates a
Pipfile.lockthat locks all dependencies to specific versions, ensuring consistent environments across different machines - Automatic Virtual Environments: No need to manually create or activate virtual environments
- Security Scanning: Built-in vulnerability scanning with
pipenv check - Simplified Workflow: Combines
pipandvirtualenvcommands into a single interface - Dev vs Production Dependencies: Separate dependency groups for development and production
Benefits of Jupyter Notebook
- Interactive Computing: Execute code cells individually and see immediate results
- Rich Output: Display charts, tables, images, and interactive widgets inline
- Narrative Documentation: Mix code with Markdown for comprehensive analysis documentation
- Reproducible Research: Share notebooks containing both code and results
Prerequisites
Before starting, ensure you have:
- Python 3.8 or higher installed on your system
- pip package manager (usually comes with Python)
- Basic command line knowledge
- VSCode (optional, but recommended for enhanced development experience)
To verify your Python installation:
python --version
# or on some systems
python3 --version
Step 1: Install Pipenv
First, install Pipenv globally using pip:
# On Windows
pip install pipenv
# On macOS/Linux
pip3 install pipenv
# Alternative: Install using Homebrew (macOS)
brew install pipenv
Verify the installation:
pipenv --version
You should see output like: pipenv, version 2023.x.x
Troubleshooting Installation
If you encounter a "command not found" error, you may need to add Python's Scripts directory to your PATH:
Windows:
# Add to PATH (adjust Python version as needed)
setx PATH "%PATH%;%USERPROFILE%\AppData\Local\Programs\Python\Python311\Scripts"
macOS/Linux:
# Add to ~/.bashrc or ~/.zshrc
export PATH="$HOME/.local/bin:$PATH"
# Reload shell configuration
source ~/.bashrc # or source ~/.zshrc
Step 2: Create Your Project Directory
Create a dedicated directory for your project:
# Create project directory
mkdir data-science-project
cd data-science-project
Initialize a new Pipenv environment with a specific Python version:
# Use Python 3.11 (or your preferred version)
pipenv --python 3.11
This command:
- Creates a new virtual environment
- Generates a
Pipfilein your project directory - Links the virtual environment to your project
You can also use the system's default Python:
pipenv --python 3
Understanding the Pipfile
After initialization, you'll see a Pipfile in your directory:
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
[dev-packages]
[requires]
python_version = "3.11"
This file tracks your project dependencies and Python version requirements.
Step 3: Install Jupyter and Essential Packages
Install Jupyter Notebook and other essential data science packages:
# Install Jupyter with all necessary components
pipenv install jupyter ipykernel
# Install common data science packages
pipenv install numpy pandas matplotlib seaborn scikit-learn
# Install development packages (optional)
pipenv install --dev black pylint autopep8
The --dev flag installs packages only needed during development (linters, formatters, etc.) but not in production.
Understanding Dependency Groups
Your Pipfile now contains two sections:
[packages]
jupyter = "*"
ipykernel = "*"
numpy = "*"
pandas = "*"
matplotlib = "*"
seaborn = "*"
scikit-learn = "*"
[dev-packages]
black = "*"
pylint = "*"
autopep8 = "*"
To install only production dependencies on a deployment server:
pipenv install --deploy --ignore-pipfile
Step 4: Register Your Environment as a Jupyter Kernel
This is the critical step that many tutorials miss. Jupyter needs to know about your Pipenv virtual environment.
First, activate your Pipenv shell:
pipenv shell
You should see your prompt change to indicate the virtual environment is active, something like:
(data-science-project) C:\Users\YourName\data-science-project>
Now register the environment as a Jupyter kernel:
python -m ipykernel install --user --name=data-science-project --display-name="Python (data-science-project)"
Let's break down this command:
python -m ipykernel install: Runs the ipykernel installer--user: Installs for your user account only (no admin rights needed)--name=data-science-project: Internal kernel name (used in kernel.json)--display-name="Python (data-science-project)": The name you'll see in Jupyter's kernel selector
Verify Kernel Installation
List all available Jupyter kernels:
jupyter kernelspec list
You should see output like:
Available kernels:
data-science-project C:\Users\YourName\AppData\Roaming\jupyter\kernels\data-science-project
python3 C:\Users\YourName\AppData\Roaming\jupyter\kernels\python3
Step 5: Launch Jupyter Notebook
With your Pipenv shell still active, launch Jupyter Notebook:
jupyter notebook
This will:
- Start the Jupyter server
- Open your default browser to
http://localhost:8888 - Display the Jupyter file browser
Create a New Notebook
- Click "New" in the top-right corner
- Select "Python (data-science-project)" from the dropdown
- A new notebook opens using your Pipenv environment
Verify Your Environment
In the first cell of your notebook, run:
import sys
print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")
# Check if packages are available
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
print(f"\nNumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print("✓ All packages loaded successfully!")
You should see output showing your Pipenv environment's Python path and package versions.
Step 6: Configure VSCode for Pipenv and Jupyter
VSCode provides an excellent integrated experience for Jupyter notebooks. Here's how to set it up:
Install VSCode Extensions
Install these essential extensions:
- Python (by Microsoft) - Core Python support
- Jupyter (by Microsoft) - Jupyter notebook support
- Pylance (by Microsoft) - Advanced Python language features
To install via command line:
code --install-extension ms-python.python
code --install-extension ms-toolsai.jupyter
code --install-extension ms-python.vscode-pylance
Configure VSCode to Use Your Pipenv Environment
Open your project folder in VSCode:
code .Open the Command Palette (
Ctrl+Shift+PorCmd+Shift+P)Type "Python: Select Interpreter"
Select the interpreter from your Pipenv virtual environment (it should show the path with "data-science-project" in it)
Alternatively, VSCode usually auto-detects Pipenv environments and shows a popup asking if you want to use the detected environment.
Verify VSCode Configuration
Create a test file .vscode/settings.json in your project:
{
"python.defaultInterpreterPath": "${workspaceFolder}/.venv",
"jupyter.notebookFileRoot": "${workspaceFolder}",
"python.linting.enabled": true,
"python.linting.pylintEnabled": true,
"python.formatting.provider": "black",
"editor.formatOnSave": true
}
Create and Run Notebooks in VSCode
Create a new file with
.ipynbextension:analysis.ipynbVSCode opens it in notebook mode
Click "Select Kernel" in the top-right
Choose "Python (data-science-project)"
Start coding! VSCode provides:
- IntelliSense and autocomplete
- Inline documentation
- Variable explorer
- Interactive debugging
Complete Example: Data Analysis Project
Let's create a complete data analysis project to demonstrate the workflow:
Project Setup
# Create and initialize project
mkdir sales-analysis
cd sales-analysis
pipenv --python 3.11
# Install dependencies
pipenv install jupyter ipykernel pandas numpy matplotlib seaborn
# Register kernel
pipenv shell
python -m ipykernel install --user --name=sales-analysis --display-name="Python (Sales Analysis)"
Create Sample Data
Create a Python script generate_data.py:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
# Set random seed for reproducibility
np.random.seed(42)
# Generate sample sales data
dates = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')
products = ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headphones']
regions = ['North', 'South', 'East', 'West']
data = []
for date in dates:
for _ in range(np.random.randint(5, 20)):
record = {
'date': date,
'product': np.random.choice(products),
'region': np.random.choice(regions),
'quantity': np.random.randint(1, 10),
'unit_price': np.random.uniform(20, 500),
}
record['total_sales'] = record['quantity'] * record['unit_price']
data.append(record)
# Create DataFrame and save
df = pd.DataFrame(data)
df.to_csv('sales_data.csv', index=False)
print(f"Generated {len(df)} sales records")
print(df.head())
Run it to generate sample data:
pipenv run python generate_data.py
Create Analysis Notebook
Launch Jupyter and create sales_analysis.ipynb:
# Cell 1: Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
print("✓ Libraries loaded successfully")
# Cell 2: Load Data
df = pd.read_csv('sales_data.csv')
df['date'] = pd.to_datetime(df['date'])
print(f"Dataset shape: {df.shape}")
print(f"\nFirst few rows:")
df.head()
# Cell 3: Data Summary
print("=== Dataset Summary ===\n")
print(df.info())
print("\n=== Statistical Summary ===\n")
print(df.describe())
print("\n=== Missing Values ===\n")
print(df.isnull().sum())
# Cell 4: Sales by Product
product_sales = df.groupby('product')['total_sales'].sum().sort_values(ascending=False)
plt.figure(figsize=(10, 6))
product_sales.plot(kind='bar', color='steelblue')
plt.title('Total Sales by Product', fontsize=16, fontweight='bold')
plt.xlabel('Product', fontsize=12)
plt.ylabel('Total Sales ($)', fontsize=12)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
print("\nTop selling products:")
print(product_sales)
# Cell 5: Sales Trends Over Time
daily_sales = df.groupby('date')['total_sales'].sum()
plt.figure(figsize=(14, 6))
plt.plot(daily_sales.index, daily_sales.values, linewidth=2, color='darkgreen')
plt.title('Daily Sales Trend (2023)', fontsize=16, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Total Sales ($)', fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Calculate monthly average
monthly_avg = daily_sales.resample('M').mean()
print("\nMonthly Average Sales:")
print(monthly_avg)
# Cell 6: Regional Performance
regional_sales = df.groupby('region').agg({
'total_sales': 'sum',
'quantity': 'sum',
'date': 'count'
}).rename(columns={'date': 'transaction_count'})
print("=== Regional Performance ===\n")
print(regional_sales)
# Create subplots
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# Sales by region
regional_sales['total_sales'].plot(kind='pie', ax=axes[0], autopct='%1.1f%%')
axes[0].set_title('Sales Distribution by Region', fontsize=14, fontweight='bold')
axes[0].set_ylabel('')
# Transaction count by region
regional_sales['transaction_count'].plot(kind='bar', ax=axes[1], color='coral')
axes[1].set_title('Number of Transactions by Region', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Region', fontsize=12)
axes[1].set_ylabel('Transaction Count', fontsize=12)
axes[1].tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.show()
# Cell 7: Advanced Analysis - Correlation
# Calculate average metrics per transaction
avg_metrics = df.groupby('date').agg({
'quantity': 'mean',
'unit_price': 'mean',
'total_sales': 'sum'
})
correlation = avg_metrics.corr()
plt.figure(figsize=(8, 6))
sns.heatmap(correlation, annot=True, cmap='coolwarm', center=0,
square=True, linewidths=1, fmt='.2f')
plt.title('Correlation Matrix of Sales Metrics', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()
print("\nCorrelation Matrix:")
print(correlation)
# Cell 8: Export Results
# Create summary report
summary = {
'total_revenue': df['total_sales'].sum(),
'total_transactions': len(df),
'average_transaction_value': df['total_sales'].mean(),
'best_selling_product': product_sales.index[0],
'best_performing_region': regional_sales['total_sales'].idxmax(),
'analysis_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
}
summary_df = pd.DataFrame([summary])
summary_df.to_csv('sales_summary_report.csv', index=False)
print("=== Analysis Summary ===\n")
for key, value in summary.items():
print(f"{key.replace('_', ' ').title()}: {value}")
print("\n✓ Summary report exported to 'sales_summary_report.csv'")
This complete example demonstrates:
- Loading and exploring data
- Performing statistical analysis
- Creating visualizations
- Generating summary reports
- Using Pandas, NumPy, Matplotlib, and Seaborn together
Best Practices
1. Use Specific Package Versions for Production
Lock your dependencies for reproducibility:
# Generate Pipfile.lock
pipenv lock
# Install from lock file (ensures exact versions)
pipenv sync
2. Separate Development and Production Dependencies
# Development-only packages
pipenv install --dev pytest black pylint jupyter
# Production packages
pipenv install pandas numpy scikit-learn
3. Use Environment Variables for Configuration
Create a .env file:
DATABASE_URL=postgresql://localhost/mydb
API_KEY=your_secret_key
DEBUG=True
Pipenv automatically loads .env files:
import os
db_url = os.getenv('DATABASE_URL')
4. Share Your Environment
To share your project with others:
# Share Pipfile and Pipfile.lock (commit to git)
git add Pipfile Pipfile.lock
git commit -m "Add Python dependencies"
# Others can recreate the environment with:
pipenv install
5. Keep Your Environment Clean
Remove unused packages:
# Uninstall a package
pipenv uninstall package-name
# Clean up unused dependencies
pipenv clean
6. Regular Security Checks
# Check for security vulnerabilities
pipenv check
# Update packages to latest compatible versions
pipenv update
Troubleshooting Common Issues
Issue 1: Kernel Not Found in Jupyter
Symptoms: Your custom kernel doesn't appear in Jupyter's kernel list.
Solution:
# List installed kernels
jupyter kernelspec list
# If missing, reinstall
pipenv shell
python -m ipykernel install --user --name=myproject --display-name="Python (MyProject)"
# Restart Jupyter
jupyter notebook
Issue 2: Wrong Python Version
Symptoms: Notebook uses different Python version than expected.
Solution:
# Remove existing environment
pipenv --rm
# Recreate with specific Python version
pipenv --python 3.11
# Reinstall packages
pipenv install
# Re-register kernel
pipenv shell
python -m ipykernel install --user --name=myproject
Issue 3: Import Errors in Notebook
Symptoms: ModuleNotFoundError even though package is installed.
Solution:
# In notebook, check which Python is being used
import sys
print(sys.executable)
# Should point to your Pipenv environment
# If not, select the correct kernel from Kernel > Change Kernel
Alternatively, restart the kernel:
- In Jupyter:
Kernel > Restart - In VSCode: Click "Restart" in the notebook toolbar
Issue 4: VSCode Not Detecting Pipenv Environment
Symptoms: VSCode doesn't show your Pipenv environment in the interpreter list.
Solution:
# Find your Pipenv environment path
pipenv --venv
# Copy the path, then in VSCode:
# 1. Open Command Palette (Ctrl+Shift+P)
# 2. Type "Python: Select Interpreter"
# 3. Click "Enter interpreter path..."
# 4. Paste the path and add /bin/python (Linux/Mac) or \Scripts\python.exe (Windows)
Issue 5: Pipenv Lock Taking Too Long
Symptoms: pipenv install or pipenv lock hangs for extended periods.
Solution:
# Skip lock generation for faster installs (development only)
pipenv install --skip-lock package-name
# Clear cache and retry
pipenv lock --clear
Issue 6: "Warning: Python X was not found on your system"
Symptoms: Pipenv can't find the specified Python version.
Solution:
# Check available Python versions
where python # Windows
which python3 # Linux/Mac
# Use absolute path
pipenv --python C:\Python311\python.exe # Windows
pipenv --python /usr/bin/python3.11 # Linux/Mac
# Or install the required Python version with pyenv
pyenv install 3.11.5
pipenv --python 3.11.5
Issue 7: Kernel Dies When Running Code
Symptoms: Notebook kernel crashes during execution.
Solution:
# Check available memory
# Increase memory limits for Jupyter
# In Jupyter config (~/.jupyter/jupyter_notebook_config.py):
# c.NotebookApp.iopub_data_rate_limit = 10000000
# Or disable the limit temporarily:
jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000
# For memory-intensive operations, process data in chunks:
# Instead of loading entire dataset
df = pd.read_csv('large_file.csv')
# Use chunking
chunks = pd.read_csv('large_file.csv', chunksize=10000)
for chunk in chunks:
# Process each chunk
process(chunk)
Advanced Topics
Using JupyterLab Instead of Jupyter Notebook
JupyterLab is the next-generation interface with more features:
# Install JupyterLab
pipenv install jupyterlab
# Launch JupyterLab
pipenv shell
jupyter lab
Multiple Kernels for Different Projects
You can register multiple environments with different package sets:
# Project 1: Machine Learning
cd ml-project
pipenv install tensorflow scikit-learn
pipenv shell
python -m ipykernel install --user --name=ml-env --display-name="Python (ML)"
# Project 2: Data Visualization
cd ../viz-project
pipenv install plotly dash bokeh
pipenv shell
python -m ipykernel install --user --name=viz-env --display-name="Python (Viz)"
# Now both kernels are available in any Jupyter instance
jupyter kernelspec list
Remove Old Kernels
Clean up kernels you no longer need:
# List all kernels
jupyter kernelspec list
# Remove specific kernel
jupyter kernelspec uninstall myproject
# Remove all unused kernels
jupyter kernelspec list | grep -v python3 | awk '{print $1}' | xargs -I {} jupyter kernelspec uninstall {} -y
Using Pipenv Scripts
Define custom commands in your Pipfile:
[scripts]
notebook = "jupyter notebook"
lab = "jupyter lab"
test = "pytest tests/"
lint = "black . && pylint src/"
analyze = "python scripts/analyze.py"
Run scripts with:
pipenv run notebook
pipenv run test
pipenv run analyze
Integration with Requirements.txt
If you need to generate requirements.txt for legacy systems:
# Generate requirements.txt from Pipfile.lock
pipenv requirements > requirements.txt
# Generate dev requirements
pipenv requirements --dev > requirements-dev.txt
Performance Tips
1. Use NumPy and Pandas Efficiently
# Slow: Iterating over DataFrame rows
for index, row in df.iterrows():
df.at[index, 'new_col'] = row['a'] + row['b']
# Fast: Vectorized operations
df['new_col'] = df['a'] + df['b']
2. Load Only Necessary Columns
# Load all columns (slower)
df = pd.read_csv('large_file.csv')
# Load specific columns only (faster)
df = pd.read_csv('large_file.csv', usecols=['date', 'price', 'quantity'])
3. Use Appropriate Data Types
# Convert to categorical for memory efficiency
df['category'] = df['category'].astype('category')
# Use downcast for numeric types
df['integer_col'] = pd.to_numeric(df['integer_col'], downcast='integer')
4. Clear Outputs Before Committing
Notebook outputs can bloat your Git repository:
# Install nbstripout
pipenv install --dev nbstripout
# Configure for your repository
nbstripout --install
# Now all commits automatically strip outputs
Comparison: Pipenv vs Other Tools
Pipenv vs Conda
| Feature | Pipenv | Conda |
|---|---|---|
| Python-specific | Yes | No (multi-language) |
| Dependency resolution | Excellent | Good |
| System packages | No | Yes (includes compilers, libraries) |
| Binary packages | No | Yes |
| Learning curve | Lower | Higher |
| Use case | Python-only projects | Data science with complex dependencies |
When to use Conda: When you need non-Python dependencies (like CUDA, MKL, system libraries) or pre-compiled binary packages.
When to use Pipenv: For pure Python projects where you want simple, reproducible environments.
Pipenv vs Poetry
| Feature | Pipenv | Poetry |
|---|---|---|
| Project initialization | Basic | Rich (with pyproject.toml) |
| Dependency resolution | Good | Excellent |
| Build system | No | Yes |
| Publishing to PyPI | No | Yes |
| Speed | Slower | Faster |
When to use Poetry: When building and publishing Python packages.
When to use Pipenv: For application development and data science projects.
Summary
You now have a complete setup for using Pipenv with Jupyter Notebook and VSCode:
- Pipenv manages your Python dependencies and virtual environments
- Jupyter Notebook/Lab provides interactive computing capabilities
- VSCode offers a modern IDE experience with integrated notebook support
- ipykernel bridges your Pipenv environment with Jupyter
Key Takeaways
- Always register your Pipenv environment as a Jupyter kernel
- Use
pipenv installfor production dependencies andpipenv install --devfor development tools - Lock your dependencies with
pipenv lockfor reproducibility - VSCode automatically detects Pipenv environments
- Keep your environments project-specific for isolation
- Use
pipenv checkregularly for security updates
Quick Reference Commands
# Setup
pipenv --python 3.11 # Create environment
pipenv install jupyter ipykernel # Install Jupyter
pipenv shell # Activate environment
python -m ipykernel install --user --name=myproject # Register kernel
# Daily workflow
pipenv shell # Activate environment
jupyter notebook # Start Jupyter
# OR
code . # Open VSCode
# Maintenance
pipenv update # Update packages
pipenv check # Security check
pipenv clean # Remove unused packages
pipenv --rm # Delete environment
Next Steps
Now that you have your environment set up, explore these topics:
- Advanced Pandas: Data manipulation, merging, and reshaping
- Machine Learning with scikit-learn: Build predictive models
- Deep Learning with TensorFlow/PyTorch: Neural networks in Jupyter
- Interactive Visualizations: Plotly, Bokeh, and Altair
- Big Data with PySpark: Analyze massive datasets
- Automated Reporting: Papermill for notebook parameterization
Related Topics
- Pipenv Overview - Understanding Pipenv fundamentals
- Jupyter Notebooks - Learn about Jupyter's capabilities
- Pandas - Essential data analysis library for Jupyter
- Apache Spark with Docker - Scale your data processing
Last updated: October 2025
Related Articles
Pipenv Cheat Sheet
Complete Pipenv cheat sheet and guide. Master Python dependency management with Pipenv - from basic commands to advanced workflows. Learn how Pipenv solves common problems with pip, virtualenv, and requirements.txt while providing deterministic builds and enhanced security.
Python Cheat Sheet
Awesome Python frameworks. A curated list of awesome Python frameworks, libraries, software and resources.
PyTorch for Python
PyTorch is a popular open-source library primarily used for deep learning applications but also offers versatility in general machine learning areas.