Comprehensive Data Science Environment Setup

A comprehensive guide for setting up your complete data science environment

This guide will walk you through setting up a complete data science environment with Anaconda, Jupyter Notebooks, Python, IntelliJ IDEA, R, and RStudio. By the end, you'll have a powerful and flexible workspace for all your data science projects.

1. Installing Anaconda

Anaconda is a distribution of Python and R for scientific computing that simplifies package management and deployment.

Download and Install Anaconda

Visit the Anaconda download page and get the appropriate version for your platform. This guide focuses on Linux, but the process is similar for Windows and macOS.

Bash

After installation, restart your terminal or run:

Bash

2. Setting Up Jupyter Notebooks with Enhancements

Install Conda Extensions for Jupyter

These extensions allow you to manage conda environments directly from Jupyter:

Bash

Install Jupyter Extensions

Jupyter extensions add useful features like code formatting, table of contents, and more:

Bash

Customize Jupyter Appearance

You can customize the look and feel of Jupyter notebooks:

Bash

3. Setting Up IntelliJ IDEA for Python Development

IntelliJ IDEA with the Python plugin provides a powerful IDE for Python development.

Download and Install IntelliJ IDEA

  1. Download IntelliJ IDEA from the official website
  2. Extract the downloaded archive:
    Bash
  3. Navigate to the bin directory and run the setup script:
    Bash
  4. Follow the installation prompts
  5. Make sure to select "Create Desktop Entry" for easy access

Configure Python Support in IntelliJ IDEA

  1. Launch IntelliJ IDEA
  2. Install the Python Community Edition Plugin:
    • Go to File → Settings → Plugins
    • Search for "Python Community Edition"
    • Click Install and restart the IDE when prompted
  3. Configure your Python/Conda environment:
    • Go to File → Project Structure → SDKs
    • Click the "+" button and select "Python SDK"
    • Choose "Conda Environment" → "Existing environment"
    • Navigate to your Anaconda installation (typically in ~/anaconda3)
    • Or create a new virtual environment specific to your project

4. Setting Up R and RStudio Support

Install R Kernel for Jupyter

To use R within Jupyter notebooks:

Bash

Install RStudio (Optional)

If you prefer a dedicated R environment alongside Jupyter:

Bash

5. Launching Your Data Science Environment

Starting Jupyter Notebook

Bash

Your browser will open with the Jupyter interface. You can now create new notebooks with either Python or R kernels.

Using Jupyter Extensions

  1. In the Jupyter interface, navigate to the "Nbextensions" tab
  2. Enable the extensions you want to use
  3. Return to the "Files" tab to create or open notebooks

Working with Projects in IntelliJ IDEA

  1. Launch IntelliJ IDEA
  2. Select "New Project" or "Open"
  3. Choose "Python" as the project type
  4. Select your configured Python/Conda interpreter
  5. Start developing your Python code with full IDE support

Troubleshooting

Common Issues

  1. "Command not found" after installing Anaconda: Make sure to restart your terminal or source your .bashrc file.
  2. Missing packages in Jupyter: Ensure you've activated the correct environment.
  3. IntelliJ not recognizing Python: Verify your Python SDK configuration in Project Structure.

Environment Management

Keep track of your environments and installed packages:

Bash

This setup provides a complete data science environment with both GUI tools and command-line capabilities. You now have the flexibility to work with Python and R in both notebook and IDE formats, giving you the best of all worlds for data science projects.