Useful Links for Data engineering. Why reinvent the wheel when you can use the best practices and tools that are already available.
All Posts
- Apache Spark and PySpark are the most popular big data processing frameworks. They are used to process large datasets in a distributed manner.
- Connect to a PostgreSQL database using PySpark. Learn how to use the PySpark DataFrameReader to load data from a PostgreSQL database.
- How to Install Apache Spark on a Local Machine using Windows. This guide provides step-by-step instructions to install Apache Spark on a Windows machine.
- Data processing pipeline patterns. Linear, branching, looping, parallel, and hybrid data processing pipeline patterns are essential tools for handling and managing data in the modern world. They offer a structured approach to data processing, enabling data to flow efficiently from one stage to another, while minimizing bottlenecks and ensuring the quality of the end result.
All Posts
- data-engineering (16)
- python (12)
- data-science (5)
- pyspark (5)
- apache-spark (4)
- react (4)
- spark (3)
- tutorial (3)
- big-data (3)
- data-processing (3)
- nlp (3)
- nextjs (3)
- pipenv (2)
- pandas (2)
- data-analysis (2)
- databricks (2)
- javascript (2)
- data-pipeline (1)
- jupyter (1)
- libraries (1)
- numpy (1)
- matplotlib (1)
- scikit-learn (1)
- tensorflow (1)
- pytorch (1)
- keras (1)
- seaborn (1)
- sqlalchemy (1)
- airflow (1)
- data-pipelines (1)
- docker (1)
- distributed-computing (1)
- postgresql (1)
- database (1)
- sql (1)
- dataframes (1)
- pipeline (1)
- patterns (1)
- machine-learning (1)
- data-analytics (1)
- redis (1)
- roadmaps (1)
- learning (1)
- software-development (1)
- nextui (1)
- ui (1)
- tailwindcss (1)
- webdev (1)