This course provides a detailed overview how to do data transformation and analysis using Apache Spark. You will learn the core knowledge and skills needed to develop applications using Apache Spark. The course covers the Apache Spark runtime and application architecture as well as the fundamental concepts of the RDD and DataFrame APIs in Spark.
Basic primers on the map reduce processing pattern and functional programming using Python are provided as well.
The course will teach Apache Spark programming using the transformations and actions available in the RDD and DataFrame APIs and within Spark SQL. Hands-on exercises are provided throughout the course to reinforce concepts.
As well as basic programming skills, additional deep dives are provided into additional programming and runtime constructs such as broadcast variables, accumulators, and RDD and DataFrame storage and lineage options.
Topics covered include:
- Apache Spark introduction and background
- Map reduce processing pattern
- Spark deployment modes
- Spark runtime and application architecture
- Understanding Spark RDDs
- Using Spark with distributed file systems and object stores
- Functional programming with Python
- Using Spark RDD transformations and actions
- RDD storage levels
- Caching, persistence and checkpointing of Spark RDDs
- Broadcast variables and accumulators
- Partitioning in Spark
- Processing RDDs with external programs
- Improving Spark application performance
- Apache hive metastore overview
- DataFrame API and Spark SQL architecture
- Using the DataFrameReader and DataFrameWriter APIs
- Utilising DataFrame API transformations and actions
- Using Apache Spark SQL
- Choosing between the RDD and DataFrame APIs
Developed by Jeffrey Aven, author of SAMS Teach Yourself Apache Spark and Data and Analytics with Spark using Python.
The Data Transformation and Analysis Using Apache Spark module is the first of three modules in the Big Data Development Using Apache Spark series, and lays the foundations for subsequent modules including “Stream and Event Processing using Apache Spark” and “Advanced Analytics using Apache Spark”.
See what former trainees are saying about AlphaZetta courses.
Additional Information
Audience | Expert This course is suitable for developers and analysts who will be working with Spark. It is ideally suited for users transitioning to a Spark runtime environment from a relational database programming or analysis background (eg, data warehouse/ETL developers or BI analysts). |
Prerequisites |
|
Objective | Attendees should, by the end of the course:
|
Format | Class |
Duration | 2 days |
Trainer | Courses are taught by Jeffrey Aven. Jeffrey Aven is a big data, open source software, and cloud computing consultant, author and instructor based in Melbourne, Australia. He has extensive experience as a technical instructor, having taught courses on Hadoop and HBase for Cloudera (awarded Cloudera Hadoop Instructor of the Year for APAC in 2013) and courses on Apache Kafka for Confluent in addition to delivering his own courses. Jeffrey is also the author of several Big Data related books including SAMS Teach Yourself Hadoop in 24 Hours, SAMS Teach Yourself Apache Spark in 24 Hours and Data Analytics with Spark using Python. In addition to his credentials as an instructor and author, Jeff has over thirty years of industry experience and has been involved in key roles with several major big data and cloud implementations over the last several years. |
Delivery Method | In-person at AlphaZetta Academy locations or on-premise for corporate groups |
Private and Corporate Training
In addition to our public seminars, workshops and courses, AlphaZetta Academy can provide this training for your organisation in a private setting at your location or ours, or online. Please enquire to discuss your needs.
Scheduled Public Courses
BOOK NOW ⇓
Private and Corporate Training
In addition to our public seminars, workshops and courses, AlphaZetta Academy can provide this training for your organisation in a private setting at your location or ours, or online. Please enquire to discuss your needs.