This course provides attendees with practical knowledge required to perform statistical, machine learning and graph analysis operations at scale using Apache Spark.

The Apache Spark family includes APIs and libraries designed to implement machine learning and statistical analysis operations in a distributed processing environment, offering horizontal scalability and parallel computing power. The “Advanced Analytics using Apache Spark” module is designed to enable data scientists and statisticians who have experience in other statistical or machine learning frameworks to extend their knowledge and experience to the Spark runtime environment.

The course introduces R on Spark (using the SparkR package) to common R functions using the Spark framework, this includes hands on examples of how to use the Spark runtime with RStudio. The course continues on to introduce the Spark MLlib and Spark ML APIs, including practical exercises implementing regression, classification and clustering algorithms as well as feature extraction operations using Spark. Collaborative filtering applications such as recommendation engines are covered as well.

Additionally, the course provides an introduction to graph processing and analysis using Spark.

Topics include:

  • Using the Spark R API
  • Using Spark with RStudio
  • Machine learning using the Spark MLlib API
  • Machine learning using the Spark ML API
  • Feature extraction using Spark
  • Linear algebra using Spark
  • Classification using Spark
  • Clustering using Spark
  • Regression using Spark
  • Building a recommender using Spark
  • Using Spark with Jupyter
  • Graph processing and analysis using Spark

Developed by Jeffrey Aven, author of SAMS Teach Yourself Apache Spark and Data and Analytics with Spark using Python, this course will provide the core knowledge and skills needed to develop applications using Apache Spark.

The “Advanced Analytics using Apache Spark” module is the third of three modules in the “Big Data Development using Apache Spark” series, following the “Data Transformation and Analysis using Apache Spark” and “Stream and Event Processing using Apache Spark” modules.

See what former trainees are saying about AlphaZetta courses.

Additional Information

This course is suitable for data scientists and statisticians working with data at scale using Apache Spark. Attendees should have a solid understanding of machine learning concepts and have implemented algorithms using other tools.
ObjectiveAttendees should, by the end of the course:

  • Understand the SparkR package and its capabilities
  • Understand the implementation of machine learning algorithms in Spark
  • Be able to train and deploy models using the Spark MLlib and Spark ML libraries
  • Understand graph analysis using Spark
Duration2 days
TrainerCourses are taught by Jeffrey Aven.

Jeffrey Aven is a big data, open source software, and cloud computing consultant, author and instructor based in Melbourne, Australia.

Jeffrey has extensive experience as a technical instructor, having taught courses on Hadoop and HBase for Cloudera (awarded Cloudera Hadoop Instructor of the Year for APAC in 2013) and courses on Apache Kafka for Confluent in addition to delivering his own courses.

Jeffrey is also the author of several Big Data related books including SAMS Teach Yourself Hadoop in 24 Hours, SAMS Teach Yourself Apache Spark in 24 Hours and Data Analytics with Spark using Python.

In addition to his credentials as an instructor and author, Jeff has over thirty years of industry experience and has been involved in key roles with several major big data and cloud implementations over the last several years.

Delivery MethodIn-person at AlphaZetta Academy locations or on-premise for corporate groups

Our online courses run as live online meetings using Zoom for the video meeting part and Microsoft virtual computers for the practical components. The benefit of having a live trainer for online training is you can ask questions, obtain mentoring from the trainer and interact with classmates.

Course participants will require the following technologies and online accounts. Please check that your setup satisfies these requirements:

  • Course participants will require the following technologies and online accounts:
  • Reliable computer (Windows, Mac or Linux)
  • Webcam (to help facilitate the mentoring aspect of our training)
  • Reliable internet access
  • A quiet space
  • Zoom video conferencing software and Zoom account (register and pre-install the software at
  • Microsoft account in order to access the virtual lab PCs (Existing or new account. There’s nothing to be installed, you just need an account to sign-in with.)

Meals and refreshments

Face-to-face courses: Catered morning tea and lunch are provided on both days of the course. Please notify us at least a week ahead if you have any special dietary requirements.


Use to email us any questions about the course, including requests for more detail, or for specific content you would like to see covered, or queries regarding prerequisites and suitability.
If you would like to attend but for any reason cannot, please also let us know.


Course material may vary from advertised due to demands and learning pace of attendees. Additional material may be presented, along with or in place of advertised.

Cancellations and refunds

You can get a full refund if you cancel 14 days or more before the course starts. No refunds will be issued for cancellations made less than 14 days before the course starts.

Frequently asked questions (FAQ)

Do I need to bring my own computer?
This is dependent on the venue. Please check the course event page.

Why do I need to provide a shipping address?
For online courses, we need an address to send you the course notes that you need for the course.

Private and Corporate Training

In addition to our public seminars, workshops and courses, AlphaZetta Academy can provide this training for your organisation in a private setting at your location or ours, or online. Please enquire to discuss your needs.

Scheduled Public Courses

[fusion_events cat_slug=”advanced-analytics-using-apache-spark” past_events=”no” number_posts=”10″ columns=”1″ column_spacing=”” picture_size=”cover” padding_top=”10px” padding_right=”5px” padding_bottom=”10px” padding_left=”5px” content_length=”” excerpt_length=”” strip_html=”” pagination=”no” hide_on_mobile=”small-visibility,medium-visibility,large-visibility” class=”academy_events” id=”” /]

Private and Corporate Training

In addition to our public seminars, workshops and courses, AlphaZetta Academy can provide this training for your organisation in a private setting at your location or ours, or online. Please enquire to discuss your needs.

Other Apache Spark Courses

Other Data Science Curriculum Electives