This class builds on the introductory Python class. Jupyter Notebook advanced use and customisation is covered as well as configuring multiple environments and kernels. The Numpy package is introduced for working with arrays and matrices and a deeper coverage of Pandas data analysis and manipulation methods is provided including working with time series data. Data exploration and advanced visualisations are taught using the Plotly and Seaborne libraries.
This course goes deeper into the tidyverse family of packages, with a focus on advanced data handling, as well as advanced data structures such as list columns in tibbles, and their application to model management. Another key topic is advanced functional programming with the purrr package, and advanced use of the pipe operator. Optional topics may include dplyr on databases, and use of rmarkdown and Rstudio notebooks.
With big data expert and author Jeffrey Aven. The third module in the “Big Data Development Using Apache Spark” series, this course provides the practical knowledge needed to perform statistical, machine learning and graph analysis operations at scale using Apache Spark. It enables data scientists and statisticians with experience in other frameworks to extend their knowledge to the Spark runtime environment with its specific APIs and libraries designed to implement machine learning and statistical analysis in a distributed and scalable processing environment.
The detection of anomalies is one of the most eclectic and difficult activities in data analysis. This course builds on the basics introduced in the earlier course, and provides more advanced methods including supervised and unsupervised learning, advanced use of Benford’s Law, and more on statistical anomaly detection. Optional topics may include anomalies in time series, deception in text and the use of social network analysis to detect fraud and other undesirable behaviours.
This course provides a more rigorous, mathematically based view of modern neural networks, their training, applications, strengths and weaknesses, focusing on key architectures such as convolutional nets for image processing and recurrent nets for text and time series. This course will also include use of dedicated hardware such as GPUs and multiple computing nodes on the cloud. There will also be an overview of the most common available platforms for neural computation. Some topics touched in the introduction will be revisited in more thorough detail. Optional advanced topics may include Generative Adversarial Networks, Reinforcement Learning, Transfer Learning and probabilistic neural networks.