Level 2

Best Practices in Enterprise Information Management

The effective management of enterprise information for analytics deployment requires best practices in the areas of people, processes, and technology. In this talk we will share both successful and unsuccessful practices in these areas. The scope of this workshop will involve five key areas of enterprise information management: (1) metadata management, (2) data quality management, (3) data security and privacy, (4) master data management, and (5) data integration.

Overcoming Information Overload with Advanced Practices in Data Visualisation

In this workshop, we explore best practices in deriving insight from vast amounts of data using visualisation techniques. Examples from traditional data as well as an in-depth look at the underlying technologies for visualisation in support of geospatial analytics will be undertaken. We will examine visualisation for both strategic and operational BI.

Stars, Flakes, Vaults and the Sins of Denormalisation

Providing both performance and flexibility are often seen as contradictory goals in designing large scale data implementations. In this talk we will discuss techniques for denormalisation and provide a framework for understanding the performance and flexibility implications of various design options. We will examine a variety of logical and physical design approaches and evaluate the trade offs between them. Specific recommendations are made for guiding the translation from a normalised logical data model to an engineered-for-performance physical data model. The role of dimensional modeling and various physical design approaches are discussed in detail. Best practices in the use of surrogate keys is also discussed. The focus is on understanding the benefit (or not) of various denormalisation approaches commonly taken in analytic database designs.

Advanced Python 1

This class builds on the introductory Python class. Jupyter Notebook advanced use and customisation is covered as well as configuring multiple environments and kernels. The Numpy package is introduced for working with arrays and matrices and a deeper coverage of Pandas data analysis and manipulation methods is provided including working with time series data. Data exploration and advanced visualisations are taught using the Plotly and Seaborne libraries.

Fraud and Anomaly Detection

This course presents statistical, computational and machine-learning techniques for predictive detection of fraud and security breaches. These methods are shown in the context of use cases for their application, and include the extraction of business rules and a framework for the interoperation of human, rule-based, predictive and outlier-detection methods. Methods presented include predictive tools that do not rely on explicit fraud labels, as well as a range of outlier-detection techniques including unsupervised learning methods, notably the powerful random-forest algorithm, which can be used for all supervised and unsupervised applications, as well as cluster analysis, visualisation and fraud detection based on Benford’s law. The course will also cover the analysis and visualisation of social-network data. A basic knowledge of R and predictive analytics is advantageous.

Advanced R 1

This class builds on “Intro to R (+data visualisation)” by providing students with powerful, modern R tools including pipes, the tidyverse, and many other packages that make coding for data analysis easier, more intuitive and more readable. The course will also provide a deeper view of functional programming in R, which also allows cleaner and more powerful coding, as well as R Markdown, R Notebooks, and the shiny package for interactive documentation, browser-based dashboards and GUIs for R code.

Deep Learning and AI

This course is an introduction to the highly celebrated are of Neural Networks, popularised as “deep learning” and “AI”. The course will cover the key concepts underlying neural network technology, as well as the unique capabilities of a number of advanced deep learning technologies, including Convolutional Neural Nets for image recognition, recurrent neural nets for time series and text modelling, and new Artificial Intelligence techniques including Generative Adversarial Networks and Reinforcement Learning. Practical exercises will present these methods in some of the most popular Deep Learning packages available in Python, including Keras and Tensorflow. Trainees are expected to be familiar with the basics of machine learning from the introductory course, as well as the python language.

Stream and Event Processing using Apache Spark

With big data expert and author Jeffrey Aven. The second module in the “Big Data Development Using Apache Spark” series, this course provides the knowledge needed to develop real-time, event-driven or -oriented processing applications using Apache Spark. It covers using Spark with NoSQL systems and popular messaging platforms like Apache Kafka and Amazon Kinesis. It covers the Spark streaming architecture in depth, and uses practical hands-on exercises to reinforce the use of transformations and output operations, as well as more advanced stream-processing patterns.

Advanced Machine Learning Masterclass 1

This course is for experienced machine-learning practitioners who want to take their skills to the next level by using R to hone their abilities as predictive modellers. Trainees will learn essential techniques for real machine-learning model development, helping them to build more accurate models. In the masterclass, participants will work to deploy, test, and improve their models.

Advanced Machine Learning Masterclass 2: Random Forests

This course is for experienced machine-learning practitioners who want to take their skills to the next level by using R to hone their abilities as predictive modellers. Trainees will learn essential techniques for real machine-learning model development, helping them to build more accurate models. In the masterclass, participants will work to deploy, test, and improve their models.

Text and Language Analytics

Text analytics is a crucial skill set in nearly all contexts where data science has an impact, whether that be customer analytics, fraud detection, automation or fintech. In this course, you will learn a toolbox of skills and techniques, starting from effective data preparation and stretching right through to advanced modelling with deep-learning and neural-network approaches such as word2vec.

The Future of Analytics

This full day workshop examines the trends in analytics deployment and developments in advanced technology. The implications of these technology developments for data foundation implementations will be discussed with examples in future architecture and deployment. This workshop presents best practices for deployment of a next generation data management implementation as the realization of analytic capability for mobile devices and consumer intelligence. We will also explore emerging trends related to big data analytics using content from Web 3.0 applications and other non-traditional data sources such as sensors and rich media.

Data Science and Big Data Analytics: Leveraging Best Practices and Avoiding Pitfalls

Data science is the key to business success in the information economy. This workshop will teach you about best practices in deploying a data science capability for your organisation. Technology is the easy part; the hard part is creating the right organisational and delivery framework in which data science can be successful in your organisation. We will discuss the necessary skill sets for a successful data scientist and the environment that will allow them to thrive. We will draw a strong distinction between “Data R&D” and “Data Product” capabilities within an enterprise and speak to the different skill sets, governance, and technologies needed across these areas. We will also explore the use of open data sets and open source software tools to enable best results from data science in large organisations. Advanced data visualisation will be described as a critical component of a big data analytics deployment strategy. We will also talk about the many pitfalls and how to avoid them.

Agile Data Management Architecture

This full-day workshop examines the trends in analytic technologies, methodologies, and use cases. The implications of these developments for deployment of analytic capabilities will be discussed with examples in future architecture and implementation. This workshop also presents best practices for deployment of next generation analytics.

Innovating with Best Practices to Modernise Delivery Architecture and Governance

Organisations often struggle with the conflicting goals of both delivering production reporting with high reliability while at the same time creating new value propositions from their data assets. Gartner has observed that organizations that focus only on mode one (predictable) deployment of analytics in the construction of reliable, stable, and high-performance capabilities will very often lag the marketplace in delivering competitive insights because the domain is moving too fast for traditional SDLC methodologies. Explorative analytics requires a very different model for identifying analytic opportunities, managing teams, and deploying into production. Rapid progress in the areas of machine learning and artificial intelligence exacerbates the need for bi-modal deployment of analytics. In this workshop we will describe best practices in both architecture and governance necessary to modernise an enterprise to enable participation in the digital economy.

Modernising Your Data Warehouse and Analytic Ecosystem

This full-day workshop examines the emergence of new trends in data warehouse implementation and the deployment of analytic ecosystems.  We will discuss new platform technologies such as columnar databases, in-memory computing, and cloud-based infrastructure deployment.  We will also examine the concept of a “logical” data warehouse – including and ecosystem of both commercial and open source technologies.  Real-time analytics and in-database analytics will also be covered.  The implications of these developments for deployment of analytic capabilities will be discussed with examples in future architecture and implementation. This workshop also presents best practices for deployment of next generation analytics using AI and machine learning. 

Cost-Based Optimisation: Obtaining the Best Execution Plan for Complex Queries

Optimiser choices in determining the execution plan for complex queries is a dominant factor in the performance delivery for a data foundation environment. The goal of this workshop is to de-mystify the inner workings of cost-based optimisation for complex query workloads. We will discuss the differences between rule-based optimisation and cost-based optimisation with a focus on how a cost-based optimization enumerates and selects among possible execution plans for a complex query. The influences of parallelism and hardware configuration on plan selection will be discussed along with the importance of data demographics. Advanced statistics collection is discussed as the foundational input for decision-making within the cost-based optimiser. Performance characteristics and optimiser selection among different join and indexing opportunities will also be discussed with examples. The inner workings of the query re-write engine will be described along with the performance implications of various re-write strategies.

Optimising Your Big Data Ecosystem

Big Data exploitation has the potential to revolutionise the analytic value proposition for organisations that are able to successfully harness these capabilities. However, the architectural components necessary for success in Big Data analytics are different than those used in traditional data warehousing. This workshop will provide a framework for Big Data exploitation along with recommendations for architectural deployment of Big Data solutions.

Social Network Analysis: Practical Use Cases and Implementation

Social networking via Web 2.0 applications such as LinkedIn and Facebook has created huge interest in understanding the connections between individuals to predict patterns of churn, influencers related to early adoption of new products and services, successful pricing strategies for certain kinds of services, and customer segmentation. We will explain how to use these advanced analytic techniques with mini case studies across a wide range of industries including telecommunications, financial services, health care, retailing, and government agencies.