With big data expert and author Jeffrey Aven. The first module in the “Big Data Development Using Apache Spark” series, this course provides a detailed overview of the spark runtime and application architecture, processing patterns, functional programming using Python, fundamental API concepts, basic programming skills and deep dives into additional constructs including broadcast variables, accumulators, and storage and lineage options. Attendees will learn to understand the Apache Spark framework and runtime architecture, fundamentals of programming for Spark, gain mastery of basic transformations, actions, and operations, and be prepared for advanced topics in Spark including streaming and machine learning.
The second module in the “Big Data Development Using Apache Spark” series, this course provides the Spark streaming knowledge needed to develop real-time, event-driven or event-oriented processing applications using Apache Spark. It covers using Spark with NoSQL systems and popular messaging platforms like Apache Kafka and Amazon Kinesis. It covers the Spark streaming architecture in depth, and uses practical hands-on exercises to reinforce the use of transformations and output operations, as well as more advanced stream-processing patterns. With big data expert and author Jeffrey Aven.
With big data expert and author Jeffrey Aven. The third module in the “Big Data Development Using Apache Spark” series, this course provides the practical knowledge needed to perform statistical, machine learning and graph analysis operations at scale using Apache Spark. It enables data scientists and statisticians with experience in other frameworks to extend their knowledge to the Spark runtime environment with its specific APIs and libraries designed to implement machine learning and statistical analysis in a distributed and scalable processing environment.
This is an introduction to the exciting new field of quantum computing, including programming actual quantum computers in the cloud. Quantum computing promises to revolutionise cryptography, machine learning, cyber security, weather forecasting and a host of other mathematical and high-performance computing fields. A practical component will include writing quantum programs and executing them on simulators as well as on actual quantum computers in the cloud.
The detection of anomalies is one of the most eclectic and difficult activities in data analysis. This course builds on the basics introduced in the earlier course, and provides more advanced methods including supervised and unsupervised learning, advanced use of Benford’s Law, and more on statistical anomaly detection. Optional topics may include anomalies in time series, deception in text and the use of social network analysis to detect fraud and other undesirable behaviours.
This course is an intuitive introduction to forecasting and analysis of time-series data. We will review a range of standard forecasting methods, including ARIMA and exponential smoothing, along with standard means of measuring forecast error and benchmarking with naive forecasts, and standard pre-processing/de-trending methods such as differencing and missing value imputation. Other topics will include trend/seasonality/noise decomposition, autocorrelation, visualisation of time series, and forecasting with uncertainty.
This course presents a process and methods for an agile analytics delivery. Agile Insights reflects the capabilities required by any organization to develop insights from data and validating potential business value.Content presented describes the process, how it is executed and how it can be deployed as a standard process inside an organization. The course will also share best practices, highlight potential tripwires to watch out for, as well as roles and resources required.
Text analytics is a crucial skill set in nearly all contexts where data science has an impact, whether that be customer analytics, fraud detection, automation or fintech. In this course, you will learn a toolbox of skills and techniques, starting from effective data preparation and stretching right through to advanced modelling with deep-learning and neural-network approaches such as word2vec.
This full day workshop examines the trends in analytics deployment and developments in advanced technology. The implications of these technology developments for data foundation implementations will be discussed with examples in future architecture and deployment. This workshop presents best practices for deployment of a next generation data management implementation as the realization of analytic capability for mobile devices and consumer intelligence. We will also explore emerging trends related to big data analytics using content from Web 3.0 applications and other non-traditional data sources such as sensors and rich media.
This full-day workshop examines the trends in analytic technologies, methodologies, and use cases. The implications of these developments for deployment of analytic capabilities will be discussed with examples in future architecture and implementation. This workshop also presents best practices for deployment of next generation analytics.
Organisations often struggle with the conflicting goals of both delivering production reporting with high reliability while at the same time creating new value propositions from their data assets. Gartner has observed that organizations that focus only on mode one (predictable) deployment of analytics in the construction of reliable, stable, and high-performance capabilities will very often lag the marketplace in delivering competitive insights because the domain is moving too fast for traditional SDLC methodologies. Explorative analytics requires a very different model for identifying analytic opportunities, managing teams, and deploying into production. Rapid progress in the areas of machine learning and artificial intelligence exacerbates the need for bi-modal deployment of analytics. In this workshop we will describe best practices in both architecture and governance necessary to modernise an enterprise to enable participation in the digital economy.
Optimiser choices in determining the execution plan for complex queries is a dominant factor in the performance delivery for a data foundation environment. The goal of this workshop is to de-mystify the inner workings of cost-based optimisation for complex query workloads. We will discuss the differences between rule-based optimisation and cost-based optimisation with a focus on how a cost-based optimization enumerates and selects among possible execution plans for a complex query. The influences of parallelism and hardware configuration on plan selection will be discussed along with the importance of data demographics. Advanced statistics collection is discussed as the foundational input for decision-making within the cost-based optimiser. Performance characteristics and optimiser selection among different join and indexing opportunities will also be discussed with examples. The inner workings of the query re-write engine will be described along with the performance implications of various re-write strategies.
Big Data exploitation has the potential to revolutionise the analytic value proposition for organisations that are able to successfully harness these capabilities. However, the architectural components necessary for success in Big Data analytics are different than those used in traditional data warehousing. This workshop will provide a framework for Big Data exploitation along with recommendations for architectural deployment of Big Data solutions.