Stephen Brobst

AlphaZetta Academy Course Author & Trainer

Stephen Brobst

Stephen Brobst offers a range of workshops and seminars on analytics topics for AlphaZetta Academy. Stephen is a thoroughly engaging speaker and trainer and his expertise is a fascinating to witness. Contact us today to arrange a workshop or seminar at your organisation.

Stephen is the Chief Technology Officer for Teradata Corporation. He performed his graduate work in Computer Science at the Massachusetts Institute of Technology where his Masters and PhD research focused on high-performance parallel processing. Stephen also completed an MBA with joint course and thesis work at the Harvard Business School and the MIT Sloan School of Management. He is a TDWI Fellow and has been on the faculty of The Data Warehousing Institute since 1996. During Barack Obama’s first term he was also appointed to the Presidential Council of Advisors on Science and Technology (PCAST) in the working group on Networking and Information Technology Research and Development (NITRD). In 2014 he was ranked by ExecRank as the #4 CTO in the United States (behind the CTOs from, Tesla Motors, and Intel) out of a pool of 10,000+ CTOs.

Big Mistakes to Avoid When Performing Big Data Analytics

2021-07-16T05:44:20+00:00June 24th, 2019|Tags: |

Mark Twain, a famous American author, once stated that there are “Lies, Damned Lies, and Statistics.” This phrase is used to describe the persuasive power of numbers, particularly the use of statistics, to lead people to draw incorrect conclusions. This workshop describes the subtle mistakes that can easily be made when interpreting the results from an analytic study or report. We describe logically sound processes for deciphering data using methods designed to illuminate actionable information for data scientists without distracting or misleading the knowledge worker from the relevant facts needed for effective decision-making.

Best Practices in Data Lake Deployment

2021-07-20T01:09:02+00:00June 24th, 2019|Tags: |

The concept of “Data Lake” is becoming a widely adopted best practice in constructing an analytic ecosystem. When well executed, a data lake strategy will increase analytic agility for an organisation and provide a foundation for provisioning data into both discovery and integrated data platforms. However, there are many pitfalls with this approach that can be avoided with best practices deployment. This workshop will provide guidance on how to deploy toward the “data reservoir” concept rather than ending up with an all too common “data swamp”.

The Third Wave of the Big Data Revolution: Sensors Everywhere and the Internet of Everything

2021-07-20T01:19:45+00:00June 24th, 2019|Tags: |

The next wave of Big Data Analytics (BDA) will be dominated by sensor data and interconnected devices. This third generation of BDA will dwarf first (weblog) and second (social media) generation data sources in both size and value. This seminar examines the opportunities and challenges associated with collecting, storing, and analysing data produced from the Internet of Things (IoT). We will also discuss best practices in creating value from IoT data using case study examples within both the B2C and B2B sectors.

Smart Cities Using Big Data

2021-07-19T01:46:01+00:00June 24th, 2019|Tags: |

The use of big data is a lynch pin in the deployment of smart cities of the future. The potential of big data goes far beyond the efficiencies that can be created in areas such as energy consumption, health care, education, and law enforcement. The true potential of big data is directly engaging society in governance through increased transparency and delivery of services using access to data as a foundation. This workshop will provide mini-case studies in a range of settings from developed countries such as the USA and developing countries such as Pakistan. An emphasis of the discussion will be on how city governments can use data to improve quality of life for their citizens.

Innovating with Best Practices to Modernise Delivery Architecture and Governance

2021-07-23T01:01:04+00:00May 20th, 2019|Tags: , , , , , |

Organisations often struggle with the conflicting goals of both delivering production reporting with high reliability while at the same time creating new value propositions from their data assets. Gartner has observed that organizations that focus only on mode one (predictable) deployment of analytics in the construction of reliable, stable, and high-performance capabilities will very often lag the marketplace in delivering competitive insights because the domain is moving too fast for traditional SDLC methodologies. Explorative analytics requires a very different model for identifying analytic opportunities, managing teams, and deploying into production. Rapid progress in the areas of machine learning and artificial intelligence exacerbates the need for bi-modal deployment of analytics. In this workshop we will describe best practices in both architecture and governance necessary to modernise an enterprise to enable participation in the digital economy.

Modernising Your Data Warehouse and Analytic Ecosystem

2021-07-23T01:01:37+00:00May 20th, 2019|Tags: , , , , |

This full-day workshop examines the emergence of new trends in data warehouse implementation and the deployment of analytic ecosystems.  We will discuss new platform technologies such as columnar databases, in-memory computing, and cloud-based infrastructure deployment.  We will also examine the concept of a “logical” data warehouse – including and ecosystem of both commercial and open source technologies.  Real-time analytics and in-database analytics will also be covered.  The implications of these developments for deployment of analytic capabilities will be discussed with examples in future architecture and implementation. This workshop also presents best practices for deployment of next generation analytics using AI and machine learning. 

Cost-Based Optimisation: Obtaining the Best Execution Plan for Complex Queries

2021-07-23T01:00:08+00:00May 20th, 2019|Tags: , , , |

Optimiser choices in determining the execution plan for complex queries is a dominant factor in the performance delivery for a data foundation environment. The goal of this workshop is to de-mystify the inner workings of cost-based optimisation for complex query workloads. We will discuss the differences between rule-based optimisation and cost-based optimisation with a focus on how a cost-based optimization enumerates and selects among possible execution plans for a complex query. The influences of parallelism and hardware configuration on plan selection will be discussed along with the importance of data demographics. Advanced statistics collection is discussed as the foundational input for decision-making within the cost-based optimiser. Performance characteristics and optimiser selection among different join and indexing opportunities will also be discussed with examples. The inner workings of the query re-write engine will be described along with the performance implications of various re-write strategies.

Optimising Your Big Data Ecosystem

2021-07-23T01:02:03+00:00May 18th, 2019|Tags: , , , , |

Big Data exploitation has the potential to revolutionise the analytic value proposition for organisations that are able to successfully harness these capabilities. However, the architectural components necessary for success in Big Data analytics are different than those used in traditional data warehousing. This workshop will provide a framework for Big Data exploitation along with recommendations for architectural deployment of Big Data solutions.

Go to Top