Session Details

Finding Islands, Gaps, and Clusters in Complex Data

Edward Pollack
75 minutes
Business Intelligence and Data Warehousing
DBA, Database Developer, BI/Analytics Practitioner
Gaps and islands problems are often treated as academic problems that are discussed without relevant real-life problems to solve. By subdividing a data set into meaningful groups of islands, we can perform analytics against it that would otherwise be slow, error-prone, or seemingly impossible! Data clustering can be applied to complex data sets in order to answer questions about winning streaks, financial performance, monitoring/alerting, goal achievement, and more! Data quality becomes critical when crunching data in this fashion. We address duplicate data, bad data, NULL, and any other unexpected weirdness that a user might throw at us. This is a fast-paced session that delves into methods that can be applied to any data. A complete set of historical baseball records will be used as our imperfect sample data for everything from a player's hitting streak to the longest winning streak by a team on Wednesdays at night.
In-depth experience with TSQL. Comfort with common table expressions, subqueries, and recursion. Basic data analytics experience.