When Your Big Data Seems Too Small: Accurate Inferences Beyond the Empirical Distribution 

March 09, 2017 | 10-11 am PT

A recording of this webinar will be made available within a week of the live session.

Many of the techniques and algorithms that are used in machine learning and data sciences assume that the empirical distribution of the available data is an accurate approximation of the primary phenomena being investigated. However, when dealing with complex or high dimensional distributions, even large datasets can fail to accurately represent its core.  As examples, in large genomic datasets many rare genetic variants are unobserved, and in a large natural language corpus, many reasonable sequences of five words might not be observed.

Join Stanford’s Dr. Gregory Valiant as he discusses the difficulties of and solutions for making accurate inferences in this challenging regime, in which the empirical distribution of the available data is misleading. Learn how to extract accurate information about the underlying distribution, including information about the portion that has not been observed in the given dataset.


You will learn:
  • An intuitive approach for reasoning about the distribution that underlies a given dataset
  • Techniques that leverage this intuition, and reveal the structure of the underlying distribution---including the structure of the unseen portion of it from which no datapoints have been observed
  • Practical implications of these techniques for the analysis of genomic datasets, including how to estimate the value of sequencing additional human genomes

About the Speaker

Gregory-Valiant.jpgGregory Valiant, PhD is an Assistant Professor in Stanford's Computer Science Department. Some of his recent projects focus on designing algorithms for accurately inferring information about complex distributions, when given surprisingly little data. More broadly, his research interests are in algorithms, learning, applied probability, and statistics, and evolution. Prior to joining Stanford, Dr. Valiant was a postdoc at Microsoft Research, New England, and received his PhD from Berkeley in Computer Science, and BA in Math from Harvard.

Presented By

Stanford's Databases and the Foundations in Computer Science graduate certificate programs


Please contact us at scpd-customerservice@stanford.edu or 650-204-3984.
Free Stanford Webinar