About Me
I am currently a postdoc in CSE at University of Washington, Seattle, working with
Prof. Sham Kakade.
My research interests broadly span the areas of Machine Learning, Statistical Inference, and Crowdsourcing.
I graduated from the Department of Electrical Engineering, California Institute of Technology
(Caltech). I was adviced by Prof. Babak
Hassibi.
I was supported by the Faculty for the Future fellowship from 20132015, awarded by the Schlumberger Foundation.
Before Caltech,
I obtained my undergraduate degree (B. Tech) from Indian Institute of Technology Madras (IIT Madras) with a
major in Electrical Engineering and a minor in Physics.
News
 (Nov, 2019) Invited talk at U. Chicago Machine Learning Seminar Series.
 (Oct, 2019) Invited participant at the Cornell ORIE Annual Young Researchers Workshop 2019.
 (Oct, 2019) Invited participant at the Rising Stars in EECS 2019 workshop, UIUC.
 (Oct, 2019) Invited talk at UMD.
 (Oct, 2019) Invited talk at NYU.
 (Oct, 2019) Invited talk at Columbia.
 (Aug, 2019) Invited talk at Berkeley BLISS Seminar.
 (Aug, 2019) Invited talk at Stanford Theory Lunch.
 (Aug, 2019) Invited talk at Stanford Information Theory Forum.
 (Jun, 2019) Invited talk at Caltech EE Systems Seminar.
 (Jun, 2019) Invited talk at UC Riverside.
 (Apr, 2019) Our paper on optimal learning population of parameters has been accepted to ICML 2019.
 (Jan, 2019) Our paper on correcting for bias in inference from sparse data has been accepted to the Web Conference 2019.
 (Nov, 2018) Invited talk at CMU Machine Learning Seminar.
Publications
[Google Scholar Page]
Preprints
Peerreviewed conferences
 Maximum Likelihood Estimation for Learning Populations of Parameters
 Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, Sham Kakade
 International Conference on Machine Learning (ICML), 2019.
 [pdf] [Abstract]
 The Illusion of Change: Correcting for Bias when Inferring Changes in Sparse, SocietalScale Data
 Gabriel Cadamuro, Ramya Korlakai Vinayak, Joshua Blumenstock, Sham Kakade, Jacob N. Shapiro
 The Web Conference (WWW), 2019.
 [pdf] [Abstract]
 Also presented at the AI for Social Good ICML 2019 Workshop
 Tensorbased crowdsourced clustering via triangle queries
 Ramya Korlakai Vinayak, Tijana Zrnic, Babak Hassibi.
 International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2017.
 [pdf] [Abstract]
 Crowdsourced Clustering: Querying Edges vs Triangles
 Ramya Korlakai Vinayak, Babak Hassibi.
 Neural Information Processing Systems (NeurIPS), 2016.
 [pdf] [Abstract]
 Similarity Clustering in the Presence of Outliers: Exact Recovery via Convex Program
 Ramya Korlakai Vinayak, Babak Hassibi.
 International Symposium on Information Theory (ISIT), 2016.
 [pdf] [Abstract] [code]
 Graph Clustering With Missing Data: Convex Algorithms and Analysis
 Ramya Korlakai Vinayak, Samet Oymak, Babak Hassibi.
 Neural Information Processing Systems (NeurIPS), 2014.
 [pdf] [Abstract]
 Sharp Performance Bounds for Graph Clustering via Convex Optimization
 Ramya Korlakai Vinayak, Samet Oymak, Babak Hassibi.
 International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014.
 [pdf] [Abstract]
Workshops
Thesis
Abstracts
 Ramya Korlakai Vinayak, Weihao Kong, Sham Kakade.
 Arxiv 2019 (preprint).

Paired estimation of change in parameters of interest over a population plays a central role in several application domains including those in the social sciences, epidemiology, medicine and biology. In these domains, the size of the population under study is often very large, however, the number of observations available per individual in the population is very small (\emph{sparse observations}) which makes the problem challenging. Consider the setting with $N$ independent individuals, each with unknown parameters $(p_i, q_i)$ drawn from some unknown distribution on $[0, 1]^2$. We observe $X_i \sim \text{Bin}(t, p_i)$ before an event and $Y_i \sim \text{Bin}(t, q_i)$ after the event. Provided these paired observations, $\{(X_i, Y_i) \}_{i=1}^N$, our goal is to accurately estimate the \emph{distribution of the change in parameters}, $\delta_i := q_i  p_i$, over the population and properties of interest like the \emph{$\ell_1$magnitude of the change} with sparse observations ($t\ll N$).
We provide \emph{information theoretic lower bounds} on the error in estimating the distribution of change and the $\ell_1$magnitude of change. Furthermore, we show that the following two step procedure achieves the optimal error bounds: first, estimate the full joint distribution of the paired parameters using the maximum likelihood estimator (MLE) and then estimate the distribution of change and the $\ell_1$magnitude of change using the joint MLE.
Notably, and perhaps surprisingly, these error bounds are of the same order as the minimax optimal error bounds for learning the \emph{full} joint distribution itself (in Wasserstein1 distance);
in other words, estimating the magnitude of the change of parameters over the population is, in a minimax sense, as difficult as estimating the full joint distribution itself.
 Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, Sham Kakade.
 International Conference on Machine Learning (ICML) 2019, Long Beach, CA.

Consider a setting with $N$ independent individuals, each with an unknown
parameter, $p_i \in [0, 1]$ drawn from some unknown distribution $P^\star$.
After observing the outcomes of $t$ independent Bernoulli trials, i.e., $X_i
\sim \text{Binomial}(t, p_i)$ per individual, our objective is to
accurately estimate $P^\star$. This problem arises in numerous
domains, including the social sciences, psychology, healthcare, and
biology, where the size of the population under study is usually large
while the number of observations per individual is often limited.
Our main result shows that, in the regime where $t <<N$, the maximum
likelihood estimator (MLE) is both statistically minimax optimal and
efficiently computable. Precisely, for sufficiently large $N$, the MLE
achieves the information theoretic optimal error bound of
$\mathcal{O}(\frac{1}{t})$ for $t < c\log{N}$, with regards to the
earth mover's distance (between the estimated and true distributions).
More generally, in an exponentially large interval of $t$ beyond $c
\log{N}$, the MLE achieves the minimax error bound of
$\mathcal{O}(\frac{1}{\sqrt{t\log N}})$. In contrast, regardless of
how large $N$ is, the naive "plugin" estimator for this problem only
achieves the suboptimal error of $\Theta(\frac{1}{\sqrt{t}})$.
 Gabriel Cadamuro, Ramya Korlakai Vinayak, Joshua Blumenstock, Sham Kakade, Jacob N. Shapiro.
 The Web Conference 2019, San Francisco, CA.

Societalscale data is playing an increasingly prominent role in social science
research; examples from research on geopolitical events include questions on
how emergency events impact the diffusion of information or how new policies
change patterns of social interaction. Such research often draws critical
inferences from observing how an exogenous event changes meaningful metrics
like network degree or network entropy. However, as we show in this work,
standard estimation methodologies make systematically incorrect inferences when
the event also changes the sparsity of the data.
To address this issue, we provide a general framework for inferring changes in
social metrics when dealing with nonstationary sparsity. We propose a plugin
correction that can be applied to any estimator, including several recently
proposed procedures. Using both simulated and real data, we demonstrate that
the correction significantly improves the accuracy of the estimated change
under a variety of plausible data generating processes. In particular, using a
large dataset of calls from Afghanistan, we show that whereas traditional
methods substantially overestimate the impact of a violent event on social
diversity, the plugin correction reveals the true response to be much more
modest.
 Ramya Korlakai Vinayak, Babak Hassibi.
 Neural Information Processing Systems (NIPS) 2016, Barcelona, Spain.

We consider the task of clustering items using answers from nonexpert crowd
workers. In such cases, the workers are often not able to label the items
directly, however, it is reasonable to assume that they can compare items and
judge whether they are similar or not. An important question is what queries to
make, and we compare two types: random edge queries, where a pair of items is
revealed, and random triangles, where a triple is. Since it is far too expensive
to query all possible edges and/or triangles, we need to work with partial
observations subject to a fixed query budget constraint. When a generative model
for the data is available (and we consider a few of these) we determine the cost
of a query by its entropy; when such models do not exist we use the average
response time per query of the workers as a surrogate for the cost. In addition
to theoretical justification, through several simulations and experiments on two
real data sets on Amazon Mechanical Turk, we empirically demonstrate that, for a
fixed budget, triangle queries uniformly outperform edge queries. Even though,
in contrast to edge queries, triangle queries reveal dependent edges, they
provide more reliable edges and, for a fixed budget, many more of them. We also
provide a sufficient condition on the number of observations, edge densities
inside and outside the clusters and the minimum cluster size required for the
exact recovery of the true adjacency matrix via triangle queries using a convex
optimizationbased clustering algorithm.
 Ramya Korlakai Vinayak, Tijana Zrnic, Babak Hassibi.
 42nd International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2017, New Orleans.

We consider the problem of crowdsourced clustering of a set of items based on queries of the similarity of triple of objects. Such an approach, called triangle queries, where it was shown that, for a fixed query budget, it outperforms clustering based on edge queries (i.e, comparing pairs of objects). In [1] the clustering algorithm for triangle and edge queries was identical and each triangle query response was treated as 3 separate edge query responses. In this paper we directly exploit the triangle structure of the responses by embedding them into a 3way tensor. Since there are 5 possible responses to each triangle query, it is a priori not clear how best to embed them into the tensor. We give sufficient conditions on nontrivial embedding such that the resulting tensor has a rank equal to the underlying number of clusters (akin to what happens with the rank of the adjacency matrix). We then use an alternating least squares tensor decomposition algorithm to cluster a noisy and partially observed tensor and show, through extensive numerical simulations, that it significantly outperforms methods that make use only of the adjacency matrix.
 Ramya Korlakai Vinayak, Babak Hassibi.
 International Symposium on Information Theory (ISIT) 2016, Barcelona, Spain.

We study the problem of clustering a set of data points based on their
similarity matrix, each entry of which represents the similarity between the
corresponding pair of points. We propose a convexoptimizationbased algorithm for
clustering using the similarity matrix, which has provable recovery guarantees. It needs no prior knowledge of the number of clusters
and it behaves in a robust way in the presence of outliers and noise. Using a
generative stochastic model for the similarity matrix (which can be thought of
as a generalization of the classical Stochastic Block Model) we obtain \emph{precise bounds} (not orderwise)
on the sizes of the clusters, the number of outliers, the noise variance, separation between the mean similarities inside and outside the clusters
and the values of the regularization parameter that guarantee the exact recovery of the clusters with high probability. The theoretical findings are corroborated
with extensive evidence from simulations.
 Ramya Korlakai Vinayak, Samet Oymak, Babak Hassibi.
 Neural Information Processing Systems (NIPS) 2014, Montreal, Canada.

We consider the problem of finding clusters in graphs which are partially
observed. We analyze two programs based on regularized nuclear norm
minimization, with a goal to recover the low rank part of the adjacency
matrix. For the commonly used Stochastic Block Model, we provide explicit
conditions in terms of the size and sparsity of the clusters, amount of observed
data, and the regularization parameter of the convex program that guarantees
successful recovery of the clusters. We also corroborate our theoretical findings
with extensive simulations, and also run our algorithms on a real data set
obtained from crowdsourcing an image classification task on Amazon Mechanical
Turk.
 Ramya Korlakai Vinayak, Samet Oymak, Babak Hassibi.
 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2014, Florence, Italy.

We analyze a popular approach towards identifying clusters in graphs, that of a
sparse plus lowrank decomposition of the adjacency matrix of a graph via a
simple and intuitive convex program. We sharply characterize the conditions for
the success and failure of this program in identifying clusters. We also present
extensive simulations that corroborate our theoretical findings.
Contact
Ramya Korlakai Vinayak
Postdoctoral Researcher
Paul G. Allen School of Computer Science and Engineering
University of Washington, Seattle
email: ramya[at]cs[dot]washington[dot]edu