research interests

My research interests are in representation theory and integrable systems and applications to random matrices and probability. I have also worked on applications of stochastic processes to network methods in machine learning and axiomatic foundations of cost-sharing methods in economics.

See also my pages at arXiv and Google Scholar Citations.

mathematics research

Principal components in linear mixed models with general bulk (with Z. Fan and Z. Wang), preprint (2018). (pdf) (arXiv)
We study the outlier eigenvalues and eigenvectors in variance components estimates for high-dimensional mixed effects linear models using a free probability approach. We quantify the almost-sure limits of these eigenvalues and their eigenvector alignments, under a general bulk-plus-spikes assumption for the population covariances of the random effects, extending previous results in the identity-plus-spikes setting. Our analysis develops two tools in free probability and random matrix theory which are of independent interest—strong asymptotic freeness of GOE and deterministic matrices, and a method of proving deterministic anisotropic approximations to resolvents using free deterministic equivalents. Statistically, our results quantify bias and aliasing effects for the leading principal components of variance components estimates in modern high-dimensional applications.

Gaussian fluctuations for products of random matrices (with V. Gorin), preprint (2018). (pdf) (arXiv)
We study global fluctuations for singular values of \(M\)-fold products of several right-unitarily invariant \(N \times N\) random matrix ensembles. As \(N \to \infty\), we show the fluctuations of their height functions converge to an explicit Gaussian field, which is log-correlated for \(M\) fixed and has a white noise component for \(M \to \infty\) jointly with \(N\). Our technique centers on the study of the multivariate Bessel generating functions of these spectral measures, for which we prove a central limit theorem for global fluctuations via certain conditions on the generating functions. We apply our approach to a number of ensembles, including rectangular Ginibre matrices, truncated Haar-random unitary matrices, and right-unitarily invariant matrices with fixed singular values, using a detailed asymptotic analysis of multivariate Bessel functions to verify the necessary conditions.

Spiked covariances and principal components analysis in high-dimensional random effects models (with Z. Fan and I. Johnstone), preprint (2018). (arXiv)
We study principal components analyses in multivariate random and mixed effects linear models, assuming a spherical-plus-spikes structure for the covariance matrix of each random effect. We characterize the behavior of outlier sample eigenvalues and eigenvectors of MANOVA variance components estimators in such models under a high-dimensional asymptotic regime. Our results show that an aliasing phenomenon may occur in high dimensions, in which eigenvalues and eigenvectors of the MANOVA estimate for one variance component may be influenced by the other components. We propose an alternative procedure for estimating the true principal eigenvalues and eigenvectors that asymptotically corrects for this aliasing problem.

Affine Macdonald conjectures and special values of Felder-Varchenko functions (with E. Rains and A. Varchenko), Selecta Mathematica N. S. 24 (2018), 1549-1591. (pdf) (arXiv) (journal) (video)
We refine the statement of the denominator and evaluation conjectures for affine Macdonald polynomials proposed by Etingof-Kirillov Jr. and prove the first non-trivial cases of these conjectures. Our results provide a q-deformation of the computation of genus 1 conformal blocks via elliptic Selberg integrals by Felder-Stevens-Varchenko. They allow us to give precise formulations for the affine Macdonald conjectures in the general case which are consistent with computer computations.
Our method applies recent work of the second named author to relate these conjectures in the case of \(U_q(\widehat{\mathfrak{sl}}_2)\) to evaluations of certain theta hypergeometric integrals defined by Felder-Varchenko. We then evaluate the resulting integrals, which may be of independent interest, by well-chosen applications of the elliptic beta integral.

Laguerre and Jacobi analogues of the Warren process (with an appendix by A. Sarantsev), submitted (2016). (pdf) (arXiv) (video 1, video 2)
We define Laguerre and Jacobi analogues of the Warren process. That is, we construct local dynamics on a triangular array of particles so that the projections to each level recover the Laguerre and Jacobi eigenvalue processes of König-O'Connell and Doumerc and the fixed time distributions recover the joint distribution of eigenvalues in multilevel Laguerre and Jacobi random matrix ensembles. Our techniques extend and generalize the framework of intertwining diffusions developed by Pal-Shkolnikov. One consequence is the construction of particle systems with local interactions whose fixed time distribution recovers the hard edge of random matrix theory. An appendix by Andrey Sarantsev establishes strong existence and uniqueness for solutions to SDER's satisfied by these processes.

Traces of intertwiners for quantum affine algebras and difference equations (after Etingof-Schiffmann-Varchenko), Transformation Groups 23 (2018), 1167-1215. (pdf) (arXiv) (journal)
We modify and give complete proofs for the results of Etingof-Schiffmann-Varchenko on traces of intertwiners of untwisted quantum affine algebras in the opposite coproduct and the standard grading. More precisely, we show that certain normalized generalized traces \(F^{V_1, \ldots, V_n}(z_1, \ldots, z_n; \lambda, \omega, \mu, k)\) for \(U_q(\widehat{\mathfrak{g}})\) solve four commuting systems of q-difference equations: the Macdonald-Ruijsenaars, dual Macdonald-Ruijsenaars, q-KZB, and dual q-KZB equations. In addition, we show a symmetry property for these renormalized trace functions. Our modifications are motivated by their appearance in the recent work of the author.

Matrix models for multilevel Heckman-Opdam and multivariate Bessel measures, submitted (2016). (pdf) (arXiv) (video 1, video 2)
We study multilevel matrix ensembles at general beta by identifying them with a class of processes defined via the branching rules for multivariate Bessel and Heckman-Opdam hypergeometric functions. For beta = 1, 2, we express the joint multilevel density of the eigenvalues of a generalized beta-Wishart matrix as a multivariate Bessel ensemble, generalizing a result of Dieker-Warren. In the null case, we prove the conjecture of Borodin-Gorin that the joint multilevel density of the beta-Jacobi ensemble is given by a principally specialized Heckman-Opdam measure.

Traces of intertwiners for quantum affine \(\mathfrak{sl}_2\) and Felder-Varchenko functions, Communications in Mathematical Physics 347 (2016), 573-653. (pdf) (arXiv) (journal)
We show that the traces of \(U_q(\widehat{\mathfrak{sl}}_2)\)-intertwiners of Etingof-Schiffmann-Varchenko valued in the three-dimensional evaluation representation converge in a certain region of parameters and give a representation-theoretic construction of Felder-Varchenko's hypergeometric solutions to the \(q\)-KZB heat equation. This gives the first proof that such a trace function converges and resolves the first case of the Etingof-Varchenko conjecture.

As applications, we prove a symmetry property for traces of intertwiners and prove Felder-Varchenko's conjecture that their elliptic Macdonald polynomials are related to the affine Macdonald polynomials defined as traces over irreducible integrable \(U_q(\widehat{\mathfrak{sl}}_2)\)-modules by Etingof-Kirillov Jr. In the trigonometric and classical limits, we recover results of Etingof-Kirillov Jr. and Etingof-Varchenko. Our method relies on an interplay between the method of coherent states applied to the free field realization of the \(q\)-Wakimoto module of Matsuo, convergence properties given by the theta hypergeometric integrals of Felder-Varchenko, and rationality properties originating from the representation-theoretic definition of the trace function.

The polynomial representation of the type \(A_{n−1}\) rational Cherednik algebra in characteristic \(p\mid n\) (with S. Devadas), Communications in Algebra 45 (2017), 1926-1934. (pdf) (arXiv) (journal)
We study the polynomial representation of the rational Cherednik algebra of type \(A_{n−1}\) with generic parameter in characteristic \(p\) for \(p \mid n\). We give explicit formulas for generators for the maximal proper graded submodule, show that they cut out a complete intersection, and thus compute the Hilbert series of the irreducible quotient. Our methods are motivated by taking characteristic \(p\) analogues of existing characteristic \(0\) results.

A representation-theoretic proof of the branching rule for Macdonald polynomials, Mathematical Research Letters 23 (2016), 887-927. Extended abstract in FPSAC 2015. (pdf) (arXiv) (journal) (proceedings)
We give a new representation-theoretic proof of the branching rule for Macdonald polynomials using the Etingof-Kirillov Jr. expression for Macdonald polynomials as traces of intertwiners of \(U_q(\mathfrak{gl}_n)\). In the Gelfand-Tsetlin basis, we show that diagonal matrix elements of such intertwiners are given by application of Macdonald's operators to a simple kernel. An essential ingredient in the proof is a map between spherical parts of double affine Hecke algebras of different ranks based upon the Dunkl-Kasatani conjecture.

A new integral formula for Heckman-Opdam hypergeometric functions, Advances in Mathematics 289 (2016), 1157-1204. (pdf) (arXiv) (journal)
We provide Harish-Chandra type formulas for the multivariate Bessel functions and Heckman-Opdam hypergeometric functions as representation-valued integrals over dressing orbits. Our expression is the quasi-classical limit of the realization of Macdonald polynomials as traces of intertwiners of quantum groups given by Etingof and Kirillov Jr. Integration over the Liouville tori of the Gelfand-Tsetlin integrable system and adjunction for higher Calogero-Moser Hamiltonians relates our expression to the integral realization over Gelfand-Tsetlin polytopes which appeared in the recent work of Borodin and Gorin on the beta-Jacobi corners ensemble.

Finite dimensional representations of the rational Cherednik algebra for \(G_4\), Journal of Algebra 323 (2010), 2864-2887. (pdf) (arXiv) (journal)
In this paper, we study representations of the rational Cherednik algebra associated to the complex reflection group \(G_4\). In particular, we classify the irreducible finite dimensional representations and compute their characters.

computer science research

D. Kang\(^*\), Y. Sun\(^*\), T. Brown, D. Hendrycks, and J. Steinhardt, Transfer of adversarial robustness between perturbation types (arXiv)
We study the transfer of adversarial robustness of deep neural networks between different perturbation types. While most work on adversarial examples has focused on \(L_\infty\) and \(L_2\)-bounded perturbations, these do not capture all types of perturbations available to an adversary. The present work evaluates 32 attacks of 5 different types against models adversarially trained on a 100-class subset of ImageNet. Our empirical results suggest that evaluating on a wide range of perturbation sizes is necessary to understand whether adversarial robustness transfers between perturbation types. We further demonstrate that robustness against one perturbation type may not always imply and may sometimes hurt robustness against other perturbation types. In light of these results, we recommend evaluation of adversarial defenses take place on a diverse range of perturbation types and sizes.

T. Hashimoto, Y. Sun, and T. Jaakkola, From random walks to distances on unweighted graphs, NIPS 2015. (pdf) (arXiv) (supplement and code) (poster) (proceedings)
Large unweighted directed graphs are commonly used to capture relations between entities. A fundamental problem in the analysis of such networks is to properly define the similarity or dissimilarity between any two vertices. Despite the significance of this problem, statistical characterization of the proposed metrics has been limited.

We introduce and develop a class of techniques for analyzing random walks on graphs using stochastic calculus. Using these techniques we generalize results on the degeneracy of hitting times and analyze a metric based on the Laplace transformed hitting time (LTHT). The metric serves as a natural, provably well-behaved alternative to the expected hitting time. We establish a general correspondence between hitting times of the Brownian motion and analogous hitting times on the graph. We show that the LTHT is consistent with respect to the underlying metric of a geometric graph, preserves clustering tendency, and remains robust against random addition of non-geometric edges. Tests on simulated and real-world data show that the LTHT matches theoretical predictions and outperforms alternatives.

T. Hashimoto, Y. Sun, and T. Jaakkola, Metric recovery from directed unweighted graphs, AISTATS 2015. NIPS Networks Workshop 2014. (pdf) (arXiv) (supplement and code) (workshop) (poster) (proceedings)
We analyze directed, unweighted graphs obtained from \(x_i \in \mathbb{R}^d\) by connecting vertex \(i\) to \(j\) iff \(|x_i−x_j| < \varepsilon(x_i)\). Examples of such graphs include \(k\)-nearest neighbor graphs, where \(\varepsilon(x_i)\) varies from point to point, and, arguably, many real world graphs such as co-purchasing graphs. We ask whether we can recover the underlying Euclidean metric \(\varepsilon(x_i)\) and the associated density \(p(x_i)\) given only the directed graph and \(d\). We show that consistent recovery is possible up to isometric scaling when the vertex degree is at least \(\omega(n^{2/(2+d)}\log(n)^{d/(d+2)})\). Our estimator is based on a careful characterization of a random walk over the directed graph and the associated continuum limit. As an algorithm, it resembles the PageRank centrality metric. We demonstrate empirically that the estimator performs well on simulated examples as well as on real-world co-purchasing graphs even with a small number of points and degree scaling as low as \(\log(n)\).

Y. Sun and M. Sundararajan, Axiomatic attribution for multilinear functions, Electronic Commerce 2011, 177-178. (pdf) (arXiv) (extended abstract) (proceedings)
We study the attribution problem, that is, the problem of attributing a change in the value of a characteristic function to its independent variables. We make three contributions. First, we propose a formalization of the problem based on a standard cost sharing model. Second, we show that there is a unique attribution method that satisfies Dummy, Additivity, Conditional Nonnegativity, Affine Scale Invariance, and Anonymity for all characteristic functions that are the sum of a multilinear function and an additive function. We term this the Aumann-Shapley-Shubik method. Conversely, we show that such a uniqueness result does not hold for characteristic functions outside this class. Third, we study multilinear characteristic functions in detail; we describe a computationally efficient implementation of the Aumann-Shapley-Shubik method and discuss practical applications to pay-per-click advertising and portfolio analysis.


2019: UCSD, Virginia, OpenAI, ICML Workshop on Uncertainty and Robustness in Deep Learning

2018: Simons Society of Fellows Retreat, Yale

2017: ESI, Columbia-Princeton Probability Day, Rutgers, Perimeter Institute (video), Rochester, PCMI (video 1, video 2)

2016: HCM, MIT (x2), IESC, Columbia (x2)

2015: ICERM, AISTATS, CMI, FPSAC, Yale, Columbia, Northeastern, ETH Zurich (x2), UC Berkeley, NIPS

2014: MIT, IHP, UC Berkeley, NIPS Workshop on Networks