• Tingran Gao, Shahab Asoodeh, Yi Huang, James Evens, Wasserstein Soft Label Propagation on Hypergraphs: Algorithm and Generalization Error Bounds, (Supplemental Material), AAAI-AI 2019

    • Abstract: Inspired by recent interests in developing machine learning and data mining algorithms for hypergraphs, here we investigate the semi-supervised learning algorithm of propagating “soft labels” (e.g. probability distributions, class membership scores) over hypergraphs, by means of optimal transportation. Borrowing insights from Wasserstein propagation on graphs [Solomon et al. 2014], we reformulate the label propagation procedure as a message-passing algorithm, which renders itself naturally to a generalization applicable to hypergraphs through Wasserstein barycenters. Furthermore, in a PAC learning framework, we provide generalization error bounds for propagating 1-dimensional distributions on graphs and hypergraphs using 2-Wasserstein distance, by establishing the algorithmic stability of the proposed semi-supervised learning algorithm. These theoretical results also offer novel insight and deeper understanding about Wasserstein propagation on graphs.

    • Figure (a) and (b) compare the classification performance of label propagation v.s. those of AdaBoost, Random forest, and SVM on hypergraphs generated with the stochastic block model over 100 vertices. Figure (c) and (d) compare the classification performance of label propagation v.s. that of SVM on two UCI datastes.

Label Propagation Classification 
  • Shahab Asoodeh, Yi Huang, and Ishanu Chattopadhyay, On semi-universal tamper-free communication over deletion channels, CDC 2018

    • Abstract: We investigate the problem of reliable communication between two legitimate parties over deletion channels under an active eavesdropping (aka jamming) adversarial model. To this goal, we develop a theoretical framework based on probabilistic finite- state automata to define novel encoding and decoding schemes that ensure small error probability in both message decoding as well as tamper detecting. We then experimentally verify the reliability and tamper-detection property of our scheme.

    • The four images on left show how different deletion rates of the channel affect the space of M2 machines–the larger the deletion rate, the closer the resulting machine to being a single-state machine. The four images on the right show sets of 10 and 20 machines chosen by a hill-climbing algorithm to represent the messages, how they are affected by deletion, and how the error rate of recoving the message decreases with increasing sequence lengths.

Temper-free with PFSA 
  • Yi Huang, Mano Vikash Janardhanan, and Lev Reyzin, Network Construction with Ordered Constraints, FSTTCS 2017

    • Abstract: In this paper, we study the problem of constructing a network by observing ordered connectivity constraints, which we define herein. These ordered constraints are made to capture realistic properties of real-world problems that are not reflected in previous, more general models. We give hardness of approximation results and nearly-matching upper bounds for the offline problem, and we study the online problem in both general graphs and restricted sub-classes. In the online problem, for general graphs, we give exponentially better upper bounds than exist for algorithms for general connectivity problems. For the restricted classes of stars and paths we are able to find algorithms with optimal competitive ratios, the latter of which involve analysis using a potential function defined over PQ-trees.

    • The following figures show patterns and their replacements of the PQ-trees that show up in the analysis of the competitive ratio of path.

Network Construction with Ordered Constraints 
  • Benjamin Fish, Yi Huang, and Lev Reyzin, Recovering Social Networks by Observing Votes, AAMAS 2016

    • Abstract: We investigate how to reconstruct social networks from voting data. In particular, given a voting model that considers social network structure, we aim to find the network that best explains the agents’ votes. We study two plausible voting models, one edge-centric and the other vertex-centric. For these models, we give algorithms and lower bounds, characterizing cases where network recovery is possible and where it is computationally difficult. We also test our algorithms on United States Senate data. Despite the similarity of the two models, we show that their respective network recovery problems differ in complexity and involve distinct algorithmic challenges. Moreover, the networks produced when working under these models can also differ significantly. These results indicate that great care should be exercised when choosing a voting model for network recovery tasks.

    • The three images on the left are the networks we recovered using the edge-centric model on three Senates, and the three images on the right are those using the vertex-centric model on the same three Senates.

Voting Network 
  • Yi Huang, Brian Powers, and Lev Reyzin, Training-Time Optimization of a Budgeted Booster, IJCAI2015

    • Abstract: We consider the problem of feature-efficient prediction – a setting where features have costs and the learner is limited by a budget constraint on the total cost of the features it can examine in test time. We focus on solving this problem with boosting by optimizing the choice of base learners in the training phase and stopping the boosting process when the learner’s budget runs out. We experimentally show that our method improves upon the boosting approach AdaBoostRS [Reyzin, 2011] and in many cases also outperforms the recent algorithm SpeedBoost [Grubb and Bagnell, 2012]. We provide a theoretical justication for our optimization method via the margin bound. We also experimentally show that our method outperforms pruned decision trees, a natural budgeted classifier.

    • The image on the right shows the comparison of error rates v.s. costs of methods we metion in this paper on three datasets from the UCI machine learning repository, and the image on the right is that for Yahoo Webscop set 2 where the cost is real CPU time.

Budgeted Booster 


  • M.S. degree thesis: Study on 3-Dimensional Parallelizable Finite Difference Time-Dependent Scheme for Maxwell's Equations, available upon request