We investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs). Traditionally, compositionality has been associated with algebraic operations on embeddings of words from a preexisting vocabulary. In contrast, we seek to approximate representations from an…
Predicting the click-through rate (CTR) of an item is a fundamental task in online advertising and recommender systems. CTR prediction models are typically trained on user click data from traffic logs. However, users are more likely to interact with items that were shown prominently on a website.…
In industrial settings, it is often necessary to achieve language-level accuracy targets. For example, Amazon business teams need to build multilingual product classifiers that operate accurately in all European languages. It is unacceptable for the accuracy of product classification to meet the…
Large Language Models (LLM) achieved considerable results on natural language understanding tasks. However, their sheer size causes a large memory consumption or high latency at inference time, which renders deployment on hardware-constrained applications challenging. Neural architecture search…
We introduce a novel class of sample-based explanations we term high-dimensional representers, that can be used to explain the predictions of a regularized high-dimensional model in terms of importance weights for each of the training samples. Our workhorse is a novel representer theorem for…
Finding that 70% of attention heads and 20% of feed-forward networks can be excised with minimal effect on in-context learning suggests that large language models are undertrained.
Recent studies have found that model performance has a smooth power-law relationship, or scaling laws, with training data and model size, for a wide range of problems. These scaling laws allow one to choose nearly optimal data and model sizes. We study whether this scaling property is also…
Large language models trained on code have shown great potential to increase productivity of software developers. Several execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. Nevertheless, it is expensive to perform…
Recent work in scientific machine learning (SciML) has focused on incorporating partial differential equation (PDE) information into the learning process. Much of this work has focused on relatively “easy” PDE operators (e.g., elliptic and parabolic), with less emphasis on relatively “hard” PDE…
Despite their high levels of robustness, Contrastive Language-Image Models (CLIP) still require some form of downstream adaptation when applied to tasks sufficiently out-of-domain with respect to their training set. Recent methods propose light-weight adapters on the model features, primarily…
GNN models on heterogeneous graphs have achieved state-of-the-art (SOTA) performance in various graph tasks such as link prediction and node classification. Despite their success in providing SOTA results, popular GNN libraries, such as PyG and DGL, fail to provide fast and efficient solutions for…
We present HumanEvalX and MBXP, execution-based code completion benchmarks in 10+ programming languages. These datasets are generated by our conversion framework that transpiles prompts and test cases from original datasets (HumanEval and MBPP) to the corresponding data in a target language. Based…
On natural-language-understanding tasks, student models trained only on task-specific data outperform those trained on a mix that includes generic data.
To improve deep learning models’ robustness, adversarial training has been frequently used in computer vision with satisfying results. However, adversarial perturbation on text have turned out to be more challenging due to the discrete nature of text. The generated adversarial text might not sound…
Large pretrained language models (PLMs) are often domain- or task-adapted via finetuning or prompting. Finetuning requires modifying all of the parameters and having enough data to avoid overfitting while prompting requires no training and few examples but limits performance. Instead, we prepare…
Learn about the development, operational, and process improvements that can be incorporated by organizations to improve the explainability of models while adhering to regulatory requirements.
The fundamental challenge of drawing causal inference is that counterfactual outcomes are not fully observed for any unit. Furthermore, in observational studies, treatment assignment is likely to be confounded. Many statistical methods have emerged for causal inference under unconfoundedness…
Increasing concerns and regulations about data privacy and sparsity necessitate the study of privacy-preserving, decentralized learning methods for natural language processing (NLP) tasks. Federated learning (FL) provides promising approaches for a large number of clients (e.g., personal devices or…
The development of both the theory and the practice of neural network models has been significant in the last decade. Novel solutions including Dropout and Batch Normalization (BN) have been proposed and pervasively adopted to overcome issues like over-fitting and to improve the convergence of the…
Despite the increasing relevance of forecasting methods, causal implications of these algorithms remain largely unexplored. This is concerning considering that, even under simplifying assumptions such as causal sufficiency, the statistical risk of a model can differ significantly from its causal…