Impactful applications such as materials discovery, hardware design, neural architecture search, or portfolio optimization require optimizing high-dimensional black-box functions with mixed and combinatorial input spaces. While Bayesian optimization has recently made significant progress in solving…
Personalization, the ability to tailor a system to individual users, is an essential factor in user experience with natural language process- ing (NLP) systems. With the emergence of Large Language Models (LLMs), a key question is how to leverage these models to better personalize user experiences.…
To translate speech for automatic dubbing, machine translation needs to be isochronous, i.e. translated speech needs to be aligned with the source in terms of speech durations. We introduce target factors in a transformer model to predict durations jointly with target language phoneme sequences. We…
In e-commerce, product classification is widely used for various purposes. Misclassifying products can cause compliance issues and hurt the company’s reputation. To address this problem, we propose an automated system to proactively detect product misclassifications by overcoming several…
Machine learning systems at the edge may fail as the real world data can be noisy and have different distribution from the training dataset which the machine learning systems were developed on. However, it is very difficult to detect the system failures and identify root cause of the failures for…
Previous research has studied the task of segmenting cinematic videos into scenes and into narrative acts. However, these studies have overlooked the essential task of multimodal alignment and fusion for effectively and efficiently processing long-form videos (> 60min). In this paper, we introduce…
Answer sentence selection (AS2) in open-domain question answering finds answer for a question by ranking candidate sentences extracted from web documents. Recent work exploits answer context, i.e., sentences around a candidate, by incorporating them as additional input string to the Transformer…
People navigate a world that involves many different modalities and make decision on what they observe. Many of the classification problems that we face in the modern digital world are also multimodal in nature, where textual information on the web rarely occurs alone, and is often accompanied by…
Rewards-based programs are popular within e-commerce online stores, with the goal of providing serendipitous incentives to delight customers. These rewards (or incentives) could be in the form of cashback, free-shipping or discount coupons on purchases within specific categories. The success of…
This paper describes the speech translation system submitted as part of the IWSLT 2023 shared task on low resource speech translation. The low resource task aids in building models for language pairs where the training corpus is limited. In this paper, we focus on two language pairs, namely,…
We evaluated the effectiveness of machine learning (ML) and natural language processing (NLP) for automatically scoring a simulation requiring audio-based constructed responses. We administered the simulation to 3,174 recent new professional-level hires working in a large multinational technology…
Getting a good understanding of the user intent is vital for e-commerce applications to surface the right product to a given customer query. Query Understanding (QU) systems are essential for this purpose, and many e-commerce providers are working on complex solutions that need to be data efficient…
This work studies the threats of adversarial attack on multivariate probabilistic forecasting models and viable defense mechanisms. Our studies discover a new attack pattern that negatively impact the forecasting of a target time series via making strategic, sparse (imperceptible) modifications to…
Two recent trends in the theory of deep learning are examinations of the double-descent phenomenon and more-realistic approaches to neural kernel methods.
In this paper, we study how to use masked signal modeling in vision and language (V+L) representation learning. Instead of developing masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal…
In this study, we present an approach to train a single speech enhancement network that can perform both personalized and nonpersonalized speech enhancement. This is achieved by incorporating a frame-wise conditioning input that specifies the type of enhancement output. To improve the quality of…
Seeking causal explanations in panel (or longitudinal/multivariate time-series) data is a difficult problem of both academic and industrial importance. Although there exists a large amount of literature on forward causal inference, where the treatment/outcome/covariates variables are well-defined,…
Machine learning models used for predictive modeling tasks spanning across personalization, recommender systems, ad response prediction, fraud detection etc. typically require a variety of tabular as well as sequential activity features about the user. For tasks like click-through or conversion…