Найденные страницы с тегом benchmark всего 14151

NEW

DWD Benchmark (Robust Object Detection) | Papers With Code

The current state-of-the-art on DWD is OA-DG. See a full comparison of 8 papers with code.

NEW

EgoSchema (subset) Benchmark (Zero-Shot Video Question Answer) | Papers With Code

The current state-of-the-art on EgoSchema (subset) is LangRepo (12B). See a full comparison of 5 papers with code.

NEW

EgoSchema (fullset) Benchmark (Zero-Shot Video Question Answer) | Papers With Code

The current state-of-the-art on EgoSchema (fullset) is LLoVi (GPT-3.5). See a full comparison of 11 papers with code.

NEW

TVQA Benchmark (Zero-Shot Video Question Answer) | Papers With Code

The current state-of-the-art on TVQA is FrozenBiLM (with speech). See a full comparison of 6 papers with code.

NEW

STAR Benchmark Benchmark (Zero-Shot Video Question Answer) | Papers With Code

The current state-of-the-art on STAR Benchmark is VideoChat2. See a full comparison of 3 papers with code.

NEW

NExT-QA Benchmark (Zero-Shot Video Question Answer) | Papers With Code

The current state-of-the-art on NExT-QA is VideoAgent (GPT-4). See a full comparison of 13 papers with code.

NEW

STAR Benchmark Benchmark (Zero-Shot Video Question Answer) | Papers With Code

The current state-of-the-art on STAR Benchmark is VideoChat2. See a full comparison of 6 papers with code.

NEW

MSRVTT-QA Dataset | Papers With Code

The MSR-VTT-QA dataset is a benchmark for the task of Visual Question Answering (VQA) on the MSR-VTT (Microsoft Research Video to Text) dataset. The MSR-VTT-QA benchmark is used to evaluate models on their ability to answer questions based on these videos. It's part of the tasks that this dataset is used for, along with Video Retrieval, Video Captioning, Zero-Shot Video Question Answering, Zero-Shot Video Retrieval, and Text-to-Video Generation.

NEW

ConceptARC Dataset | Papers With Code

The ConceptARC dataset is a benchmark for evaluating understanding and generalization in the Abstraction and Reasoning Corpus (ARC) domain. It was developed by Arseny Moskvichev, Victor Vikram Odouard, and Melanie Mitchell. The ability to form and abstract concepts is key to human intelligence, but such abilities remain lacking in state-of-the-art AI systems. There has been substantial research on conceptual abstraction in AI, particularly using idealized domains such as Raven's Progressive Matrices and Bo

NEW

MARS (Multimodal Analogical Reasoning dataSet) Benchmark (Knowledge Graphs) | Papers With Code

The current state-of-the-art on MARS (Multimodal Analogical Reasoning dataSet) is MarT_MKGformer. See a full comparison of 8 papers with code.

NEW

TLDR9+ Benchmark (Extreme Summarization) | Papers With Code

The current state-of-the-art on TLDR9+ is ORACLE-EXT. See a full comparison of 4 papers with code.

NEW

MathVista Dataset | Papers With Code

MathVista is a consolidated Mathematical reasoning benchmark within Visual contexts. It consists of three newly created datasets, IQTest, FunctionQA, and PaperQA, which address the missing visual domains and are tailored to evaluate logical reasoning on puzzle test figures, algebraic reasoning over functional plots, and scientific reasoning with academic paper figures, respectively. It also incorporates 9 MathQA datasets and 19 VQA datasets from the literature, which significantly enrich the diversity and

NEW

VeRi-Wild Small Benchmark (Vehicle Re-Identification) | Papers With Code

The current state-of-the-art on VeRi-Wild Small is MBR-4B (without RK). See a full comparison of 3 papers with code.

NEW

VeRi-Wild Large Benchmark (Vehicle Re-Identification) | Papers With Code

The current state-of-the-art on VeRi-Wild Large is ANet. See a full comparison of 2 papers with code.

NEW

EgoSchema Dataset | Papers With Code

EgoSchema is very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems. Derived from Ego4D, EgoSchema consists of over 5000 human curated multiple choice question answer pairs, spanning over 250 hours of real video data, covering a very broad range of natural human activity and behavior.

NEW

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | Papers With Code

???? SOTA for Zero-Shot Video Question Answer on STAR Benchmark (Accuracy metric)

NEW

UnAV-100 Benchmark (audio-visual event localization) | Papers With Code

The current state-of-the-art on UnAV-100 is UnAV. See a full comparison of 2 papers with code.

NEW

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline | Papers With Code

???? SOTA for audio-visual event localization on UnAV-100 ( mAP metric)

NEW

AlgoPuzzleVQA Benchmark (Multimodal Reasoning) | Papers With Code

The current state-of-the-art on AlgoPuzzleVQA is GPT-4. See a full comparison of 1 papers with code.

NEW

MATH-V Benchmark (Multimodal Reasoning) | Papers With Code

The current state-of-the-art on MATH-V is GPT4V. See a full comparison of 4 papers with code.

NEW

REBUS: A Robust Evaluation Benchmark of Understanding Symbols | Papers With Code

???? SOTA for Multimodal Reasoning on REBUS (Accuracy metric)

NEW

REBUS Benchmark (Multimodal Reasoning) | Papers With Code

The current state-of-the-art on REBUS is GPT-4V. See a full comparison of 8 papers with code.

NEW

VTAB-1k(Structured<8>) Benchmark (Visual Prompt Tuning) | Papers With Code

The current state-of-the-art on VTAB-1k(Structured<8>) is SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K). See a full comparison of 10 papers with code.

NEW

VTAB-1k(Specialized<4>) Benchmark (Visual Prompt Tuning) | Papers With Code

The current state-of-the-art on VTAB-1k(Specialized<4>) is SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K). See a full comparison of 10 papers with code.

NEW

RLBench Benchmark (Robot Manipulation) | Papers With Code

The current state-of-the-art on RLBench is 3D Diffuser Actor. See a full comparison of 11 papers with code.