Referring Video Object Segmentation

ReVOS - J

Referring Video Object Segmentation

ReVOS - F

Referring Video Object Segmentation

ReVOS - J&F

Referring Video Object Segmentation

ReVOS - R

Referring Video Object Segmentation

MeViS - J&F

Referring Video Object Segmentation

MeViS - J

Referring Video Object Segmentation

MeViS - F

Referring Expression Segmentation

Refer-YouTube-VOS (2021 public validation) - J&F

Referring Expression Segmentation

Refer-YouTube-VOS (2021 public validation) - J

Referring Expression Segmentation

Refer-YouTube-VOS (2021 public validation) - F

Referring Expression Segmentation

RefCOCO - IoU

Multi-Person Pose Estimation

PoseTrack2018 - Mean mAP

RGB Salient Object Detection

ECSSD - mean F-Measure

RGB Salient Object Detection

HKU-IS - mean F-Measure

RGB Salient Object Detection

PASCAL-S - mean F-Measure

Anomaly Detection In Surveillance Videos

XD-Violence - AP

Unsupervised Action Segmentation

Youtube INRIA Instructional - Recall

Unsupervised Action Segmentation

IKEA ASM - Accuracy

Unsupervised Action Segmentation

IKEA ASM - F1

Unsupervised Action Segmentation

IKEA ASM - Recall

Time Series Forecasting

Weather (336) - MSE

Time Series Forecasting

Weather (336) - MAE

Time Series Forecasting

ETTm1 (96) Multivariate - MSE

Time Series Forecasting

ETTm1 (192) Multivariate - MSE

Time Series Forecasting

ETTm2 (192) Multivariate - MAE

Time Series Forecasting

ETTm2 (96) Multivariate - MSE

Time Series Forecasting

ETTm2 (96) Multivariate - MAE

Time Series Forecasting

ETTh2 (192) Multivariate - MAE

Time Series Forecasting

ETTh2 (96) Multivariate - MAE

Time Series Forecasting

ETTm2 (336) Multivariate - MAE

Time Series Forecasting

ETTh2 (336) Multivariate - MSE

Time Series Forecasting

ETTh2 (336) Multivariate - MAE

Time Series Forecasting

ETTm2 (720) Multivariate - MSE

Time Series Forecasting

ETTm2 (720) Multivariate - MAE

Skeleton Based Action Recognition

JHMDB (2D poses only) - Average accuracy of 3 splits

Image Rescaling

DIV2K val-q50-4x - PSNR

Image Rescaling

DIV2K val-q50-4x - SSIM

Image Rescaling

DIV2K val-q70-2x - PSNR

Image Rescaling

DIV2K val-q70-2x - SSIM

Image Rescaling

DIV2K val-q30-2x - PSNR

Image Rescaling

DIV2K val-q30-2x - SSIM

Image Rescaling

DIV2K val-q90-2x - PSNR

Image Rescaling

DIV2K val-q90-2x - SSIM

Image Rescaling

DIV2K val-q30-4x - PSNR

Image Rescaling

DIV2K val-q30-4x - SSIM

Image Rescaling

DIV2K val-q50-2x - PSNR

Image Rescaling

DIV2K val-q50-2x - SSIM

Image Rescaling

DIV2K val-q70-4x - PSNR

Image Rescaling

DIV2K val-q70-4x - SSIM

Image Rescaling

DIV2K val-q90-4x - PSNR

Image Rescaling

DIV2K val-q90-4x - SSIM

3D Hand Pose Estimation

HO-3D v3 - F@15mm

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (Ego4D) All

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (VISOR) Visible

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (Ego4D) Visible

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (NewDays) Occ

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (VISOR) Occ

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (Ego4D) Occ

CAD Reconstruction

DeepCAD - Chamfer Distance

CAD Reconstruction

DeepCAD - Camfer Distance (median)

CAD Reconstruction

DeepCAD - IoU

CAD Reconstruction

Fusion 360 Gallery - Chamfer Distance

CAD Reconstruction

Fusion 360 Gallery - Chamfer Distance (median)

CAD Reconstruction

Fusion 360 Gallery - IoU

CAD Reconstruction

CC3D - Chamfer Distance (median)

Multi-modal Recommendation

Amazon Baby - NDCG@20

Multi-modal Recommendation

Amazon Clothing - NDCG@20

Multi-modal Recommendation

Amazon Sports - NGCG@20

Learning with noisy labels

CIFAR-10N-Random3 - Accuracy (mean)

Learning with noisy labels

CIFAR-10N-Random2 - Accuracy (mean)

Video Retrieval

VATEX - text-to-video R@1

Video Retrieval

VATEX - text-to-video R@10

Video Retrieval

VATEX - video-to-text R@10

Video Object Tracking

NT-VOT211 - AUC

Visual Object Tracking

TrackingNet - Precision

Visual Object Tracking

TrackingNet - Normalized Precision

Visual Object Tracking

TrackingNet - Accuracy

Visual Object Tracking

TNL2K - AUC

Video Inpainting

HQVI (240p) - PSNR

Video Inpainting

HQVI (240p) - SSIM

Video Inpainting

HQVI (240p) - LPIPS

Video Inpainting

HQVI (240p) - VFID

Video Inpainting

HQVI (480p) - PSNR

Video Inpainting

HQVI (480p) - LPIPS

Video Inpainting

HQVI (480p) - VFID

Text-based Image Editing

PIE-Bench - CLIPSIM

Action Classification

WiGesture - Accuracy (% )

Zero-Shot Video Question Answer

MSRVTT-QA - Confidence Score

Zero-Shot Video Question Answer

MSVD-QA - Confidence Score

Zero-Shot Video Question Answer

EgoSchema (fullset) - Accuracy

3D Hand Pose Estimation

FreiHAND - PA-MPJPE

Image Super-Resolution

DIV2K val - 4x upscaling - LPIPS

Image Super-Resolution

DIV2K val - 4x upscaling - DISTS

Image Super-Resolution

Urban100 - 4x upscaling - LPIPS

Motion Synthesis

HumanML3D - FID

Omnnidirectional Stereo Depth Estimation

Helvipad - Depth-MAE

Image Super-Resolution

Manga109 - 3x upscaling - PSNR

Image Super-Resolution

Manga109 - 3x upscaling - SSIM

Image Super-Resolution

Set14 - 2x upscaling - SSIM

Image Super-Resolution

Set14 - 3x upscaling - PSNR

Image Super-Resolution

Set14 - 3x upscaling - SSIM

Image Super-Resolution

Set5 - 2x upscaling - SSIM

Image Super-Resolution

Manga109 - 4x upscaling - SSIM

Image Super-Resolution

Manga109 - 2x upscaling - PSNR

Image Super-Resolution

Manga109 - 2x upscaling - SSIM

Image Super-Resolution

Set5 - 4x upscaling - SSIM

Image Super-Resolution

Set5 - 3x upscaling - SSIM

Image Super-Resolution

Urban100 - 4x upscaling - PSNR

Image Super-Resolution

Urban100 - 3x upscaling - PSNR

Image Super-Resolution

BSD100 - 3x upscaling - PSNR

Image Super-Resolution

BSD100 - 3x upscaling - SSIM

JPEG Artifact Correction

LIVE1 (Quality 20 Color) - PSNR

JPEG Artifact Correction

Classic5 (Quality 20 Grayscale) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 30 Grayscale) - PSNR

JPEG Artifact Correction

Live1 (Quality 10 Grayscale) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 20 Grayscale) - PSNR

JPEG Artifact Correction

Classic5 (Quality 30 Grayscale) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 40 Grayscale) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 40 Grayscale) - SSIM

JPEG Artifact Correction

Classic5 (Quality 10 Grayscale) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 30 Color) - PSNR

JPEG Artifact Correction

Classic5 (Quality 40 Grayscale) - PSNR

JPEG Artifact Correction

Classic5 (Quality 40 Grayscale) - SSIM

JPEG Artifact Correction

LIVE1 (Quality 10 Color) - PSNR

5-Degradation Blind All-in-One Image Restoration

5-Degradation Blind All-in-One Image Restoration - Average PSNR

Grayscale Image Denoising

Urban100 sigma50 - PSNR

Grayscale Image Denoising

Urban100 sigma25 - PSNR

Grayscale Image Denoising

Set12 sigma15 - PSNR

Semi-Supervised Video Object Segmentation

VOT2020 - EAO

Visual Object Tracking

DiDi - Tracking quality

Visual Object Tracking

VOT2022 - EAO

Referring Expression Segmentation

RefCOCO+ testA - Overall IoU

Referring Expression Segmentation

RefCOCO testA - Overall IoU

Referring Expression Segmentation

RefCOCO testB - Overall IoU

Referring Expression Segmentation

RefCoCo val - Overall IoU

Weakly Supervised Action Localization

ActivityNet-1.2 - Mean mAP

Zero-Shot Composed Image Retrieval (ZS-CIR)

CIRR - R@5

Video Salient Object Detection

FBMS-59 - S-Measure

Video Salient Object Detection

FBMS-59 - MAX F-MEASURE

Video Salient Object Detection

DAVIS-2016 - S-Measure

Video Salient Object Detection

DAVIS-2016 - AVERAGE MAE

Video Salient Object Detection

DAVIS-2016 - MAX F-MEASURE

Video Salient Object Detection

ViSal - S-Measure

Video Salient Object Detection

ViSal - Average MAE

Video Salient Object Detection

DAVSOD-easy35 - S-Measure

Video Salient Object Detection

DAVSOD-easy35 - Average MAE

Video Salient Object Detection

DAVSOD-easy35 - max F-Measure

Shadow Removal

ISTD+ - RMSE

Action Recognition

Diving-48 - Accuracy

Zero-Shot Video Question Answer

Video-MME (w/o subs) - Accuracy (%)

Image-Based Localization

cvact - Recall@1

Image-Based Localization

cvact - Recall@5

Image-Based Localization

cvact - Recall@10

Image-Based Localization

VIGOR Cross Area - Recall@1

Image-Based Localization

VIGOR Cross Area - Recall@5

Image-Based Localization

VIGOR Cross Area - Recall@10

Image-Based Localization

VIGOR Cross Area - Recall@1%

Image-Based Localization

VIGOR Cross Area - Hit Rate

Image-Based Localization

VIGOR Same Area - Recall@1

Image-Based Localization

VIGOR Same Area - Recall@5

Image-Based Localization

VIGOR Same Area - Recall@10

Image-Based Localization

VIGOR Same Area - Hit Rate

Drone-view target localization

University-1652 - Recall@1

Visual Object Tracking

LaSOT-ext - AUC

Visual Object Tracking

LaSOT-ext - Normalized Precision

Visual Object Tracking

LaSOT-ext - Precision

Visual Object Tracking

NeedForSpeed - AUC

Unsupervised Semantic Segmentation with Language-image Pre-training

Cityscapes val - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

COCO-Stuff-171 - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

PASCAL Context-59 - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

PASCAL VOC - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

COCO-Object - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

PascalVOC-20 - mIoU

Text-To-SQL

spider - Execution Accuracy (Test)

Text-To-SQL

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) - Execution Accuracy % (Test)

Text-To-SQL

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) - Execution Accuracy % (Dev)

Multiple Object Tracking

SportsMOT - HOTA

Multiple Object Tracking

SportsMOT - IDF1

Multiple Object Tracking

SportsMOT - AssA

Category-Agnostic Pose Estimation

MP100 - Mean [email protected] - 1shot

Video-based Generative Performance Benchmarking (Consistency)

VideoInstruct - gpt-score

Cross-Modal Retrieval

ChEBI-20 - Test MRR

Cross-Modal Retrieval

ChEBI-20 - Hits@1

3D Instance Segmentation

ScanNet(v2) - mRec

Cross-Modal Retrieval

ChEBI-20 - Mean Rank

Generalizable Person Re-identification

Market-1501 - MSMT17-All->mAP

Generalizable Person Re-identification

Market-1501 - MSMT17-All->Rank-1

Generalizable Person Re-identification

Market-1501 - RandPerson->mAP

Generalizable Person Re-identification

Market-1501 - RandPerson->Rank-1

Generalizable Person Re-identification

Market-1501 - MSMT17->mAP

Generalizable Person Re-identification

Market-1501 - MSMT17->Rank-1

Generalizable Person Re-identification

CUHK03-NP (detected) - MSMT17-All->mAP

Generalizable Person Re-identification

CUHK03-NP (detected) - MSMT17-All->Rank-1

Generalizable Person Re-identification

CUHK03-NP (detected) - MSMT17->mAP

Generalizable Person Re-identification

CUHK03-NP (detected) - MSMT17->Rank-1

FS-MEVQA

SME - BLEU-4

FS-MEVQA

SME - METEOR

FS-MEVQA

SME - ROUGE-L

FS-MEVQA

SME - CIDEr

FS-MEVQA

SME - SPICE

FS-MEVQA

SME - Detection

FS-MEVQA

SME - ACC

Zero-Shot Composed Image Retrieval (ZS-CIR)

CIRCO - mAP@10

Motion Synthesis

FineDance - fid_k

Motion Synthesis

FineDance - BAS

Zero-Shot Video Question Answer

Video-MME - Accuracy (%)

Time Series Forecasting

ETTm1 (336) Multivariate - MSE

Time Series Forecasting

ETTm1 (720) Multivariate - MSE

Time Series Forecasting

ETTm1 (720) Multivariate - MAE

Time Series Forecasting

ETTm2 (336) Multivariate - MSE

Conditional Image Generation

ImageNet 256x256 - FID

Video Frame Interpolation

X4K1000FPS - PSNR

Rgb-T Tracking

RGBT210 - Success

Text-based Image Editing

PIE-Bench - Background LPIPS

Semi-Supervised Semantic Segmentation

ADE20K 1/32 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/256 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 92 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/64 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 183 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

ADE20K 1/16 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 1464 labels - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/128 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

Cityscapes 6.25% labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 732 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/32 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 366 labeled - Validation mIoU

Semi-supervised Change Detection

LEVIR-CD - 5% labeled data - IoU

Semi-supervised Change Detection

LEVIR-CD - 20% labeled data - IoU

Semi-supervised Change Detection

LEVIR-CD - 10% labeled data - IoU

Semi-supervised Change Detection

WHU - 40% labeled data - IoU

Semi-supervised Change Detection

WHU - 20% labeled data - IoU

Low-Light Image Enhancement

LIME - NIQE

Low-Light Image Enhancement

MEF - NIQE

Low-Light Image Enhancement

DICM - NIQE

Low-Light Image Enhancement

NPE - NIQE

Low-Light Image Enhancement

VV - NIQE

Motion Synthesis

InterHuman - FID

Motion Synthesis

Inter-X - FID

Motion Synthesis

Inter-X - R-Precision Top3

Motion Synthesis

Inter-X - MMDist

Math Word Problem Solving

SVAMP - Execution Accuracy

Math Word Problem Solving

Math23K - Accuracy (5-fold)

Reflection Removal

SIR^2(Postcard) - PSNR

Reflection Removal

SIR^2(Postcard) - SSIM

Reflection Removal

SIR^2(Wild) - PSNR

Reflection Removal

SIR^2(Wild) - SSIM

Reflection Removal

Nature - PSNR

Reflection Removal

Nature - SSIM

Reflection Removal

Real20 - PSNR

Reflection Removal

Real20 - SSIM

Reflection Removal

SIR^2(Objects) - PSNR

Reflection Removal

SIR^2(Objects) - SSIM

Skeleton Based Action Recognition

H2O (2 Hands and Objects) - Accuracy

Video Prediction

Moving MNIST - MSE

Video Prediction

Moving MNIST - MAE

Deblurring

Beam-Splitter Deblurring (BSD) - PSNR

Zero-Shot Video Question Answer

Zero-shot Video Question Answering on LongVideoBench - Accuracy (% )

Abstractive Text Summarization

CNN/Daily Mail - ROUGE-1

Abstractive Text Summarization

CNN/Daily Mail - ROUGE-2

Abstractive Text Summarization

CNN/Daily Mail - ROUGE-L

Text-to-Image Generation

MS COCO - FID

Text-to-Image Generation

Oxford 102 Flowers - Inception score

Text-to-Image Generation

Oxford 102 Flowers - FID

Text-to-Image Generation

CUB - FID

Natural Language Moment Retrieval

TACoS - R@1,IoU=0.3

Natural Language Moment Retrieval

TACoS - R@1,IoU=0.5

Natural Language Moment Retrieval

TACoS - R@1,IoU=0.7

Natural Language Moment Retrieval

TACoS - mIoU

Multivariate Time Series Forecasting

ETTh1 (192) Multivariate - MSE

Robot Manipulation Generalization

GEMBench - Average Success Rate

Photo to Rest Generalization

PACS - Accuracy

Single-Source Domain Generalization

Digits-five - Accuracy

Visual Object Tracking

OTB-2015 - Precision

Visual Object Tracking

AVisT - Success Rate

Image Denoising

DND - PSNR (sRGB)

Image Denoising

DND - SSIM (sRGB)

Image-to-Image Translation

FLIR - PSNR

Image-to-Image Translation

FLIR - SSIM

Text-To-SQL

spider - Exact Match Accuracy (Dev)

Text-To-SQL

spider - Execution Accuracy (Dev)

Sequential Recommendation

Amazon-Beauty - HR@10

Sequential Recommendation

Amazon-Beauty - nDCG@10

Few-Shot 3D Point Cloud Classification

ModelNet40 5-way (20-shot) - Standard Deviation

Self-Supervised Human Action Recognition

NTU RGB+D 120 - xsub (%)

Self-Supervised Human Action Recognition

NTU RGB+D 120 - xset (%)

Molecular Property Prediction

BBBP - ROC-AUC

Molecular Property Prediction

ToxCast - ROC-AUC

Burst Image Super-Resolution

BurstSR - PSNR

Temporal Relation Extraction

Vinoground - Text Score

Math Word Problem Solving

MATH - Accuracy

3D Hand Pose Estimation

FreiHAND - PA-MPVPE

3D Hand Pose Estimation

FreiHAND - PA-F@5mm

3D Hand Pose Estimation

FreiHAND - PA-F@15mm

3D Hand Pose Estimation

HO-3D v2 - F@15mm

3D Hand Pose Estimation

HO-3D v2 - AUC_J

3D Lane Detection

OpenLane-V2 val - DET_l

3D Lane Detection

OpenLane-V2 val - OLS

3D Lane Detection

OpenLane-V2 val - TOP_ll

3D Lane Detection

OpenLane-V2 val - TOP_lt

Few-Shot Semantic Segmentation

COCO-20i -> Pascal VOC (1-shot) - Mean IoU

Image Super-Resolution

BSD100 - 2x upscaling - PSNR

Image Super-Resolution

BSD100 - 2x upscaling - SSIM

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - Sensitivity

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - S measure

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - mean E-measure

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - mean F-measure

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - Dice

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - Sensitivity

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - S-Measure

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - mean E-measure

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - mean F-measure

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - Dice

Multiple Object Tracking

SportsMOT - MOTA

Multiple Object Tracking

SportsMOT - DetA

Multi-Object Tracking

TAO - TETA

Multi-Object Tracking

TAO - LocA

Multi-Object Tracking

TAO - AssocA

Multi-Object Tracking

TAO - ClsA

Monocular Depth Estimation

ETH3D - Delta < 1.25

Image Dehazing

O-Haze - PSNR

Image Dehazing

O-Haze - SSIM

Image Dehazing

Haze4k - PSNR

Image Dehazing

SOTS Indoor - PSNR

Entity Resolution

WDC Products-80%cc-seen-medium - F1 (%)

3D Object Detection

nuscenes Camera-Radar - NDS

3D Object Detection

S3DIS - [email protected]

3D Object Detection

S3DIS - [email protected]

Zero-Shot Video Question Answer

NExT-GQA - Acc@GQA

3D Object Detection

ScanNet++ - [email protected]

3D Object Detection

ScanNet++ - [email protected]

3D Object Detection

MultiScan - [email protected]

3D Object Detection

MultiScan - [email protected]

3D Object Detection

ARKitScenes - [email protected]

Crack Segmentation

CrackVision12K - mIoU

Multiview Detection

MultiviewX - MODA

Multiview Detection

MultiviewX - Recall

Facial Expression Recognition (FER)

RAF-DB - Overall Accuracy

Facial Expression Recognition (FER)

FER2013 - Accuracy

Facial Expression Recognition (FER)

AffectNet - Accuracy (7 emotion)

Video Anomaly Detection

HR-ShanghaiTech - AUC

Video Anomaly Detection

ShanghaiTech Campus - AUC

Speech Emotion Recognition

MSP-Podcast (Activation) - CCC

Speech Emotion Recognition

MSP-Podcast (Valence) - CCC

Speech Emotion Recognition

MSP-Podcast (Dominance) - CCC

Dynamic Link Prediction

DBLP Temporal - AUC

Dynamic Link Prediction

DBLP Temporal - AP

Point Cloud Registration

RotKITTI Registration Benchmark - RR@(1.5,0.3)

Point Cloud Registration

RotKITTI Registration Benchmark - RR@(1,0.1)

Multivariate Time Series Forecasting

ETTh1 (96) Multivariate - MSE

Image Segmentation

PMD - MAE

Image Segmentation

PMD - IoU

Image Segmentation

PMD - F-measure

Image Segmentation

MSD (Mirror Segmentation Dataset) - MAE

Image Segmentation

MSD (Mirror Segmentation Dataset) - IoU

Image Segmentation

MSD (Mirror Segmentation Dataset) - F-measure

Image Segmentation

RMAS - S-measure

Image Segmentation

MAS3K - S-measure

Image Segmentation

MAS3K - mIoU

Image Segmentation

MAS3K - E-measure

Image Segmentation

MAS3K - MAE

Speech Synthesis

LibriTTS - Periodicity

Rgb-T Tracking

RGBT210 - Precision

Rgb-T Tracking

GTOT - Precision

Rgb-T Tracking

GTOT - Success

Temporal Relation Extraction

Vinoground - Video Score

Temporal Relation Extraction

Vinoground - Group Score

Thermal Image Segmentation

MFN Dataset - mIOU

Video Object Detection

ImageNet VID - MAP

Monocular Depth Estimation

NYU-Depth V2 - RMSE

Monocular Depth Estimation

NYU-Depth V2 - absolute relative error

Monocular Depth Estimation

NYU-Depth V2 - Delta < 1.25^2

Motion Synthesis

AIOZ-GDANCE - FID

Motion Synthesis

AIOZ-GDANCE - MMC

Motion Synthesis

AIOZ-GDANCE - GMC

3D Object Detection

nuScenes LiDAR only - NDS

3D Object Detection

nuScenes LiDAR only - mAP

3D Object Detection

nuScenes LiDAR only - NDS (val)

3D Object Detection

nuScenes LiDAR only - mAP (val)

Referring Expression Segmentation

RefCOCO+ test B - Overall IoU

Referring Expression Segmentation

RefCOCO+ val - Overall IoU

Referring Expression Segmentation

RefCOCOg-val - Overall IoU

Facial Action Unit Detection

DISFA - Average F1

Facial Expression Recognition (FER)

AffectNet - Accuracy (8 emotion)

Point Tracking

TAP-Vid-DAVIS - Average Jaccard

Point Tracking

TAP-Vid-DAVIS - Occlusion Accuracy

Head Pose Estimation

BIWI - MAE (trained with other data)

Temporal Action Localization

HACS - Average-mAP

Temporal Action Localization

HACS - [email protected]

Temporal Action Localization

HACS - [email protected]

Temporal Action Localization

HACS - [email protected]

Citation Intent Classification

SciCite - F1

Low-Light Image Enhancement

LOLv2-synthetic - SSIM

Low-Light Image Enhancement

LOLv2 - Average PSNR

Object Detection

PKU-DDD17-Car - mAP50

3D Semantic Scene Completion from a single RGB image

NYUv2 - mIoU

Overlapped 100-10

ADE20K - Mean IoU (test)

Video Quality Assessment

LIVE-VQC - PLCC

Unsupervised Video Object Segmentation

YouTube-Objects - J

Unsupervised Video Object Segmentation

FBMS test - J

Open Vocabulary Object Detection

LVIS v1.0 - AP novel-LVIS base training

Object Detection

STCrowd - AP

Object Detection

CrowdHuman (full body) - mMR

Object Detection

InOutDoor - AP

Object Detection

EventPed - AP

Story Visualization

Pororo - FID

Domain Generalization

GTA5-to-Cityscapes - mIoU

Domain Generalization

GTA-to-Avg(Cityscapes,BDD,Mapillary) - mIoU

3D Hand Pose Estimation

HO-3D v3 - PA-MPJPE

3D Hand Pose Estimation

HO-3D v3 - PA-MPVPE

3D Hand Pose Estimation

HO-3D v3 - F@5mm

3D Hand Pose Estimation

HO-3D v3 - AUC_J

3D Hand Pose Estimation

HO-3D v3 - AUC_V

3D Hand Pose Estimation

HO-3D v2 - PA-MPJPE (mm)

3D Hand Pose Estimation

HO-3D v2 - F@5mm

3D Hand Pose Estimation

HO-3D v2 - AUC_V

3D Hand Pose Estimation

HO-3D v2 - PA-MPVPE

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (New Days) All

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (VISOR) All

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (NewDays) Visible

Cross-modal retrieval with noisy correspondence

CC152K - Image-to-text R@1

Cross-modal retrieval with noisy correspondence

CC152K - Text-to-image R@5

Cross-modal retrieval with noisy correspondence

COCO-Noisy - Image-to-text R@10

Robot Manipulation Generalization

The COLOSSEUM - Average decrease average across all perturbations

Video Polyp Segmentation

SUN-SEG-Easy - Dice

Video Polyp Segmentation

SUN-SEG-Hard - Dice

Head Pose Estimation

AFLW2000 - MAE

Head Pose Estimation

AFLW2000 - Geodesic Error (GE)

Head Pose Estimation

BIWI - Geodesic Error (GE)

Head Pose Estimation

BIWI - MAE-aligned (trained with other data)

Head Pose Estimation

BIWI - Geodesic Error - aligned (GE)

Stereo Image Super-Resolution

Middlebury - 2x upscaling - PSNR

Stereo Image Super-Resolution

KITTI2012 - 2x upscaling - PSNR

Stereo Image Super-Resolution

KITTI2015 - 2x upscaling - PSNR

Stereo Image Super-Resolution

Flickr1024 - 2x upscaling - PSNR

Video Panoptic Segmentation

VIPSeg - VPQ

Object Detection In Aerial Images

HRSC2016 - mAP-07

Object Detection In Aerial Images

HRSC2016 - mAP-12

Video Frame Interpolation

Xiph-4k - SSIM

Video Frame Interpolation

Xiph-2K - PSNR

Video Frame Interpolation

SNU-FILM (easy) - SSIM

Video Frame Interpolation

X4K1000FPS-2K - PSNR

Video Frame Interpolation

X4K1000FPS-2K - SSIM

Few-Shot Semantic Segmentation

COCO-20i (2-way 1-shot) - mIoU

Style Transfer

StyleBench - CLIP Score

Self-supervised Scene Flow Estimation

Argoverse 2 - EPE 3-Way

Self-supervised Scene Flow Estimation

Argoverse 2 - EPE Background Static

Zero-Shot Video Question Answer

NExT-QA - Accuracy

Zero-Shot Video Question Answer

ActivityNet-QA - Accuracy

Zero-Shot Video Question Answer

TGIF-QA - Accuracy

Zero-Shot Video Question Answer

TGIF-QA - Confidence Score

Zero-Shot Video Question Answer

EgoSchema (subset) - Accuracy

Single Image Desnowing

CSD - Average PSNR (dB)

Knowledge Base Question Answering

WebQuestionsSP - Hits@1

Knowledge Base Question Answering

SimpleQuestionsWikiData - F1

Low-Light Image Enhancement

LOLv2-synthetic - Average PSNR

Image Dehazing

I-Haze - PSNR

Saliency Prediction

SALICON - AUC

Saliency Prediction

SALICON - KLD

Saliency Prediction

SALECI - KL

Multi-task Language Understanding

MMLU - Average (%)

Skeleton Based Action Recognition

First-Person Hand Action Benchmark - 1:1 Accuracy

Hand Gesture Recognition

DHG-28 - Accuracy

Hand Gesture Recognition

SHREC 2017 - 14 Gestures Accuracy

Hand Gesture Recognition

SHREC 2017 - 28 Gestures Accuracy

Hand Gesture Recognition

DHG-14 - Accuracy

Few-Shot Learning

DTD - 8-shot Accuracy

Few-Shot Learning

DTD - 4-shot Accuracy

Few-Shot Learning

DTD - 16-shot Accuracy

Mitigating Contextual Bias

FGVC Aircraft - Top-1 Accuracy (%)

Mitigating Contextual Bias

FGVC Aircraft - OOD Accuracy (%)

Video Super-Resolution

REDS4- 4x upscaling - PSNR

Video Super-Resolution

REDS4- 4x upscaling - SSIM

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - weighted F-measure

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - weighted F-measure

Lipreading

Lip Reading in the Wild - Top-1 Accuracy

Aspect Sentiment Triplet Extraction

ASTE-Data-V2 - F1

Visual Question Answering

MMBench - GPT-3.5 score

Zero-Shot Video Question Answer

IntentQA - Accuracy

Cross-Modal Retrieval

ChEBI-20 - Hits@10

Zero-Shot Composed Image Retrieval (ZS-CIR)

Fashion IQ - (Recall@10+Recall@50)/2

Deblurring

RealBlur-J - PSNR (sRGB)

Deblurring

RealBlur-J (trained on GoPro) - PSNR (sRGB)

Deblurring

RealBlur-J (trained on GoPro) - SSIM (sRGB)

Deblurring

RealBlur-R (trained on GoPro) - PSNR (sRGB)

Deblurring

RealBlur-R - PSNR (sRGB)

3D Lane Detection

Apollo Synthetic 3D Lane - F1

Zero-Shot Video Question Answer

MSRVTT-QA - Accuracy

Zero-Shot Video Question Answer

MSVD-QA - Accuracy

Deblurring

GoPro - PSNR

Deblurring

GoPro - SSIM

Deblurring

DVD - PSNR

Object Detection

AI-TOD - AP

Object Detection

AI-TOD - AP50

Object Detection

AI-TOD - AP75

Object Detection

AI-TOD - APvt

Object Detection

AI-TOD - APt

Object Detection

AI-TOD - APs

Generalized Zero-Shot Learning

SUN Attribute - Harmonic mean

3D Hand Pose Estimation

H3WB - Average MPJPE (mm)

Unsupervised Domain Adaptation

Duke to MSMT - mAP

Unsupervised Domain Adaptation

Duke to MSMT - rank-1

Unsupervised Domain Adaptation

Duke to MSMT - rank-10

Unsupervised Domain Adaptation

Duke to MSMT - rank-5

Unsupervised Domain Adaptation

Duke to Market - mAP

Unsupervised Domain Adaptation

Duke to Market - rank-1

Unsupervised Domain Adaptation

Duke to Market - rank-5

Unsupervised Domain Adaptation

Duke to Market - rank-10

Unsupervised Domain Adaptation

Market to Duke - mAP

Unsupervised Domain Adaptation

Market to Duke - rank-5

Unsupervised Domain Adaptation

Market to Duke - rank-10

Unsupervised Domain Adaptation

Market to MSMT - mAP

Unsupervised Domain Adaptation

Market to MSMT - rank-1

Unsupervised Domain Adaptation

Market to MSMT - rank-10

Unsupervised Domain Adaptation

Market to MSMT - rank-5

Unsupervised Person Re-Identification

Market-1501->DukeMTMC-reID - Rank-10

Few Shot Action Recognition

Kinetics-100 - Accuracy

Underwater Image Restoration

LSUI - PSNR

Single-View 3D Reconstruction

GSO - Chamfer Distance

Action Anticipation

EPIC-KITCHENS-100 - Recall@5

Multiview Detection

CVCS - MODA (1m)

Multiview Detection

CVCS - F1_score (1m)

Saliency Prediction

SALICON - CC

Saliency Prediction

SALICON - SIM

Source-Free Domain Adaptation

VisDA-2017 - Accuracy

Visual Question Answering

MM-Vet - GPT-4 score

Unsupervised Domain Adaptation

virtual KITTI to KITTI (MDE) - RMSE

Cross-modal retrieval with noisy correspondence

CC152K - Image-to-text R@10

Cross-modal retrieval with noisy correspondence

CC152K - R-Sum

Cross-Domain Few-Shot Object Detection

UODD - mAP

3D Semantic Scene Completion from a single RGB image

KITTI-360 - mIoU

3D Semantic Scene Completion from a single RGB image

SemanticKITTI - mIoU

Supervised Video Summarization

SumMe - Kendall's Tau

Supervised Video Summarization

SumMe - Spearman's Rho

Multiple Object Tracking

KITTI Test (Online Methods) - MOTA

Multiple Object Tracking

KITTI Test (Online Methods) - HOTA

Multiple Object Tracking

KITTI Test (Online Methods) - IDSW

3D Multi-Object Tracking

Waymo Open Dataset: Vehicle (Online Methods) - MOTA/L2

Crowd Counting

JHU-CROWD++ - MAE

Crowd Counting

ShanghaiTech A - MAE

Crowd Counting

ShanghaiTech A - MSE

Crowd Counting

UCF CC 50 - MAE

Few-Shot Object Detection

ODinW-35 - Average Score

Few-Shot Object Detection

ODinW-13 - Average Score

Zero-Shot Object Detection

LVIS v1.0 minival - AP

Zero-Shot Object Detection

ODinW - Average Score

Object Detection

ODinW Full-Shot 13 Tasks - AP

Anomaly Classification

GoodsAD - AUROC

Audio Classification

ICBHI Respiratory Sound Database - ICBHI Score

Audio Classification

ICBHI Respiratory Sound Database - Sensitivity

Crowd Counting

ShanghaiTech B - MAE

Crowd Counting

ShanghaiTech A - RMSE

Time Series Forecasting

ETTm1 (96) Multivariate - MAE

Time Series Forecasting

ETTm1 (192) Multivariate - MAE

Time Series Forecasting

ETTm1 (336) Multivariate - MAE

Time Series Forecasting

ETTm2 (192) Multivariate - MSE

visual instruction following

LLaVA-Bench - avg score

Image Super-Resolution

Urban100 - 2x upscaling - PSNR

Image Super-Resolution

Manga109 - 4x upscaling - PSNR

Image Super-Resolution

Set5 - 4x upscaling - PSNR

Image Super-Resolution

Set5 - 3x upscaling - PSNR

Semi-supervised Change Detection

WHU - 10% labeled data - IoU

Semi-supervised Change Detection

WHU - 5% labeled data - IoU

Multivariate Time Series Forecasting

USHCN-Daily - MSE

Heterogeneous Node Classification

Freebase (Heterogeneous Node Classification) - Micro-F1

Heterogeneous Node Classification

ACM (Heterogeneous Node Classification) - Micro-F1

Generative 3D Object Classification

Objaverse - Objaverse (I)

Generative 3D Object Classification

Objaverse - Objaverse (Average)

Generative 3D Object Classification

Objaverse - Objaverse (C)

Audio Classification

SHD - Percentage correct

Audio Classification

SSC - Accuracy

Classify murmurs

CirCor DigiScope - Weighted Accuracy

Zero-Shot Video Question Answer

ActivityNet-QA - Confidence Score

Few-Shot 3D Point Cloud Classification

ModelNet40 10-way (10-shot) - Overall Accuracy

Image Segmentation

RMAS - mIoU

Image Segmentation

RMAS - E-measure

Image Segmentation

RMAS - MAE

Math Word Problem Solving

SVAMP - Accuracy

3D Human Pose Estimation

RICH - MPJPE

3D Human Pose Estimation

RICH - PA-MPJPE

Semi-supervised Change Detection

LEVIR-CD - 40% labeled data - IoU

Graph Classification

Peptides-func - AP

3D Human Pose Estimation

RICH - MPVPE

Motion Synthesis

HumanML3D - Multimodality

Monocular Depth Estimation

KITTI Eigen split unsupervised - absolute relative error

Monocular Depth Estimation

KITTI Eigen split unsupervised - RMSE log

Monocular Depth Estimation

KITTI Eigen split unsupervised - Delta < 1.25

Monocular Depth Estimation

KITTI Eigen split unsupervised - Delta < 1.25^2

Monocular Depth Estimation

Make3D - Sq Rel

Monocular Depth Estimation

KITTI Eigen split - absolute relative error

Monocular Depth Estimation

KITTI Eigen split - RMSE

Monocular Depth Estimation

KITTI Eigen split - RMSE log

Monocular Depth Estimation

KITTI Eigen split - Delta < 1.25

Monocular Depth Estimation

KITTI Eigen split - Delta < 1.25^2

Monocular Depth Estimation

KITTI Eigen split - Sq Rel

Generalized Few-Shot Semantic Segmentation

COCO-20i (1-shot) - Mean Base and Novel

Generalized Few-Shot Semantic Segmentation

COCO-20i (5-shot) - Mean Base and Novel

Generalized Few-Shot Semantic Segmentation

PASCAL-5i (5-Shot) - Mean Base and Novel

Generalized Few-Shot Semantic Segmentation

PASCAL-5i (1-Shot) - Mean Base and Novel

Brain Tumor Segmentation

BRATS-2017 val - Dice Score

Motion Synthesis

InterHuman - R-Precision Top3

Referring Expression Segmentation

RefCOCO+ val - Mean IoU

Visual Question Answering

VQA v2 test-dev - Accuracy

Dichotomous Image Segmentation

DIS-TE2 - max F-Measure

Dichotomous Image Segmentation

DIS-TE2 - weighted F-measure

Dichotomous Image Segmentation

DIS-TE2 - MAE

Dichotomous Image Segmentation

DIS-TE2 - S-Measure

Dichotomous Image Segmentation

DIS-TE2 - E-measure

Dichotomous Image Segmentation

DIS-TE1 - max F-Measure

Dichotomous Image Segmentation

DIS-TE1 - weighted F-measure

Dichotomous Image Segmentation

DIS-TE1 - MAE

Dichotomous Image Segmentation

DIS-TE1 - E-measure

Dichotomous Image Segmentation

DIS-TE3 - max F-Measure

Dichotomous Image Segmentation

DIS-TE3 - S-Measure

Dichotomous Image Segmentation

DIS-TE3 - E-measure

Dichotomous Image Segmentation

DIS-TE4 - max F-Measure

Dichotomous Image Segmentation

DIS-TE4 - E-measure

Dichotomous Image Segmentation

DIS-VD - max F-Measure

Dichotomous Image Segmentation

DIS-VD - weighted F-measure

Dichotomous Image Segmentation

DIS-VD - MAE

Dichotomous Image Segmentation

DIS-VD - S-Measure

Dichotomous Image Segmentation

DIS-VD - E-measure

3D Reconstruction

DTU - Comp

Scene Graph Generation

4D-OR - F1

Sleep Stage Detection

Sleep-EDFx (single-channel) - Macro-F1

Sleep Stage Detection

SHHS (single-channel) - Macro-F1

Sleep Stage Detection

Sleep-EDFx - Macro-F1

Emotion Recognition in Context

BoLD - Average mAP

Emotion Recognition in Context

BoLD - AUC

Emotion Recognition in Context

EMOTIC - mAP

3D Object Reconstruction

BEHAVE - Chamfer Distance

Thermal Image Segmentation

PST900 - mIoU

Thermal Image Segmentation

KP day-night - mIoU

UNET Segmentation

Munich Sentinel2 Crop Segmentation - Overall Accuracy

Few-Shot Class-Incremental Learning

CIFAR-100 - Average Accuracy

Few-Shot Class-Incremental Learning

CIFAR-100 - Last Accuracy

Few-Shot Class-Incremental Learning

CUB-200-2011 - Average Accuracy

Few-Shot Class-Incremental Learning

CUB-200-2011 - Last Accuracy

Few-Shot Class-Incremental Learning

mini-Imagenet - Average Accuracy

Few-Shot Class-Incremental Learning

mini-Imagenet - Last Accuracy

Unsupervised Action Segmentation

Youtube INRIA Instructional - F1

Unsupervised Action Segmentation

Youtube INRIA Instructional - Acc

Unsupervised Action Segmentation

Youtube INRIA Instructional - Precision

Unsupervised Action Segmentation

IKEA ASM - JSD

Unsupervised Action Segmentation

IKEA ASM - Precision

Image Super-Resolution

Set14 - 2x upscaling - PSNR

Image Super-Resolution

Set14 - 4x upscaling - PSNR

Image Super-Resolution

Urban100 - 2x upscaling - SSIM

Image Super-Resolution

Set5 - 2x upscaling - PSNR

Image Super-Resolution

BSD100 - 4x upscaling - PSNR

Temporal Action Localization

MultiTHUMOS - mAP [email protected]

Temporal Action Localization

MultiTHUMOS - mAP [email protected]

Temporal Action Localization

MultiTHUMOS - mAP [email protected]

Weakly-Supervised Semantic Segmentation

COCO 2014 val - mIoU

Multi-Label Text Classification

CC3M-TagMask - Precision

Multi-Label Text Classification

CC3M-TagMask - Accuracy

Blind Docking

PDBBind - Top-1 RMSD (%<2)

Deblurring

RealBlur-J - SSIM (sRGB)

Deblurring

RealBlur-R (trained on GoPro) - SSIM (sRGB)

Deblurring

RealBlur-R - SSIM (sRGB)

Zero-Shot Composed Image Retrieval (ZS-CIR)

ImageNet-R - (Recall@10+Recall@50)/2

Data-free Knowledge Distillation

SQuAD - Exact Match

Motion Synthesis

HumanML3D - R Precision Top3

Motion Synthesis

KIT Motion-Language - R Precision Top3

3D Multi-Person Mesh Recovery

AGORA - FB-NMVE

Egocentric Pose Estimation

UnrealEgo - Average MPJPE (mm)

Egocentric Pose Estimation

UnrealEgo - PA-MPJPE

Drug–drug Interaction Extraction

DrugBank - Accuracy

Drug–drug Interaction Extraction

DrugBank - AUROC

Drug–drug Interaction Extraction

DrugBank - F1 score

Multi-Object Tracking

DanceTrack - HOTA

Multi-Object Tracking

DanceTrack - AssA

Multi-Object Tracking

DanceTrack - IDF1

3D Lane Detection

OpenLane - F1 (all)

3D Lane Detection

OpenLane - Up & Down

3D Lane Detection

OpenLane - Extreme Weather

3D Lane Detection

OpenLane - Night

3D Lane Detection

OpenLane - Intersection

Aspect-Based Sentiment Analysis (ABSA)

MAMS - Acc

Point Cloud Registration

FP-R-M - RRE (degrees)

Point Cloud Registration

FP-R-M - RTE (cm)

Point Cloud Registration

FP-R-H - RRE (degrees)

Point Cloud Registration

FP-R-H - RTE (cm)

Point Cloud Registration

FP-T-E - RRE (degrees)

Point Cloud Registration

FP-T-E - RTE (cm)

Point Cloud Registration

FP-O-E - RRE (degrees)

Point Cloud Registration

FP-O-E - RTE (cm)

Point Cloud Registration

FP-R-E - RRE (degrees)

Point Cloud Registration

FP-R-E - RTE (cm)

Point Cloud Registration

FP-T-M - Recall (3cm, 10 degrees)

Point Cloud Registration

FP-T-M - RRE (degrees)

Point Cloud Registration

FP-T-M - RTE (cm)

Point Cloud Registration

ETH (trained on 3DMatch) - Recall (30cm, 5 degrees)

Point Cloud Registration

FP-O-H - RRE (degrees)

Point Cloud Registration

FP-O-H - RTE (cm)

Point Cloud Registration

FP-O-M - Recall (3cm, 10 degrees)

Point Cloud Registration

FP-O-M - RRE (degrees)

Point Cloud Registration

FP-O-M - RTE (cm)

Point Cloud Registration

FP-T-H - RRE (degrees)

Point Cloud Registration

FP-T-H - RTE (cm)

Monocular Depth Estimation

IBims-1 - δ1.25

Monocular Depth Estimation

NYU-Depth V2 - Delta < 1.25

Monocular Depth Estimation

NYU-Depth V2 - log 10

Action Classification

Kinetics-700 - Top-1 Accuracy

Video Retrieval

LSMDC - text-to-video R@1

Video Retrieval

LSMDC - video-to-text R@1

Video Retrieval

MSVD - text-to-video R@1

Video Retrieval

MSVD - video-to-text R@1

Video Retrieval

SSv2-label retrieval - text-to-video R@5

Building change detection for remote sensing images

LEVIR-CD - F1

Building change detection for remote sensing images

LEVIR-CD - Params(M)

Object Detection In Aerial Images

DIOR-R - mAP

inverse tone mapping

VDS dataset: Multi exposure stack-based inverse tone mapping - Reinhard'TMO-PSNR

inverse tone mapping

VDS dataset: Multi exposure stack-based inverse tone mapping - HDR-VDP-3

inverse tone mapping

VDS dataset: Multi exposure stack-based inverse tone mapping - PU21-PSNR

inverse tone mapping

VDS dataset: Multi exposure stack-based inverse tone mapping - PU21-SSIM

Age And Gender Classification

Adience Age - Accuracy (5-fold)

Chart Question Answering

ChartQA - 1:1 Accuracy

Diffusion Personalization Tuning Free

AgeDB - Cosine Similarity

Diffusion Personalization Tuning Free

AgeDB - FID

Unsupervised Domain Adaptation

SIM10K to Cityscapes - [email protected]

Cross-modal retrieval with noisy correspondence

CC152K - Image-to-text R@5

Cross-modal retrieval with noisy correspondence

COCO-Noisy - Image-to-text R@1

No-Reference Image Quality Assessment

UHD-IQA - SRCC

No-Reference Image Quality Assessment

UHD-IQA - PLCC

Image Super-Resolution

DIV2K val - 4x upscaling - LRPSNR

Efficient ViTs

ImageNet-1K (With LV-ViT-S) - Top 1 Accuracy

Efficient ViTs

ImageNet-1K (With LV-ViT-S) - GFLOPs

3D Face Animation

BEAT2 - MSE

Crowd Counting

UCF-QNRF - MAE

Few-Shot Image Classification

Meta-Dataset - Accuracy

Monocular Depth Estimation

DDAD - absolute relative error

Monocular Depth Estimation

DDAD - Sq Rel

Monocular Depth Estimation

DDAD - RMSE

Monocular Depth Estimation

DDAD - RMSE log

Emotion Recognition in Conversation

EmoryNLP - Weighted-F1

3D Object Detection

View-of-Delft (val) - mAP

Motion Synthesis

AIOZ-GDANCE - GenDiv

Motion Synthesis

AIOZ-GDANCE - PFC

Motion Synthesis

AIOZ-GDANCE - GMR

Motion Synthesis

AIOZ-GDANCE - TIF

Visual Object Tracking

TNL2K - precision

Visual Object Tracking

UAV123 - AUC

Cross-modal retrieval with noisy correspondence

CC152K - Text-to-image R@1

Age Estimation

IMDB-Clean - Average mean absolute error

Age Estimation

CACD - MAE

Facial Attribute Classification

FairFace - gender-top1

Facial Attribute Classification

FairFace - age-top1

Age And Gender Classification

Adience Gender - Accuracy (5-fold)

Semi-Supervised Semantic Segmentation

Pascal VOC 2012 6.25% labeled - Validation mIoU

Retinal Vessel Segmentation

DRIVE - AUC

Unsupervised Semantic Segmentation

Potsdam-3 - Accuracy

Video Panoptic Segmentation

VIPSeg - STQ

Burst Image Super-Resolution

BurstSR - SSIM

Image Harmonization

iHarmony4 - MSE

Image Harmonization

iHarmony4 - fMSE

Zero-Shot Transfer 3D Point Cloud Classification

ScanObjectNN - OBJ_ONLY Accuracy(%)

Few-Shot 3D Point Cloud Classification

ModelNet40 5-way (20-shot) - Overall Accuracy

Few-Shot 3D Point Cloud Classification

ModelNet40 10-way (20-shot) - Overall Accuracy

Image-to-Image Translation

ADE20K Labels-to-Photos - LPIPS

3D Hand Pose Estimation

DexYCB - Average MPJPE (mm)

3D Hand Pose Estimation

DexYCB - Procrustes-Aligned MPJPE

3D Hand Pose Estimation

DexYCB - MPVPE

3D Hand Pose Estimation

DexYCB - VAUC

3D Hand Pose Estimation

DexYCB - PA-MPVPE

3D Hand Pose Estimation

DexYCB - PA-VAUC

3D Question Answering (3D-QA)

ScanQA Test w/ objects - Exact Match

3D Question Answering (3D-QA)

ScanQA Test w/ objects - BLEU-4

3D Question Answering (3D-QA)

ScanQA Test w/ objects - ROUGE

3D Question Answering (3D-QA)

ScanQA Test w/ objects - CIDEr

Facial Landmark Detection

AFLW-Full - Mean NME

Facial Landmark Detection

AFLW-Full - Mean NME

Face Alignment

AFLW-19 - NME_diag (%, Full)

Face Alignment

AFLW-19 - NME_diag (%, Frontal)

Face Alignment

AFLW-19 - NME_box (%, Full)

Face Alignment

AFLW-19 - [email protected] (%, Full)

Zero-shot Named Entity Recognition (NER)

CrossNER - AI

Zero-shot Named Entity Recognition (NER)

CrossNER - Literature

Zero-shot Named Entity Recognition (NER)

CrossNER - Music

Zero-shot Named Entity Recognition (NER)

CrossNER - Politics

Zero-shot Named Entity Recognition (NER)

CrossNER - Science

3D Multi-Person Mesh Recovery

AGORA - FB-MVE

3D Human Reconstruction

EHF - MPVPE

3D Human Reconstruction

EHF - PA V2V (mm), whole body

Weakly Supervised Object Detection

PASCAL VOC 2012 test - MAP

3D Human Pose Estimation

UBody - PVE-All

3D Human Pose Estimation

UBody - PVE-Hands

3D Human Pose Estimation

UBody - PVE-Face

3D Human Pose Estimation

UBody - PA-PVE-All

3D Human Pose Estimation

UBody - PA-PVE-Hands

3D Human Pose Estimation

UBody - PA-PVE-Face

Molecular Property Prediction

FreeSolv - RMSE

Pose Estimation

InLoc - [email protected],10°

Pose Estimation

InLoc - [email protected],10°

Pose Estimation

InLoc - [email protected],10°

Pose Estimation

InLoc - [email protected],10°

Pose Estimation

InLoc - [email protected],10°

Pose Estimation

InLoc - [email protected],10°

Math Word Problem Solving

MAWPS - Accuracy (%)

Trajectory Planning

ToolBench - Win rate

Low-Light Image Enhancement

LIME - BRISQUE

Low-Light Image Enhancement

MEF - BRISQUE

Low-Light Image Enhancement

DICM - BRISQUE

Low-Light Image Enhancement

Sony-Total-Dark - Average PSNR

Low-Light Image Enhancement

Sony-Total-Dark - SSIM

Low-Light Image Enhancement

Sony-Total-Dark - LPIPS

Low-Light Image Enhancement

VV - BRISQUE

Low-Light Image Enhancement

LOL-v2 - Average PSNR

Robot Pose Estimation

DREAM-dataset - AUC (avg. on 4 real DREAM datasets)

Robot Pose Estimation

DREAM-dataset - mean-ADD (avg. on 4 real DREAM datasets)

Low-light Image Deblurring and Enhancement

LOL-Blur - LPIPS

Few-Shot Object Detection

MS-COCO (30-shot) - AP

Few-Shot Object Detection

MS-COCO (10-shot) - AP

Cross-Domain Few-Shot Object Detection

Artaxor - mAP

Cross-Domain Few-Shot Object Detection

Clipark1k - mAP

Cross-Domain Few-Shot Object Detection

DeepFish - mAP

Unsupervised Few-Shot Image Classification

Tiered ImageNet 5-way (1-shot) - Accuracy

Unsupervised Few-Shot Image Classification

Mini-Imagenet 5-way (5-shot) - Accuracy

Unsupervised Few-Shot Image Classification

Mini-Imagenet 5-way (1-shot) - Accuracy

Unsupervised Few-Shot Image Classification

Tiered ImageNet 5-way (5-shot) - Accuracy

Time Series Forecasting

ETTh1 (720) Univariate - MSE

Time Series Forecasting

ETTh1 (720) Univariate - MAE

Time Series Forecasting

ETTh2 (720) Univariate - MSE

Point Tracking

TAP-Vid-DAVIS - Average PCK

Low-Light Image Enhancement

LOLv2-synthetic - LPIPS

Low-Light Image Enhancement

LOLv2 - LPIPS

Image Denoising

SIDD - PSNR (sRGB)

Music Source Separation

MUSDB18-HQ - SDR (drums)

Music Source Separation

MUSDB18-HQ - SDR (others)

Music Source Separation

MUSDB18-HQ - SDR (vocals)

Music Source Separation

MUSDB18-HQ - SDR (avg)

3D Reconstruction

DTU - Overall

Monocular Depth Estimation

NYU-Depth V2 - Delta < 1.25^3

Data-to-Text Generation

E2E NLG Challenge - METEOR

Network Intrusion Detection

CICIDS2017 - Recall

Head Pose Estimation

Panoptic - Geodesic Error (GE)

RGB Salient Object Detection

DUT-OMRON - MAE

RGB Salient Object Detection

DUT-OMRON - S-Measure

RGB Salient Object Detection

DUT-OMRON - mean F-Measure

RGB Salient Object Detection

DUT-OMRON - mean E-Measure

RGB Salient Object Detection

DAVIS-S - S-measure

RGB Salient Object Detection

DAVIS-S - F-measure

RGB Salient Object Detection

DAVIS-S - MAE

RGB Salient Object Detection

HRSOD - S-Measure

RGB Salient Object Detection

HRSOD - max F-Measure

RGB Salient Object Detection

HRSOD - MAE

RGB Salient Object Detection

UHRSD - S-Measure

RGB Salient Object Detection

UHRSD - max F-Measure

RGB Salient Object Detection

UHRSD - MAE

RGB Salient Object Detection

DUTS-TE - MAE

RGB Salient Object Detection

DUTS-TE - max F-measure

RGB Salient Object Detection

DUTS-TE - S-Measure

RGB Salient Object Detection

DUTS-TE - mean E-Measure

RGB Salient Object Detection

DUTS-TE - mean F-Measure

Dichotomous Image Segmentation

DIS-TE1 - S-Measure

Dichotomous Image Segmentation

DIS-TE1 - HCE

Dichotomous Image Segmentation

DIS-TE3 - weighted F-measure

Dichotomous Image Segmentation

DIS-TE3 - MAE

Dichotomous Image Segmentation

DIS-TE4 - weighted F-measure

Dichotomous Image Segmentation

DIS-TE4 - MAE

Camouflaged Object Segmentation

CHAMELEON - S-measure

Camouflaged Object Segmentation

CHAMELEON - weighted F-measure

Camouflaged Object Segmentation

CHAMELEON - MAE

Camouflaged Object Segmentation

NC4K - S-measure

Camouflaged Object Segmentation

NC4K - weighted F-measure

Camouflaged Object Segmentation

NC4K - MAE

Camouflaged Object Segmentation

COD - MAE

Camouflaged Object Segmentation

COD - Weighted F-Measure

Camouflaged Object Segmentation

COD - S-Measure

Audio Classification

Balanced Audio Set - Mean AP

Key-value Pair Extraction

SIBR - key-value pair F1

Key-value Pair Extraction

RFUND-EN - key-value pair F1

Video Object Tracking

NT-VOT211 - Precision

Visual Object Tracking

OTB-2015 - AUC

Action Segmentation

GTEA - F1@10%

Action Segmentation

GTEA - F1@25%

Action Segmentation

GTEA - Edit

Action Classification

Moments in Time - Top 1 Accuracy

Action Classification

MiT - Top 1 Accuracy

Chart Question Answering

PlotQA - 1:1 Accuracy

Action Anticipation

EGTEA - Top-1 Accuracy

3D Instance Segmentation

STPLS3D - AP50

3D Instance Segmentation

STPLS3D - AP25

3D Instance Segmentation

STPLS3D - AP

Spectral Reconstruction

CAVE - PSNR

Spectral Reconstruction

CAVE - SSIM

Spectral Reconstruction

KAIST - PSNR

Spectral Reconstruction

KAIST - SSIM

3D Point Cloud Classification

ModelNet40-C - Error Rate

Shadow Removal

SRD - SSIM

Shadow Removal

SRD - LPIPS

Scene Text Recognition

ICDAR2013 - Accuracy

Deblurring

RSBlur - Average PSNR

Video Quality Assessment

LIVE-FB LSVQ - PLCC

Skeleton Based Action Recognition

N-UCLA - Accuracy

Cross-modal retrieval with noisy correspondence

Flickr30K-Noisy - Image-to-text R@1

Cross-modal retrieval with noisy correspondence

Flickr30K-Noisy - Text-to-image R@1

Artifact Detection

HistoArtifacts - MCC

Image Dehazing

SOTS Outdoor - PSNR

Image-to-Image Translation

ADE20K Labels-to-Photos - mIoU

Image-to-Image Translation

ADE20K Labels-to-Photos - FID

Image-to-Image Translation

COCO-Stuff Labels-to-Photos - FID

Image-to-Image Translation

Cityscapes Labels-to-Photo - mIoU

Image-to-Image Translation

Cityscapes Labels-to-Photo - FID

Video Semantic Segmentation

VSPW - mIoU

Multiview Detection

Wildtrack - MODA

Multiview Detection

Wildtrack - Recall

Human Pose Forecasting

Human3.6M - Average MPJPE (mm) @ 1000 ms

Human Pose Forecasting

Human3.6M - Average MPJPE (mm) @ 400ms

Robust Object Detection

DWD - mPC [AP50]

3D Instance Segmentation

ScanNet(v2) - mAP @ 50

3D Instance Segmentation

ScanNet(v2) - mAP

3D Instance Segmentation

ScanNet200 - mAP

Single Image Deraining

Rain100H - SSIM

Single Image Deraining

Rain100H - PSNR

Generalized Referring Expression Segmentation

gRefCOCO - gIoU

Generalized Referring Expression Segmentation

gRefCOCO - cIoU

Referring Video Object Segmentation

Refer-YouTube-VOS - J&F

Referring Video Object Segmentation

Refer-YouTube-VOS - J

Referring Video Object Segmentation

Refer-YouTube-VOS - F

Column Type Annotation

VizNet-Sato-Full - Macro-F1

Multi-Person Pose Estimation

CrowdPose - mAP @0.5:0.95

Multi-Person Pose Estimation

CrowdPose - AP Easy

Multi-Person Pose Estimation

CrowdPose - AP Medium

Multi-Person Pose Estimation

CrowdPose - AP Hard

Semi-Supervised Object Detection

COCO 2% labeled data - mAP

Semi-Supervised Object Detection

COCO 10% labeled data - mAP

Semi-Supervised Object Detection

COCO 100% labeled data - mAP

Video Prediction

Kinetics-600 12 frames, 64x64 - FVD

Science Question Answering

ScienceQA - Social Science

Time Series Forecasting

Electricity (720) - MSE

Image Inpainting

Places2 - FID

Image Inpainting

Places2 - P-IDS

Image Inpainting

Places2 - U-IDS

Image Inpainting

Places2 - LPIPS

Text-based Image Editing

PIE-Bench - Background PSNR

Motion Synthesis

InterHuman - MMDist

Weakly-Supervised Semantic Segmentation

PASCAL VOC 2012 train - Mean IoU

Image Manipulation Detection

COVERAGE - AUC

Image Manipulation Detection

COVERAGE - Balanced Accuracy

Image Manipulation Detection

CocoGlide - Balanced Accuracy

Image Manipulation Detection

DSO-1 - Balanced Accuracy

Image Manipulation Detection

Casia V1+ - Balanced Accuracy

3D Question Answering (3D-QA)

ScanQA Test w/ objects - BLEU-1

3D Question Answering (3D-QA)

ScanQA Test w/ objects - METEOR

Pose Estimation

AIC - AP

Referring Expression Segmentation

RefCOCOg-test - Overall IoU

Human Part Segmentation

CIHP - Mean IoU

Monocular Depth Estimation

NYU-Depth V2 self-supervised - Root mean square error (RMSE)

Monocular Depth Estimation

NYU-Depth V2 self-supervised - Absolute relative error (AbsRel)

Monocular Depth Estimation

NYU-Depth V2 self-supervised - delta_1

Monocular Depth Estimation

NYU-Depth V2 self-supervised - delta_2

Monocular Depth Estimation

NYU-Depth V2 self-supervised - delta_3

Bird's-Eye View Semantic Segmentation

Lyft Level 5 - IoU vehicle - 224x480 - Long

Bird's-Eye View Semantic Segmentation

Lyft Level 5 - IoU vehicle - 224x480 - Short

Action Classification

Toyota Smarthome dataset - CS

Action Classification

Toyota Smarthome dataset - CV1

Egocentric Pose Estimation

GlobalEgoMocap Test Dataset - Average MPJPE (mm)

Egocentric Pose Estimation

GlobalEgoMocap Test Dataset - PA-MPJPE

Egocentric Pose Estimation

SceneEgo - Average MPJPE (mm)

Egocentric Pose Estimation

SceneEgo - PA-MPJPE

Semantic correspondence

SPair-71k - PCK

Semantic correspondence

PF-PASCAL - PCK

Pose Estimation

COCO val2017 - AP

Pose Estimation

COCO val2017 - AP50

Pose Estimation

COCO val2017 - AP75

3D Dense Shape Correspondence

SHREC'19 - Euclidean Mean Error (EME)

3D Dense Shape Correspondence

SHREC'19 - Accuracy at 1%

3D Human Reconstruction

CustomHumans - Chamfer Distance P-to-S

3D Human Reconstruction

CustomHumans - Chamfer Distance S-to-P

Video Retrieval

VATEX - text-to-video R@50

Panoptic Scene Graph Generation

PSG Dataset - mR@20

Semi-Supervised Semantic Segmentation

Cityscapes 100 samples labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/512 labeled - Validation mIoU

Few-Shot Learning

MedConceptsQA - Accuracy

3D Instance Segmentation

ScanNet(v2) - mAP@25

Science Question Answering

ScienceQA - Natural Science

Science Question Answering

ScienceQA - Language Science

Science Question Answering

ScienceQA - Text Context

Science Question Answering

ScienceQA - Image Context

Science Question Answering

ScienceQA - No Context

Science Question Answering

ScienceQA - Grades 1-6

Science Question Answering

ScienceQA - Grades 7-12

Science Question Answering

ScienceQA - Avg. Accuracy

Synthetic-to-Real Translation

SYNTHIA-to-Cityscapes - MIoU (13 classes)

Synthetic-to-Real Translation

SYNTHIA-to-Cityscapes - MIoU (16 classes)

Synthetic-to-Real Translation

GTAV-to-Cityscapes Labels - mIoU

GZSL Video Classification

ActivityNet-GZSL (cls) - HM

GZSL Video Classification

ActivityNet-GZSL (cls) - ZSL

GZSL Video Classification

ActivityNet-GZSL(main) - HM

GZSL Video Classification

ActivityNet-GZSL(main) - ZSL

GZSL Video Classification

VGGSound-GZSL(main) - HM

GZSL Video Classification

VGGSound-GZSL(main) - ZSL

GZSL Video Classification

VGGSound-GZSL (cls) - HM

GZSL Video Classification

VGGSound-GZSL (cls) - ZSL

GZSL Video Classification

UCF-GZSL (cls) - HM

GZSL Video Classification

UCF-GZSL (cls) - ZSL

GZSL Video Classification

UCF-GZSL(main) - HM

GZSL Video Classification

UCF-GZSL(main) - ZSL

Low-Dose X-Ray Ct Reconstruction

X3D - PSNR

Low-Dose X-Ray Ct Reconstruction

X3D - SSIM

Multi-task Language Understanding

BBH-nlp - Average (%)

Depth Anomaly Detection and Segmentation

MVTEC 3D-AD - Segmentation AUPRO

Depth Anomaly Detection and Segmentation

MVTEC 3D-AD - Detection AUROC

Multiple Object Tracking

BDD100K test - mMOTA

Hateful Meme Classification

HarMeme - Accuracy

Image-to-Image Translation

horse2zebra - Frechet Inception Distance

Sports Ball Detection and Tracking

Volleyball - F1 (%)

Sports Ball Detection and Tracking

Volleyball - Accuracy (%)

Sports Ball Detection and Tracking

Volleyball - Average Precision (%)

Sports Ball Detection and Tracking

Soccer - F1 (%)

Sports Ball Detection and Tracking

Soccer - Average Precision (%)

Sports Ball Detection and Tracking

Soccer - Accuracy (% )

Sports Ball Detection and Tracking

Badminton - F1 (%)

Sports Ball Detection and Tracking

Badminton - Accuracy (%)

Sports Ball Detection and Tracking

Badminton - Average Precision (%)

Sports Ball Detection and Tracking

Basketball - F1 (%)

Sports Ball Detection and Tracking

Basketball - Accuracy (%)

Sports Ball Detection and Tracking

Basketball - Average Precision (%)

Sports Ball Detection and Tracking

Tennis - F1 (%)

Sports Ball Detection and Tracking

Tennis - Accuracy (%)

Sports Ball Detection and Tracking

Tennis - Average Precision (%)

Audio Classification

VGGSound - Top 1 Accuracy

Multimodal Emotion Recognition

IEMOCAP - F1

Multimodal Emotion Recognition

IEMOCAP - Weighted Accuracy (WA)

Image Dehazing

SOTS Outdoor - SSIM

Video Anomaly Detection

HR-Avenue - AUC

Video Story QA

MovieQA - Accuracy

Math Word Problem Solving

ASDiv-A - Execution Accuracy

Video Retrieval

QuerYD - text-to-video R@1

Video Retrieval

QuerYD - text-to-video R@10

Video Retrieval

QuerYD - text-to-video R@5

Video Retrieval

Condensed Movies - text-to-video R@1

Video Retrieval

Condensed Movies - text-to-video R@5

Video Retrieval

Condensed Movies - text-to-video R@10

Multi-Label Image Classification

BigEarthNet (official test set) - F1 Score

Aspect-Based Sentiment Analysis (ABSA)

SemEval 2014 Task 4 Subtask 1+2 - F1

Hand Gesture Recognition

LSA16 - Accuracy

Cross-modal retrieval with noisy correspondence

CC152K - Text-to-image R@10

Cross-modal retrieval with noisy correspondence

COCO-Noisy - Text-to-image R@1

Retinal Vessel Segmentation

CHASE_DB1 - mIOU

Indoor Scene Synthesis

PRO-teXt - CD

Indoor Scene Synthesis

PRO-teXt - EMD

Indoor Scene Synthesis

PRO-teXt - F1

Semi-Supervised Image Classification

ImageNet - 10% labeled data - Top 5 Accuracy

Single-View 3D Reconstruction

GSO - IoU

Video-Text Retrieval

Test-of-Time - 2-Class Accuracy

Heterogeneous Node Classification

Freebase (Heterogeneous Node Classification) - Macro-F1

Heterogeneous Node Classification

OAG-L1-Field - NDCG

Heterogeneous Node Classification

OAG-L1-Field - MRR

Heterogeneous Node Classification

IMDB (Heterogeneous Node Classification) - Macro-F1

Heterogeneous Node Classification

IMDB (Heterogeneous Node Classification) - Micro-F1

Heterogeneous Node Classification

DBLP (Heterogeneous Node Classification) - Macro-F1

Heterogeneous Node Classification

DBLP (Heterogeneous Node Classification) - Micro-F1

Multi-class Anomaly Detection

MVTec AD - Detection AUROC

Multi-class Anomaly Detection

MVTec AD - Segmentation AUROC

No-Reference Image Quality Assessment

CSIQ - PLCC

Analog Video Restoration

TAPE - LPIPS

Analog Video Restoration

TAPE - PSNR

Semi-Supervised Video Object Segmentation

MOSE - J&F

Semi-Supervised Video Object Segmentation

MOSE - J

Semi-Supervised Video Object Segmentation

MOSE - F

Semi-Supervised Video Object Segmentation

YouTube-VOS 2019 - Overall

Semi-Supervised Video Object Segmentation

YouTube-VOS 2019 - F-Measure (Unseen)

Vehicle Re-Identification

VeRi-Wild Small - mAP

Motion Synthesis

Motion-X - FID

Motion Synthesis

Motion-X - TMR-R-Precision Top3

Motion Synthesis

Motion-X - TMR-Matching Score

Motion Synthesis

Motion-X - MModality

Conditional Image Generation

ImageNet 128x128 - FID

Few-Shot Image Classification

CUB 200 5-way 5-shot - Accuracy

Few-Shot Image Classification

CIFAR-FS 5-way (5-shot) - Accuracy

Domain Generalization

TerraIncognita - Average Accuracy

No-Reference Image Quality Assessment

KADID-10k - SRCC

No-Reference Image Quality Assessment

KADID-10k - PLCC