Conditional Image Generation

ImageNet 256x256 - FID

Video Frame Interpolation

X4K1000FPS - PSNR

Rgb-T Tracking

RGBT210 - Success

Semi-Supervised Semantic Segmentation

ADE20K 1/16 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/128 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 183 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

Cityscapes 6.25% labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/256 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

ADE20K 1/32 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/64 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 92 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 732 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/32 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 366 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 1464 labels - Validation mIoU

Low-Light Image Enhancement

LIME - NIQE

Low-Light Image Enhancement

NPE - NIQE

Low-Light Image Enhancement

LOL - Average PSNR

Low-Light Image Enhancement

LOL - LPIPS

Low-Light Image Enhancement

MEF - NIQE

Low-Light Image Enhancement

VV - NIQE

Low-Light Image Enhancement

DICM - NIQE

Motion Synthesis

InterHuman - FID

Motion Synthesis

Inter-X - FID

Motion Synthesis

Inter-X - R-Precision Top3

Motion Synthesis

Inter-X - MMDist

Math Word Problem Solving

SVAMP - Execution Accuracy

Math Word Problem Solving

Math23K - Accuracy (5-fold)

Reflection Removal

SIR^2(Objects) - PSNR

Reflection Removal

Real20 - PSNR

Reflection Removal

Real20 - SSIM

Skeleton Based Action Recognition

H2O (2 Hands and Objects) - Accuracy

Video Prediction

Moving MNIST - MSE

Video Prediction

Moving MNIST - MAE

Text-to-Image Generation

Oxford 102 Flowers - Inception score

Text-to-Image Generation

Oxford 102 Flowers - FID

Text-to-Image Generation

CUB - FID

Natural Language Moment Retrieval

TACoS - R@1,IoU=0.3

Natural Language Moment Retrieval

TACoS - R@1,IoU=0.5

Natural Language Moment Retrieval

TACoS - R@1,IoU=0.7

Natural Language Moment Retrieval

TACoS - mIoU

Robot Manipulation Generalization

GEMBench - Average Success Rate

Photo to Rest Generalization

PACS - Accuracy

Single-Source Domain Generalization

Digits-five - Accuracy

Visual Object Tracking

AVisT - Success Rate

Visual Object Tracking

OTB-2015 - Precision

Text-To-SQL

spider - Exact Match Accuracy (Dev)

Text-To-SQL

spider - Execution Accuracy (Dev)

Self-Supervised Human Action Recognition

NTU RGB+D 120 - xsub (%)

Self-Supervised Human Action Recognition

NTU RGB+D 120 - xset (%)

Molecular Property Prediction

BBBP - ROC-AUC

Molecular Property Prediction

ToxCast - ROC-AUC

Burst Image Super-Resolution

BurstSR - PSNR

Temporal Relation Extraction

Vinoground - Text Score

Math Word Problem Solving

MATH - Accuracy

3D Hand Pose Estimation

FreiHAND - PA-MPVPE

3D Hand Pose Estimation

FreiHAND - PA-F@5mm

3D Hand Pose Estimation

FreiHAND - PA-F@15mm

3D Hand Pose Estimation

FreiHAND - PA-MPJPE

Few-Shot Semantic Segmentation

COCO-20i -> Pascal VOC (1-shot) - Mean IoU

Image Super-Resolution

BSD100 - 2x upscaling - PSNR

Image Super-Resolution

BSD100 - 2x upscaling - SSIM

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - Sensitivity

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - S-Measure

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - mean E-measure

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - mean F-measure

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - Dice

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - Sensitivity

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - S measure

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - mean E-measure

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - mean F-measure

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - Dice

Multiple Object Tracking

SportsMOT - MOTA

Multiple Object Tracking

SportsMOT - DetA

Multi-Object Tracking

TAO - TETA

Multi-Object Tracking

TAO - LocA

Multi-Object Tracking

TAO - AssocA

Multi-Object Tracking

TAO - ClsA

Monocular Depth Estimation

ETH3D - Delta < 1.25

Image Dehazing

Haze4k - PSNR

Image Dehazing

O-Haze - PSNR

Image Dehazing

O-Haze - SSIM

Image Dehazing

SOTS Indoor - PSNR

Entity Resolution

WDC Products-80%cc-seen-medium - F1 (%)

3D Object Detection

nuscenes Camera-Radar - NDS

3D Object Detection

ARKitScenes - [email protected]

3D Object Detection

S3DIS - [email protected]

3D Object Detection

S3DIS - [email protected]

Zero-Shot Video Question Answer

NExT-GQA - Acc@GQA

3D Object Detection

ScanNetV2 - [email protected]

3D Object Detection

ScanNet++ - [email protected]

3D Object Detection

ScanNet++ - [email protected]

3D Object Detection

MultiScan - [email protected]

3D Object Detection

MultiScan - [email protected]

Crack Segmentation

CrackVision12K - mIoU

Multiview Detection

MultiviewX - MODA

Multiview Detection

MultiviewX - Recall

Facial Expression Recognition (FER)

RAF-DB - Overall Accuracy

Facial Expression Recognition (FER)

FER2013 - Accuracy

Facial Expression Recognition (FER)

AffectNet - Accuracy (7 emotion)

Video Anomaly Detection

HR-ShanghaiTech - AUC

Video Anomaly Detection

ShanghaiTech Campus - AUC

Speech Emotion Recognition

MSP-Podcast (Valence) - CCC

Speech Emotion Recognition

MSP-Podcast (Dominance) - CCC

Speech Emotion Recognition

MSP-Podcast (Activation) - CCC

Dynamic Link Prediction

DBLP Temporal - AUC

Dynamic Link Prediction

DBLP Temporal - AP

Point Cloud Registration

RotKITTI Registration Benchmark - RR@(1.5,0.3)

Point Cloud Registration

RotKITTI Registration Benchmark - RR@(1,0.1)

Time Series Forecasting

ETTm1 (96) Multivariate - MSE

Time Series Forecasting

Electricity (96) - MSE

Image Segmentation

MSD (Mirror Segmentation Dataset) - MAE

Image Segmentation

MSD (Mirror Segmentation Dataset) - IoU

Image Segmentation

MSD (Mirror Segmentation Dataset) - F-measure

Image Segmentation

PMD - MAE

Image Segmentation

PMD - IoU

Image Segmentation

PMD - F-measure

Image Segmentation

RMAS - S-measure

Image Segmentation

MAS3K - S-measure

Image Segmentation

MAS3K - mIoU

Image Segmentation

MAS3K - E-measure

Image Segmentation

MAS3K - MAE

Speech Synthesis

LibriTTS - Periodicity

Rgb-T Tracking

RGBT210 - Precision

Rgb-T Tracking

GTOT - Precision

Rgb-T Tracking

GTOT - Success

Text-To-SQL

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) - Execution Accuracy % (Test)

Text-To-SQL

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) - Execution Accuracy % (Dev)

3D Object Detection

ScanNetV2 - [email protected]

Unsupervised Semantic Segmentation with Language-image Pre-training

Cityscapes val - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

COCO-Stuff-171 - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

PASCAL Context-59 - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

COCO-Object - mIoU

Temporal Relation Extraction

Vinoground - Video Score

Temporal Relation Extraction

Vinoground - Group Score

Zero-Shot Video Question Answer

Video-MME (w/o subs) - Accuracy (%)

Thermal Image Segmentation

MFN Dataset - mIOU

Video Object Detection

ImageNet VID - MAP

Molecular Property Prediction

FreeSolv - RMSE

Monocular Depth Estimation

NYU-Depth V2 - absolute relative error

Motion Synthesis

AIOZ-GDANCE - FID

Motion Synthesis

AIOZ-GDANCE - MMC

Motion Synthesis

AIOZ-GDANCE - GMC

3D Object Detection

nuScenes LiDAR only - NDS

3D Object Detection

nuScenes LiDAR only - mAP

3D Object Detection

nuScenes LiDAR only - NDS (val)

3D Object Detection

nuScenes LiDAR only - mAP (val)

Video-based Generative Performance Benchmarking (Consistency)

VideoInstruct - gpt-score

Point Tracking

TAP-Vid-DAVIS - Average Jaccard

Point Tracking

TAP-Vid-DAVIS - Occlusion Accuracy

Head Pose Estimation

BIWI - MAE (trained with other data)

Temporal Action Localization

HACS - Average-mAP

Temporal Action Localization

HACS - [email protected]

Temporal Action Localization

HACS - [email protected]

Temporal Action Localization

HACS - [email protected]

Low-Light Image Enhancement

LOLv2-synthetic - Average PSNR

Low-Light Image Enhancement

LOLv2-synthetic - SSIM

Low-Light Image Enhancement

LOLv2 - Average PSNR

Object Detection

PKU-DDD17-Car - mAP50

3D Semantic Scene Completion from a single RGB image

NYUv2 - mIoU

Overlapped 100-10

ADE20K - Mean IoU (test)

Video Quality Assessment

LIVE-VQC - PLCC

Unsupervised Video Object Segmentation

FBMS test - J

Open Vocabulary Object Detection

LVIS v1.0 - AP novel-LVIS base training

Facial Action Unit Detection

DISFA - Average F1

Object Detection

CrowdHuman (full body) - mMR

Object Detection

InOutDoor - AP

Object Detection

EventPed - AP

Object Detection

STCrowd - AP

Story Visualization

Pororo - FID

Domain Generalization

GTA5-to-Cityscapes - mIoU

Domain Generalization

GTA-to-Avg(Cityscapes,BDD,Mapillary) - mIoU

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (New Days) All

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (VISOR) All

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (NewDays) Visible

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (VISOR) Visible

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (NewDays) Occ

3D Hand Pose Estimation

HO-3D v3 - PA-MPJPE

3D Hand Pose Estimation

HO-3D v3 - PA-MPVPE

3D Hand Pose Estimation

HO-3D v3 - F@5mm

3D Hand Pose Estimation

HO-3D v3 - F@15mm

3D Hand Pose Estimation

HO-3D v3 - AUC_J

3D Hand Pose Estimation

HO-3D v3 - AUC_V

Cross-modal retrieval with noisy correspondence

CC152K - Image-to-text R@1

Cross-modal retrieval with noisy correspondence

CC152K - Text-to-image R@5

Cross-modal retrieval with noisy correspondence

COCO-Noisy - Image-to-text R@10

Robot Manipulation Generalization

The COLOSSEUM - Average decrease average across all perturbations

Video Polyp Segmentation

SUN-SEG-Easy - Dice

Video Polyp Segmentation

SUN-SEG-Hard - Dice

Monocular Depth Estimation

KITTI Eigen split unsupervised - RMSE log

Monocular Depth Estimation

KITTI Eigen split unsupervised - Delta < 1.25

Monocular Depth Estimation

KITTI Eigen split unsupervised - Delta < 1.25^2

Head Pose Estimation

BIWI - Geodesic Error (GE)

Head Pose Estimation

BIWI - MAE-aligned (trained with other data)

Head Pose Estimation

BIWI - Geodesic Error - aligned (GE)

Video Panoptic Segmentation

VIPSeg - VPQ

Object Detection In Aerial Images

HRSC2016 - mAP-07

Object Detection In Aerial Images

HRSC2016 - mAP-12

Video Frame Interpolation

SNU-FILM (easy) - SSIM

Video Frame Interpolation

Xiph-4k - SSIM

Video Frame Interpolation

Xiph-2K - PSNR

Video Frame Interpolation

X4K1000FPS-2K - PSNR

Video Frame Interpolation

X4K1000FPS-2K - SSIM

Few-Shot Semantic Segmentation

COCO-20i (2-way 1-shot) - mIoU

Self-supervised Scene Flow Estimation

Argoverse 2 - EPE 3-Way

Self-supervised Scene Flow Estimation

Argoverse 2 - EPE Background Static

Style Transfer

StyleBench - CLIP Score

Zero-Shot Video Question Answer

NExT-QA - Accuracy

Zero-Shot Video Question Answer

ActivityNet-QA - Accuracy

Zero-Shot Video Question Answer

TGIF-QA - Accuracy

Zero-Shot Video Question Answer

TGIF-QA - Confidence Score

Zero-Shot Video Question Answer

MSRVTT-QA - Confidence Score

Zero-Shot Video Question Answer

EgoSchema (subset) - Accuracy

Single Image Desnowing

CSD - Average PSNR (dB)

Referring Expression Segmentation

RefCOCO testB - Overall IoU

Referring Expression Segmentation

RefCOCO+ testA - Overall IoU

Referring Expression Segmentation

RefCOCO+ val - Overall IoU

Referring Expression Segmentation

RefCOCO testA - Overall IoU

Referring Expression Segmentation

RefCOCO+ test B - Overall IoU

Image Dehazing

I-Haze - PSNR

Saliency Prediction

SALICON - AUC

Saliency Prediction

SALICON - KLD

Saliency Prediction

SALECI - KL

Skeleton Based Action Recognition

First-Person Hand Action Benchmark - 1:1 Accuracy

Hand Gesture Recognition

SHREC 2017 - 14 Gestures Accuracy