Referring Video Object Segmentation

ReVOS - J

Referring Video Object Segmentation

ReVOS - F

Referring Video Object Segmentation

ReVOS - J&F

Referring Video Object Segmentation

ReVOS - R

Referring Video Object Segmentation

MeViS - J&F

Referring Video Object Segmentation

MeViS - J

Referring Video Object Segmentation

MeViS - F

Referring Expression Segmentation

Refer-YouTube-VOS (2021 public validation) - J&F

Referring Expression Segmentation

Refer-YouTube-VOS (2021 public validation) - J

Referring Expression Segmentation

Refer-YouTube-VOS (2021 public validation) - F

Referring Expression Segmentation

RefCOCO - IoU

Multi-Person Pose Estimation

PoseTrack2018 - Mean mAP

RGB Salient Object Detection

ECSSD - mean F-Measure

RGB Salient Object Detection

HKU-IS - mean F-Measure

RGB Salient Object Detection

PASCAL-S - mean F-Measure

Anomaly Detection In Surveillance Videos

XD-Violence - AP

Unsupervised Action Segmentation

Youtube INRIA Instructional - Recall

Unsupervised Action Segmentation

IKEA ASM - Accuracy

Unsupervised Action Segmentation

IKEA ASM - F1

Unsupervised Action Segmentation

IKEA ASM - Recall

Time Series Forecasting

Weather (336) - MSE

Time Series Forecasting

Weather (336) - MAE

Time Series Forecasting

ETTm1 (96) Multivariate - MSE

Time Series Forecasting

ETTm1 (192) Multivariate - MSE

Time Series Forecasting

ETTm2 (192) Multivariate - MAE

Time Series Forecasting

ETTm2 (96) Multivariate - MSE

Time Series Forecasting

ETTm2 (96) Multivariate - MAE

Time Series Forecasting

ETTh2 (192) Multivariate - MAE

Time Series Forecasting

ETTh2 (96) Multivariate - MAE

Time Series Forecasting

ETTm2 (336) Multivariate - MAE

Time Series Forecasting

ETTh2 (336) Multivariate - MSE

Time Series Forecasting

ETTh2 (336) Multivariate - MAE

Time Series Forecasting

ETTm2 (720) Multivariate - MSE

Time Series Forecasting

ETTm2 (720) Multivariate - MAE

Skeleton Based Action Recognition

JHMDB (2D poses only) - Average accuracy of 3 splits

Image Rescaling

DIV2K val-q50-4x - PSNR

Image Rescaling

DIV2K val-q50-4x - SSIM

Image Rescaling

DIV2K val-q70-2x - PSNR

Image Rescaling

DIV2K val-q70-2x - SSIM

Image Rescaling

DIV2K val-q30-2x - PSNR

Image Rescaling

DIV2K val-q30-2x - SSIM

Image Rescaling

DIV2K val-q90-2x - PSNR

Image Rescaling

DIV2K val-q90-2x - SSIM

Image Rescaling

DIV2K val-q30-4x - PSNR

Image Rescaling

DIV2K val-q30-4x - SSIM

Image Rescaling

DIV2K val-q50-2x - PSNR

Image Rescaling

DIV2K val-q50-2x - SSIM

Image Rescaling

DIV2K val-q70-4x - PSNR

Image Rescaling

DIV2K val-q70-4x - SSIM

Image Rescaling

DIV2K val-q90-4x - PSNR

Image Rescaling

DIV2K val-q90-4x - SSIM

3D Hand Pose Estimation

HO-3D v3 - F@15mm

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (Ego4D) All

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (VISOR) Visible

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (Ego4D) Visible

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (NewDays) Occ

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (VISOR) Occ

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (Ego4D) Occ

CAD Reconstruction

DeepCAD - Chamfer Distance

CAD Reconstruction

DeepCAD - Camfer Distance (median)

CAD Reconstruction

DeepCAD - IoU

CAD Reconstruction

Fusion 360 Gallery - Chamfer Distance

CAD Reconstruction

Fusion 360 Gallery - Chamfer Distance (median)

CAD Reconstruction

Fusion 360 Gallery - IoU

CAD Reconstruction

CC3D - Chamfer Distance (median)

Multi-modal Recommendation

Amazon Baby - NDCG@20

Multi-modal Recommendation

Amazon Clothing - NDCG@20

Multi-modal Recommendation

Amazon Sports - NGCG@20

Learning with noisy labels

CIFAR-10N-Random3 - Accuracy (mean)

Learning with noisy labels

CIFAR-10N-Random2 - Accuracy (mean)

Video Retrieval

VATEX - text-to-video R@1

Video Retrieval

VATEX - text-to-video R@10

Video Retrieval

VATEX - video-to-text R@10

Video Object Tracking

NT-VOT211 - AUC

Visual Object Tracking

TrackingNet - Precision

Visual Object Tracking

TrackingNet - Normalized Precision

Visual Object Tracking

TrackingNet - Accuracy

Visual Object Tracking

TNL2K - AUC

Video Inpainting

HQVI (240p) - PSNR

Video Inpainting

HQVI (240p) - SSIM

Video Inpainting

HQVI (240p) - LPIPS

Video Inpainting

HQVI (240p) - VFID

Video Inpainting

HQVI (480p) - PSNR

Video Inpainting

HQVI (480p) - LPIPS

Video Inpainting

HQVI (480p) - VFID

Text-based Image Editing

PIE-Bench - CLIPSIM

Action Classification

WiGesture - Accuracy (% )

Zero-Shot Video Question Answer

MSRVTT-QA - Confidence Score

Zero-Shot Video Question Answer

MSVD-QA - Confidence Score

Zero-Shot Video Question Answer

EgoSchema (fullset) - Accuracy

3D Hand Pose Estimation

FreiHAND - PA-MPJPE

Image Super-Resolution

DIV2K val - 4x upscaling - LPIPS

Image Super-Resolution

DIV2K val - 4x upscaling - DISTS

Image Super-Resolution

Urban100 - 4x upscaling - LPIPS

Motion Synthesis

HumanML3D - FID

Omnnidirectional Stereo Depth Estimation

Helvipad - Depth-MAE

Image Super-Resolution

Manga109 - 3x upscaling - PSNR

Image Super-Resolution

Manga109 - 3x upscaling - SSIM

Image Super-Resolution

Set14 - 2x upscaling - SSIM

Image Super-Resolution

Set14 - 3x upscaling - PSNR

Image Super-Resolution

Set14 - 3x upscaling - SSIM

Image Super-Resolution

Set5 - 2x upscaling - SSIM

Image Super-Resolution

Manga109 - 4x upscaling - SSIM

Image Super-Resolution

Manga109 - 2x upscaling - PSNR

Image Super-Resolution

Manga109 - 2x upscaling - SSIM

Image Super-Resolution

Set5 - 4x upscaling - SSIM

Image Super-Resolution

Set5 - 3x upscaling - SSIM

Image Super-Resolution

Urban100 - 4x upscaling - PSNR

Image Super-Resolution

Urban100 - 3x upscaling - PSNR

Image Super-Resolution

BSD100 - 3x upscaling - PSNR

Image Super-Resolution

BSD100 - 3x upscaling - SSIM

JPEG Artifact Correction

LIVE1 (Quality 20 Color) - PSNR

JPEG Artifact Correction

Classic5 (Quality 20 Grayscale) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 30 Grayscale) - PSNR

JPEG Artifact Correction

Live1 (Quality 10 Grayscale) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 20 Grayscale) - PSNR

JPEG Artifact Correction

Classic5 (Quality 30 Grayscale) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 40 Grayscale) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 40 Grayscale) - SSIM

JPEG Artifact Correction

Classic5 (Quality 10 Grayscale) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 30 Color) - PSNR

JPEG Artifact Correction

Classic5 (Quality 40 Grayscale) - PSNR

JPEG Artifact Correction

Classic5 (Quality 40 Grayscale) - SSIM

JPEG Artifact Correction

LIVE1 (Quality 10 Color) - PSNR

5-Degradation Blind All-in-One Image Restoration

5-Degradation Blind All-in-One Image Restoration - Average PSNR

Grayscale Image Denoising

Urban100 sigma50 - PSNR

Grayscale Image Denoising

Urban100 sigma25 - PSNR

Grayscale Image Denoising

Set12 sigma15 - PSNR

Semi-Supervised Video Object Segmentation

VOT2020 - EAO

Visual Object Tracking

DiDi - Tracking quality

Visual Object Tracking

VOT2022 - EAO

Referring Expression Segmentation

RefCOCO+ testA - Overall IoU

Referring Expression Segmentation

RefCOCO testA - Overall IoU

Referring Expression Segmentation

RefCOCO testB - Overall IoU

Referring Expression Segmentation

RefCoCo val - Overall IoU

Weakly Supervised Action Localization

ActivityNet-1.2 - Mean mAP

Zero-Shot Composed Image Retrieval (ZS-CIR)

CIRR - R@5

Video Salient Object Detection

FBMS-59 - S-Measure

Video Salient Object Detection

FBMS-59 - MAX F-MEASURE

Video Salient Object Detection

DAVIS-2016 - S-Measure

Video Salient Object Detection

DAVIS-2016 - AVERAGE MAE

Video Salient Object Detection

DAVIS-2016 - MAX F-MEASURE

Video Salient Object Detection

ViSal - S-Measure

Video Salient Object Detection

ViSal - Average MAE

Video Salient Object Detection

DAVSOD-easy35 - S-Measure

Video Salient Object Detection

DAVSOD-easy35 - Average MAE

Video Salient Object Detection

DAVSOD-easy35 - max F-Measure

Shadow Removal

ISTD+ - RMSE

Action Recognition

Diving-48 - Accuracy

Zero-Shot Video Question Answer

Video-MME (w/o subs) - Accuracy (%)

Image-Based Localization

cvact - Recall@1

Image-Based Localization

cvact - Recall@5

Image-Based Localization

cvact - Recall@10

Image-Based Localization

VIGOR Cross Area - Recall@1

Image-Based Localization

VIGOR Cross Area - Recall@5

Image-Based Localization

VIGOR Cross Area - Recall@10

Image-Based Localization

VIGOR Cross Area - Recall@1%

Image-Based Localization

VIGOR Cross Area - Hit Rate

Image-Based Localization

VIGOR Same Area - Recall@1

Image-Based Localization

VIGOR Same Area - Recall@5

Image-Based Localization

VIGOR Same Area - Recall@10

Image-Based Localization

VIGOR Same Area - Hit Rate

Drone-view target localization

University-1652 - Recall@1

Visual Object Tracking

LaSOT-ext - AUC

Visual Object Tracking

LaSOT-ext - Normalized Precision

Visual Object Tracking

LaSOT-ext - Precision

Visual Object Tracking

NeedForSpeed - AUC

Unsupervised Semantic Segmentation with Language-image Pre-training

Cityscapes val - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

COCO-Stuff-171 - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

PASCAL Context-59 - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

PASCAL VOC - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

COCO-Object - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

PascalVOC-20 - mIoU

Text-To-SQL

spider - Execution Accuracy (Test)

Text-To-SQL

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) - Execution Accuracy % (Test)

Text-To-SQL

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) - Execution Accuracy % (Dev)

Multiple Object Tracking

SportsMOT - HOTA

Multiple Object Tracking

SportsMOT - IDF1

Multiple Object Tracking

SportsMOT - AssA

Category-Agnostic Pose Estimation

MP100 - Mean [email protected] - 1shot

Video-based Generative Performance Benchmarking (Consistency)

VideoInstruct - gpt-score

Cross-Modal Retrieval

ChEBI-20 - Test MRR

Cross-Modal Retrieval

ChEBI-20 - Hits@1

3D Instance Segmentation

ScanNet(v2) - mRec

Cross-Modal Retrieval

ChEBI-20 - Mean Rank

Generalizable Person Re-identification

Market-1501 - MSMT17-All->mAP

Generalizable Person Re-identification

Market-1501 - MSMT17-All->Rank-1

Generalizable Person Re-identification

Market-1501 - RandPerson->mAP

Generalizable Person Re-identification

Market-1501 - RandPerson->Rank-1

Generalizable Person Re-identification

Market-1501 - MSMT17->mAP

Generalizable Person Re-identification

Market-1501 - MSMT17->Rank-1

Generalizable Person Re-identification

CUHK03-NP (detected) - MSMT17-All->mAP

Generalizable Person Re-identification

CUHK03-NP (detected) - MSMT17-All->Rank-1

Generalizable Person Re-identification

CUHK03-NP (detected) - MSMT17->mAP

Generalizable Person Re-identification

CUHK03-NP (detected) - MSMT17->Rank-1

FS-MEVQA

SME - BLEU-4

FS-MEVQA

SME - METEOR

FS-MEVQA

SME - ROUGE-L

FS-MEVQA

SME - CIDEr

FS-MEVQA

SME - SPICE

FS-MEVQA

SME - Detection

FS-MEVQA

SME - ACC

Zero-Shot Composed Image Retrieval (ZS-CIR)

CIRCO - mAP@10

Motion Synthesis

FineDance - fid_k

Motion Synthesis

FineDance - BAS

Zero-Shot Video Question Answer

Video-MME - Accuracy (%)

Time Series Forecasting

ETTm1 (336) Multivariate - MSE

Time Series Forecasting

ETTm1 (720) Multivariate - MSE

Time Series Forecasting

ETTm1 (720) Multivariate - MAE

Time Series Forecasting

ETTm2 (336) Multivariate - MSE

Conditional Image Generation

ImageNet 256x256 - FID

Video Frame Interpolation

X4K1000FPS - PSNR

Rgb-T Tracking

RGBT210 - Success

Text-based Image Editing

PIE-Bench - Background LPIPS

Semi-Supervised Semantic Segmentation

ADE20K 1/32 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/256 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 92 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/64 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 183 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

ADE20K 1/16 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 1464 labels - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/128 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

Cityscapes 6.25% labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 732 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/32 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 366 labeled - Validation mIoU

Semi-supervised Change Detection

LEVIR-CD - 5% labeled data - IoU

Semi-supervised Change Detection

LEVIR-CD - 20% labeled data - IoU

Semi-supervised Change Detection

LEVIR-CD - 10% labeled data - IoU

Semi-supervised Change Detection

WHU - 40% labeled data - IoU

Semi-supervised Change Detection

WHU - 20% labeled data - IoU

Low-Light Image Enhancement

LIME - NIQE

Low-Light Image Enhancement

MEF - NIQE

Low-Light Image Enhancement

DICM - NIQE

Low-Light Image Enhancement

NPE - NIQE

Low-Light Image Enhancement

VV - NIQE

Motion Synthesis

InterHuman - FID

Motion Synthesis

Inter-X - FID

Motion Synthesis

Inter-X - R-Precision Top3

Motion Synthesis

Inter-X - MMDist

Math Word Problem Solving

SVAMP - Execution Accuracy