Temporal Action Localization

HACS - [email protected]

Video Panoptic Segmentation

VIPSeg - VPQ

Object Detection In Aerial Images

HRSC2016 - mAP-07

Object Detection In Aerial Images

HRSC2016 - mAP-12

Object Detection

GRAZPEDWRI-DX - mAP

Video Frame Interpolation

Xiph-2K - PSNR

Video Frame Interpolation

Xiph-4k - SSIM

Video Frame Interpolation

SNU-FILM (easy) - SSIM

Video Frame Interpolation

X4K1000FPS-2K - PSNR

Video Frame Interpolation

X4K1000FPS-2K - SSIM

Few-Shot Semantic Segmentation

COCO-20i (2-way 1-shot) - mIoU

Zero-Shot Video Question Answer

ActivityNet-QA - Accuracy

Zero-Shot Video Question Answer

EgoSchema (fullset) - Accuracy

Zero-Shot Video Question Answer

EgoSchema (subset) - Accuracy

Zero-Shot Video Question Answer

TGIF-QA - Accuracy

Zero-Shot Video Question Answer

TGIF-QA - Confidence Score

Zero-Shot Video Question Answer

MSRVTT-QA - Confidence Score

Zero-Shot Video Question Answer

NExT-QA - Accuracy

Single Image Desnowing

CSD - Average PSNR (dB)

Self-supervised Scene Flow Estimation

Argoverse 2 - EPE 3-Way

Self-supervised Scene Flow Estimation

Argoverse 2 - EPE Foreground Static

Self-supervised Scene Flow Estimation

Argoverse 2 - EPE Background Static

Image Dehazing

SOTS Indoor - PSNR

Image Dehazing

O-Haze - PSNR

Image Dehazing

I-Haze - PSNR

Saliency Prediction

SALECI - KL

Saliency Prediction

SALICON - AUC

Saliency Prediction

SALICON - KLD

Skeleton Based Action Recognition

First-Person Hand Action Benchmark - 1:1 Accuracy

Hand Gesture Recognition

SHREC 2017 - 14 Gestures Accuracy

Hand Gesture Recognition

SHREC 2017 - 28 Gestures Accuracy

Hand Gesture Recognition

DHG-14 - Accuracy

Hand Gesture Recognition

DHG-28 - Accuracy

Few-Shot Learning

DTD - 8-shot Accuracy

Few-Shot Learning

DTD - 4-shot Accuracy

Few-Shot Learning

DTD - 16-shot Accuracy

Mitigating Contextual Bias

FGVC Aircraft - Top-1 Accuracy (%)

Mitigating Contextual Bias

FGVC Aircraft - OOD Accuracy (%)

Video Super-Resolution

REDS4- 4x upscaling - PSNR

Video Super-Resolution

REDS4- 4x upscaling - SSIM

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - Sensitivity

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - S-Measure

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - mean E-measure

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - weighted F-measure

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - mean F-measure

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - Dice

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - Sensitivity

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - S measure

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - mean E-measure

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - weighted F-measure

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - mean F-measure

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - Dice

Lipreading

Lip Reading in the Wild - Top-1 Accuracy

Multiple Object Tracking

SportsMOT - HOTA

Multiple Object Tracking

SportsMOT - IDF1

Multiple Object Tracking

SportsMOT - AssA

Visual Question Answering

MMBench - GPT-3.5 score

Zero-Shot Video Question Answer

IntentQA - Accuracy

Video-based Generative Performance Benchmarking (Consistency)

VideoInstruct - gpt-score

Zero-Shot Composed Image Retrieval (ZS-CIR)

Fashion IQ - (Recall@10+Recall@50)/2

3D Lane Detection

Apollo Synthetic 3D Lane - F1

Zero-Shot Video Question Answer

MSVD-QA - Accuracy

Zero-Shot Video Question Answer

MSRVTT-QA - Accuracy

Audio Classification

ICBHI Respiratory Sound Database - ICBHI Score

Object Detection

AI-TOD - AP

Object Detection

AI-TOD - AP50

Object Detection

AI-TOD - AP75

Object Detection

AI-TOD - APvt

Object Detection

AI-TOD - APt

Object Detection

AI-TOD - APs

Unsupervised Domain Adaptation

Market to Duke - mAP

Unsupervised Domain Adaptation

Market to Duke - rank-5

Unsupervised Domain Adaptation

Market to Duke - rank-10

Unsupervised Domain Adaptation

Duke to MSMT - mAP

Unsupervised Domain Adaptation

Duke to MSMT - rank-1

Unsupervised Domain Adaptation

Duke to MSMT - rank-10

Unsupervised Domain Adaptation

Duke to MSMT - rank-5

Unsupervised Domain Adaptation

Market to MSMT - mAP

Unsupervised Domain Adaptation

Market to MSMT - rank-1

Unsupervised Domain Adaptation

Market to MSMT - rank-10

Unsupervised Domain Adaptation

Market to MSMT - rank-5

Unsupervised Domain Adaptation

Duke to Market - mAP

Unsupervised Domain Adaptation

Duke to Market - rank-1

Unsupervised Domain Adaptation

Duke to Market - rank-5

Unsupervised Domain Adaptation

Duke to Market - rank-10

Few Shot Action Recognition

Kinetics-100 - Accuracy

Unsupervised Anomaly Detection

SMAP - F1

Single-View 3D Reconstruction

GSO - Chamfer Distance

Action Anticipation

EPIC-KITCHENS-100 - Recall@5

Source-Free Domain Adaptation

VisDA-2017 - Accuracy

Visual Question Answering

MM-Vet - GPT-4 score

Cross-modal retrieval with noisy correspondence

CC152K - Image-to-text R@10

Cross-modal retrieval with noisy correspondence

CC152K - R-Sum

Generalized Referring Expression Segmentation

gRefCOCO - gIoU

Generalized Referring Expression Segmentation

gRefCOCO - cIoU

3D Semantic Scene Completion from a single RGB image

KITTI-360 - mIoU

3D Semantic Scene Completion from a single RGB image

SemanticKITTI - mIoU

Multiple Object Tracking

KITTI Tracking test - MOTA

Multiple Object Tracking

KITTI Tracking test - HOTA

Few-Shot Object Detection

ODinW-13 - Average Score

Few-Shot Object Detection

ODinW-35 - Average Score

Zero-Shot Object Detection

ODinW - Average Score

Zero-Shot Object Detection

LVIS v1.0 minival - AP

Object Detection

ODinW Full-Shot 13 Tasks - AP

Crowd Counting

ShanghaiTech A - MAE

Crowd Counting

ShanghaiTech A - RMSE

Crowd Counting

ShanghaiTech B - MAE

Time Series Forecasting

ETTm2 (720) Multivariate - MSE

Time Series Forecasting

ETTm2 (192) Multivariate - MSE

Time Series Forecasting

ETTm2 (96) Multivariate - MSE

Time Series Forecasting

ETTm1 (192) Multivariate - MSE

Time Series Forecasting

ETTm1 (720) Multivariate - MSE

Time Series Forecasting

ETTm1 (336) Multivariate - MSE

Time Series Forecasting

ETTm1 (96) Multivariate - MSE

Time Series Forecasting

ETTm2 (336) Multivariate - MSE

visual instruction following

LLaVA-Bench - avg score

Semi-supervised Change Detection

LEVIR-CD - 5% labeled data - IoU

Semi-supervised Change Detection

WHU - 10% labeled data - IoU

Semi-supervised Change Detection

WHU - 5% labeled data - IoU

Crowd Counting

ShanghaiTech A - MSE

Zero-Shot Composed Image Retrieval (ZS-CIR)

MS COCO - Actions Recall@5

Zero-Shot Composed Image Retrieval (ZS-CIR)

ImageNet-R - (Recall@10+Recall@50)/2

Rgb-T Tracking

GTOT - Success

Rgb-T Tracking

RGBT210 - Precision

Rgb-T Tracking

RGBT210 - Success

Generative 3D Object Classification

Objaverse - Objaverse (I)

Generative 3D Object Classification

Objaverse - Objaverse (Average)

Generative 3D Object Classification

Objaverse - Objaverse (C)

Rgb-T Tracking

GTOT - Precision

Audio Classification

SSC - Accuracy

Audio Classification

SHD - Percentage correct

Classify murmurs

CirCor DigiScope - Weighted Accuracy

Zero-Shot Video Question Answer

MSVD-QA - Confidence Score

Zero-Shot Video Question Answer

ActivityNet-QA - Confidence Score

Few-Shot 3D Point Cloud Classification

ModelNet40 10-way (10-shot) - Overall Accuracy

Skeleton Based Action Recognition

UAV-Human - CSv1(%)

Skeleton Based Action Recognition

UAV-Human - CSv2(%)

Semi-supervised Change Detection

WHU - 40% labeled data - IoU

Semi-supervised Change Detection

WHU - 20% labeled data - IoU

Semi-supervised Change Detection

LEVIR-CD - 20% labeled data - IoU

Semi-supervised Change Detection

LEVIR-CD - 10% labeled data - IoU

Semi-supervised Change Detection

LEVIR-CD - 40% labeled data - IoU

Shadow Removal

Adjusted ISTD - RMSE

Dichotomous Image Segmentation

DIS-TE1 - max F-Measure

Dichotomous Image Segmentation

DIS-TE1 - weighted F-measure

Dichotomous Image Segmentation

DIS-TE1 - MAE

Dichotomous Image Segmentation

DIS-TE1 - E-measure

Dichotomous Image Segmentation

DIS-TE2 - max F-Measure

Dichotomous Image Segmentation

DIS-TE2 - weighted F-measure

Dichotomous Image Segmentation

DIS-TE2 - MAE

Dichotomous Image Segmentation

DIS-TE2 - S-Measure

Dichotomous Image Segmentation

DIS-TE2 - E-measure

Dichotomous Image Segmentation

DIS-TE3 - max F-Measure

Dichotomous Image Segmentation

DIS-TE3 - S-Measure

Dichotomous Image Segmentation

DIS-TE3 - E-measure

Dichotomous Image Segmentation

DIS-TE4 - max F-Measure

Dichotomous Image Segmentation

DIS-TE4 - E-measure

Dichotomous Image Segmentation

DIS-VD - max F-Measure

Dichotomous Image Segmentation

DIS-VD - weighted F-measure

Dichotomous Image Segmentation

DIS-VD - MAE

Dichotomous Image Segmentation

DIS-VD - S-Measure

Dichotomous Image Segmentation

DIS-VD - E-measure

Scene Graph Generation

4D-OR - F1

Sleep Stage Detection

Sleep-EDFx (single-channel) - Macro-F1

Sleep Stage Detection

SHHS (single-channel) - Macro-F1

Sleep Stage Detection

Sleep-EDFx - Macro-F1

Emotion Recognition in Context

EMOTIC - mAP

Emotion Recognition in Context

BoLD - Average mAP

Emotion Recognition in Context

BoLD - AUC

Image Denoising

DND - PSNR (sRGB)

Image Denoising

DND - SSIM (sRGB)

3D Object Reconstruction

BEHAVE - Chamfer Distance

Math Word Problem Solving

MATH - Accuracy

Referring Video Object Segmentation

MeViS - J&F

Referring Video Object Segmentation

MeViS - J

Referring Video Object Segmentation

MeViS - F

Thermal Image Segmentation

MFN Dataset - mIOU

Thermal Image Segmentation

PST900 - mIoU

Thermal Image Segmentation

KP day-night - mIoU

Image Super-Resolution

BSD100 - 4x upscaling - PSNR

Image Super-Resolution

Urban100 - 4x upscaling - PSNR

Image Super-Resolution

Set5 - 4x upscaling - PSNR

Image Super-Resolution

Set5 - 4x upscaling - SSIM

Image Super-Resolution

Manga109 - 4x upscaling - PSNR

Image Super-Resolution

Manga109 - 4x upscaling - SSIM

Image Super-Resolution

Set14 - 4x upscaling - PSNR

Temporal Action Localization

MultiTHUMOS - mAP [email protected]

Temporal Action Localization

MultiTHUMOS - mAP [email protected]

Temporal Action Localization

MultiTHUMOS - mAP [email protected]

Unsupervised Semantic Segmentation with Language-image Pre-training

Cityscapes val - mIoU

Weakly-Supervised Semantic Segmentation

COCO 2014 val - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

COCO-Object - mIoU

Multi-Label Text Classification

CC3M-TagMask - Precision

Multi-Label Text Classification

CC3M-TagMask - Accuracy

Visual Object Tracking

LaSOT-ext - Normalized Precision

3D Multi-Person Mesh Recovery

AGORA - FB-NMVE

Egocentric Pose Estimation

UnrealEgo - Average MPJPE (mm)

Egocentric Pose Estimation

UnrealEgo - PA-MPJPE

Drug–drug Interaction Extraction

DrugBank - Accuracy

Drug–drug Interaction Extraction

DrugBank - AUROC

Drug–drug Interaction Extraction

DrugBank - F1 score

Multi-Object Tracking

DanceTrack - HOTA

Multi-Object Tracking

DanceTrack - AssA

Multi-Object Tracking

DanceTrack - IDF1

Aspect-Based Sentiment Analysis (ABSA)

MAMS - Acc

Point Cloud Registration

ETH (trained on 3DMatch) - Recall (30cm, 5 degrees)

Point Cloud Registration

FP-R-H - RRE (degrees)

Point Cloud Registration

FP-R-H - RTE (cm)

Point Cloud Registration

FP-T-E - RRE (degrees)

Point Cloud Registration

FP-T-E - RTE (cm)

Point Cloud Registration

FP-R-M - RRE (degrees)

Point Cloud Registration

FP-R-M - RTE (cm)

Point Cloud Registration

FP-R-E - RRE (degrees)

Point Cloud Registration

FP-R-E - RTE (cm)

Point Cloud Registration

FP-T-M - Recall (3cm, 10 degrees)

Point Cloud Registration

FP-T-M - RRE (degrees)

Point Cloud Registration

FP-T-M - RTE (cm)

Point Cloud Registration

FP-O-H - RRE (degrees)

Point Cloud Registration

FP-O-H - RTE (cm)

Point Cloud Registration

FP-T-H - RRE (degrees)

Point Cloud Registration

FP-T-H - RTE (cm)

Point Cloud Registration

FP-O-E - RRE (degrees)

Point Cloud Registration

FP-O-E - RTE (cm)

Point Cloud Registration

FP-O-M - Recall (3cm, 10 degrees)

Point Cloud Registration

FP-O-M - RRE (degrees)

Point Cloud Registration

FP-O-M - RTE (cm)

Monocular Depth Estimation

KITTI Eigen split - absolute relative error