Performance Graph

Video Anomaly Detection

HR-ShanghaiTech - AUC

Video Anomaly Detection

ShanghaiTech Campus - AUC

Speech Emotion Recognition

MSP-Podcast (Valence) - CCC

Speech Emotion Recognition

MSP-Podcast (Dominance) - CCC

Speech Emotion Recognition

MSP-Podcast (Activation) - CCC

Dynamic Link Prediction

DBLP Temporal - AUC

Dynamic Link Prediction

DBLP Temporal - AP

Point Cloud Registration

RotKITTI Registration Benchmark - RR@(1.5,0.3)

Point Cloud Registration

RotKITTI Registration Benchmark - RR@(1,0.1)

Image Segmentation

MAS3K - S-measure

Image Segmentation

MAS3K - mIoU

Image Segmentation

MAS3K - E-measure

Image Segmentation

MAS3K - MAE

Image Segmentation

RMAS - S-measure

Image Segmentation

MSD (Mirror Segmentation Dataset) - MAE

Image Segmentation

MSD (Mirror Segmentation Dataset) - IoU

Image Segmentation

MSD (Mirror Segmentation Dataset) - F-measure

Image Segmentation

PMD - MAE

Image Segmentation

PMD - IoU

Image Segmentation

PMD - F-measure

Speech Synthesis

LibriTTS - Periodicity

3D Object Detection

ScanNetV2 - [email protected]

Unsupervised Semantic Segmentation with Language-image Pre-training

Cityscapes val - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

COCO-Stuff-171 - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

PASCAL Context-59 - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

COCO-Object - mIoU

Zero-Shot Video Question Answer

Video-MME (w/o subs) - Accuracy (%)

Thermal Image Segmentation

MFN Dataset - mIOU

Video Object Detection

ImageNet VID - MAP

Molecular Property Prediction

BBBP - ROC-AUC

Molecular Property Prediction

FreeSolv - RMSE

3D Object Detection

nuScenes - mAAE

Monocular Depth Estimation

NYU-Depth V2 - absolute relative error

Motion Synthesis

AIOZ-GDANCE - FID

Motion Synthesis

AIOZ-GDANCE - MMC

Motion Synthesis

AIOZ-GDANCE - GMC

3D Object Detection

nuScenes LiDAR only - NDS

3D Object Detection

nuScenes LiDAR only - mAP

3D Object Detection

nuScenes LiDAR only - NDS (val)

3D Object Detection

nuScenes LiDAR only - mAP (val)

Point Tracking

TAP-Vid-DAVIS - Average Jaccard

Point Tracking

TAP-Vid-DAVIS - Occlusion Accuracy

Temporal Action Localization

HACS - Average-mAP

Temporal Action Localization

HACS - [email protected]

Temporal Action Localization

HACS - [email protected]

Temporal Action Localization

HACS - [email protected]

Generalized Zero Shot skeletal action recognition

NTU RGB+D 120 - Harmonic Mean (10 unseen classes)

Low-Light Image Enhancement

LOL-v2 - Average PSNR

Low-Light Image Enhancement

LOL-v2 - SSIM

Low-Light Image Enhancement

LOL-v2 - LPIPS

Low-Light Image Enhancement

LOL-v2-synthetic - PSNR

Low-Light Image Enhancement

LOL-v2-synthetic - SSIM

Object Detection

PKU-DDD17-Car - mAP50

3D Semantic Scene Completion from a single RGB image

NYUv2 - mIoU

Overlapped 100-10

ADE20K - Mean IoU (test)

Video Quality Assessment

LIVE-VQC - PLCC

Unsupervised Video Object Segmentation

FBMS test - J

Open Vocabulary Object Detection

LVIS v1.0 - AP novel-LVIS base training

Facial Action Unit Detection

DISFA - Average F1

Facial Expression Recognition (FER)

RAF-DB - Overall Accuracy

Object Detection

CrowdHuman (full body) - mMR

Object Detection

InOutDoor - AP

Object Detection

EventPed - AP

Object Detection

STCrowd - AP

Domain Generalization

GTA-to-Avg(Cityscapes,BDD,Mapillary) - mIoU

3D Hand Pose Estimation

FreiHAND - PA-MPVPE

3D Hand Pose Estimation

FreiHAND - PA-F@5mm

3D Hand Pose Estimation

FreiHAND - PA-F@15mm

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (New Days) All

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (VISOR) All

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (NewDays) Visible

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (VISOR) Visible

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (NewDays) Occ

3D Hand Pose Estimation

HO-3D v3 - PA-MPJPE

3D Hand Pose Estimation

HO-3D v3 - PA-MPVPE

3D Hand Pose Estimation

HO-3D v3 - F@5mm

3D Hand Pose Estimation

HO-3D v3 - F@15mm

3D Hand Pose Estimation

HO-3D v3 - AUC_J

3D Hand Pose Estimation

HO-3D v3 - AUC_V

Cross-modal retrieval with noisy correspondence

CC152K - Image-to-text R@1

Cross-modal retrieval with noisy correspondence

CC152K - Text-to-image R@5

Cross-modal retrieval with noisy correspondence

COCO-Noisy - Image-to-text R@10

Robot Manipulation Generalization

The COLOSSEUM - Average decrease average across all perturbations

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - Dice

Video Polyp Segmentation

SUN-SEG-Easy - Dice

Video Polyp Segmentation

SUN-SEG-Hard - Dice

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - Dice

Video Panoptic Segmentation

VIPSeg - VPQ

Object Detection In Aerial Images

HRSC2016 - mAP-07

Object Detection In Aerial Images

HRSC2016 - mAP-12

Video Frame Interpolation

Xiph-2K - PSNR

Video Frame Interpolation

Xiph-4k - SSIM

Video Frame Interpolation

SNU-FILM (easy) - SSIM

Video Frame Interpolation

X4K1000FPS-2K - PSNR

Video Frame Interpolation

X4K1000FPS-2K - SSIM

Few-Shot Semantic Segmentation

COCO-20i (2-way 1-shot) - mIoU

Self-supervised Scene Flow Estimation

Argoverse 2 - EPE 3-Way

Self-supervised Scene Flow Estimation

Argoverse 2 - EPE Foreground Static

Self-supervised Scene Flow Estimation

Argoverse 2 - EPE Background Static

Zero-Shot Video Question Answer

NExT-QA - Accuracy

Zero-Shot Video Question Answer

ActivityNet-QA - Accuracy

Zero-Shot Video Question Answer

TGIF-QA - Accuracy

Zero-Shot Video Question Answer

TGIF-QA - Confidence Score

Zero-Shot Video Question Answer

MSRVTT-QA - Confidence Score

Zero-Shot Video Question Answer

EgoSchema (subset) - Accuracy

Single Image Desnowing

CSD - Average PSNR (dB)

Referring Expression Segmentation

RefCOCO+ val - Overall IoU

Referring Expression Segmentation

RefCOCO+ testA - Overall IoU

Referring Expression Segmentation

RefCOCO testB - Overall IoU

Referring Expression Segmentation

RefCOCO testA - Overall IoU

Referring Expression Segmentation

RefCOCO+ test B - Overall IoU

Image Dehazing

SOTS Indoor - PSNR

Image Dehazing

O-Haze - PSNR

Image Dehazing

I-Haze - PSNR

Saliency Prediction

SALICON - AUC

Saliency Prediction

SALICON - KLD

Saliency Prediction

SALECI - KL

Skeleton Based Action Recognition

First-Person Hand Action Benchmark - 1:1 Accuracy

Hand Gesture Recognition

SHREC 2017 - 14 Gestures Accuracy

Hand Gesture Recognition

SHREC 2017 - 28 Gestures Accuracy

Hand Gesture Recognition

DHG-14 - Accuracy

Hand Gesture Recognition

DHG-28 - Accuracy

Few-Shot Learning

DTD - 8-shot Accuracy

Few-Shot Learning

DTD - 4-shot Accuracy

Few-Shot Learning

DTD - 16-shot Accuracy

Mitigating Contextual Bias

FGVC Aircraft - Top-1 Accuracy (%)

Mitigating Contextual Bias

FGVC Aircraft - OOD Accuracy (%)

Video Super-Resolution

REDS4- 4x upscaling - PSNR

Video Super-Resolution

REDS4- 4x upscaling - SSIM

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - Sensitivity

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - S-Measure

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - mean E-measure

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - weighted F-measure

Video Polyp Segmentation

SUN-SEG-Hard (Unseen) - mean F-measure

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - Sensitivity

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - S measure

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - mean E-measure

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - weighted F-measure

Video Polyp Segmentation

SUN-SEG-Easy (Unseen) - mean F-measure

Lipreading

Lip Reading in the Wild - Top-1 Accuracy

Multiple Object Tracking

SportsMOT - HOTA

Multiple Object Tracking

SportsMOT - IDF1

Multiple Object Tracking

SportsMOT - AssA

Visual Question Answering

MMBench - GPT-3.5 score

Zero-Shot Video Question Answer

IntentQA - Accuracy

Video-based Generative Performance Benchmarking (Consistency)

VideoInstruct - gpt-score

Zero-Shot Composed Image Retrieval (ZS-CIR)

Fashion IQ - (Recall@10+Recall@50)/2

3D Lane Detection

Apollo Synthetic 3D Lane - F1

Zero-Shot Video Question Answer

MSRVTT-QA - Accuracy

Zero-Shot Video Question Answer

MSVD-QA - Accuracy

Long-range modeling

LRA - Text

Long-range modeling

LRA - Retrieval

Long-range modeling

LRA - Image

Zero-Shot Video Question Answer

Video-MME - Accuracy (%)

Zero-Shot Video Question Answer

EgoSchema (fullset) - Accuracy

Audio Classification

ICBHI Respiratory Sound Database - ICBHI Score

Object Detection

AI-TOD - AP

Object Detection

AI-TOD - AP50

Object Detection

AI-TOD - AP75

Object Detection

AI-TOD - APvt

Object Detection

AI-TOD - APt

Object Detection

AI-TOD - APs

Generalized Zero-Shot Learning

SUN Attribute - Harmonic mean

Generalized Zero-Shot Learning

AwA2 - Harmonic mean

Unsupervised Domain Adaptation

Market to Duke - mAP

Unsupervised Domain Adaptation

Market to Duke - rank-5

Unsupervised Domain Adaptation

Market to Duke - rank-10

Unsupervised Domain Adaptation

Duke to MSMT - mAP

Unsupervised Domain Adaptation

Duke to MSMT - rank-1

Unsupervised Domain Adaptation

Duke to MSMT - rank-10

Unsupervised Domain Adaptation

Duke to MSMT - rank-5

Unsupervised Domain Adaptation

Market to MSMT - mAP

Unsupervised Domain Adaptation

Market to MSMT - rank-1

Unsupervised Domain Adaptation

Market to MSMT - rank-10

Unsupervised Domain Adaptation

Market to MSMT - rank-5

Unsupervised Domain Adaptation

Duke to Market - mAP

Unsupervised Domain Adaptation

Duke to Market - rank-1

Unsupervised Domain Adaptation

Duke to Market - rank-5

Unsupervised Domain Adaptation

Duke to Market - rank-10

Few Shot Action Recognition

Kinetics-100 - Accuracy

Single-View 3D Reconstruction

GSO - Chamfer Distance

Action Anticipation

EPIC-KITCHENS-100 - Recall@5

Saliency Prediction

SALICON - CC

Saliency Prediction

SALICON - SIM

Source-Free Domain Adaptation

VisDA-2017 - Accuracy

Visual Question Answering

MM-Vet - GPT-4 score

Cross-modal retrieval with noisy correspondence

CC152K - Image-to-text R@10

Cross-modal retrieval with noisy correspondence

CC152K - R-Sum

Generalized Referring Expression Segmentation

gRefCOCO - gIoU

Generalized Referring Expression Segmentation

gRefCOCO - cIoU

Cross-Domain Few-Shot Object Detection

UODD - mAP

3D Semantic Scene Completion from a single RGB image

KITTI-360 - mIoU

3D Semantic Scene Completion from a single RGB image

SemanticKITTI - mIoU

Supervised Video Summarization

SumMe - Kendall's Tau

Supervised Video Summarization

SumMe - Spearman's Rho

Multiple Object Tracking

KITTI Tracking test - MOTA

Multiple Object Tracking

KITTI Tracking test - HOTA

Crowd Counting

JHU-CROWD++ - MAE

Crowd Counting

UCF CC 50 - MAE

Crowd Counting

ShanghaiTech A - MAE

Crowd Counting

ShanghaiTech A - MSE

Few-Shot Object Detection

ODinW-13 - Average Score

Few-Shot Object Detection

ODinW-35 - Average Score

Zero-Shot Object Detection

ODinW - Average Score

Zero-Shot Object Detection

LVIS v1.0 minival - AP

Object Detection

ODinW Full-Shot 13 Tasks - AP

Crowd Counting

ShanghaiTech A - RMSE

Crowd Counting

ShanghaiTech B - MAE

Time Series Forecasting

ETTm2 (192) Multivariate - MSE

Time Series Forecasting

ETTm2 (96) Multivariate - MSE

Time Series Forecasting

ETTm1 (192) Multivariate - MSE

Time Series Forecasting

ETTm1 (720) Multivariate - MSE

Time Series Forecasting

ETTm1 (336) Multivariate - MSE

Time Series Forecasting

ETTm1 (96) Multivariate - MSE

Time Series Forecasting

ETTm2 (336) Multivariate - MSE

Time Series Forecasting

ETTm2 (720) Multivariate - MSE

visual instruction following

LLaVA-Bench - avg score

Semi-supervised Change Detection

LEVIR-CD - 5% labeled data - IoU

Semi-supervised Change Detection

WHU - 10% labeled data - IoU

Semi-supervised Change Detection

WHU - 5% labeled data - IoU

Rgb-T Tracking

GTOT - Success

Rgb-T Tracking

RGBT210 - Precision

Rgb-T Tracking

RGBT210 - Success

Heterogeneous Node Classification

ACM (Heterogeneous Node Classification) - Micro-F1

Heterogeneous Node Classification

Freebase (Heterogeneous Node Classification) - Micro-F1

Generative 3D Object Classification

Objaverse - Objaverse (I)

Generative 3D Object Classification

Objaverse - Objaverse (Average)

Generative 3D Object Classification

Objaverse - Objaverse (C)

Rgb-T Tracking

GTOT - Precision

Audio Classification

SSC - Accuracy

Audio Classification

SHD - Percentage correct

Classify murmurs

CirCor DigiScope - Weighted Accuracy

Zero-Shot Video Question Answer

ActivityNet-QA - Confidence Score

Zero-Shot Video Question Answer

MSVD-QA - Confidence Score

Few-Shot 3D Point Cloud Classification

ModelNet40 10-way (10-shot) - Overall Accuracy

Skeleton Based Action Recognition

UAV-Human - CSv1(%)

Skeleton Based Action Recognition

UAV-Human - CSv2(%)

Image Segmentation

RMAS - mIoU

Image Segmentation

RMAS - E-measure

Image Segmentation

RMAS - MAE

Math Word Problem Solving

SVAMP - Accuracy

3D Human Pose Estimation

RICH - MPJPE

3D Human Pose Estimation

RICH - PA-MPJPE

Semi-supervised Change Detection

WHU - 20% labeled data - IoU

Semi-supervised Change Detection

LEVIR-CD - 20% labeled data - IoU

Semi-supervised Change Detection

LEVIR-CD - 10% labeled data - IoU

Semi-supervised Change Detection

WHU - 40% labeled data - IoU

Semi-supervised Change Detection

LEVIR-CD - 40% labeled data - IoU

Graph Classification

MNIST - Accuracy

Graph Classification

Peptides-func - AP

3D Human Pose Estimation

RICH - MPVPE

Motion Synthesis

HumanML3D - Multimodality

Motion Synthesis

InterHuman - FID

Motion Synthesis

InterHuman - R-Precision Top3

Dichotomous Image Segmentation

DIS-TE1 - max F-Measure

Dichotomous Image Segmentation

DIS-TE1 - weighted F-measure

Dichotomous Image Segmentation

DIS-TE1 - MAE

Dichotomous Image Segmentation

DIS-TE1 - E-measure

Dichotomous Image Segmentation

DIS-TE2 - max F-Measure

Dichotomous Image Segmentation

DIS-TE2 - weighted F-measure

Dichotomous Image Segmentation

DIS-TE2 - MAE

Dichotomous Image Segmentation

DIS-TE2 - S-Measure

Dichotomous Image Segmentation

DIS-TE2 - E-measure

Dichotomous Image Segmentation

DIS-TE3 - max F-Measure

Dichotomous Image Segmentation

DIS-TE3 - S-Measure

Dichotomous Image Segmentation

DIS-TE3 - E-measure

Dichotomous Image Segmentation

DIS-TE4 - max F-Measure

Dichotomous Image Segmentation

DIS-TE4 - E-measure

Dichotomous Image Segmentation

DIS-VD - max F-Measure

Dichotomous Image Segmentation

DIS-VD - weighted F-measure

Dichotomous Image Segmentation

DIS-VD - MAE

Dichotomous Image Segmentation

DIS-VD - S-Measure

Dichotomous Image Segmentation

DIS-VD - E-measure

3D Reconstruction

DTU - Comp

Scene Graph Generation

4D-OR - F1

Sleep Stage Detection

SHHS (single-channel) - Macro-F1

Sleep Stage Detection

Sleep-EDFx (single-channel) - Macro-F1

Sleep Stage Detection

Sleep-EDFx - Macro-F1

Emotion Recognition in Context

EMOTIC - mAP

Emotion Recognition in Context

BoLD - Average mAP

Emotion Recognition in Context

BoLD - AUC

Image Denoising

DND - PSNR (sRGB)

Image Denoising

DND - SSIM (sRGB)

3D Object Reconstruction

BEHAVE - Chamfer Distance

Math Word Problem Solving

MATH - Accuracy

Referring Video Object Segmentation

MeViS - J&F

Referring Video Object Segmentation

MeViS - J

Referring Video Object Segmentation

MeViS - F

Thermal Image Segmentation

PST900 - mIoU

Thermal Image Segmentation

KP day-night - mIoU

UNET Segmentation

Munich Sentinel2 Crop Segmentation - Overall Accuracy

Image Super-Resolution

Manga109 - 3x upscaling - PSNR

Image Super-Resolution

Manga109 - 3x upscaling - SSIM

Image Super-Resolution

Urban100 - 3x upscaling - PSNR

Image Super-Resolution

Manga109 - 2x upscaling - PSNR

Image Super-Resolution

Manga109 - 2x upscaling - SSIM

Image Super-Resolution

Manga109 - 4x upscaling - PSNR

Image Super-Resolution

Manga109 - 4x upscaling - SSIM

Image Super-Resolution

Set5 - 3x upscaling - PSNR

Image Super-Resolution

Set5 - 3x upscaling - SSIM

Image Super-Resolution

BSD100 - 3x upscaling - PSNR

Image Super-Resolution

BSD100 - 3x upscaling - SSIM

Image Super-Resolution

Set14 - 3x upscaling - PSNR

Image Super-Resolution

Set14 - 3x upscaling - SSIM

Image Super-Resolution

Set5 - 2x upscaling - PSNR

Image Super-Resolution

Set14 - 4x upscaling - PSNR

Temporal Action Localization

MultiTHUMOS - mAP [email protected]

Temporal Action Localization

MultiTHUMOS - mAP [email protected]

Temporal Action Localization

MultiTHUMOS - mAP [email protected]

Weakly-Supervised Semantic Segmentation

COCO 2014 val - mIoU

Multi-Label Text Classification

CC3M-TagMask - Precision

Multi-Label Text Classification

CC3M-TagMask - Accuracy

Blind Docking

PDBBind - Top-1 RMSD (%<2)

Zero-Shot Composed Image Retrieval (ZS-CIR)

ImageNet-R - (Recall@10+Recall@50)/2

Zero-Shot Composed Image Retrieval (ZS-CIR)

CIRCO - mAP@10

Data-free Knowledge Distillation

SQuAD - Exact Match

Motion Synthesis

HumanML3D - R Precision Top3

Motion Synthesis

KIT Motion-Language - R Precision Top3

3D Multi-Person Mesh Recovery

AGORA - FB-NMVE

Egocentric Pose Estimation

UnrealEgo - Average MPJPE (mm)

Egocentric Pose Estimation

UnrealEgo - PA-MPJPE

Drug–drug Interaction Extraction

DrugBank - Accuracy

Drug–drug Interaction Extraction

DrugBank - AUROC

Drug–drug Interaction Extraction

DrugBank - F1 score

Multi-Object Tracking

DanceTrack - HOTA

Multi-Object Tracking

DanceTrack - AssA

Multi-Object Tracking

DanceTrack - IDF1

3D Lane Detection

OpenLane - F1 (all)

3D Lane Detection

OpenLane - Up & Down

3D Lane Detection

OpenLane - Extreme Weather

3D Lane Detection

OpenLane - Night

3D Lane Detection

OpenLane - Intersection

Aspect-Based Sentiment Analysis (ABSA)

MAMS - Acc

Point Cloud Registration

ETH (trained on 3DMatch) - Recall (30cm, 5 degrees)

Point Cloud Registration

FP-R-H - RRE (degrees)

Point Cloud Registration

FP-R-H - RTE (cm)

Point Cloud Registration

FP-T-E - RRE (degrees)

Point Cloud Registration

FP-T-E - RTE (cm)

Point Cloud Registration

FP-R-M - RRE (degrees)

Point Cloud Registration

FP-R-M - RTE (cm)

Point Cloud Registration

FP-R-E - RRE (degrees)

Point Cloud Registration

FP-R-E - RTE (cm)

Point Cloud Registration

FP-T-M - Recall (3cm, 10 degrees)

Point Cloud Registration

FP-T-M - RRE (degrees)

Point Cloud Registration

FP-T-M - RTE (cm)

Point Cloud Registration

FP-O-H - RRE (degrees)

Point Cloud Registration

FP-O-H - RTE (cm)

Point Cloud Registration

FP-T-H - RRE (degrees)

Point Cloud Registration

FP-T-H - RTE (cm)

Point Cloud Registration

FP-O-E - RRE (degrees)

Point Cloud Registration

FP-O-E - RTE (cm)

Point Cloud Registration

FP-O-M - Recall (3cm, 10 degrees)

Point Cloud Registration

FP-O-M - RRE (degrees)

Point Cloud Registration

FP-O-M - RTE (cm)

Monocular Depth Estimation

KITTI Eigen split - absolute relative error

Monocular Depth Estimation

IBims-1 - δ1.25

Monocular Depth Estimation

NYU-Depth V2 - RMSE

Monocular Depth Estimation

NYU-Depth V2 - Delta < 1.25

Monocular Depth Estimation

NYU-Depth V2 - log 10

Video Retrieval

MSVD - text-to-video R@1

Video Retrieval

MSVD - video-to-text R@1

Video Retrieval

LSMDC - text-to-video R@1

Video Retrieval

LSMDC - video-to-text R@1

Action Classification

Kinetics-700 - Top-1 Accuracy

Video Retrieval

SSv2-label retrieval - text-to-video R@5

Building change detection for remote sensing images

LEVIR-CD - F1

Building change detection for remote sensing images

LEVIR-CD - Params(M)

Object Detection In Aerial Images

DIOR-R - mAP

Age And Gender Classification

Adience Age - Accuracy (5-fold)

Chart Question Answering

ChartQA - 1:1 Accuracy

Multiview Detection

MultiviewX - MODA

Diffusion Personalization Tuning Free

AgeDB - Cosine Similarity

Diffusion Personalization Tuning Free

AgeDB - FID

Unsupervised Domain Adaptation

SIM10K to Cityscapes - [email protected]

Cross-modal retrieval with noisy correspondence

CC152K - Image-to-text R@5

Cross-modal retrieval with noisy correspondence

COCO-Noisy - Image-to-text R@1

No-Reference Image Quality Assessment

UHD-IQA - SRCC

No-Reference Image Quality Assessment

UHD-IQA - PLCC

Image Super-Resolution

DIV2K val - 4x upscaling - LRPSNR

Efficient ViTs

ImageNet-1K (With LV-ViT-S) - Top 1 Accuracy

Efficient ViTs

ImageNet-1K (With LV-ViT-S) - GFLOPs

Motion Synthesis

FineDance - fid_k

Motion Synthesis

FineDance - BAS

Crowd Counting

UCF-QNRF - MAE

Few-Shot Image Classification

Meta-Dataset - Accuracy

Monocular Depth Estimation

DDAD - absolute relative error

Monocular Depth Estimation

DDAD - Sq Rel

Monocular Depth Estimation

DDAD - RMSE

Monocular Depth Estimation

DDAD - RMSE log

3D Object Detection

nuscenes Camera-Radar - NDS

Emotion Recognition in Conversation

EmoryNLP - Weighted-F1

3D Object Detection

View-of-Delft (val) - mAP

Motion Synthesis

AIOZ-GDANCE - GenDiv

Motion Synthesis

AIOZ-GDANCE - PFC

Motion Synthesis

AIOZ-GDANCE - GMR

Motion Synthesis

AIOZ-GDANCE - TIF

Visual Object Tracking

TNL2K - AUC

Visual Object Tracking

TNL2K - precision

Visual Object Tracking

LaSOT-ext - AUC

Visual Object Tracking

LaSOT-ext - Normalized Precision

Visual Object Tracking

LaSOT-ext - Precision

Visual Object Tracking

UAV123 - AUC

Cross-modal retrieval with noisy correspondence

CC152K - Text-to-image R@1

Age Estimation

IMDB-Clean - Average mean absolute error

Age Estimation

CACD - MAE

Facial Attribute Classification

FairFace - gender-top1

Facial Attribute Classification

FairFace - age-top1

Age And Gender Classification

Adience Gender - Accuracy (5-fold)

Unsupervised Semantic Segmentation

Potsdam-3 - Accuracy

Video Panoptic Segmentation

VIPSeg - STQ

Burst Image Super-Resolution

BurstSR - PSNR

Burst Image Super-Resolution

BurstSR - SSIM

Image Harmonization

iHarmony4 - MSE

Image Harmonization

iHarmony4 - fMSE

Zero-Shot Transfer 3D Point Cloud Classification

ScanObjectNN - OBJ_ONLY Accuracy(%)

Few-Shot 3D Point Cloud Classification

ModelNet40 10-way (20-shot) - Overall Accuracy

Few-Shot 3D Point Cloud Classification

ModelNet40 5-way (20-shot) - Overall Accuracy

Few-Shot 3D Point Cloud Classification

ModelNet40 5-way (20-shot) - Standard Deviation

Image-to-Image Translation

ADE20K Labels-to-Photos - LPIPS

hand-object pose

DexYCB - Average MPJPE (mm)

hand-object pose

DexYCB - MCE

hand-object pose

DexYCB - Procrustes-Aligned MPJPE

hand-object pose

DexYCB - ADD-S

hand-object pose

HO-3D v2 - ST-MPJPE

hand-object pose

HO-3D v2 - OME

hand-object pose

HO-3D v2 - ADD-S

hand-object pose

HO-3D v2 - Average MPJPE (mm)

3D Question Answering (3D-QA)

ScanQA Test w/ objects - Exact Match

3D Question Answering (3D-QA)

ScanQA Test w/ objects - BLEU-4

3D Question Answering (3D-QA)

ScanQA Test w/ objects - ROUGE

3D Question Answering (3D-QA)

ScanQA Test w/ objects - CIDEr

Face Alignment

AFLW-19 - NME_diag (%, Full)

Face Alignment

AFLW-19 - NME_diag (%, Frontal)

Face Alignment

AFLW-19 - NME_box (%, Full)

Face Alignment

AFLW-19 - [email protected] (%, Full)

Facial Landmark Detection

AFLW-Full - Mean NME

Facial Landmark Detection

AFLW-Full - Mean NME

Zero-shot Named Entity Recognition (NER)

CrossNER - AI

Zero-shot Named Entity Recognition (NER)

CrossNER - Literature

Zero-shot Named Entity Recognition (NER)

CrossNER - Music

Zero-shot Named Entity Recognition (NER)

CrossNER - Politics

Zero-shot Named Entity Recognition (NER)

CrossNER - Science

3D Human Reconstruction

EHF - MPVPE

3D Human Reconstruction

EHF - PA V2V (mm), whole body

3D Multi-Person Mesh Recovery

AGORA - FB-MVE

Weakly Supervised Object Detection

PASCAL VOC 2012 test - MAP

3D Human Pose Estimation

UBody - PVE-All

3D Human Pose Estimation

UBody - PVE-Hands

3D Human Pose Estimation

UBody - PVE-Face

3D Human Pose Estimation

UBody - PA-PVE-All

3D Human Pose Estimation

UBody - PA-PVE-Hands

3D Human Pose Estimation

UBody - PA-PVE-Face

Multiple Object Tracking

SportsMOT - MOTA

Math Word Problem Solving

MAWPS - Accuracy (%)

Trajectory Planning

ToolBench - Win rate

Low-Light Image Enhancement

DICM - NIQE

Low-Light Image Enhancement

DICM - BRISQUE

Low-Light Image Enhancement

NPE - NIQE

Low-Light Image Enhancement

LIME - NIQE

Low-Light Image Enhancement

LIME - BRISQUE

Low-Light Image Enhancement

MEF - BRISQUE

Low-Light Image Enhancement

LOL - Average PSNR

Low-Light Image Enhancement

Sony-Total-Dark - Average PSNR

Low-Light Image Enhancement

Sony-Total-Dark - SSIM

Low-Light Image Enhancement

Sony-Total-Dark - LPIPS

Low-Light Image Enhancement

LOLv2-synthetic - Average PSNR

Low-Light Image Enhancement

LOLv2-synthetic - SSIM

Low-Light Image Enhancement

VV - NIQE

Low-Light Image Enhancement

VV - BRISQUE

Robot Pose Estimation

DREAM-dataset - AUC (avg. on 4 real DREAM datasets)

Robot Pose Estimation

DREAM-dataset - mean-ADD (avg. on 4 real DREAM datasets)

Low-light Image Deblurring and Enhancement

LOL-Blur - LPIPS

Few-Shot Object Detection

MS-COCO (30-shot) - AP

Few-Shot Object Detection

MS-COCO (10-shot) - AP

Cross-Domain Few-Shot Object Detection

Artaxor - mAP

Cross-Domain Few-Shot Object Detection

Clipark1k - mAP

Cross-Domain Few-Shot Object Detection

DeepFish - mAP

Unsupervised Few-Shot Image Classification

Mini-Imagenet 5-way (1-shot) - Accuracy

Unsupervised Few-Shot Image Classification

Tiered ImageNet 5-way (5-shot) - Accuracy

Unsupervised Few-Shot Image Classification

Mini-Imagenet 5-way (5-shot) - Accuracy

Unsupervised Few-Shot Image Classification

Tiered ImageNet 5-way (1-shot) - Accuracy

Time Series Forecasting

ETTh1 (720) Univariate - MSE

Time Series Forecasting

ETTh1 (720) Univariate - MAE

Time Series Forecasting

ETTh2 (720) Univariate - MSE

Point Tracking

TAP-Vid-DAVIS - Average PCK

Low-Light Image Enhancement

LOL - LPIPS

Low-Light Image Enhancement

LOL - FLOPS (G)

Low-Light Image Enhancement

LOL - Params (M)

Low-Light Image Enhancement

LOLv2-synthetic - LPIPS

Low-Light Image Enhancement

LOLv2 - LPIPS

Image Denoising

SIDD - PSNR (sRGB)

Music Source Separation

MUSDB18-HQ - SDR (drums)

Music Source Separation

MUSDB18-HQ - SDR (others)

Music Source Separation

MUSDB18-HQ - SDR (vocals)

Music Source Separation

MUSDB18-HQ - SDR (avg)

3D Reconstruction

DTU - Overall

Monocular Depth Estimation

NYU-Depth V2 - Delta < 1.25^3

Monocular Depth Estimation

NYU-Depth V2 - Delta < 1.25^2

Data-to-Text Generation

E2E NLG Challenge - METEOR

Network Intrusion Detection

CICIDS2017 - Recall

RGB Salient Object Detection

DUT-OMRON - MAE

RGB Salient Object Detection

DUT-OMRON - S-Measure

RGB Salient Object Detection

DUT-OMRON - mean F-Measure

RGB Salient Object Detection

DAVIS-S - S-measure

RGB Salient Object Detection

DAVIS-S - F-measure

RGB Salient Object Detection

DAVIS-S - MAE

RGB Salient Object Detection

UHRSD - S-Measure

RGB Salient Object Detection

UHRSD - max F-Measure

RGB Salient Object Detection

UHRSD - MAE

RGB Salient Object Detection

DUTS-TE - MAE

RGB Salient Object Detection

DUTS-TE - max F-measure

RGB Salient Object Detection

DUTS-TE - S-Measure

RGB Salient Object Detection

DUTS-TE - mean E-Measure

RGB Salient Object Detection

DUTS-TE - mean F-Measure

RGB Salient Object Detection

HRSOD - S-Measure

RGB Salient Object Detection

HRSOD - max F-Measure

RGB Salient Object Detection

HRSOD - MAE

Dichotomous Image Segmentation

DIS-TE1 - S-Measure

Dichotomous Image Segmentation

DIS-TE1 - HCE

Dichotomous Image Segmentation

DIS-TE3 - weighted F-measure

Dichotomous Image Segmentation

DIS-TE3 - MAE

Dichotomous Image Segmentation

DIS-TE4 - weighted F-measure

Dichotomous Image Segmentation

DIS-TE4 - MAE

Camouflaged Object Segmentation

CHAMELEON - S-measure

Camouflaged Object Segmentation

CHAMELEON - weighted F-measure

Camouflaged Object Segmentation

CHAMELEON - MAE

Camouflaged Object Segmentation

NC4K - S-measure

Camouflaged Object Segmentation

NC4K - weighted F-measure

Camouflaged Object Segmentation

NC4K - MAE

Camouflaged Object Segmentation

COD - MAE

Camouflaged Object Segmentation

COD - Weighted F-Measure

Camouflaged Object Segmentation

COD - S-Measure

Key-value Pair Extraction

RFUND-EN - key-value pair F1

Key-value Pair Extraction

SIBR - key-value pair F1

Visual Object Tracking

OTB-2015 - AUC

Facial Expression Recognition (FER)

AffectNet - Accuracy (7 emotion)

Action Classification

MiT - Top 1 Accuracy

Action Classification

Moments in Time - Top 1 Accuracy

Action Anticipation

EGTEA - Top-1 Accuracy

3D Point Cloud Classification

ModelNet40-C - Error Rate

3D Face Animation

BEAT2 - MSE

Scene Text Recognition

ICDAR2013 - Accuracy

Video Quality Assessment

LIVE-FB LSVQ - PLCC

Skeleton Based Action Recognition

N-UCLA - Accuracy

Zero-Shot Video Question Answer

NExT-GQA - Acc@GQA

Visual Object Tracking

NeedForSpeed - AUC

Visual Object Tracking

TrackingNet - Normalized Precision

Cross-modal retrieval with noisy correspondence

Flickr30K-Noisy - Image-to-text R@1

Cross-modal retrieval with noisy correspondence

Flickr30K-Noisy - Text-to-image R@1

Artifact Detection

HistoArtifacts - MCC

Weakly Supervised Action Localization

ActivityNet-1.2 - Mean mAP

Unsupervised Semantic Segmentation with Language-image Pre-training

PascalVOC-20 - mIoU

Image Dehazing

SOTS Outdoor - PSNR

Image-to-Image Translation

Cityscapes Labels-to-Photo - mIoU

Image-to-Image Translation

Cityscapes Labels-to-Photo - FID

Image-to-Image Translation

ADE20K Labels-to-Photos - mIoU

Image-to-Image Translation

ADE20K Labels-to-Photos - FID

Image-to-Image Translation

COCO-Stuff Labels-to-Photos - FID

Video Semantic Segmentation

VSPW - mIoU

Unsupervised Semantic Segmentation with Language-image Pre-training

PASCAL VOC - mIoU

Multiview Detection

Wildtrack - MODA

Multiview Detection

Wildtrack - Recall

Multiview Detection

MultiviewX - Recall

3D Facial Landmark Localization

H3WB - Average MPJPE (mm)

Multi-task Language Understanding

MMLU - Average (%)

Robust Object Detection

DWD - mPC [AP50]

3D Instance Segmentation

ScanNet(v2) - mAP @ 50

3D Instance Segmentation

ScanNet(v2) - mAP

3D Instance Segmentation

ScanNet200 - mAP

Single Image Deraining

Rain100H - SSIM

Single Image Deraining

Rain100H - PSNR

3D Semantic Segmentation

ScanNet++ - Top-1 IoU

3D Semantic Segmentation

ScanNet++ - Top-3 IoU

Referring Video Object Segmentation

Refer-YouTube-VOS - J&F

Referring Video Object Segmentation

Refer-YouTube-VOS - J

Referring Video Object Segmentation

Refer-YouTube-VOS - F

Multi-Object Tracking

TAO - TETA

Multi-Object Tracking

TAO - LocA

Multi-Object Tracking

TAO - AssocA

Referring Expression Segmentation

Refer-YouTube-VOS (2021 public validation) - J&F

Referring Expression Segmentation

Refer-YouTube-VOS (2021 public validation) - J

Referring Expression Segmentation

Refer-YouTube-VOS (2021 public validation) - F

Column Type Annotation

VizNet-Sato-Full - Macro-F1

Multi-Person Pose Estimation

CrowdPose - mAP @0.5:0.95

Multi-Person Pose Estimation

CrowdPose - AP Easy

Multi-Person Pose Estimation

CrowdPose - AP Medium

Multi-Person Pose Estimation

CrowdPose - AP Hard

Semi-Supervised Object Detection

COCO 1% labeled data - mAP

Semi-Supervised Object Detection

COCO 100% labeled data - mAP

Semi-Supervised Object Detection

COCO 10% labeled data - mAP

Semi-Supervised Object Detection

COCO 2% labeled data - mAP

Video Prediction

Kinetics-600 12 frames, 64x64 - FVD

Science Question Answering

ScienceQA - Social Science

Time Series Forecasting

Electricity (720) - MSE

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (Ego4D) All

Text-based Image Editing

PIE-Bench - Background PSNR

Text-based Image Editing

PIE-Bench - Background LPIPS

Motion Synthesis

InterHuman - MMDist

Weakly-Supervised Semantic Segmentation

PASCAL VOC 2012 train - Mean IoU

Domain Generalization

GTA5-to-Cityscapes - mIoU

Image Manipulation Detection

DSO-1 - Balanced Accuracy

Image Manipulation Detection

COVERAGE - AUC

Image Manipulation Detection

COVERAGE - Balanced Accuracy

Image Manipulation Detection

CocoGlide - Balanced Accuracy

Image Manipulation Detection

Casia V1+ - Balanced Accuracy

3D Question Answering (3D-QA)

ScanQA Test w/ objects - BLEU-1

3D Question Answering (3D-QA)

ScanQA Test w/ objects - METEOR

Referring Expression Segmentation

RefCOCOg-val - Overall IoU

Referring Expression Segmentation

RefCOCOg-test - Overall IoU

Human Part Segmentation

CIHP - Mean IoU

Monocular Depth Estimation

NYU-Depth V2 self-supervised - Root mean square error (RMSE)

Monocular Depth Estimation

NYU-Depth V2 self-supervised - Absolute relative error (AbsRel)

Monocular Depth Estimation

NYU-Depth V2 self-supervised - delta_1

Monocular Depth Estimation

NYU-Depth V2 self-supervised - delta_2

Monocular Depth Estimation

NYU-Depth V2 self-supervised - delta_3

Long-range modeling

LRA - Pathfinder-X

Bird's-Eye View Semantic Segmentation

Lyft Level 5 - IoU vehicle - 224x480 - Long

Bird's-Eye View Semantic Segmentation

Lyft Level 5 - IoU vehicle - 224x480 - Short

Action Classification

Toyota Smarthome dataset - CS

Action Classification

Toyota Smarthome dataset - CV1

Natural Language Moment Retrieval

TACoS - R@1,IoU=0.3

Natural Language Moment Retrieval

TACoS - R@1,IoU=0.5

Natural Language Moment Retrieval

TACoS - R@1,IoU=0.7

Natural Language Moment Retrieval

TACoS - mIoU

Motion Synthesis

HumanML3D - FID

Egocentric Pose Estimation

GlobalEgoMocap Test Dataset - Average MPJPE (mm)

Egocentric Pose Estimation

GlobalEgoMocap Test Dataset - PA-MPJPE

Egocentric Pose Estimation

SceneEgo - Average MPJPE (mm)

Egocentric Pose Estimation

SceneEgo - PA-MPJPE

Semantic correspondence

SPair-71k - PCK

Semantic correspondence

PF-PASCAL - PCK

3D Dense Shape Correspondence

SHREC'19 - Euclidean Mean Error (EME)

3D Dense Shape Correspondence

SHREC'19 - Accuracy at 1%

3D Human Reconstruction

CustomHumans - Chamfer Distance P-to-S

3D Human Reconstruction

CustomHumans - Chamfer Distance S-to-P

Video Retrieval

VATEX - text-to-video R@50

Panoptic Scene Graph Generation

PSG Dataset - mR@20

Semi-Supervised Semantic Segmentation

COCO 1/256 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 183 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 732 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/32 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 366 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 1464 labels - Validation mIoU

Semi-Supervised Semantic Segmentation

ADE20K 1/16 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/128 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/64 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

ADE20K 1/32 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

Cityscapes 6.25% labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

COCO 1/512 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 92 labeled - Validation mIoU

Semi-Supervised Semantic Segmentation

Cityscapes 100 samples labeled - Validation mIoU

Few-Shot Learning

MedConceptsQA - Accuracy

3D Instance Segmentation

ScanNet(v2) - mAP@25

Science Question Answering

ScienceQA - Natural Science

Science Question Answering

ScienceQA - Language Science

Science Question Answering

ScienceQA - Text Context

Science Question Answering

ScienceQA - Image Context

Science Question Answering

ScienceQA - No Context

Science Question Answering

ScienceQA - Grades 1-6

Science Question Answering

ScienceQA - Grades 7-12

Science Question Answering

ScienceQA - Avg. Accuracy

Synthetic-to-Real Translation

SYNTHIA-to-Cityscapes - MIoU (13 classes)

Synthetic-to-Real Translation

SYNTHIA-to-Cityscapes - MIoU (16 classes)

Synthetic-to-Real Translation

GTAV-to-Cityscapes Labels - mIoU

GZSL Video Classification

VGGSound-GZSL(main) - HM

GZSL Video Classification

VGGSound-GZSL(main) - ZSL

GZSL Video Classification

UCF-GZSL (cls) - HM

GZSL Video Classification

UCF-GZSL (cls) - ZSL

GZSL Video Classification

ActivityNet-GZSL (cls) - HM

GZSL Video Classification

ActivityNet-GZSL (cls) - ZSL

GZSL Video Classification

ActivityNet-GZSL(main) - HM

GZSL Video Classification

ActivityNet-GZSL(main) - ZSL

GZSL Video Classification

VGGSound-GZSL (cls) - HM

GZSL Video Classification

VGGSound-GZSL (cls) - ZSL

GZSL Video Classification

UCF-GZSL(main) - HM

GZSL Video Classification

UCF-GZSL(main) - ZSL

3D Reconstruction

ShapeNet - IoU

Low-Dose X-Ray Ct Reconstruction

X3D - PSNR

Low-Dose X-Ray Ct Reconstruction

X3D - SSIM

Multi-task Language Understanding

BBH-nlp - Average (%)

Multiple Object Tracking

BDD100K test - mMOTA

Image-to-Image Translation

horse2zebra - Frechet Inception Distance

Sports Ball Detection and Tracking

Volleyball - F1 (%)

Sports Ball Detection and Tracking

Volleyball - Accuracy (%)

Sports Ball Detection and Tracking

Volleyball - Average Precision (%)

Sports Ball Detection and Tracking

Badminton - F1 (%)

Sports Ball Detection and Tracking

Badminton - Accuracy (%)

Sports Ball Detection and Tracking

Badminton - Average Precision (%)

Sports Ball Detection and Tracking

Soccer - F1 (%)

Sports Ball Detection and Tracking

Soccer - Average Precision (%)

Sports Ball Detection and Tracking

Soccer - Accuracy (% )

Sports Ball Detection and Tracking

Basketball - F1 (%)

Sports Ball Detection and Tracking

Basketball - Accuracy (%)

Sports Ball Detection and Tracking

Basketball - Average Precision (%)

Sports Ball Detection and Tracking

Tennis - F1 (%)

Sports Ball Detection and Tracking

Tennis - Accuracy (%)

Sports Ball Detection and Tracking

Tennis - Average Precision (%)

Audio Classification

VGGSound - Top 1 Accuracy

Multimodal Emotion Recognition

IEMOCAP - F1

Multimodal Emotion Recognition

IEMOCAP - Weighted Accuracy (WA)

Image Dehazing

SOTS Outdoor - SSIM

Video Anomaly Detection

HR-Avenue - AUC

Video Story QA

MovieQA - Accuracy

Math Word Problem Solving

ASDiv-A - Execution Accuracy

Video Retrieval

Condensed Movies - text-to-video R@1

Video Retrieval

Condensed Movies - text-to-video R@5

Video Retrieval

Condensed Movies - text-to-video R@10

Video Retrieval

QuerYD - text-to-video R@1

Video Retrieval

QuerYD - text-to-video R@10

Video Retrieval

QuerYD - text-to-video R@5

Multi-Label Image Classification

BigEarthNet (official test set) - F1 Score

Aspect-Based Sentiment Analysis (ABSA)

SemEval 2014 Task 4 Subtask 1+2 - F1

Hand Gesture Recognition

LSA16 - Accuracy

Cross-modal retrieval with noisy correspondence

CC152K - Text-to-image R@10

Cross-modal retrieval with noisy correspondence

COCO-Noisy - Text-to-image R@1

Retinal Vessel Segmentation

CHASE_DB1 - mIOU

Indoor Scene Synthesis

PRO-teXt - CD

Indoor Scene Synthesis

PRO-teXt - EMD

Indoor Scene Synthesis

PRO-teXt - F1

Semi-Supervised Image Classification

ImageNet - 10% labeled data - Top 5 Accuracy

Single-View 3D Reconstruction

GSO - IoU

Video-Text Retrieval

Test-of-Time - 2-Class Accuracy

Visual Question Answering

VQA v2 test-dev - Accuracy

Heterogeneous Node Classification

DBLP (Heterogeneous Node Classification) - Macro-F1

Heterogeneous Node Classification

DBLP (Heterogeneous Node Classification) - Micro-F1

Heterogeneous Node Classification

IMDB (Heterogeneous Node Classification) - Macro-F1

Heterogeneous Node Classification

IMDB (Heterogeneous Node Classification) - Micro-F1

Heterogeneous Node Classification

ACM (Heterogeneous Node Classification) - Macro-F1

Heterogeneous Node Classification

Freebase (Heterogeneous Node Classification) - Macro-F1

Heterogeneous Node Classification

OAG-L1-Field - NDCG

Heterogeneous Node Classification

OAG-L1-Field - MRR

No-Reference Image Quality Assessment

CSIQ - PLCC

Analog Video Restoration

TAPE - LPIPS

Analog Video Restoration

TAPE - PSNR

Semi-Supervised Video Object Segmentation

YouTube-VOS 2019 - Overall

Semi-Supervised Video Object Segmentation

YouTube-VOS 2019 - F-Measure (Unseen)

Semi-Supervised Video Object Segmentation

MOSE - J&F

Semi-Supervised Video Object Segmentation

MOSE - J

Semi-Supervised Video Object Segmentation

MOSE - F

Vehicle Re-Identification

VeRi-Wild Small - mAP

Motion Synthesis

Motion-X - FID

Motion Synthesis

Motion-X - TMR-R-Precision Top3

Motion Synthesis

Motion-X - TMR-Matching Score

Motion Synthesis

Motion-X - MModality

Conditional Image Generation

ImageNet 128x128 - FID

Few-Shot Image Classification

CUB 200 5-way 5-shot - Accuracy

Few-Shot Image Classification

CIFAR-FS 5-way (5-shot) - Accuracy

Domain Generalization

TerraIncognita - Average Accuracy

No-Reference Image Quality Assessment

KADID-10k - SRCC

No-Reference Image Quality Assessment

KADID-10k - PLCC

No-Reference Image Quality Assessment

CSIQ - SRCC

No-Reference Image Quality Assessment

TID2013 - SRCC

No-Reference Image Quality Assessment

TID2013 - PLCC

Visual Question Answering

BenchLMM - GPT-3.5 score

Knowledge Base Question Answering

ComplexWebQuestions - Accuracy

Knowledge Base Question Answering

WebQuestionsSP - Hits@1

Knowledge Base Question Answering

WebQuestionsSP - F1

Hateful Meme Classification

HarMeme - AUROC

Unsupervised Semantic Segmentation

COCO-Stuff-81 - mIoU

Unsupervised Semantic Segmentation

COCO-Stuff-81 - Pixel Accuracy

3D Lane Detection

OpenLane-V2 val - DET_l

3D Lane Detection

OpenLane-V2 val - TOP_lt

3D Object Detection

V2XSet - AP0.7 (Perfect)

Zero-Shot Transfer 3D Point Cloud Classification

ModelNet40 - Accuracy (%)

Zero-Shot Object Detection

PASCAL VOC'07 - mAP

Semi-Supervised Image Classification

CIFAR-100, 400 Labels - Percentage error

Vehicle Re-Identification

VeRi-776 - mAP

Vehicle Re-Identification

VeRi-776 - Rank-1

Vehicle Re-Identification

VehicleID Small - mAP

Type prediction

ManyTypes4TypeScript - Average Accuracy

Automated Theorem Proving

miniF2F-valid - Pass@100

Time Series Forecasting

ETTh2 (192) Multivariate - MSE

Time Series Forecasting

ETTh2 (192) Multivariate - MAE

Time Series Forecasting

ETTh2 (336) Multivariate - MSE

Time Series Forecasting

ETTh2 (336) Multivariate - MAE

Time Series Forecasting

ETTh1 (192) Univariate - MSE

Time Series Forecasting

ETTh1 (192) Univariate - MAE

Time Series Forecasting

ETTh2 (96) Multivariate - MSE

Time Series Forecasting

ETTh2 (96) Multivariate - MAE

Time Series Forecasting

ETTh2 (192) Univariate - MSE

Time Series Forecasting

ETTh2 (192) Univariate - MAE

Time Series Forecasting

ETTh2 (96) Univariate - MSE

3D Multi-Person Mesh Recovery

AGORA - B-NMVE

3D Multi-Person Mesh Recovery

AGORA - F-MVE

Video Retrieval

MSVD - video-to-text R@10

TDC ADMET Benchmarking Group

tdcommons - TDC.Caco2_Wang

Open Vocabulary Object Detection

LVIS v1.0 - AP novel-Unrestricted open-vocabulary training

Image-text matching

CommercialAdsDataset - ADD(S) AUC

Multi-Label Text Classification

CC3M-TagMask - F1

Multi-Label Text Classification

CC3M-TagMask - Recall

Photo geolocation estimation

Im2GPS3k - Street level (1 km)

Photo geolocation estimation

YFCC26k - Street level (1 km)

Automated Theorem Proving

miniF2F-test - Pass@100

Graph Classification

REDDIT-B - Accuracy

MRI Reconstruction

fastMRI Knee Val 8x - SSIM

MRI Reconstruction

fastMRI Knee Val 8x - PSNR

MRI Reconstruction

fastMRI Knee Val 8x - NMSE

Cross-modal retrieval with noisy correspondence

COCO-Noisy - Text-to-image R@5

Cross-modal retrieval with noisy correspondence

COCO-Noisy - Text-to-image R@10

Cross-modal retrieval with noisy correspondence

COCO-Noisy - R-Sum

Cross-modal retrieval with noisy correspondence

Flickr30K-Noisy - Image-to-text R@5

Cross-Domain Few-Shot Object Detection

NEU-DET - mAP

Semi-Supervised Semantic Segmentation

PASCAL VOC 2012 25% labeled - Validation mIoU

3D Object Detection

DAIR-V2X-I - AP|R40(moderate)

3D Object Detection

DAIR-V2X-I - AP|R40(easy)

3D Object Detection

DAIR-V2X-I - AP|R40(hard)

3D Object Detection

Rope3D - [email protected]

Session-Based Recommendations

Diginetica - MRR@20

Session-Based Recommendations

Diginetica - Hit@20

Session-Based Recommendations

yoochoose1/64 - HR@20

Long-tail Learning

ImageNet-LT - Top-1 Accuracy

Generalized Zero Shot skeletal action recognition

NTU RGB+D - Harmonic Mean (5 unseen classes)

Generalized Zero Shot skeletal action recognition

NTU RGB+D - Harmonic Mean (12 unseen classes)

Generalized Zero Shot skeletal action recognition

NTU RGB+D 120 - Harmonic Mean (24 unseen classes)

RGB-D Salient Object Detection

NLPR - S-Measure

RGB-D Salient Object Detection

NLPR - Average MAE

RGB-D Salient Object Detection

NLPR - max F-Measure

RGB-D Salient Object Detection

NLPR - max E-Measure

RGB-D Salient Object Detection

SIP - S-Measure

RGB-D Salient Object Detection

SIP - max E-Measure

RGB-D Salient Object Detection

SIP - max F-Measure

RGB-D Salient Object Detection

SIP - Average MAE

RGB-D Salient Object Detection

STERE - S-Measure

RGB-D Salient Object Detection

STERE - Average MAE

RGB-D Salient Object Detection

STERE - max F-Measure

RGB-D Salient Object Detection

STERE - max E-Measure

RGB-D Salient Object Detection

NJU2K - S-Measure

RGB-D Salient Object Detection

NJU2K - Average MAE

RGB-D Salient Object Detection

NJU2K - max E-Measure

RGB-D Salient Object Detection

NJU2K - max F-Measure

RGB-D Salient Object Detection

DES - S-Measure

RGB-D Salient Object Detection

DES - Average MAE

RGB-D Salient Object Detection

DES - max E-Measure

RGB-D Salient Object Detection

DES - max F-Measure

Long-tail Learning

CIFAR-100-LT (ρ=50) - Error Rate

Long-tail Learning

Places-LT - Top-1 Accuracy

Long-tail Learning

CIFAR-100-LT (ρ=10) - Error Rate

RGB Salient Object Detection

ECSSD - MAE

RGB Salient Object Detection

HKU-IS - MAE

Sleep Stage Detection

MASS SS3 - Macro-F1

Head Pose Estimation

BIWI - MAE (trained with other data)

Temporal Action Localization

MultiTHUMOS - Average mAP

Temporal Action Localization

MultiTHUMOS - mAP [email protected]

Temporal Action Localization

MultiTHUMOS - mAP [email protected]

Temporal Action Localization

MultiTHUMOS - mAP [email protected]

Human Pose Forecasting

AMASS - Average MPJPE (mm) 1000 msec

Face Swapping

FaceForensics++ - ID retrieval

Unsupervised Video Object Segmentation

DAVIS 2017 (test-dev) - J&F

Unsupervised Video Object Segmentation

DAVIS 2017 (val) - J&F

Unsupervised Video Object Segmentation

DAVIS 2017 (val) - Jaccard (Mean)

Unsupervised Video Object Segmentation

DAVIS 2017 (val) - F-measure (Mean)

Unsupervised Video Object Segmentation

DAVIS 2016 val - F

Single Image Deraining

Test1200 - SSIM

Single Image Deraining

Test1200 - PSNR

6D Pose Estimation using RGBD

YCB-Video - Mean ADD

Monocular Depth Estimation

KITTI Eigen split unsupervised - absolute relative error

Monocular Depth Estimation

KITTI Eigen split unsupervised - RMSE log

Monocular Depth Estimation

KITTI Eigen split unsupervised - Delta < 1.25

Monocular Depth Estimation

KITTI Eigen split unsupervised - Delta < 1.25^2

Monocular Depth Estimation

KITTI Eigen split - RMSE

Monocular Depth Estimation

KITTI Eigen split - Sq Rel

3D Question Answering (3D-QA)

3D MM-Vet - Overall Accuracy

Image Manipulation Detection

CASIA (OSN-transmitted - Facebook) - AUC

Image Manipulation Detection

CASIA (OSN-transmitted - Facebook) - F-score

Image Manipulation Detection

CASIA (OSN-transmitted - Facebook) - Intersection over Union

Image Manipulation Detection

CASIA (OSN-transmitted - Wechat) - AUC

Image Manipulation Detection

CASIA (OSN-transmitted - Wechat) - f-Score

Image Manipulation Detection

CASIA (OSN-transmitted - Wechat) - Intersection over Union

Image Manipulation Detection

CASIA (OSN-transmitted - Whatsapp) - AUC

Image Manipulation Detection

CASIA (OSN-transmitted - Whatsapp) - f-Score

Image Manipulation Detection

CASIA (OSN-transmitted - Whatsapp) - Intersection over Union

Image Manipulation Detection

CASIA (OSN-transmitted - Weibo) - AUC

Image Manipulation Detection

CASIA (OSN-transmitted - Weibo) - f-Score

Image Manipulation Detection

CASIA (OSN-transmitted - Weibo) - Intersection over Union

Image Manipulation Detection

Columbia (OSN-transmitted - Wechat) - f-Score

Image Manipulation Detection

Columbia (OSN-transmitted - Wechat) - Intersection over Union

Image Manipulation Detection

Columbia (OSN-transmitted - Whatsapp) - f-Score

Image Manipulation Detection

Columbia (OSN-transmitted - Whatsapp) - Intersection over Union

Image Manipulation Detection

Columbia (OSN-transmitted - Weibo) - AUC

Image Manipulation Detection

Columbia (OSN-transmitted - Weibo) - f-Score

Image Manipulation Detection

Columbia (OSN-transmitted - Weibo) - Intersection over Union

Image Manipulation Detection

DSO (OSN-transmitted - Whatsapp) - f-Score

Image Manipulation Detection

DSO (OSN-transmitted - Whatsapp) - Intersection over Union

Image Manipulation Detection

NIST (OSN-transmitted - Facebook) - AUC

Image Manipulation Detection

NIST (OSN-transmitted - Wechat) - AUC

Image Manipulation Detection

NIST (OSN-transmitted - Wechat) - f-Score

Image Manipulation Detection

NIST (OSN-transmitted - Wechat) - Intersection over Union

Image Manipulation Detection

NIST (OSN-transmitted - Whatsapp) - AUC

Image Manipulation Detection

NIST (OSN-transmitted - Whatsapp) - f-Score

Image Manipulation Detection

NIST (OSN-transmitted - Whatsapp) - Intersection over Union

Image Manipulation Detection

NIST (OSN-transmitted - Weibo) - AUC

Image Manipulation Detection

NIST (OSN-transmitted - Weibo) - f-Score

Image Manipulation Detection

NIST (OSN-transmitted - Weibo) - Intersection over Union

Image Manipulation Detection

Columbia (OSN-transmitted - Facebook) - f-Score

Image Manipulation Detection

Columbia (OSN-transmitted - Facebook) - Intersection over Union

Scene Text Recognition

ICDAR2015 - Accuracy

Scene Text Recognition

SVTP - Accuracy

Action Segmentation

GTEA - Edit

Facial Expression Recognition (FER)

AffectNet - Accuracy (8 emotion)

Weakly Supervised Action Localization

BEOID - [email protected]:0.7

Weakly Supervised Action Localization

BEOID - [email protected]

Weakly Supervised Action Localization

THUMOS 2014 - [email protected]

Weakly Supervised Action Localization

THUMOS 2014 - [email protected]:0.7

Weakly Supervised Action Localization

THUMOS 2014 - [email protected]:0.5

Data-free Knowledge Distillation

QNLI - Accuracy

Small Object Detection

SOD4SB Private Test - AP50

Small Object Detection

SOD4SB Public Test - AP50

Time Series Forecasting

Weather (96) - MAE

Time Series Forecasting

Weather (720) - MSE

Time Series Forecasting

Weather (720) - MAE

Time Series Forecasting

Weather (192) - MSE

Time Series Forecasting

Weather (192) - MAE

Time Series Forecasting

Weather (336) - MSE

Time Series Forecasting

Weather (336) - MAE

Time Series Forecasting

ETTh1 (336) Univariate - MSE

Time Series Forecasting

ETTh1 (336) Univariate - MAE

Action Segmentation

COIN - Frame accuracy

Reflection Removal

SIR^2(Wild) - PSNR

Reflection Removal

SIR^2(Wild) - SSIM

Reflection Removal

Real20 - PSNR

Reflection Removal

SIR^2(Objects) - PSNR

Reflection Removal

SIR^2(Postcard) - PSNR

Reflection Removal

SIR^2(Postcard) - SSIM

3D Object Detection

nuScenes Camera Only - NDS

3D Object Detection

3D Object Detection on Argoverse2 Camera Only - Average mAP

Human Pose Forecasting

3DPW - Average MPJPE (mm) 1000 msec

3D Object Detection

nuScenes - mAOE

Diffusion Personalization Tuning Free

AgeDB - LPIPS

Semi-Supervised Image Classification

SVHN, 250 Labels - Accuracy

Edge Detection

UDED - ODS

3D Lane Detection

OpenLane - Curve

3D Lane Detection

OpenLane - Merge & Split

3D Object Detection

ScanNetV2 - [email protected]

Image Harmonization

HAdobe5k(1024$\times$1024) - MSE

Image Harmonization

HAdobe5k(1024$\times$1024) - fMSE

Chart Question Answering

RealCQA - 1:1 Accuracy

3D Object Detection

V2X-SIM - mAP

Object Detection

OoDIS - AP

Object Detection

OoDIS - AP50

Drone-view target localization

University-1652 - AP

Drone-view target localization

University-1652 - Recall@1

Multimodal Sentiment Analysis

CMU-MOSI - F1

Multimodal Sentiment Analysis

CMU-MOSI - MAE

Multimodal Sentiment Analysis

CMU-MOSI - Corr

Multimodal Sentiment Analysis

CMU-MOSI - Acc-7

Multimodal Sentiment Analysis

CMU-MOSI - Acc-2

Multimodal Sentiment Analysis

MOSI - Accuracy

Multimodal Sentiment Analysis

MOSI - F1 score

Multi-modal Entity Alignment

UMVM-dbp-zh-en - Hits@1

Multi-modal Entity Alignment

UMVM-oea-d-w-v1 - Hits@1

Multi-modal Entity Alignment

UMVM-dbp-fr-en - Hits@1

Multi-modal Entity Alignment

UMVM-oea-en-de - Hits@1

Multi-modal Entity Alignment

UMVM-oea-en-fr - Hits@1

Multi-modal Entity Alignment

UMVM-oea-d-w-v2 - Hits@1

Multi-modal Entity Alignment

UMVM-dbp-ja-en - Hits@1

Human Pose Forecasting

HumanEva-I - MMFDE@2000ms

Human Pose Forecasting

AMASS - ADE

Human Pose Forecasting

AMASS - APD

Text Simplification

ASSET - SARI (EASSE>=0.2.1)

Text Simplification

TurkCorpus - SARI (EASSE>=0.2.1)

Text Simplification

TurkCorpus - FKGL

Scene Segmentation

StreetHazards - Open-mIoU

Referring Expression Segmentation

A2D Sentences - [email protected]

Referring Expression Segmentation

A2D Sentences - [email protected]

Referring Expression Segmentation

A2D Sentences - [email protected]

Referring Expression Segmentation

A2D Sentences - AP

Extracting Buildings In Remote Sensing Images

Massachusetts building dataset - IoU

License Plate Recognition

UFPR-ALPR - Rank-1 Recognition Rate

Drivable Area Detection

BDD100K val - mIoU

Video Polyp Segmentation

SUN-SEG-Easy - IoU

Video Polyp Segmentation

SUN-SEG-Hard - IoU

Conditional Text-to-Image Synthesis

COCO-MIG - instance success rate

Conditional Text-to-Image Synthesis

COCO-MIG - mIoU

Single-View 3D Reconstruction

Common Objects in 3D - Avg. F1

Multiple Choice Question Answering (MCQA)

MMLU (Professional medicine) - Accuracy

Aspect Term Extraction and Sentiment Classification

SemEval - Avg F1

Aspect Term Extraction and Sentiment Classification

SemEval - Restaurant 2014 (F1)

Aspect Term Extraction and Sentiment Classification

SemEval - Laptop 2014 (F1)

Face Verification

BTS3.1 - TAR @ FAR=0.01

Face Identification

DroneSURF - Rank1

Semi-Supervised Object Detection

COCO 5% labeled data - mAP

Action Recognition

Diving-48 - Accuracy

Object Detection

WaterScenes - mAP@50-95

Photo geolocation estimation

GWS15k - City level (25 km)

Photo geolocation estimation

GWS15k - Region level (200 km)

Photo geolocation estimation

GWS15k - Country level (750 km)

Photo geolocation estimation

GWS15k - Continent level (2500 km)

Photo geolocation estimation

Im2GPS3k - City level (25 km)

Photo geolocation estimation

Im2GPS3k - Region level (200 km)

Photo geolocation estimation

Im2GPS3k - Country level (750 km)

Photo geolocation estimation

Im2GPS3k - Continent level (2500 km)

Photo geolocation estimation

YFCC26k - City level (25 km)

Photo geolocation estimation

YFCC26k - Region level (200 km)

Photo geolocation estimation

YFCC26k - Country level (750 km)

Photo geolocation estimation

YFCC26k - Continent level (2500 km)

Photo geolocation estimation

Im2GPS - Region level (200 km)

Photo geolocation estimation

Im2GPS - Country level (750 km)

Photo geolocation estimation

Im2GPS - Continent level (2500 km)

Age Estimation

ChaLearn 2016 - MAE

All-day Semantic Segmentation

All-day CityScapes - mIoU

3D Lane Detection

OpenLane-V2 val - OLS

3D Lane Detection

OpenLane-V2 val - DET_t

3D Lane Detection

OpenLane-V2 val - TOP_ll

Referring Expression Segmentation

RefCoCo val - Overall IoU

Grounded Situation Recognition

SWiG - Top-1 Verb

Grounded Situation Recognition

SWiG - Top-1 Verb & Value

Grounded Situation Recognition

SWiG - Top-5 Verbs

Grounded Situation Recognition

SWiG - Top-5 Verbs & Value

Grounded Situation Recognition

SWiG - Top-1 Verb & Grounded-Value

Grounded Situation Recognition

SWiG - Top-5 Verbs & Grounded-Value

Image Super-Resolution

BSD100 - 2x upscaling - SSIM

Aspect-Based Sentiment Analysis (ABSA)

ASQP - F1 (R16)

Aspect-Based Sentiment Analysis (ABSA)

ASTE - F1 (R16)

Semi-Supervised Video Object Segmentation

YouTube-VOS 2019 - Jaccard (Seen)

Semi-Supervised Video Object Segmentation

YouTube-VOS 2019 - F-Measure (Seen)

Visual Question Answering

ViP-Bench - GPT-4 score (bbox)

Visual Storytelling

VIST - BLEU-1

Visual Storytelling

VIST - BLEU-2

3D Face Animation

Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2 - Lip Vertex Error

3D Face Animation

Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2 - FDD

6D Pose Estimation using RGBD

REAL275 - mAP 10, 5cm

6D Pose Estimation using RGBD

REAL275 - mAP 10, 2cm

6D Pose Estimation using RGBD

REAL275 - mAP 5, 2cm

Facial Expression Recognition (FER)

FER+ - Accuracy

Time Series Forecasting

Electricity (336) - MSE

CARLA MAP Leaderboard

CARLA - Driving score

CARLA MAP Leaderboard

CARLA - Route completion

CARLA MAP Leaderboard

CARLA - Infraction penalty

Animal Pose Estimation

TriMouse-161 - mAP

Zero-Shot Composed Image Retrieval (ZS-CIR)

CIRR - R@5

Formation Energy

Materials Project - MAE

Formation Energy

JARVIS-DFT - MAE

Formation Energy

QM9 - MAE

Few-Shot 3D Point Cloud Classification

ScanObjectNN 10-way (10-shot) - Overall Accuracy

Supervised Anomaly Detection

MVTec AD - Detection AUROC

Supervised Anomaly Detection

MVTec AD - Segmentation AUROC

Supervised Anomaly Detection

MVTec AD - Segmentation AUPRO

Supervised Anomaly Detection

MVTec AD - Segmentation AP

Semi-Supervised Image Classification

CIFAR-10, 400 Labels (OpenSet, 6/4) - Accuracy

Semi-Supervised Image Classification

CIFAR-10, 50 Labels (OpenSet, 6/4) - Accuracy

Semi-Supervised Image Classification

CIFAR-10, 100 Labels (OpenSet, 6/4) - Accuracy

Low-Light Image Enhancement

LOLv2 - Average PSNR

Chinese Word Segmentation

CTB6 - F1

Cloud Removal

SEN12MS-CR-TS - RMSE

Cloud Removal

SEN12MS-CR-TS - PSNR

3D Object Detection

S3DIS - [email protected]

3D Object Detection

S3DIS - [email protected]

3D Object Detection

SUN-RGBD val - [email protected]

3D Object Detection

SUN-RGBD val - [email protected]

Weakly Supervised Action Localization

ActivityNet-1.2 - [email protected]

Video Retrieval

MSR-VTT - text-to-video R@1

Video Retrieval

MSR-VTT - text-to-video R@5

Video Retrieval

VATEX - text-to-video R@1

Video Retrieval

VATEX - text-to-video R@10

Video Retrieval

VATEX - text-to-video R@5

Cross-Modal Retrieval

COCO 2014 - Text-to-image R@1

Cross-Modal Retrieval

COCO 2014 - Text-to-image R@10

Fine-Grained Image Recognition

OVEN - Accuracy

Utterance-level pronounciation scoring

speechocean762 - Pearson correlation coefficient (PCC)

Image Dehazing

SOTS Indoor - SSIM

Image Dehazing

Haze4k - PSNR

Image Dehazing

Haze4k - SSIM

Graph Classification

CIFAR10 100k - Accuracy (%)

Lane Detection

TuSimple - Accuracy

Nested Named Entity Recognition

ACE 2004 - F1

Referring Expression Segmentation

A2D Sentences - [email protected]

Referring Expression Segmentation

A2D Sentences - IoU overall

Referring Expression Segmentation

A2D Sentences - IoU mean

Referring Expression Segmentation

A2D Sentences - [email protected]

Multimodal Intent Recognition

PhotoChat - F1

Multimodal Intent Recognition

PhotoChat - Precision

Multimodal Intent Recognition

PhotoChat - Recall

Multimodal Intent Recognition

MMDialog - F1

Open-Domain Question Answering

ELI5 - Rouge-L

Semantic correspondence

PF-WILLOW - PCK

Math Word Problem Solving

SVAMP - Execution Accuracy

Scene Text Recognition

SVT - Accuracy

Scene Text Recognition

CUTE80 - Accuracy

Scene Text Recognition

IIIT5k - Accuracy

Audio Classification

ICBHI Respiratory Sound Database - Specificity

Semi-Supervised Video Object Segmentation

Long Video Dataset - J&F

Few-Shot Learning

DTD - 12-shot Accuracy

Multivariate Time Series Forecasting

USHCN-Daily - MSE

Aspect-Based Sentiment Analysis (ABSA)

ASQP - F1 (R15)

Aspect-Based Sentiment Analysis (ABSA)

ASTE - F1 (L14)

Aspect-Based Sentiment Analysis (ABSA)

ASTE - F1(R14)

Aspect-Based Sentiment Analysis (ABSA)

ASTE - F1 (R15)

Aspect-Based Sentiment Analysis (ABSA)

ACOS - F1 (Laptop)

Aspect-Based Sentiment Analysis (ABSA)

ACOS - F1 (Restaurant)

Aspect-Based Sentiment Analysis (ABSA)

TASD - F1 (R15)

Aspect-Based Sentiment Analysis (ABSA)

TASD - F1 (R16)

Time Series Forecasting

Weather (96) - MSE

Multi-modal Named Entity Recognition

Twitter-15 - F1

Multi-modal Named Entity Recognition

Twitter-2017 - F1

Few-Shot 3D Point Cloud Classification

ModelNet40 5-way (10-shot) - Overall Accuracy

Molecular Property Prediction

BACE - ROC-AUC

Autonomous Driving

CARLA Leaderboard - Driving Score

Autonomous Driving

CARLA Leaderboard - Infraction penalty

Multi-task Language Understanding

MGSM - Average (%)

Multiple Choice Question Answering (MCQA)

BIG-bench (Ruin Names) - Accuracy

Multiple Choice Question Answering (MCQA)

BIG-bench (Navigate) - Accuracy

Multiple Choice Question Answering (MCQA)

BIG-bench (Movie Recommendation) - Accuracy

Generative Visual Question Answering

PMC-VQA - BLEU-1

Cross-Lingual Transfer

XCOPA - Accuracy

Toxic Comment Classification

Civil Comments - AUROC

Self-supervised Scene Flow Estimation

Argoverse 2 - EPE Foreground Dynamic

Multiple Choice Question Answering (MCQA)

MedMCQA - Test Set (Acc-%)

Lane Detection

CULane - F1 score

Long-tail Learning

CIFAR-10-LT (ρ=10) - Error Rate

Long-tail Learning

CIFAR-10-LT (ρ=100) - Error Rate

Visual Question Answering

ViP-Bench - GPT-4 score (human)

Video Reconstruction

MVSEC - Mean Squared Error

Video Reconstruction

MVSEC - LPIPS

Grayscale Image Denoising

Set12 sigma50 - PSNR

Grayscale Image Denoising

Set12 sigma25 - PSNR

Atari Games

Atari-57 - Human World Record Breakthrough

3D Object Detection

waymo vehicle - APH/L2

Long-tail Learning

VOC-MLT - Average mAP

Long-tail Learning

COCO-MLT - Average mAP

Facial Expression Recognition (FER)

SFEW - Accuracy

Gesture Generation

TED Gesture Dataset - FGD

Semi-supervised Medical Image Segmentation

ACDC 10% labeled data - Dice (Average)

Semi-supervised Medical Image Segmentation

ACDC 20% labeled data - Dice (Average)

Semi-supervised Medical Image Segmentation

ACDC 5% labeled data - Dice (Average)

Skeleton Based Action Recognition

SHREC 2017 track on 3D Hand Gesture Recognition - 28 gestures accuracy

Skeleton Based Action Recognition

SHREC 2017 track on 3D Hand Gesture Recognition - 14 gestures accuracy

Co-Salient Object Detection

CoSal2015 - MAE

Co-Salient Object Detection

CoSal2015 - S-measure

Co-Salient Object Detection

CoSal2015 - max F-measure

Co-Salient Object Detection

CoSal2015 - max E-measure

Co-Salient Object Detection

CoSal2015 - mean E-measure

Co-Salient Object Detection

CoSal2015 - mean F-measure

Co-Salient Object Detection

CoSOD3k - S-measure

Co-Salient Object Detection

CoSOD3k - max F-measure

Co-Salient Object Detection

CoSOD3k - mean E-measure

Co-Salient Object Detection

CoSOD3k - mean F-measure

Dialog Relation Extraction

DialogRE - F1 (v2)

Rgb-T Tracking

RGBT234 - Precision

Rgb-T Tracking

LasHeR - Precision

Rgb-T Tracking

LasHeR - Success

Image Relighting

Stanford-ORB - HDR-PSNR

Image Relighting

Stanford-ORB - SSIM

Image Relighting

Stanford-ORB - LPIPS

Video Frame Interpolation

MSU Video Frame Interpolation - VMAF

Video Frame Interpolation

MSU Video Frame Interpolation - LPIPS

Object Detection

GEN1 Detection - mAP

Object Detection

GEN1 Detection - Params

Object Detection

COCO minival - AP75

Object Detection

COCO minival - APS

Object Detection

COCO minival - APM

Object Detection

COCO minival - Params (M)

Lane Detection

CurveLanes - Recall

Efficient ViTs

ImageNet-1K (with DeiT-T) - Top 1 Accuracy

Face Alignment

FaceScape - NME

3D Face Reconstruction

Florence - RMSE Cooperative

Pose Transfer

Deep-Fashion - FID

Object Detection

VisDrone-DET2019 - AP50

Video Retrieval

MSR-VTT - text-to-video R@10

Unsupervised Object Segmentation

SegTrack-v2 - mIoU

Object Detection

India Driving Dataset - [email protected]

Layout-to-Image Generation

LayoutBench-COCO - Position - AP

Layout-to-Image Generation

LayoutBench-COCO - Size - AP

Layout-to-Image Generation

LayoutBench-COCO - Combination - AP

Cross-modal retrieval with noisy correspondence

COCO-Noisy - Image-to-text R@5

Multi-Person Pose forecasting

Expi - common actions split - Average MPJPE (mm) @ 1000 ms

Multi-Person Pose forecasting

Expi - common actions split - Average MPJPE (mm) @ 600 ms

Multi-Person Pose forecasting

Expi - common actions split - Average MPJPE (mm) @ 400 ms

Multi-Person Pose forecasting

Expi - common actions split - Average MPJPE (mm) @ 200 ms

Multi-Person Pose forecasting

Expi - unseen actions split - Average MPJPE (mm) @ 800 ms

Multi-Person Pose forecasting

Expi - unseen actions split - Average MPJPE (mm) @ 600 ms

Multi-Person Pose forecasting

Expi - unseen actions split - Average MPJPE (mm) @ 400 ms

Few-Shot 3D Point Cloud Classification

ModelNet40 5-way (10-shot) - Standard Deviation

Few-Shot 3D Point Cloud Classification

ModelNet40 10-way (10-shot) - Standard Deviation

Camera shot boundary detection

ClipShots - F1 score

Motion Synthesis

Inter-X - MModality

Cloud Removal

SEN12MS-CR-TS - SSIM

Cloud Removal

SEN12MS-CR-TS - SAM

Cloud Removal

SEN12MS-CR - MAE

Cloud Removal

SEN12MS-CR - PSNR

Cloud Removal

SEN12MS-CR - SAM

Video Retrieval

FIVR-200K - mAP (ISVR)

Video Retrieval

FIVR-200K - mAP (DSVR)

Video Retrieval

FIVR-200K - mAP (CSVR)

Motion Synthesis

AIST++ - FID

Temporal Action Proposal Generation

THUMOS' 14 - AR@100

Temporal Action Proposal Generation

THUMOS' 14 - AR@200

Temporal Action Proposal Generation

THUMOS' 14 - AR@50

Motion Synthesis

KIT Motion-Language - FID

Photo to Rest Generalization

PACS - Accuracy

Single-Source Domain Generalization

Digits-five - Accuracy

3D Object Detection

nuScenes - NDS

3D Object Detection

nuScenes - mAP

3D Object Detection

nuScenes - mATE

3D Object Detection

nuScenes - mASE

3D Place Recognition

Oxford RobotCar Dataset - AR@1%

3D Place Recognition

Oxford RobotCar Dataset - AR@1

3D Place Recognition

CS-Campus3D - AR@1%

3D Place Recognition

CS-Campus3D - AR@1

3D Place Recognition

CS-Campus3D - AR@1% cross-source

3D Place Recognition

CS-Campus3D - AR@1 cross-source

Layout-to-Image Generation

Visual Genome 128x128 - FID

Layout-to-Image Generation

Visual Genome 256x256 - FID

Layout-to-Image Generation

COCO-Stuff 256x256 - FID

Multiple Choice Question Answering (MCQA)

BIG-bench (Hyperbaton) - Accuracy

Image Super-Resolution

DIV2K val - 4x upscaling - PSNR

Image Super-Resolution

DIV2K val - 4x upscaling - SSIM

Spatio-Temporal Action Localization

AVA-Kinetics - val mAP

Text-based Image Editing

PIE-Bench - Structure Distance

Video Retrieval

SSv2-template retrieval - text-to-video R@1

Video Retrieval

SSv2-label retrieval - text-to-video R@1

Action Classification

MiT - Top 5 Accuracy

Action Classification

Kinetics-700 - Top-5 Accuracy

Panoptic Scene Graph Generation

PSG Dataset - R@20

3D Human Reconstruction

CustomHumans - f-Score

Few-Shot Class-Incremental Learning

CIFAR-100 - Average Accuracy

Few-Shot Class-Incremental Learning

CIFAR-100 - Last Accuracy

Skeleton Based Action Recognition

Kinetics-Skeleton dataset - Accuracy

Few-Shot Class-Incremental Learning

CUB-200-2011 - Last Accuracy

Few-Shot Class-Incremental Learning

mini-Imagenet - Average Accuracy

Few-Shot Class-Incremental Learning

mini-Imagenet - Last Accuracy

Cross-Modal Person Re-Identification

SYSU-MM01 - mAP (All-search & Single-shot)

Depth Anomaly Detection and Segmentation

MVTEC 3D-AD - Segmentation AUPRO

Depth Anomaly Detection and Segmentation

MVTEC 3D-AD - Detection AUROC

Described Object Detection

Description Detection Dataset - Intra-scenario FULL mAP

Described Object Detection

Description Detection Dataset - Intra-scenario PRES mAP

Described Object Detection

Description Detection Dataset - Intra-scenario ABS mAP

Image-Based Localization

cvusa - Recall@10

Image-Based Localization

cvusa - Recall@1

Image-Based Localization

cvusa - Recall@5

Image-Based Localization

cvusa - Recall@top1%

Image-Based Localization

VIGOR Cross Area - Recall@1

Image-Based Localization

VIGOR Cross Area - Recall@5

Image-Based Localization

VIGOR Cross Area - Recall@10

Image-Based Localization

VIGOR Cross Area - Recall@1%

Image-Based Localization

VIGOR Cross Area - Hit Rate

Image-Based Localization

VIGOR Same Area - Recall@1

Image-Based Localization

VIGOR Same Area - Recall@5

Image-Based Localization

VIGOR Same Area - Recall@10

Image-Based Localization

VIGOR Same Area - Hit Rate

Image-Based Localization

cvact - Recall@1

Image-Based Localization

cvact - Recall@5

Image-Based Localization

cvact - Recall@10

Image-Based Localization

cvact - Recall@1 (%)

Unsupervised Image Classification

CIFAR-20 - Accuracy

3D Face Reconstruction

REALY - all

3D Face Reconstruction

REALY (side-view) - all

Emotion Recognition in Conversation

EmoryNLP - Micro-F1

Human Pose Forecasting

HARPER - Average MPJPE (mm) @ 400ms

Human Pose Forecasting

HARPER - Average MPJPE (mm) @ 1000ms

Human Pose Forecasting

HARPER - Last Frame MPJPE (mm) @ 400ms

Human Pose Forecasting

HARPER - Last Frame MPJPE (mm) @ 1000ms

Unsupervised Video Object Segmentation

YouTube-Objects - J

Information Threading

NewSHead - NMI

Image Super-Resolution

Urban100 - 3x upscaling - SSIM

Unsupervised Video Object Segmentation

DAVIS 2016 val - G

Unsupervised Video Object Segmentation

DAVIS 2016 val - J

Semi-Supervised Video Object Segmentation

DAVIS 2016 - Speed (FPS)

Video Prediction

Moving MNIST - MAE

Vehicle Re-Identification

VehicleID Large - Rank-5

Low-Light Image Enhancement

DICM - User Study Score

Low-Light Image Enhancement

MEF - User Study Score

Low-Light Image Enhancement

LOLv2 - SSIM

Low-Light Image Enhancement

VV - User Study Score

Visual Object Tracking

TrackingNet - Precision

Multi-Label Image Classification

BigEarthNet - mAP (micro)

Low-light Image Deblurring and Enhancement

LOL-Blur - SSIM

Low-light Image Deblurring and Enhancement

LOL-Blur - Average PSNR

Sequential Image Classification

Sequential CIFAR-10 - Unpermuted Accuracy

Pedestrian Attribute Recognition

PA-100K - Accuracy

Photo geolocation estimation

GWS15k - Street level (1 km)

Blocking

Abt-Buy - Recall

Blocking

Amazon-Google - Recall

Blocking

Amazon-Google - Candidate Set Size

Single Image Deraining

Test2800 - PSNR

Color Image Denoising

CBSD68 sigma25 - PSNR

Grayscale Image Denoising

BSD68 sigma25 - PSNR

Color Image Denoising

McMaster sigma50 - PSNR

Grayscale Image Denoising

Set12 sigma15 - PSNR

Video Panoptic Segmentation

KITTI-STEP - STQ

Video Panoptic Segmentation

KITTI-STEP - AQ

Video Panoptic Segmentation

KITTI-STEP - SQ

Thermal Image Segmentation

Noisy RS RGB-T Dataset - mIoU

Motion Synthesis

Inter-X - FID

Motion Synthesis

Inter-X - R-Precision Top3

Video Frame Interpolation

Xiph-2K - SSIM

Video Frame Interpolation

Xiph-4k - PSNR

Video Frame Interpolation

MSU Video Frame Interpolation - PSNR

Video Frame Interpolation

MSU Video Frame Interpolation - SSIM

Video Frame Interpolation

MSU Video Frame Interpolation - MS-SSIM

Crack Segmentation

khanhha's dataset - 4x upscaling (blind) - IoU_max

Crack Segmentation

khanhha's dataset - 4x upscaling (blind) - Average IOU

Crack Segmentation

khanhha's dataset - 4x upscaling (blind) - AHD95

Crack Segmentation

khanhha's dataset - 4x upscaling (blind) - HD95_min

Causal Emotion Entailment

RECCON - Pos. F1

Causal Emotion Entailment

RECCON - Neg. F1

Hyperspectral Image Classification

Kennedy Space Center - OA@15perclass

Hyperspectral Image Classification

Indian Pines - OA@15perclass

Cross-modal retrieval with noisy correspondence

Flickr30K-Noisy - Image-to-text R@10

Cross-modal retrieval with noisy correspondence

Flickr30K-Noisy - Text-to-image R@5

Cross-modal retrieval with noisy correspondence

Flickr30K-Noisy - Text-to-image R@10

Cross-modal retrieval with noisy correspondence

Flickr30K-Noisy - R-Sum

Cross-Domain Few-Shot

Places - 5 shot

Cross-Domain Few-Shot

cars - 5 shot

Cross-Domain Few-Shot

ISIC2018 - 5 shot

Cross-Domain Few-Shot

CropDisease - 5 shot

Lipreading

LRS3-TED - Word Error Rate (WER)

Aspect-Based Sentiment Analysis (ABSA)

SemEval 2014 Task 4 Laptop - F1

Layout-to-Image Generation

LayoutBench-COCO - Number - AP

Human Pose Forecasting

HumanEva-I - MMADE@2000ms

Hateful Meme Classification

HarMeme - Accuracy

Text-based Image Editing

PIE-Bench - CLIPSIM

Visual Object Tracking

TrackingNet - Accuracy

Zero-Shot Transfer 3D Point Cloud Classification

ModelNet10 - Accuracy (%)

Image-Based Localization

VIGOR Same Area - Recall@1%

Shadow Removal

ISTD - RMSE

Entity Alignment

YAGO-WIKI50K - Hit@1

Entity Alignment

DICEWS-1K - Hit@1

Complex Query Answering

NELL-995 - MRR 3i

Complex Query Answering

NELL-995 - MRR pi

Crowd Counting

ShanghaiTech B - MSE

3D Object Detection

waymo cyclist - APH/L2

3D Object Detection

waymo pedestrian - APH/L2

Motion Synthesis

KIT Motion-Language - Diversity

Dialogue Generation

Persona-Chat - Avg F1

Video Panoptic Segmentation

Cityscapes-VPS - VPQ

Action Recognition

Charades-Ego - mAP

Polyp Segmentation

Kvasir-SEG - mDice

Learning with noisy labels

Clothing1M - Test Accuracy

Monocular Depth Estimation

KITTI Eigen split - RMSE log

Monocular Depth Estimation

KITTI Eigen split - Delta < 1.25

Facial Expression Recognition (FER)

RAF-DB - Avg. Accuracy

Motion Forecasting

Argoverse CVPR 2020 - MR (K=1)

Motion Forecasting

Argoverse CVPR 2020 - minADE (K=6)

Motion Forecasting

Argoverse CVPR 2020 - minFDE (K=6)

Motion Forecasting

Argoverse CVPR 2020 - brier-minFDE (K=6)

Pedestrian Detection

TJU-Ped-traffic - R (miss rate)

Pedestrian Detection

TJU-Ped-traffic - RS (miss rate)

Pedestrian Detection

TJU-Ped-traffic - HO (miss rate)

Pedestrian Detection

Caltech - Reasonable Miss Rate

Pedestrian Detection

Caltech - Heavy MR^-2

Lane Detection

CurveLanes - F1 score

Multi-Hypotheses 3D Human Pose Estimation

AH36M - Best-Hypothesis PMPJPE (n = 25)

Low-Light Image Enhancement

NPE - BRISQUE

Generalized Few-Shot Semantic Segmentation

COCO-20i (1-shot) - Mean IoU

Generalized Few-Shot Semantic Segmentation

COCO-20i (1-shot) - Mean Base and Novel

Generalized Few-Shot Semantic Segmentation

PASCAL-5i (5-Shot) - Mean IoU

Generalized Few-Shot Semantic Segmentation

PASCAL-5i (5-Shot) - Mean Base and Novel

Generalized Few-Shot Semantic Segmentation

COCO-20i (5-shot) - Mean IoU

Generalized Few-Shot Semantic Segmentation

COCO-20i (5-shot) - Mean Base and Novel

Generalized Few-Shot Semantic Segmentation

PASCAL-5i (1-Shot) - Mean IoU

Generalized Few-Shot Semantic Segmentation

PASCAL-5i (1-Shot) - Mean Base and Novel

Explanatory Visual Question Answering

GQA-REX - BLEU-4

Explanatory Visual Question Answering

GQA-REX - METEOR

Explanatory Visual Question Answering

GQA-REX - ROUGE-L

Explanatory Visual Question Answering

GQA-REX - CIDEr

Explanatory Visual Question Answering

GQA-REX - SPICE

Explanatory Visual Question Answering

GQA-REX - Grounding

Explanatory Visual Question Answering

GQA-REX - GQA-val

Explanatory Visual Question Answering

GQA-REX - GQA-test

Saliency Prediction

SALICON - NSS

Action Segmentation

50 Salads - F1@10%

Action Segmentation

50 Salads - Acc

Action Segmentation

50 Salads - F1@25%

Action Segmentation

50 Salads - F1@50%

hand-object pose

DexYCB - OCE

hand-object pose

HO-3D v2 - PA-MPJPE

Image Denoising

SID SonyA7S2 x250 - PSNR (Raw)

Image Denoising

SID x100 - PSNR (Raw)

Image Denoising

SID x300 - PSNR (Raw)

Image Denoising

ELD SonyA7S2 x100 - PSNR (Raw)

Image Denoising

ELD SonyA7S2 x100 - SSIM (Raw)

Image Denoising

ELD SonyA7S2 x200 - PSNR (Raw)

Image Denoising

ELD SonyA7S2 x200 - SSIM (Raw)

Reflection Removal

Real20 - SSIM

Video Retrieval

MSVD - video-to-text Mean Rank

Video Retrieval

VATEX - video-to-text R@1

Object Detection

VEDAI - mAP50

Video Retrieval

SSv2-label retrieval - text-to-video R@10

Entity Alignment

dbp15k ja-en - Hits@1

Entity Alignment

DBP15k zh-en - Hits@1

Entity Alignment

dbp15k fr-en - Hits@1

Long-tail Learning

CIFAR-10-LT (ρ=50) - Error Rate

Long-tail Learning

iNaturalist 2018 - Top-1 Accuracy

Multiple Choice Question Answering (MCQA)

MedMCQA - Dev Set (Acc-%)

Video Reconstruction

UVG - Average PSNR (dB)

Zero-shot Relation Triplet Extraction

FewRel - Avg. F1

Lesion Segmentation

ISIC 2018 - Mean IoU

Image Manipulation Detection

Columbia - AUC

Image Manipulation Detection

Columbia - Balanced Accuracy

Chart Question Answering

PlotQA - 1:1 Accuracy

Complex Query Answering

FB15k-237 - MRR 1p

Complex Query Answering

FB15k-237 - MRR 2p

Complex Query Answering

FB15k-237 - MRR 3p

Complex Query Answering

FB15k-237 - MRR 2i

Complex Query Answering

FB15k-237 - MRR 3i

Complex Query Answering

FB15k-237 - MRR pi

Complex Query Answering

FB15k-237 - MRR ip

Complex Query Answering

FB15k-237 - MRR 2u

Complex Query Answering

FB15k-237 - MRR up

Complex Query Answering

FB15k - MRR 1p

Complex Query Answering

FB15k - MRR 3p

Complex Query Answering

FB15k - MRR 2i

Complex Query Answering

FB15k - MRR 3i

Complex Query Answering

FB15k - MRR pi

Complex Query Answering

FB15k - MRR ip

Complex Query Answering

FB15k - MRR 2u

Complex Query Answering

FB15k - MRR up

Complex Query Answering

NELL-995 - MRR 1p

Complex Query Answering

NELL-995 - MRR 2p

Complex Query Answering

NELL-995 - MRR 3p

Complex Query Answering

NELL-995 - MRR ip

Complex Query Answering

NELL-995 - MRR 2u

Complex Query Answering

NELL-995 - MRR up

Audio Classification

Balanced Audio Set - Mean AP

Image Segmentation

Pascal Panoptic Parts - mIoUPartS

Video Generation

Sky Time-lapse - FVD 16

Semi-Supervised Video Object Segmentation

DAVIS 2016 - Jaccard (Mean)

Semi-Supervised Video Object Segmentation

DAVIS 2016 - J&F

Video Retrieval

MSR-VTT - video-to-text R@1

Video Retrieval

SSv2-template retrieval - text-to-video R@5

Video Retrieval

SSv2-template retrieval - text-to-video R@10

Shadow Removal

Adjusted ISTD - RMSE

Cross-Modal Retrieval

Recipe1M - Image-to-text R@1

Cross-Modal Retrieval

Recipe1M - Text-to-image R@1

Motion Synthesis

Motion-X - Diversity

Motion Synthesis

KIT Motion-Language - Multimodality

Animal Pose Estimation

AP-10K - AP

Change detection for remote sensing images

CDD Dataset (season-varying) - F1-Score

Multi-Object Tracking

DanceTrack - DetA

Multi-Object Tracking

DanceTrack - MOTA

Multiple Object Tracking

BDD100K test - mIDF1

Multiple Object Tracking

BDD100K test - mHOTA

Zero-Shot Video Question Answer

STAR Benchmark - Accuracy

Multi-modal Named Entity Recognition

SNAP (MNER) - F1

Change detection for remote sensing images

CDD Dataset (season-varying) - IoU

3D Object Detection

nuScenes - mAVE

Text Classification

R8 - Accuracy

Domain Generalization

VLCS - Average Accuracy

SMAC

SMAC 6h_vs_8z - Median Win Rate

SMAC

SMAC 3s5z_vs_3s6z - Median Win Rate

SMAC

SMAC MMM2 - Median Win Rate

SMAC

SMAC corridor - Median Win Rate

Point Cloud Registration

FPv1 - Recall (3cm, 10 degrees)

Point Cloud Registration

FPv1 - RRE (degrees)

Point Cloud Registration

FPv1 - RTE (cm)

Time Series Forecasting

Electricity (192) - MSE

Time Series Forecasting

Electricity (96) - MSE

Time Series Forecasting

ETTh1 (96) Univariate - MSE

Time Series Forecasting

ETTh1 (96) Univariate - MAE

Time Series Forecasting

ETTh2 (96) Univariate - MAE

Time Series Forecasting

ETTh2 (720) Multivariate - MSE

Time Series Forecasting

ETTh2 (720) Multivariate - MAE

Unsupervised Semantic Segmentation

COCO-Stuff-3 - Pixel Accuracy

Action Anticipation

EPIC-KITCHENS-100 (test) - recall@5

Human Pose Forecasting

AMASS - FDE

Vehicle Re-Identification

VehicleID Small - Rank-5

Image Super-Resolution

DIV2K val - 4x upscaling - LPIPS

Layout-to-Image Generation

LayoutBench - AP

Video Prediction

BAIR Robot Pushing - FVD

Unsupervised Person Re-Identification

DukeMTMC-reID - Rank-1

Unsupervised Person Re-Identification

DukeMTMC-reID - Rank-5

Unsupervised Person Re-Identification

DukeMTMC-reID - MAP

Video Prediction

Moving MNIST - MSE

Object Detection

COCO minival - box AP

Cross-Modal Retrieval

Flickr30k - Image-to-text R@1

Cross-Modal Retrieval

COCO 2014 - Image-to-text R@10

Open-World Semi-Supervised Learning

CIFAR-10 - Novel accuracy (50% Labeled)

Open-World Semi-Supervised Learning

CIFAR-10 - All accuracy (50% Labeled)

Video Retrieval

MSR-VTT-1kA - text-to-video Mean Rank

Video Retrieval

MSR-VTT-1kA - video-to-text Mean Rank

Open-World Semi-Supervised Learning

ImageNet-100 - Seen accuracy (50% Labeled)

Open-World Semi-Supervised Learning

ImageNet-100 - All accuracy (50% Labeled)

Efficient ViTs

ImageNet-1K (with DeiT-S) - Top 1 Accuracy

Efficient ViTs

ImageNet-1K (with DeiT-S) - GFLOPs

Zero-Shot Transfer 3D Point Cloud Classification

ScanObjectNN - PB_T50_RS Accuracy (%)

Zero-Shot Transfer 3D Point Cloud Classification

ScanObjectNN - OBJ_BG Accuracy(%)

Motion Synthesis

AIST++ - Beat alignment score

Object Detection

Visual Genome - MAP

Text to 3D

T$^3$Bench - Avg

Image Harmonization

HAdobe5k(1024$\times$1024) - PSNR

Image Harmonization

HAdobe5k(1024$\times$1024) - SSIM

Image Harmonization

iHarmony4 - PSNR

TDC ADMET Benchmarking Group

tdcommons - TDC.BBB_Martins

Molecular Property Prediction

HIV dataset - AUC

Molecular Property Prediction

ClinTox - ROC-AUC

Molecular Property Prediction

SIDER - ROC-AUC

Molecular Property Prediction

Tox21 - ROC-AUC

Object Detection

COCO minival - APL

Monocular 3D Object Detection

KITTI Pedestrian Easy - AP Easy

Object Detection

COCO-O - Average mAP

Object Detection

COCO-O - Effective Robustness

Polyp Segmentation

Kvasir-SEG - mIoU

Visual Object Tracking

UAV123 - Precision

Session-Based Recommendations

yoochoose1/4 - MRR@20

Neural Architecture Search

NAS-Bench-201, ImageNet-16-120 - Accuracy (Test)

Panoptic Segmentation

Mapillary val - PQ

Panoptic Segmentation

Mapillary val - mIoU

Panoptic Segmentation

Mapillary val - PQst

Object Detection

CrowdHuman (full body) - AP

Video Quality Assessment

KoNViD-1k - PLCC

Video Quality Assessment

YouTube-UGC - PLCC

Multimodal Machine Translation

Multi30K - BLEU (EN-DE)

Multimodal Emotion Recognition

IEMOCAP - Unweighted Accuracy (UA)

Image Inpainting

FFHQ 512 x 512 - FID

Few-Shot Image Classification

ImageNet-FS (1-shot, novel) - Top-5 Accuracy (%)

Few-Shot Image Classification

ImageNet-FS (5-shot, all) - Top-5 Accuracy (%)

Few-Shot Image Classification

ImageNet-FS (2-shot, novel) - Top-5 Accuracy (%)

Few-Shot Image Classification

ImageNet-FS (5-shot, novel) - Top-5 Accuracy (%)

Participant Intervention Comparison Outcome Extraction

EBM-NLP - F1

Citation Intent Classification

ACL-ARC - F1

Chinese Word Segmentation

MSR - F1

Monocular Depth Estimation

KITTI Eigen split unsupervised - Delta < 1.25^3

Video Frame Interpolation

X4K1000FPS - PSNR

Video Frame Interpolation

X4K1000FPS - SSIM

Video Frame Interpolation

SNU-FILM (medium) - SSIM

3D Semantic Segmentation

SensatUrban - mIoU

Cross-Lingual NER

CoNLL Spanish - F1

Motion Synthesis

HumanML3D - Diversity

Math Word Problem Solving

Math23K - Accuracy (5-fold)

Math Word Problem Solving

Math23K - Accuracy (training-test)

Short Text Clustering

Biomedical - Acc

Cross-Lingual Question Answering

TyDiQA-GoldP - F1

Document Classification

HOC - Micro F1

Text Classification

WeeBit (Readability Assessment) - Accuracy (5-fold)

Text-to-Image Generation

Multi-Modal-CelebA-HQ - FID

Text-to-Image Generation

CUB - Inception score

Semi-Supervised Video Object Segmentation

YouTube-VOS 2019 - Jaccard (Unseen)

Semi-Supervised Video Object Segmentation

DAVIS 2016 - F-measure (Mean)

Semi-Supervised Video Object Segmentation

VOT2020 - EAO (real-time)

Math Word Problem Solving

MathQA - Answer Accuracy

Semi-Supervised Semantic Segmentation

Pascal VOC 2012 6.25% labeled - Validation mIoU

Video Emotion Recognition

Ekman6 - Accuracy

Few-Shot Image Classification

Caltech-256 5-way (1-shot) - Accuracy

Face Alignment

COFW-68 - NME (inter-ocular)

Face Alignment

300W Split 2 - NME (inter-ocular)

Face Alignment

300W Split 2 - AUC@8 (inter-ocular)

Face Alignment

300W Split 2 - FR@8 (inter-ocular)

Face Alignment

300W Split 2 - NME (box)

Face Alignment

300W Split 2 - AUC@7 (box)

Few-Shot Semantic Segmentation

COCO-20i -> Pascal VOC (5-shot) - Mean IoU

Open-Domain Question Answering

ELI5 - Rouge-2

Overlapped 5-3

PASCAL VOC 2012 - Mean IoU (test)

D4RL

D4RL - Average Reward

3D Lane Detection

Apollo Synthetic 3D Lane - X error near

3D Lane Detection

Apollo Synthetic 3D Lane - X error far

Human Pose Forecasting

HumanEva-I - APD@2000ms

Human Pose Forecasting

HumanEva-I - ADE@2000ms

Human Pose Forecasting

HumanEva-I - FDE@2000ms

Multimodal Machine Translation

Multi30K - Meteor (EN-DE)

Multimodal Machine Translation

Multi30K - Meteor (EN-FR)

Image-guided Story Ending Generation

VIST-E - BLEU-1

Image-guided Story Ending Generation

VIST-E - BLEU-2

Image-guided Story Ending Generation

VIST-E - METEOR

Image-guided Story Ending Generation

VIST-E - CIDEr

Image-guided Story Ending Generation

VIST-E - ROUGE-L

Image-guided Story Ending Generation

LSMDC-E - BLEU-1

Image-guided Story Ending Generation

LSMDC-E - BLEU-3

Image-guided Story Ending Generation

LSMDC-E - BLEU-4

Image-guided Story Ending Generation

LSMDC-E - METEOR

Image-guided Story Ending Generation

LSMDC-E - CIDEr

3D Object Detection

SUN-RGBD - [email protected]

Image Dehazing

Dense-Haze - SSIM

3D Instance Segmentation

STPLS3D - AP50

3D Instance Segmentation

STPLS3D - AP25

3D Instance Segmentation

STPLS3D - AP

Temporal Action Proposal Generation

ActivityNet-1.3 - AUC (val)

Temporal Action Proposal Generation

ActivityNet-1.3 - AR@100

Temporal Action Proposal Generation

ActivityNet-1.3 - AUC (test)

Temporal Action Proposal Generation

THUMOS' 14 - AR@1000

Photoplethysmography (PPG) heart rate estimation

MMSE-HR - MAE

Photoplethysmography (PPG) heart rate estimation

MMSE-HR - RMSE

Photoplethysmography (PPG) heart rate estimation

MMSE-HR - Pearson Correlation

6D Pose Estimation using RGBD

REAL275 - mAP 3DIou@75

Long-tail Learning

CIFAR-100-LT (ρ=100) - Error Rate

Few-Shot Image Classification

ORBIT Clutter Video Evaluation - Frame accuracy

Cross-Modal Retrieval

Flickr30k - Text-to-image R@1

Cross-Modal Retrieval

Flickr30k - Text-to-image R@10

Cross-Modal Retrieval

Flickr30k - Text-to-image R@5

3D Object Detection

V2X-SIM - mATE

3D Object Detection

V2X-SIM - mASE

3D Object Detection

V2X-SIM - mAOE

Audio Super-Resolution

VCTK Multi-Speaker - Log-Spectral Distance

Audio Classification

VGGSound - Top 5 Accuracy

Multi-modal Classification

VGG-Sound - Top-1 Accuracy

Video Deinterlacing

MSU Deinterlacer Benchmark - FPS on CPU

Long-range modeling

LRA - ListOps

Long-range modeling

LRA - Avg

Sleep Stage Detection

Sleep-EDF - Macro-F1

Sleep Stage Detection

Sleep-EDF - Cohen's kappa

Sleep Stage Detection

SHHS - Cohen's Kappa

Sleep Stage Detection

SHHS - Macro-F1

RGB Salient Object Detection

ECSSD - S-Measure

Point Cloud Completion

ShapeNet-ViPC - Chamfer Distance

RGB Salient Object Detection

DAVIS-S - mBA

RGB Salient Object Detection

HRSOD - mBA

RGB Salient Object Detection

HKU-IS - F-measure

RGB Salient Object Detection

HKU-IS - S-Measure

Dichotomous Image Segmentation

DIS-TE2 - HCE

Dichotomous Image Segmentation

DIS-TE3 - HCE

Dichotomous Image Segmentation

DIS-TE4 - HCE

Dichotomous Image Segmentation

DIS-TE4 - S-Measure

Dichotomous Image Segmentation

DIS-VD - HCE

Few-Shot Image Classification

Dirichlet Mini-Imagenet (5-way, 1-shot) - 1:1 Accuracy

Few-Shot Image Classification

Dirichlet Tiered-Imagenet (5-way, 1-shot) - 1:1 Accuracy

Few-Shot Image Classification

FC100 5-way (5-shot) - Accuracy

Few-Shot Image Classification

Dirichlet CUB-200 (5-way, 1-shot) - 1:1 Accuracy

Few-Shot Image Classification

Dirichlet CUB-200 (5-way, 5-shot) - 1:1 Accuracy

Few-Shot Image Classification

FC100 5-way (1-shot) - Accuracy

Few-Shot Image Classification

Dirichlet Mini-Imagenet (5-way, 5-shot) - 1:1 Accuracy

Building change detection for remote sensing images

LEVIR-CD - IoU

Domain Generalization

Stylized-ImageNet - Top 1 Accuracy

Action Segmentation

GTEA - F1@10%

Action Segmentation

GTEA - F1@50%

Action Segmentation

GTEA - Acc

Action Segmentation

GTEA - F1@25%

Action Segmentation

50 Salads - Edit

Action Anticipation

EPIC-KITCHENS-55 (Unseen test set (S2) - Top 1 Accuracy - Verb

Action Anticipation

EPIC-KITCHENS-55 (Unseen test set (S2) - Top 1 Accuracy - Noun

Action Anticipation

EPIC-KITCHENS-55 (Unseen test set (S2) - Top 1 Accuracy - Act.

Action Anticipation

EPIC-KITCHENS-55 (Unseen test set (S2) - Top 5 Accuracy - Verb

Action Anticipation

EPIC-KITCHENS-55 (Unseen test set (S2) - Top 5 Accuracy - Noun

Action Anticipation

EPIC-KITCHENS-55 (Seen test set (S1)) - Top 1 Accuracy - Noun

Action Anticipation

EPIC-KITCHENS-55 (Seen test set (S1)) - Top 1 Accuracy - Act.

Action Anticipation

EPIC-KITCHENS-55 (Seen test set (S1)) - Top 5 Accuracy - Verb

Action Anticipation

EPIC-KITCHENS-55 (Seen test set (S1)) - Top 5 Accuracy - Noun

Action Anticipation

EPIC-KITCHENS-55 (Seen test set (S1)) - Top 5 Accuracy - Act.

Multimodal Intent Recognition

MIntRec - Accuracy (20 classes)

Molecular Property Prediction

ToxCast - ROC-AUC

Molecular Property Prediction

MUV - ROC-AUC

Molecular Property Prediction

ESOL - RMSE

Molecular Property Prediction

Lipophilicity - RMSE

Molecular Property Prediction

QM8 - MAE

Molecular Property Prediction

QM7 - MAE

Pedestrian Detection

DVTOD - mAP

RGB Salient Object Detection

PASCAL-S - S-Measure

Few-Shot Object Detection

MS-COCO (1-shot) - AP

Molecular Property Prediction

ClinTox - Molecules (M)

Nested Named Entity Recognition

ACE 2005 - F1

Dialog Relation Extraction

DialogRE - F1 (v1)

Dialog Relation Extraction

DialogRE - F1c (v1)

Video Denoising

Set8 sigma10 - PSNR

Video Denoising

DAVIS sigma20 - PSNR

Video Denoising

Set8 sigma50 - PSNR

Video Denoising

DAVIS sigma30 - PSNR

Video Denoising

Set8 sigma20 - PSNR

Video Denoising

DAVIS sigma40 - PSNR

Video Denoising

Set8 sigma40 - PSNR

Video Denoising

DAVIS sigma10 - PSNR

Video Denoising

Set8 sigma30 - PSNR

Video Denoising

DAVIS sigma50 - PSNR

Stereo Image Super-Resolution

Middlebury - 2x upscaling - PSNR

Stereo Image Super-Resolution

Middlebury - 4x upscaling - PSNR

Stereo Image Super-Resolution

KITTI2012 - 4x upscaling - PSNR

Stereo Image Super-Resolution

Flickr1024 - 2x upscaling - PSNR

Stereo Image Super-Resolution

KITTI2015 - 2x upscaling - PSNR

Stereo Image Super-Resolution

Flickr1024 - 4x upscaling - PSNR

Drivable Area Detection

BDD100K val - Params (M)

Cross-Modal Retrieval

COCO 2014 - Image-to-text R@1

Cross-Modal Retrieval

COCO 2014 - Image-to-text R@5

Cross-Modal Retrieval

COCO 2014 - Text-to-image R@5

Few-Shot Image Classification

Mini-ImageNet-CUB 5-way (1-shot) - Accuracy

Few-Shot Image Classification

Mini-ImageNet-CUB 5-way (5-shot) - Accuracy

Earth Surface Forecasting

EarthNet2021 Extreme Track - EarthNetScore

Visual Storytelling

VIST - BLEU-3

Semi-Supervised Image Classification

ImageNet - 1% labeled data - Top 5 Accuracy

Anomaly Detection In Surveillance Videos

XD-Violence - AP

3D Instance Segmentation

PartNet - mAP50

Few-Shot Semantic Segmentation

FSS-1000 (1-shot) - Mean IoU

Few-Shot Semantic Segmentation

FSS-1000 (5-shot) - Mean IoU

Multi-Object Tracking

TAO - ClsA

Video Quality Assessment

MSU SR-QA Dataset - KLCC

Retinal Vessel Segmentation

ROSE-1 SVC - Dice Score

Retinal Vessel Segmentation

ROSE-2 - Dice Score

Retinal Vessel Segmentation

ROSE-1 SVC-DVC - Dice Score

Learning with noisy labels

ANIMAL - Accuracy

New Product Sales Forecasting

VISUELLE - MAE

New Product Sales Forecasting

VISUELLE - WAPE

Low-Light Image Enhancement

LOL - SSIM

3D Object Detection From Monocular Images

Waymo Open Dataset - 3D mAPH Vehicle (Front Camera Only)

Category-Agnostic Pose Estimation

MP100 - Mean [email protected] - 1shot

Learning with noisy labels

CIFAR-10N-Random1 - Accuracy (mean)

Learning with noisy labels

CIFAR-10N-Aggregate - Accuracy (mean)

Learning with noisy labels

CIFAR-10N-Worst - Accuracy (mean)

Garment Reconstruction

4D-DRESS - Chamfer (cm)

Garment Reconstruction

4D-DRESS - IOU

Facial Attribute Classification

bFFHQ - Bias-Conflicting Accuracy

Text-to-Image Generation

LHQC - Block-FID

3D Room Layouts From A Single RGB Panorama

PanoContext - 3DIoU

3D Room Layouts From A Single RGB Panorama

Stanford2D3D Panoramic - 3DIoU

3D Room Layouts From A Single RGB Panorama

Stanford2D3D Panoramic - Pixel Error

Facial Attribute Classification

LFWA - Error Rate

Monocular 3D Object Detection

KITTI Cars Hard - AP Hard

Monocular 3D Object Detection

KITTI Cars Easy - AP Easy

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - FPS

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D17 test (G)

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D17 test (J)

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D17 test (F)

Point-interactive Image Colorization

ImageNet ctest10k - PSNR@10

Point-interactive Image Colorization

ImageNet ctest10k - PSNR@1

Image Denoising

SID SonyA7S2 x100 - PSNR (Raw)

Age Estimation

ChaLearn 2016 - e-error

Age Estimation

ChaLearn 2015 - e-error

Age Estimation

ChaLearn 2015 - MAE

Earth Surface Forecasting

EarthNet2021 OOD Track - EarthNetScore

Earth Surface Forecasting

EarthNet2021 IID Track - EarthNetScore

Group Activity Recognition

Collective Activity - Accuracy

Audio Classification

Speech Commands - Accuracy

3D Reconstruction

ShapeNet - Chamfer Distance

Retinal Vessel Segmentation

DRIVE - sensitivity

Blocking

Abt-Buy - Candidate Set Size

Open Vocabulary Attribute Detection

OVAD benchmark - mean average precision

Heterogeneous Node Classification

OAG-Venue - NDCG

Heterogeneous Node Classification

OAG-Venue - MRR

Graph Classification

Mutagenicity - Accuracy

Retinal Vessel Segmentation

CHASE_DB1 - Sensitivity

Retinal Vessel Segmentation

DRIVE - AUC

Video Anomaly Detection

HR-UBnormal - AUC

Multi-label zero-shot learning

Open Images V4 - MAP

Multi-label zero-shot learning

NUS-WIDE - mAP

Network Intrusion Detection

CICIDS2017 - Avg F1

Network Intrusion Detection

CICIDS2017 - Precision

Face Anti-Spoofing

SiW (Protocol 3) - ACER

Object Detection In Indoor Scenes

SUN RGB-D - AP 0.5

Semi-Supervised Image Classification

STL-10, 1000 Labels - Accuracy

Surgical tool detection

Cholec80 - mAP

Semi-Supervised Semantic Segmentation

ScribbleKITTI - mIoU (20% Labels)

Semi-Supervised Semantic Segmentation

ScribbleKITTI - mIoU (50% Labels)

Spoken Language Understanding

Snips-SmartLights - Accuracy (%)

Spoken Language Understanding

Fluent Speech Commands - Accuracy (%)

Video Quality Assessment

LIVE-ETRI - SRCC

Math Word Problem Solving

MATH - Parameters (Billions)

Edge Detection

BRIND - ODS

Text Classification

arXiv-10 - Accuracy

Conditional Image Generation

ImageNet 256x256 - FID

Text-to-Image Generation

MS COCO - FID

3D Lane Detection

Apollo Synthetic 3D Lane - Z error near

Video Frame Interpolation

SNU-FILM (hard) - PSNR

Video Frame Interpolation

SNU-FILM (hard) - SSIM

Video Frame Interpolation

SNU-FILM (extreme) - PSNR

Video Frame Interpolation

SNU-FILM (extreme) - SSIM

Object Detection

UA-DETRAC - mAP

Zero-Shot Video Question Answer

TVQA - Accuracy

Emotion Recognition in Conversation

DailyDialog - Macro F1

Unsupervised Semantic Segmentation with Language-image Pre-training

COCO-Stuff-27 - mIoU

Hand Pose Estimation

MSRA Hands - Average 3D Error

Face Anti-Spoofing

Replay-Attack - EER

Face Anti-Spoofing

Replay-Attack - HTER

JPEG Artifact Correction

LIVE1 (Quality 30 Grayscale) - SSIM

Color Image Denoising

CBSD68 sigma75 - PSNR

Long-range modeling

LRA - Pathfinder

Photoplethysmography (PPG) heart rate estimation

UBFC-rPPG - MAE

Photoplethysmography (PPG) heart rate estimation

UBFC-rPPG - RMSE

Photoplethysmography (PPG) heart rate estimation

UBFC-rPPG - Pearson Correlation

Referring Expression Segmentation

PhraseCut - Mean IoU

Referring Image Matting (RefMatte-RW100)

RefMatte - SAD

Referring Image Matting (RefMatte-RW100)

RefMatte - MSE

Referring Image Matting (RefMatte-RW100)

RefMatte - MAD

Referring Image Matting (RefMatte-RW100)

RefMatte - SAD(E)

Referring Image Matting (RefMatte-RW100)

RefMatte - MSE(E)

Referring Image Matting (RefMatte-RW100)

RefMatte - MAD(E)

Speech Synthesis

LibriTTS - MCD

Speech Synthesis

LibriTTS - V/UV F1

Speech Synthesis

LibriTTS - M-STFT

Text-to-Image Generation

Conceptual Captions - FID

Online Action Detection

THUMOS'14 - mAP

3D Semantic Segmentation

OpenTrench3D - mIoU

3D Semantic Segmentation

OpenTrench3D - mAcc

Online Action Detection

TVSeries - mCAP

Monocular Depth Estimation

KITTI Eigen split unsupervised - RMSE

Monocular Depth Estimation

KITTI Eigen split unsupervised - Sq Rel

Document-level Event Extraction

ChFinAnn - F1

Atari Games

atari game - Human World Record Breakthrough

Atari Games

Atari 2600 Phoenix - Score

Atari Games

Atari 2600 Space Invaders - Score

Atari Games

Atari 2600 Pitfall! - Score

Atari Games

Atari 2600 Atlantis - Score

Atari Games

Atari 2600 Gopher - Score

Atari Games

Atari 2600 Breakout - Score

Atari Games

Atari 2600 Road Runner - Score

Atari Games

Atari 2600 Asterix - Score

Atari Games

Atari 2600 Kung-Fu Master - Score

Atari Games

Atari 2600 Ice Hockey - Score

Atari Games

Atari 2600 Krull - Score

Atari Games

Atari 2600 Asteroids - Score

Atari Games

Atari 2600 Seaquest - Score

Atari Games

Atari 2600 James Bond - Score

Atari Games

Atari 2600 Demon Attack - Score

Age Estimation

Adience - Accuracy

Aesthetics Quality Assessment

Image Aesthetics dataset - Accuracy

Aesthetics Quality Assessment

Image Aesthetics dataset - MAE

Cloud Removal

SEN12MS-CR - SSIM

Hyperspectral Image Classification

Pavia University - Overall Accuracy

Video Super-Resolution

Vid4 - 4x upscaling - BD degradation - PSNR

Video Super-Resolution

Vid4 - 4x upscaling - BD degradation - SSIM

Entity Resolution

WDC Watches-xlarge - F1 (%)

Named Entity Recognition In Vietnamese

PhoNER COVID19 - F1 (%)

Unsupervised Facial Landmark Detection

MAFL - NME

Co-Salient Object Detection

CoCA - S-measure

Co-Salient Object Detection

CoCA - max F-measure

Co-Salient Object Detection

CoCA - mean E-measure

Co-Salient Object Detection

CoCA - Mean F-measure

Co-Salient Object Detection

CoCA - max E-measure

Co-Salient Object Detection

CoCA - MAE

Co-Salient Object Detection

CoSOD3k - max E-measure

Co-Salient Object Detection

CoSOD3k - MAE

Video Frame Interpolation

Middlebury - Interpolation Error

Monocular Depth Estimation

KITTI Eigen split - Delta < 1.25^2

Monocular Depth Estimation

KITTI Eigen split - Delta < 1.25^3

Automated Theorem Proving

miniF2F-test - Pass@1

Unsupervised Facial Landmark Detection

MAFL Unaligned - NME

Relation Classification

TACRED - F1

Video Super-Resolution

Vimeo90K - PSNR

Spectral Reconstruction

KAIST - PSNR

Spectral Reconstruction

KAIST - SSIM

Spectral Reconstruction

Real HSI - User Study Score

Spectral Reconstruction

CAVE - PSNR

Spectral Reconstruction

CAVE - SSIM

Speech Dereverberation

WHAMR! - SI-SDR

Speech Dereverberation

WHAMR! - PESQ

Complex Query Answering

FB15k - MRR 2p

Dialogue State Tracking

CoSQL - question match accuracy

Dialogue State Tracking

CoSQL - interaction match accuracy

Scene Recognition

AID - Accuracy

Domain Generalization

ImageNet-Sketch - Top-1 accuracy

Hate Speech Detection

Ethos Binary - F1-score

Multi-Object Tracking

HiEve - IDF1

Causal Emotion Entailment

RECCON - Macro F1

Spoken Language Understanding

Spoken-SQuAD - F1 score

Action Recognition

RareAct - mWAP

Grayscale Image Denoising

BSD68 sigma15 - PSNR

Color Image Denoising

CBSD68 sigma35 - PSNR

Color Image Denoising

CBSD68 sigma15 - PSNR

Grayscale Image Denoising

BSD68 sigma50 - PSNR

Few Shot Action Recognition

Something-Something-100 - 1:1 Accuracy

Aspect Sentiment Triplet Extraction

ASTE-Data-V2 - F1

Motion Synthesis

InterHuman - MModality

Motion Synthesis

Inter-X - MMDist

Dynamic Link Prediction

Enron Emails - AP

Few-Shot Image Classification

Dirichlet Tiered-Imagenet (5-way, 5-shot) - 1:1 Accuracy

Text-to-Image Generation

Oxford 102 Flowers - Inception score

Stereo Image Super-Resolution

KITTI2015 - 4x upscaling - PSNR

Burst Image Super-Resolution

BurstSR - LPIPS

Burst Image Super-Resolution

SyntheticBurst - PSNR

Burst Image Super-Resolution

SyntheticBurst - SSIM

Burst Image Super-Resolution

SyntheticBurst - LPIPS

Spectral Reconstruction

ARAD-1K - PSNR

Spectral Reconstruction

ARAD-1K - MRAE

Spectral Reconstruction

ARAD-1K - RMSE

Cross-Lingual Natural Language Inference

XNLI - Accuracy

Multiview Gait Recognition

CASIA-B - Accuracy (Cross-View, Avg)

Multiview Gait Recognition

CASIA-B - NM#5-6

Multiview Gait Recognition

CASIA-B - BG#1-2

Multiview Gait Recognition

CASIA-B - CL#1-2

KG-to-Text Generation

WebNLG 2.0 (Unconstrained) - BLEU

KG-to-Text Generation

WebNLG 2.0 (Unconstrained) - ROUGE

Motion Forecasting

Argoverse CVPR 2020 - minADE (K=1)

Motion Forecasting

Argoverse CVPR 2020 - minFDE (K=1)

Thermal Image Segmentation

RGB-T-Glass-Segmentation - MAE

Action Triplet Recognition

CholecT50 (Challenge) - mAP

Video Panoptic Segmentation

Cityscapes-VPS - VPQ (thing)

Image Super-Resolution

IXI - SSIM for 2x T2w

Image Super-Resolution

IXI - PSNR 2x T2w

Image Super-Resolution

IXI - PSNR 4x T2w

Image Dehazing

RS-Haze - PSNR

Image Dehazing

RS-Haze - SSIM

Image Dehazing

RESIDE-6K - PSNR

Image Dehazing

RESIDE-6K - SSIM

Image Denoising

SID SonyA7S2 x250 - SSIM (Raw)

Image Denoising

SID x100 - SSIM

Image Denoising

SID x300 - SSIM

Video Generation

UCF-101 16 frames, 64x64, Unconditional - Inception Score

Video Generation

UCF-101 16 frames, 64x64, Unconditional - FID

Video Retrieval

MSVD - text-to-video R@10

Video Retrieval

MSVD - text-to-video Mean Rank

Video Retrieval

MSVD - video-to-text R@5

Video Retrieval

LSMDC - text-to-video R@5

Video Retrieval

LSMDC - text-to-video R@10

Video Retrieval

LSMDC - text-to-video Median Rank

Video Retrieval

LSMDC - video-to-text R@5

Video Retrieval

LSMDC - video-to-text R@10

Video Retrieval

LSMDC - video-to-text Median Rank

Video Retrieval

LSMDC - text-to-video Mean Rank

Video Retrieval

LSMDC - video-to-text Mean Rank

Video Retrieval

MSR-VTT-1kA - text-to-video R@1

Video Retrieval

MSR-VTT-1kA - text-to-video R@5

Video Retrieval

MSR-VTT-1kA - text-to-video R@10

Video Retrieval

MSR-VTT-1kA - video-to-text R@1

Video Retrieval

MSR-VTT-1kA - video-to-text R@5

Video Retrieval

MSR-VTT-1kA - video-to-text R@10

3D Face Reconstruction

Florence - RMSE Indoor

3D Face Reconstruction

Florence - RMSE Outdoor

3D Face Reconstruction

NoW Benchmark - Mean Reconstruction Error (mm)

3D Face Reconstruction

NoW Benchmark - Stdev Reconstruction Error (mm)

3D Face Reconstruction

NoW Benchmark - Median Reconstruction Error

3D Object Detection From Stereo Images

KITTI Cars Moderate - AP75

3D Object Detection From Stereo Images

KITTI Cyclists Moderate - AP50

3D Object Detection From Stereo Images

KITTI Pedestrians Moderate - AP50

Few-Shot Image Classification

CUB 200 5-way 1-shot - Accuracy

Few-Shot Image Classification

CIFAR-FS 5-way (1-shot) - Accuracy

Multiple Choice Question Answering (MCQA)

BIG-bench (Novel Concepts) - Accuracy

Temporal Relation Classification

MATRES - F1

Dense Object Detection

SKU-110K - AP

Multiple Object Tracking

CroHD - MOTA

Online Multi-Object Tracking

MOT16 - MOTA

Image Inpainting

CelebA-HQ - FID

Image Inpainting

Places2 - P-IDS

Image Inpainting

Places2 - U-IDS

Hand Pose Estimation

HANDS 2019 - Average 3D Error

Hand Pose Estimation

NYU Hands - Average 3D Error

Hand Pose Estimation

ICVL Hands - Average 3D Error

Face Verification

CFP-FP - Accuracy

Face Verification

AgeDB-30 - Accuracy

Knowledge Graph Completion

DBP-5L (Greek) - MRR

Knowledge Graph Completion

DPB-5L (French) - MRR

Knowledge Graph Completion

DBP-5L (English) - MRR

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (Ego4D) Visible

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (VISOR) Occ

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (Ego4D) Occ

Affordance Recognition

HICO-DET(Unknown Concepts) - COCO-Val2017

Affordance Recognition

HICO-DET(Unknown Concepts) - Obj365

Affordance Recognition

HICO-DET(Unknown Concepts) - HICO

Affordance Recognition

HICO-DET(Unknown Concepts) - Novel Classes

Age Estimation

MORPH album2 (Caucasian) - MAE

Face Verification

IJB-C - TAR @ FAR=1e-3

Autonomous Driving

CARLA Leaderboard - Route Completion

Image Inpainting

Places2 - FID

Image Inpainting

Places2 - LPIPS

Nested Named Entity Recognition

GENIA - F1

3D Object Detection

V2XSet - AP0.5 (Perfect)

3D Object Detection

V2XSet - AP0.5 (Noisy)

3D Object Detection

V2XSet - AP0.7 (Noisy)

Lane Detection

TuSimple - F1 score

Lane Detection

LLAMAS - F1

Video Prediction

KTH - LPIPS

Video Prediction

KTH - SSIM

Video Prediction

SynpickVP - LPIPS

Video Prediction

SynpickVP - PSNR

Video Prediction

Moving MNIST - SSIM

Video Prediction

Moving MNIST - LPIPS

Document Image Classification

Tobacco-3482 - Accuracy

Neural Architecture Search

NAS-Bench-201, CIFAR-10 - Accuracy (Test)

Neural Architecture Search

NAS-Bench-201, CIFAR-10 - Accuracy (Val)

Neural Architecture Search

NAS-Bench-201, ImageNet-16-120 - Accuracy (Val)

AMR Parsing

The Little Prince - Smatch

AMR Parsing

New3 - Smatch

6D Pose Estimation using RGBD

REAL275 - mAP 10, 10cm

MRI Reconstruction

fastMRI Knee Val 8x - Params (M)

MRI Reconstruction

fastMRI Knee 8x - PSNR

Video Retrieval

MSR-VTT - text-to-video Mean Rank

Semi-Supervised Image Classification

CIFAR-10, 4000 Labels - Percentage error

Domain Generalization

ImageNet-A - Top-1 accuracy %

Gesture Generation

BEAT - FID

Video Salient Object Detection

DAVIS-2016 - S-Measure

Video Salient Object Detection

DAVIS-2016 - AVERAGE MAE

Video Salient Object Detection

DAVIS-2016 - MAX F-MEASURE

Video Salient Object Detection

SegTrack v2 - S-Measure

Video Salient Object Detection

SegTrack v2 - AVERAGE MAE

Video Salient Object Detection

SegTrack v2 - MAX F-MEASURE

Video Salient Object Detection

ViSal - S-Measure

Video Salient Object Detection

ViSal - max E-measure

Video Salient Object Detection

ViSal - Average MAE

Video Salient Object Detection

FBMS-59 - S-Measure

Video Salient Object Detection

FBMS-59 - AVERAGE MAE

Video Salient Object Detection

FBMS-59 - MAX F-MEASURE

Camouflaged Object Segmentation

PCOD_1200 - S-Measure

6D Pose Estimation using RGBD

REAL275 - mAP 3DIou@50

Stereo Depth Estimation

Spring - 1px total

Layout-to-Image Generation

Visual Genome 128x128 - Inception Score

Semi-Supervised Image Classification

CIFAR-100, 2500 Labels - Percentage error

Semi-Supervised Image Classification

cifar-100, 10000 Labels - Percentage error

Video Frame Interpolation

SNU-FILM (medium) - PSNR

Video Frame Interpolation

SNU-FILM (easy) - PSNR

Data-to-Text Generation

MLB Dataset (Relation Generation) - Precision

Data-to-Text Generation

MLB Dataset (Content Ordering) - DLD

Data-to-Text Generation

MLB Dataset - BLEU

Data-to-Text Generation

RotoWire (Relation Generation) - count

Learning with noisy labels

CIFAR-10N-Random3 - Accuracy (mean)

Learning with noisy labels

CIFAR-10N-Random2 - Accuracy (mean)

Small Data Image Classification

CIFAR-10, 500 Labels - Accuracy (%)

Text-to-Image Generation

CUB - FID

Point Cloud Completion

Completion3D - Chamfer Distance

Point Cloud Registration

KITTI (FCGF setting) - Recall (0.6m, 5 degrees)

Point Cloud Registration

3DLoMatch (10-30% overlap) - Recall ( correspondence RMSE below 0.2)

Motion Forecasting

Argoverse CVPR 2020 - DAC (K=6)

Image Dehazing

I-Haze - SSIM

Image Dehazing

Dense-Haze - PSNR

Visual Entailment

SNLI-VE val - Accuracy

Visual Entailment

SNLI-VE test - Accuracy

Active Learning

CIFAR10 (10,000) - Accuracy

Multi-Frame Super-Resolution

PROBA-V - Normalized cPSNR

Entity Resolution

WDC Computers-xlarge - F1 (%)

Video Super-Resolution

MSU Video Super Resolution Benchmark: Detail Restoration - Subjective score

Video Super-Resolution

MSU Video Super Resolution Benchmark: Detail Restoration - ERQAv1.0

Video Super-Resolution

MSU Video Super Resolution Benchmark: Detail Restoration - QRCRv1.0

Video Super-Resolution

MSU Video Super Resolution Benchmark: Detail Restoration - SSIM

Video Super-Resolution

MSU Video Super Resolution Benchmark: Detail Restoration - PSNR

Video Super-Resolution

MSU Video Upscalers: Quality Enhancement - PSNR

Point Cloud Registration

3DMatch (at least 30% overlapped - sample 5k interest points) - Recall ( correspondence RMSE below 0.2)

Point Cloud Registration

3DMatch (at least 30% overlapped - FCGF setting) - RE (all)

Point Cloud Registration

3DMatch (at least 30% overlapped - FCGF setting) - TE (all)

Motion Synthesis

LaFAN1 - L2Q@5

Motion Synthesis

LaFAN1 - L2Q@15

Motion Synthesis

LaFAN1 - L2Q@30

Motion Synthesis

LaFAN1 - L2P@5

Motion Synthesis

LaFAN1 - L2P@15

Motion Synthesis

LaFAN1 - NPSS@5

Motion Synthesis

LaFAN1 - NPSS@15

Motion Synthesis

LaFAN1 - NPSS@30

Motion Synthesis

LaFAN1 - L2P@30

Task-Oriented Dialogue Systems

KVRET - Entity F1

Audio Classification

ICBHI Respiratory Sound Database - Sensitivity

Image Super-Resolution

BSD100 - 8x upscaling - PSNR

Few-Shot Semantic Segmentation

FSS-1000 - Mean IoU

Long-range modeling

SCROLLS - Qspr

Long-range modeling

SCROLLS - Nrtv

Long-range modeling

SCROLLS - CNLI

Long-range modeling

SCROLLS - Avg.

Open-World Semi-Supervised Learning

CIFAR-10 - Seen accuracy (50% Labeled)

Open-World Semi-Supervised Learning

ImageNet-100 - Novel accuracy (50% Labeled)

Face Anti-Spoofing

SiW-Enroll5 - AUC

Medical Object Detection

DeepLesion - Sensitivity

Image Manipulation Detection

Columbia (OSN-transmitted - Whatsapp) - AUC

Image Manipulation Detection

DSO (OSN-transmitted - Wechat) - AUC

Image Manipulation Detection

DSO (OSN-transmitted - Whatsapp) - AUC

Image Manipulation Detection

DSO (OSN-transmitted - Weibo) - AUC

Image Manipulation Detection

Columbia (OSN-transmitted - Facebook) - AUC

Supervised Video Summarization

SumMe - F1-score (Augmented)

Point Cloud Registration

FP-T-E - Recall (3cm, 10 degrees)

Point Cloud Registration

FP-O-H - Recall (3cm, 10 degrees)

Point Cloud Registration

FP-O-E - Recall (3cm, 10 degrees)

3D Lane Detection

Apollo Synthetic 3D Lane - Z error far

Supervised Video Summarization

TvSum - Spearman's Rho

Emotion Recognition in Conversation

DailyDialog - Micro-F1

Data-to-Text Generation

MLB Dataset (Relation Generation) - count

Data-to-Text Generation

MLB Dataset (Content Selection) - Precision

Data-to-Text Generation

MLB Dataset (Content Selection) - Recall

3D Human Reconstruction

CAPE - Chamfer (cm)

3D Human Reconstruction

CAPE - P2S (cm)

3D Human Reconstruction

CAPE - NC

3D Human Reconstruction

CustomHumans - Normal Consistency

Hand Gesture Recognition

NVGesture - Accuracy

Multi-Document Summarization

Multi-News - ROUGE-SU4

RGB Salient Object Detection

ECSSD - F-measure

RGB Salient Object Detection

DUT-OMRON - F-measure

AMR Parsing

LDC2020T02 - Smatch

AMR Parsing

Bio - Smatch

AMR Parsing

LDC2017T10 - Smatch

RGB Salient Object Detection

PASCAL-S - F-measure

Monocular Depth Estimation

Make3D - Abs Rel

Monocular Depth Estimation

Make3D - Sq Rel

Few Shot Action Recognition

HMDB51 - 1:1 Accuracy

3D Point Cloud Classification

IntrA - F1 score (5-fold)

Grayscale Image Denoising

Urban100 sigma50 - PSNR

Grayscale Image Denoising

Urban100 sigma25 - PSNR

Face Parsing

CelebAMask-HQ - Mean F1

Face Parsing

LaPa - Mean F1

3D Hand Pose Estimation

FreiHAND - PA-MPJPE

Monocular Depth Estimation

VA (Virtual Apartment) - Root mean square error (RMSE)

Monocular Depth Estimation

VA (Virtual Apartment) - Log root mean square error (RMSE_log)

Monocular Depth Estimation

VA (Virtual Apartment) - Mean average error (MAE)

Monocular Depth Estimation

VA (Virtual Apartment) - Absolute relative error (AbsRel)

Edge Detection

BIPED - ODS

Edge Detection

MDBD - ODS

Low-Light Image Enhancement

MEF - NIQE

Text-to-Image Generation

MS COCO - Inception score

Learning with noisy labels

CIFAR-100N - Accuracy (mean)

Partial Domain Adaptation

VisDA2017 - Accuracy (%)

Partial Domain Adaptation

ImageNet-Caltech - Accuracy (%)

Supervised Video Summarization

SumMe - F1-score (Canonical)

End-To-End Dialogue Modelling

MULTIWOZ 2.0 - MultiWOZ (Success)

End-To-End Dialogue Modelling

MULTIWOZ 2.0 - MultiWOZ (Inform)

Text-to-Image Generation

Oxford 102 Flowers - FID

Music Source Separation

MUSDB18-HQ - SDR (bass)

Text-to-Image Generation

MS COCO - SOA-C

Text-to-Image Generation

MS COCO - FID-8

Text-to-Image Generation

MS COCO - FID-4

Neural Architecture Search

NATS-Bench Topology, CIFAR-10 - Test Accuracy

Neural Architecture Search

NATS-Bench Topology, ImageNet16-120 - Test Accuracy

Neural Architecture Search

NATS-Bench Topology, CIFAR-100 - Test Accuracy

Semi-supervised Medical Image Classification

Chest X-Ray14 2% labeled - AUC

Underwater Image Restoration

LSUI - PSNR

Text-to-Image Generation

MS COCO - FID-1

Text-to-Image Generation

MS COCO - FID-2

Object Proposal Generation

PASCAL VOC 2012, 60 proposals per image - Average Recall

Point Cloud Registration

3DMatch Benchmark - Feature Matching Recall

Single Image Deraining

Test2800 - SSIM

Single Image Deraining

Test100 - SSIM

Single Image Deraining

Test100 - PSNR

Grayscale Image Denoising

Urban100 sigma15 - PSNR

Pedestrian Detection

TJU-Ped-traffic - R+HO (miss rate)

Pedestrian Detection

TJU-Ped-traffic - ALL (miss rate)

Pedestrian Detection

TJU-Ped-campus - R (miss rate)

Pedestrian Detection

TJU-Ped-campus - HO (miss rate)

Pedestrian Detection

TJU-Ped-campus - R+HO (miss rate)

Pedestrian Detection

TJU-Ped-campus - ALL (miss rate)

Cross-Modal Retrieval

Flickr30k - Image-to-text R@5

Open Vocabulary Attribute Detection

OVAD-Box benchmark - mean average precision

Image Inpainting

Places2 val - FID

Image Inpainting

Places2 val - PD

Video Retrieval

MSR-VTT-1kA - text-to-video Median Rank

Video Retrieval

MSR-VTT-1kA - video-to-text Median Rank

Face Alignment

300W Split 2 (300W-LP) - NME (bbox)

Face Alignment

300W Split 2 (300W-LP) - AUC@7 (bbox)

Face Alignment

COFW-68 (300WLP) - NME (box)

Face Alignment

COFW-68 (300WLP) - AUC@7

Face Alignment

WFW (Extra Data) - NME (inter-ocular)

Face Alignment

WFW (Extra Data) - AUC@10 (inter-ocular)

Face Alignment

WFW (Extra Data) - FR@10 (inter-ocular)

Text Retrieval

Image-Chat - R@1

Text Retrieval

Image-Chat - R@5

Text Retrieval

Image-Chat - Sum(R@1,5)

Conditional Image Generation

ArtBench-10 (32x32) - FID

Dialogue State Tracking

Wizard-of-Oz - Joint

Unsupervised Domain Adaptation

HMDB-UCF - Accuracy

Object Detection

AI-TOD - APm

Human action generation

NTU RGB+D 2D - MMDa (CS)

Human action generation

NTU RGB+D 2D - MMDs (CS)

Human action generation

NTU RGB+D 2D - MMDa (CV)

Human action generation

NTU RGB+D 2D - MMDs (CV)

Face Alignment

AFLW - Mean NME

Multi-Document Summarization

Multi-News - ROUGE-2

Multi-Document Summarization

Multi-News - ROUGE-1

Data-to-Text Generation

Cleaned E2E NLG Challenge - BLEU (Test set)

Data-to-Text Generation

WebNLG Full - BLEU

Vehicle Re-Identification

VeRi-776 - Rank1

Text Simplification

ASSET - FKGL

Fine-Grained Image Classification

Bird-225 - Accuracy

RGB Salient Object Detection

PASCAL-S - MAE

Passage Retrieval

EntityQuestions - Recall@20

Image Denoising

SID SonyA7S2 x100 - SSIM (Raw)

Few-Shot Semantic Segmentation

COCO-20i -> Pascal VOC (1-shot) - Mean IoU

Few-Shot Semantic Segmentation

PASCAL-5i (10-Shot) - Mean IoU

Few-Shot Semantic Segmentation

COCO-20i (10-shot) - Mean IoU

Unsupervised Semantic Segmentation

COCO-Stuff-15 - Pixel Accuracy

Depth Estimation

eBDtheque - Abs Rel

Depth Estimation

eBDtheque - Sq Rel

Depth Estimation

eBDtheque - RMSE

Depth Estimation

eBDtheque - RMSE log

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D16 val (F)

Suspicous (BIRADS 4,5)-no suspicous (BIRADS 1,2,3) per image classification

InBreast - AUC

Lesion Segmentation

ISIC 2018 - Dice Score

Liver Segmentation

LiTS2017 - IoU

Video Deinterlacing

MSU Deinterlacer Benchmark - PSNR

Video Deinterlacing

MSU Deinterlacer Benchmark - SSIM

Video Deinterlacing

MSU Deinterlacer Benchmark - Subjective

Video Deinterlacing

MSU Deinterlacer Benchmark - VMAF

JPEG Artifact Correction

ICB (Quality 30 Color) - PSNR

JPEG Artifact Correction

ICB (Quality 20 Color) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 20 Color) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 10 Color) - PSNR

JPEG Artifact Correction

Classic5 (Quality 40 Grayscale) - PSNR

JPEG Artifact Correction

Classic5 (Quality 10 Grayscale) - PSNR

JPEG Artifact Correction

Classic5 (Quality 30 Grayscale) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 20 Grayscale) - PSNR

JPEG Artifact Correction

ICB (Quality 10 Color) - PSNR

JPEG Artifact Correction

Live1 (Quality 10 Grayscale) - PSNR

JPEG Artifact Correction

Classic5 (Quality 20 Grayscale) - PSNR

SMAC+

Def_Outnumbered_parallel - Median Win Rate

SMAC+

Def_Outnumbered_sequential - Median Win Rate

SMAC+

Off_Near_sequential - Median Win Rate

SMAC+

Off_Hard_parallel - Median Win Rate

SMAC+

Off_Distant_parallel - Median Win Rate

SMAC+

Off_Superhard_sequential - Median Win Rate

SMAC+

Off_Complicated_sequential - Median Win Rate

SMAC+

Def_Armored_sequential - Median Win Rate

SMAC+

Off_Distant_sequential - Median Win Rate

SMAC+

Off_Complicated_parallel - Median Win Rate

Temporal Action Localization

CrossTask - Recall

Multimodal Activity Recognition

MMAct - F1-Score (Cross-Subject)

Unsupervised Person Re-Identification

DukeMTMC-reID - Rank-10

Text Classification

OneStopEnglish (Readability Assessment) - Accuracy (5-fold)

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D16 val (G)

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D16 val (J)

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D17 val (G)

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D17 val (J)

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D17 val (F)

Supervised Video Summarization

TvSum - Kendall's Tau

Video Quality Assessment

LIVE Livestream - SRCC

Conditional Image Generation

ImageNet 64x64 - FID

Conditional Image Generation

ImageNet 256x256 - Inception score

Multi-Label Text Classification

Reuters-21578 - Micro-F1

Dialog Relation Extraction

DialogRE - F1c (v2)

Video Retrieval

MSR-VTT - text-to-video Median Rank

Video Retrieval

MSR-VTT - video-to-text R@5

Video Retrieval

MSR-VTT - video-to-text R@10

Video Retrieval

MSR-VTT - video-to-text Mean Rank

Video Retrieval

MSVD - text-to-video R@5

Video Retrieval

MSVD - text-to-video Median Rank

Semi-Supervised Semantic Segmentation

ScribbleKITTI - mIoU (1% Labels)

Semi-Supervised Semantic Segmentation

ScribbleKITTI - mIoU (10% Labels)

Action Triplet Recognition

CholecT50 - Mean AP

3D Object Detection

waymo vehicle - L1 mAP

Domain Generalization

NICO Animal - Accuracy

Domain Generalization

NICO Vehicle - Accuracy

Coherence Evaluation

GCDC + RST - Accuracy - Accuracy

Face Recognition

CASIA-WebFace+masks - Accuracy

Face Recognition

CelebA+masks - Accuracy

Speech Synthesis

LJSpeech - Mean Opinion Score

Image Manipulation Detection

Casia V1+ - AUC

Multi-Hypotheses 3D Human Pose Estimation

AH36M - H36M PMPJPE (n = 25)

Multi-Hypotheses 3D Human Pose Estimation

AH36M - Most-Likely Hypothesis PMPJPE (n = 1)

Multi-Hypotheses 3D Human Pose Estimation

AH36M - H36M PMPJPE (n = 1)

Audio Super-Resolution

Piano - Log-Spectral Distance

Audio Super-Resolution

Voice Bank corpus (VCTK) - Log-Spectral Distance

Vehicle Re-Identification

VehicleID Medium - Rank-5

Analog Video Restoration

TAPE - VMAF

Analog Video Restoration

TAPE - SSIM

Video Super-Resolution

MSU Super-Resolution for Video Compression - BSQ-rate over ERQA

Video Super-Resolution

MSU Super-Resolution for Video Compression - BSQ-rate over VMAF

Video Super-Resolution

MSU Super-Resolution for Video Compression - BSQ-rate over LPIPS

Action Recognition

BAR - Accuracy

Stereo Depth Estimation

KITTI2015 - three pixel error

Vehicle Re-Identification

VehicleID Small - Rank-1

Vehicle Re-Identification

VehicleID Medium - mAP

Vehicle Re-Identification

VehicleID Medium - Rank-1

Vehicle Re-Identification

VehicleID Large - mAP

Vehicle Re-Identification

VehicleID Large - Rank-1

3D Object Reconstruction From A Single Image

RenderPeople - Point-to-surface distance (cm)

3D Object Reconstruction From A Single Image

RenderPeople - Chamfer (cm)

3D Object Reconstruction From A Single Image

RenderPeople - Surface normal consistency

3D Object Reconstruction From A Single Image

BUFF - Chamfer (cm)

3D Object Reconstruction From A Single Image

BUFF - Surface normal consistency

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - TR V2V (mm), left hand

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - PA V2V (mm), whole body

Monocular 3D Object Detection

KITTI Pedestrian Moderate - AP Medium

Monocular 3D Object Detection

KITTI Pedestrian Hard - AP Hard

Multiview Detection

Wildtrack - MODP

Multiview Detection

MultiviewX - MODP

Poll Generation

WeiboPolls - ROUGE-1

Poll Generation

WeiboPolls - ROUGE-L

Poll Generation

WeiboPolls - BLEU-1

Poll Generation

WeiboPolls - BLEU-3

6D Pose Estimation using RGBD

REAL275 - mAP 3DIou@25

6D Pose Estimation using RGBD

REAL275 - mAP 5, 5cm

Multivariate Time Series Imputation

Beijing Multi-Site Air-Quality Dataset - MAE (PM2.5)

Scene Text Recognition

ICDAR 2003 - Accuracy

Video Prediction

KTH - FVD

Video Prediction

KTH - Params (M)

Rgb-T Tracking

RGBT234 - Success

Skeleton Based Action Recognition

UPenn Action - Accuracy

Action Classification

Toyota Smarthome dataset - CV2

Time Series Forecasting

ETTh2 (168) Univariate - MSE

Time Series Forecasting

ETTh2 (168) Univariate - MAE

Time Series Forecasting

ETTh2 (336) Univariate - MSE

Time Series Forecasting

ETTh2 (336) Univariate - MAE

Time Series Forecasting

ETTh2 (720) Univariate - MAE

Visual Storytelling

VIST - METEOR

Visual Storytelling

VIST - CIDEr

Visual Storytelling

VIST - BLEU-4

Multi-Object Tracking

MOTS20 - IDF1

Supervised Video Summarization

TvSum - F1-score (Augmented)

Zero-Shot Cross-Lingual Transfer

XTREME - Sentence-pair Classification

Zero-Shot Cross-Lingual Transfer

XTREME - Structured Prediction

Zero-Shot Cross-Lingual Transfer

XTREME - Question Answering

Zero-Shot Cross-Lingual Transfer

XTREME - Sentence Retrieval

Zero-Shot Cross-Lingual Transfer

XTREME - Avg

Time Series Forecasting

PeMSD7 - 9 steps MAE

Video Retrieval

MSR-VTT - video-to-text Median Rank

Image Super-Resolution

BSD100 - 4x upscaling - LPIPS

Image Super-Resolution

Urban100 - 4x upscaling - LPIPS

Image Super-Resolution

IXI - SSIM 4x T2w

KG-to-Text Generation

WebNLG 2.0 (Unconstrained) - METEOR

Self-Supervised Action Recognition

UCF101 (finetuned) - 3-fold Accuracy

Session-Based Recommendations

Last.FM - HR@20

Session-Based Recommendations

Last.FM - MRR@20

Time Series Forecasting

ETTh1 (24) Multivariate - MSE

Time Series Forecasting

ETTh1 (24) Multivariate - MAE

Time Series Forecasting

ETTh1 (48) Multivariate - MSE

Time Series Forecasting

ETTh1 (48) Multivariate - MAE

Time Series Forecasting

ETTh2 (168) Multivariate - MSE

Time Series Forecasting

ETTh2 (168) Multivariate - MAE

Time Series Forecasting

ETTh2 (48) Multivariate - MSE

Time Series Forecasting

ETTh2 (48) Multivariate - MAE

Time Series Forecasting

ETTh2 (24) Multivariate - MSE

Time Series Forecasting

ETTh2 (24) Multivariate - MAE

Time Series Forecasting

ETTh1 (24) Univariate - MSE

Time Series Forecasting

ETTh1 (24) Univariate - MAE

Time Series Forecasting

ETTh1 (48) Univariate - MSE

Time Series Forecasting

ETTh1 (48) Univariate - MAE

Time Series Forecasting

ETTh1 (168) Univariate - MSE

Time Series Forecasting

ETTh1 (168) Univariate - MAE

Time Series Forecasting

ETTh1 (168) Multivariate - MSE

Time Series Forecasting

ETTh1 (168) Multivariate - MAE

Time Series Forecasting

ETTh2 (48) Univariate - MSE

Time Series Forecasting

ETTh2 (48) Univariate - MAE

Time Series Forecasting

ETTh2 (24) Univariate - MSE

Time Series Forecasting

ETTh2 (24) Univariate - MAE

3D Multi-Person Pose Estimation

Campus - PCP3D

3D Multi-Person Pose Estimation

Shelf - PCP3D

Sequential Image Classification

Sequential MNIST - Permuted Accuracy

Medical Named Entity Recognition

ShARe/CLEF eHealth corpus - F1

Program Repair

DeepFix - Average Success Rate

Atari Games

Atari 2600 Chopper Command - Score

Atari Games

Atari 2600 Tennis - Score

Atari Games

Atari 2600 Surround - Score

Atari Games

Atari 2600 Up and Down - Score

Atari Games

Atari 2600 Enduro - Score

Unsupervised 3D Human Pose Estimation

MPI-INF-3DHP - PCK

Semi-Supervised Video Object Segmentation

DAVIS 2016 - Jaccard (Recall)

Semi-Supervised Video Object Segmentation

DAVIS 2016 - F-measure (Recall)

Aspect-oriented Opinion Extraction

SemEval-2014 Task-4 - Restaurant 2014 (F1)

Aspect-oriented Opinion Extraction

SemEval-2014 Task-4 - Laptop 2014 (F1)

Aspect-oriented Opinion Extraction

SemEval-2014 Task-4 - Restaurant 2015 (F1)

Aspect-oriented Opinion Extraction

SemEval-2014 Task-4 - Restaurant 2016 (F1)

Aspect Term Extraction and Sentiment Classification

SemEval - Restaurant 2015 (F1)

Unsupervised Semantic Segmentation

ImageNet-S-50 - mIoU (val)

Unsupervised Semantic Segmentation

ImageNet-S-50 - mIoU (test)

Semi-Supervised Video Object Segmentation

VOT2020 - EAO

Fine-Grained Image Classification

Oxford-IIIT Pets - Accuracy

Action Anticipation

EPIC-KITCHENS-55 (Unseen test set (S2) - Top 5 Accuracy - Act.

Stereo Image Super-Resolution

KITTI2012 - 2x upscaling - PSNR

Entity Resolution

WDC Computers-small - F1 (%)

Entity Resolution

WDC Watches-small - F1 (%)

Relationship Extraction (Distant Supervised)

New York Times Corpus - P@10%

Relationship Extraction (Distant Supervised)

New York Times Corpus - P@30%

Aspect-Based Sentiment Analysis (ABSA)

MAMS - Macro-F1

Generalizable Person Re-identification

Market-1501 - MSMT17-All->mAP

Generalizable Person Re-identification

Market-1501 - MSMT17-All->Rank-1

Generalizable Person Re-identification

MSMT17 - Market-1501->Rank1

Generalizable Person Re-identification

MSMT17 - Market-1501->mAP

Generalizable Person Re-identification

CUHK03-NP (detected) - MSMT17-All->mAP

Generalizable Person Re-identification

CUHK03-NP (detected) - MSMT17-All->Rank-1

Generalizable Person Re-identification

CUHK03-NP (detected) - Market-1501->mAP

Generalizable Person Re-identification

CUHK03-NP (detected) - Market-1501->Rank-1

Unsupervised Anomaly Detection with Specified Settings -- 30% anomaly

STL-10 - AUC-ROC

Unsupervised Anomaly Detection with Specified Settings -- 30% anomaly

ASSIRA Cat Vs Dog - AUC-ROC

Cross-Lingual Question Answering

TyDiQA-GoldP - EM

Document Classification

HOC - F1

Relationship Extraction (Distant Supervised)

New York Times Corpus - AUC

Motion Forecasting

Argoverse CVPR 2020 - MR (K=6)

Point Cloud Registration

ETH (trained on 3DMatch) - Feature Matching Recall

Point Cloud Registration

FP-R-H - Recall (3cm, 10 degrees)

Point Cloud Registration

FP-R-M - Recall (3cm, 10 degrees)

Point Cloud Registration

KITTI (trained on 3DMatch) - Success Rate

Point Cloud Registration

3DMatch (trained on KITTI) - Recall

Point Cloud Registration

FP-R-E - Recall (3cm, 10 degrees)

Point Cloud Registration

FP-T-H - Recall (3cm, 10 degrees)

Retinal Vessel Segmentation

DRIVE - Accuracy

Text Classification

R52 - Accuracy

Text Classification

Ohsumed - Accuracy

3D Multi-Person Mesh Recovery

AGORA - FB-NMJE

3D Multi-Person Mesh Recovery

AGORA - B-NMJE

3D Multi-Person Mesh Recovery

AGORA - B-MVE

3D Multi-Person Mesh Recovery

AGORA - FB-MPJPE

3D Multi-Person Mesh Recovery

AGORA - B-MPJPE

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - PA V2V (mm), face

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - TR V2V (mm), body only

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - MPJPE-14

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - MPJPE, left hand

Mathematical Question Answering

Geometry3K - Accuracy (%)

Grammatical Error Detection

FCE - F0.5

Grammatical Error Detection

CoNLL-2014 A2 - F0.5

Grammatical Error Detection

CoNLL-2014 A1 - F0.5

Chinese Named Entity Recognition

Weibo NER - F1

Chinese Named Entity Recognition

Resume NER - F1

Facial Expression Recognition (FER)

FER2013 - Accuracy

Image Relighting

VIDIT’20 validation set - PSNR

Image Relighting

VIDIT’20 validation set - SSIM

Image Relighting

VIDIT’20 validation set - LPIPS

Image Relighting

VIDIT’20 validation set - MPS

Cross-Domain Few-Shot

Plantae - 5 shot

Cross-Domain Few-Shot

ChestX - 5 shot

AMR Parsing

LDC2014T12 - F1 Full

Facial Expression Recognition (FER)

JAFFE - Accuracy

3D Object Detection

KITTI Cyclist Easy val - AP

3D Object Detection

KITTI Cyclist Moderate val - AP

3D Object Detection

KITTI Cars Hard val - AP

3D Object Detection

KITTI Cyclist Hard val - AP

Supervised Video Summarization

TvSum - F1-score (Canonical)

Homography Estimation

PDS-COCO - MACE

Text-to-Image Generation

Multi-Modal-CelebA-HQ - Acc

Self-Supervised Person Re-Identification

SYSU-30k - Rank-1

Earth Surface Forecasting

EarthNet2021 Seasonal Track - EarthNetScore

Semantic Image Matting

Semantic Image Matting Dataset - SAD

Semantic Image Matting

Semantic Image Matting Dataset - MSE(10^3)

Semantic Image Matting

Semantic Image Matting Dataset - Grad

Semantic Image Matting

Semantic Image Matting Dataset - Conn

Face Anti-Spoofing

OULU-NPU - ACER

Visual Dialog

Visual Dialog v1.0 test-std - MRR (x 100)

Visual Dialog

Visual Dialog v1.0 test-std - R@1

Fundus to Angiography Generation

Fundus Fluorescein Angiogram Photographs & Colour Fundus Images of Diabetic Patients - FID

Fundus to Angiography Generation

Fundus Fluorescein Angiogram Photographs & Colour Fundus Images of Diabetic Patients - Kernel Inception Distance

Pose Transfer

Deep-Fashion - SSIM

Image Manipulation Detection

DSO-1 - AUC

Text Classification

TREC-6 - Error

Atari Games

Atari 2600 Bank Heist - Score

6D Pose Estimation using RGBD

LineMOD - Mean ADD

3D Multi-Person Pose Estimation (root-relative)

MuPoTS-3D - 3DPCK

RGB-D Salient Object Detection

LFSD - max E-Measure

RGB-D Salient Object Detection

LFSD - max F-Measure

Video Quality Assessment

MSU FR VQA Database - SRCC

Video Quality Assessment

MSU FR VQA Database - PLCC

Video Quality Assessment

MSU FR VQA Database - KLCC

Visual Object Tracking

VOT2019 - Expected Average Overlap (EAO)

Video Frame Interpolation

X4K1000FPS - tOF

Self-Supervised Action Recognition

HMDB51 (finetuned) - Top-1 Accuracy

Unsupervised Domain Adaptation

Cityscapes-to-OxfordCar - mIoU

Video Super-Resolution

MSU Video Upscalers: Quality Enhancement - LPIPS

Age Estimation

Adience - MAE

Image Inpainting

CelebA-HQ - P-IDS

Image Inpainting

CelebA-HQ - U-IDS

3D Room Layouts From A Single RGB Panorama

Stanford2D3D Panoramic - Corner Error

Stereo-LiDAR Fusion

KITTI Depth Completion Validation - RMSE

Layout-to-Image Generation

COCO-Stuff 128x128 - FID

Layout-to-Image Generation

COCO-Stuff 128x128 - Inception Score

Vehicle Re-Identification

VeRi-776 - Rank5

Unsupervised Domain Adaptation

Market to Duke - rank-1

Few-Shot Image Classification

Stanford Dogs 5-way (5-shot) - Accuracy

Few-Shot Image Classification

Stanford Dogs 5-way (1-shot) - Accuracy

Efficient ViTs

ImageNet-1K (with DeiT-T) - GFLOPs

Video Frame Interpolation

Middlebury - SSIM

Video Frame Interpolation

Middlebury - PSNR

Interactive Video Object Segmentation

DAVIS 2017 - AUC-J&F

Interactive Video Object Segmentation

DAVIS 2017 - AUC-J

Interactive Video Object Segmentation

DAVIS 2017 - J@60s

Multi-Person Pose Estimation

PoseTrack2017 - Mean mAP

Multi-Person Pose Estimation

PoseTrack2018 - Mean mAP

Video Denoising

CRVD - PSNR (Raw)

Video Denoising

CRVD - SSIM (Raw)

Sequential Image Classification

noise padded CIFAR-10 - % Test Accuracy

Text Simplification

PWKP / WikiSmall - SARI

Retinal Vessel Segmentation

CHASE_DB1 - AUC

Retinal Vessel Segmentation

DRIVE - F1 score

Interactive Video Object Segmentation

DAVIS 2017 - J&F@60s

Video Prediction

Kinetics-600 12 frames, 64x64 - Cond

Video Prediction

Kinetics-600 12 frames, 64x64 - Pred

Video Retrieval

MSVD - video-to-text Median Rank

Image Relighting

VIDIT’20 validation set - Runtime(s)

Object Detection

DSEC - mAP

Video Semantic Segmentation

CamVid - Mean IoU

Neural Architecture Search

NAS-Bench-201, CIFAR-10 - Search time (s)

Neural Architecture Search

NAS-Bench-201, CIFAR-100 - Search time (s)

SMAC

SMAC 27m_vs_30m - Median Win Rate

SMAC+

Def_Armored_parallel - Median Win Rate

SMAC+

Def_Infantry_parallel - Median Win Rate

Action Spotting

SoccerNet - Average-mAP

Graph Classification

PROTEINS - Accuracy

Cross-Modal Retrieval

Flickr30k - Image-to-text R@10

Vehicle Re-Identification

VehicleID Small - Rank1

Emotion Classification

SemEval 2018 Task 1E-c - Macro-F1

Motion Synthesis

BRACE - Beat DTW cost

Motion Synthesis

BRACE - Toprock average

Motion Synthesis

BRACE - Footwork average

Motion Synthesis

BRACE - Powermove average

Facial Expression Recognition (FER)

FERPlus - Accuracy(pretrained)

Action Anticipation

EPIC-KITCHENS-55 (Seen test set (S1)) - Top 1 Accuracy - Verb

Retinal Vessel Segmentation

CHASE_DB1 - F1 score

Semi-supervised Anomaly Detection

UBI-Fights - AUC

Semi-supervised Anomaly Detection

UBI-Fights - Decidability

Semi-supervised Anomaly Detection

UBI-Fights - EER

Text Classification

MR - Accuracy

Image Denoising

SIDD - SSIM (sRGB)

Face Alignment

AFLW2000 - Error rate

Trajectory Forecasting

TrajNet++ - FDE

Trajectory Forecasting

TrajNet++ - COL

Unsupervised 3D Human Pose Estimation

MPI-INF-3DHP - AUC

3D Part Segmentation

ShapeNet-Part - Class Average IoU

Head Pose Estimation

BIWI - Geodesic Error (GE)

Video Saliency Detection

MSU Video Saliency Prediction - SIM

Video Saliency Detection

MSU Video Saliency Prediction - CC

Video Saliency Detection

MSU Video Saliency Prediction - NSS

Video Saliency Detection

MSU Video Saliency Prediction - AUC-J

Video Saliency Detection

MSU Video Saliency Prediction - KLDiv

Video Saliency Detection

DHF1K - NSS

Cross-Modal Person Re-Identification

SYSU-MM01 - rank1

Video Panoptic Segmentation

Cityscapes-VPS - VPQ (stuff)

Image-to-Image Translation

Cityscapes Labels-to-Photo - LPIPS

Image-to-Image Translation

COCO-Stuff Labels-to-Photos - mIoU

Image-to-Image Translation

ADE20K-Outdoor Labels-to-Photos - mIoU

3D Object Detection

KITTI Cars Moderate val - AP

3D Object Detection

KITTI Cars Easy val - AP

3D Face Reconstruction

Stirling-LQ (FG2018 3D face reconstruction challenge) - Mean Reconstruction Error (mm)

3D Face Reconstruction

Stirling-HQ (FG2018 3D face reconstruction challenge) - Mean Reconstruction Error (mm)

Text-to-Image Generation

Multi-Modal-CelebA-HQ - Real

Object Detection

UAVDT - mAP

3D Reconstruction

DTU - Acc

Video Super-Resolution

TbD-3D - SSIM

Video Super-Resolution

TbD-3D - PSNR

Video Super-Resolution

TbD-3D - TIoU

Video Super-Resolution

TbD - SSIM

Video Super-Resolution

TbD - PSNR

Video Super-Resolution

TbD - TIoU

Video Super-Resolution

Falling Objects - SSIM

Video Super-Resolution

Falling Objects - PSNR

Video Super-Resolution

Falling Objects - TIoU

Single Image Deraining

Rain100L - PSNR

Panoptic Segmentation

Mapillary val - PQth

Few-Shot Image Classification

Stanford Cars 5-way (1-shot) - Accuracy

Few-Shot Image Classification

Stanford Cars 5-way (5-shot) - Accuracy

Conditional Image Generation

ImageNet 128x128 - Inception score

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - PA V2V (mm), left hand

Dialogue State Tracking

Second dialogue state tracking challenge - Joint

Video Frame Interpolation

MSU Video Frame Interpolation - Subjective score

Video Frame Interpolation

MSU Video Frame Interpolation - FPS

Video Quality Assessment

MSU NR VQA Database - SRCC

Video Quality Assessment

MSU NR VQA Database - PLCC

Video Quality Assessment

MSU NR VQA Database - KLCC

DRS Parsing

PMB-3.0.0 - F1

DRS Parsing

PMB-2.2.0 - F1

Video Generation

UCF-101 16 frames, 128x128, Unconditional - Inception Score

Complex Query Answering

NELL-995 - MRR 2i

3D Multi-Person Pose Estimation (root-relative)

MuPoTS-3D - MPJPE

Video Quality Assessment

LIVE-YT-HFR - SRCC

Drug–drug Interaction Extraction

DDI extraction 2013 corpus - F1

Drug–drug Interaction Extraction

DDI extraction 2013 corpus - Micro F1

Multi-Object Tracking

HiEve - MOTA

Session-Based Recommendations

yoochoose1/64 - MRR@20

MRI Reconstruction

fastMRI Knee 8x - SSIM

Task-Oriented Dialogue Systems

KVRET - BLEU

Semantic Dependency Parsing

PSD - In-domain

Semantic Dependency Parsing

PSD - Out-of-domain

Semantic Dependency Parsing

DM - In-domain

Semantic Dependency Parsing

DM - Out-of-domain

Semantic Dependency Parsing

PAS - In-domain

Semantic Dependency Parsing

PAS - Out-of-domain

Chunking

Penn Treebank - F1 score

Chunking

CoNLL 2000 - Exact Span F1

Single Image Deraining

RainCityscapes - PSNR

Single Image Deraining

RainCityscapes - SSIM

Single Image Deraining

Rain100L - SSIM

Open-Domain Question Answering

SQuAD1.1 dev - EM

Dialogue Act Classification

ICSI Meeting Recorder Dialog Act (MRDA) corpus - Accuracy

Emotion Recognition in Conversation

SEMAINE - MAE (Arousal)

Emotion Recognition in Conversation

SEMAINE - MAE (Expectancy)

Type prediction

ManyTypes4TypeScript - Average Precision

Type prediction

ManyTypes4TypeScript - Average Recall

Type prediction

ManyTypes4TypeScript - Average F1

3D Human Pose Estimation

3D Poses in the Wild Challenge - MPJAE

Open-Domain Question Answering

SearchQA - EM

RGB Salient Object Detection

SOC - Average MAE

RGB-D Salient Object Detection

LFSD - S-Measure

RGB-D Salient Object Detection

LFSD - Average MAE

Automated Theorem Proving

Metamath set.mm - Percentage correct

Anomaly Detection In Surveillance Videos

UCSD Peds2 - AUC

Few-Shot Image Classification

Mini-Imagenet 20-way (5-shot) - Accuracy

Few-Shot Image Classification

Mini-Imagenet 20-way (1-shot) - Accuracy

Image Super-Resolution

Set14 - 8x upscaling - SSIM

Image Super-Resolution

Set5 - 8x upscaling - PSNR

Image Super-Resolution

Set5 - 8x upscaling - SSIM

Emotion Classification

SemEval 2018 Task 1E-c - Micro-F1

Emotion Classification

SemEval 2018 Task 1E-c - Accuracy

Image Super-Resolution

BSD100 - 8x upscaling - SSIM

3D Multi-Person Mesh Recovery

AGORA - F-MPJPE

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - PA V2V (mm), body only

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - TR V2V (mm), whole body

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - TR V2V (mm), face

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - mean P2S

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - median P2S

Image Dehazing

O-Haze - SSIM

Text-to-Image Generation

Multi-Modal-CelebA-HQ - LPIPS

Speech Synthesis

LibriTTS - PESQ

Self-Supervised Action Recognition

Kinetics-600 - Top-1 Accuracy

Video Super-Resolution

MSU Video Super Resolution Benchmark: Detail Restoration - 1 - LPIPS

Lane Detection

CurveLanes - Precision

Lane Detection

CurveLanes - GFLOPs

Hand Pose Estimation

HANDS 2017 - Average 3D Error

Pose Transfer

Deep-Fashion - IS

Pose Transfer

Deep-Fashion - PCKh

Cross-Domain Few-Shot

EuroSAT - 5 shot

Visual Object Tracking

OTB-2013 - AUC

KG-to-Text Generation

AGENDA - BLEU

Brain Tumor Segmentation

BRATS-2013 - Dice Score

Few-Shot Semantic Segmentation

Pascal5i - meanIOU

3D Human Pose Estimation

3D Poses in the Wild Challenge - MPJPE

Visual Object Tracking

OTB-2015 - Precision

Multi-Object Tracking

MOTS20 - sMOTSA

Video Super-Resolution

Ultra Video Group HD - 4x upscaling - Average PSNR

RGB-D Salient Object Detection

RGBD135 - max E-Measure

Chinese Word Segmentation

PKU - F1

Nested Named Entity Recognition

NNE - Micro F1

Video Super-Resolution

MSU Video Upscalers: Quality Enhancement - SSIM

Few-Shot Image Classification

Meta-Dataset Rank - Mean Rank

Video Super-Resolution

MSU Super-Resolution for Video Compression - BSQ-rate over Subjective Score

Video Super-Resolution

MSU Super-Resolution for Video Compression - BSQ-rate over PSNR

Video Super-Resolution

MSU Super-Resolution for Video Compression - BSQ-rate over MS-SSIM

Neural Architecture Search

NAS-Bench-201, CIFAR-100 - Accuracy (Test)

Neural Architecture Search

NAS-Bench-201, CIFAR-100 - Accuracy (Val)

Few-Shot Image Classification

Tiered ImageNet 10-way (5-shot) - Accuracy

Few-Shot Image Classification

Mini-Imagenet 10-way (5-shot) - Accuracy

Few-Shot Image Classification

Mini-Imagenet 10-way (1-shot) - Accuracy

Few-Shot Image Classification

Tiered ImageNet 10-way (1-shot) - Accuracy

Neural Architecture Search

CIFAR-10 Image Classification - Search Time (GPU days)

Neural Architecture Search

CIFAR-10 Image Classification - Params

Motion Synthesis

BRACE - Beat alignment score

Motion Synthesis

BRACE - Frechet Inception Distance

Unsupervised Person Re-Identification

DukeMTMC-reID->Market-1501 - mAP

Unsupervised Person Re-Identification

DukeMTMC-reID->Market-1501 - Rank-1

Unsupervised Person Re-Identification

Market-1501->MSMT17 - mAP

Unsupervised Person Re-Identification

Market-1501->MSMT17 - Rank-1

Unsupervised Person Re-Identification

DukeMTMC-reID->MSMT17 - mAP

Unsupervised Person Re-Identification

DukeMTMC-reID->MSMT17 - Rank-1

Unsupervised Person Re-Identification

MSMT17->Market-1501 - Rank-1

Unsupervised Person Re-Identification

MSMT17->Market-1501 - mAP

Unsupervised Person Re-Identification

Market-1501->DukeMTMC-reID - mAP

Unsupervised Person Re-Identification

Market-1501->DukeMTMC-reID - Rank-1

Neural Architecture Search

NAS-Bench-201, ImageNet-16-120 - Search time (s)

Few-Shot Image Classification

Mini-Imagenet 5-way (10-shot) - Accuracy

Session-Based Recommendations

yoochoose1 - MRR@20

Session-Based Recommendations

yoochoose1 - Precision@20

Monocular Depth Estimation

Make3D - RMSE

RGB-D Salient Object Detection

RGBD135 - S-Measure

RGB-D Salient Object Detection

RGBD135 - Average MAE

RGB-D Salient Object Detection

RGBD135 - max F-Measure

Unsupervised Machine Translation

WMT2016 English-German - BLEU

Unsupervised Machine Translation

WMT2016 Romanian-English - BLEU

Unsupervised Machine Translation

WMT2014 French-English - BLEU

Unsupervised Machine Translation

WMT2016 English-Romanian - BLEU

Unsupervised Machine Translation

WMT2016 German-English - BLEU

Data-to-Text Generation

MULTIWOZ 2.1 - BLEU

Data-to-Text Generation

ToTTo - BLEU

Data-to-Text Generation

ToTTo - PARENT

K-complex detection

MASS SS2 - F1-score (@IoU = 0.3)

Face Swapping

FaceForensics++ - pose

Neural Architecture Search

CIFAR-10 Image Classification - Percentage error

Image Super-Resolution

FFHQ 1024 x 1024 - 4x upscaling - FID

Image Super-Resolution

FFHQ 1024 x 1024 - 4x upscaling - MS-SSIM

Image Super-Resolution

FFHQ 256 x 256 - 4x upscaling - FID

Image Super-Resolution

FFHQ 256 x 256 - 4x upscaling - MS-SSIM

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - PSNR

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - SSIM

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - MS-SSIM

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - FED

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - FID

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - LPIPS

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - NIQE

Face Hallucination

FFHQ 512 x 512 - 16x upscaling - FID

Face Hallucination

FFHQ 512 x 512 - 16x upscaling - LPIPS

Face Hallucination

FFHQ 512 x 512 - 16x upscaling - NIQE

Citation Intent Classification

SciCite - F1

Text Simplification

Newsela - SARI

End-To-End Dialogue Modelling

MULTIWOZ 2.0 - BLEU

Atari Games

Atari 2600 Berzerk - Score

Atari Games

Atari 2600 Private Eye - Score

Cross-Lingual NER

CoNLL German - F1

Extractive Text Summarization

CNN / Daily Mail - ROUGE-2

Extractive Text Summarization

CNN / Daily Mail - ROUGE-1

Extractive Text Summarization

CNN / Daily Mail - ROUGE-L

JPEG Artifact Correction

ICB (Quality 30 Color) - PSNR-B

JPEG Artifact Correction

ICB (Quality 30 Color) - SSIM

JPEG Artifact Correction

ICB (Quality 20 Color) - PSNR-B

JPEG Artifact Correction

ICB (Quality 20 Color) - SSIM

JPEG Artifact Correction

ICB (Quality 10 Grayscale) - PSNR

JPEG Artifact Correction

ICB (Quality 10 Grayscale) - PSNR-B

JPEG Artifact Correction

ICB (Quality 10 Grayscale) - SSIM

JPEG Artifact Correction

LIVE1 (Quality 10 Color) - SSIM

JPEG Artifact Correction

Classic5 (Quality 10 Grayscale) - SSIM

JPEG Artifact Correction

Classic5 (Quality 30 Grayscale) - SSIM

JPEG Artifact Correction

LIVE1 (Quality 20 Grayscale) - SSIM

JPEG Artifact Correction

ICB (Quality 20 Grayscale) - PSNR

JPEG Artifact Correction

ICB (Quality 20 Grayscale) - PSNR-B

JPEG Artifact Correction

ICB (Quality 10 Color) - PSNR-B

JPEG Artifact Correction

ICB (Quality 10 Color) - SSIM

JPEG Artifact Correction

Live1 (Quality 10 Grayscale) - SSIM

JPEG Artifact Correction

Classic5 (Quality 20 Grayscale) - SSIM

Image-to-Image Translation

ADE20K-Outdoor Labels-to-Photos - FID

Weakly Supervised Object Detection

COCO test-dev - AP50

Atari Games

Atari 2600 Boxing - Score

Video Super-Resolution

MSU Video Super Resolution Benchmark: Detail Restoration - FPS

Panoptic Segmentation

Indian Driving Dataset - PQ

Panoptic Segmentation

KITTI Panoptic Segmentation - PQ

Long-tail learning with class descriptors

SUN-LT - Per-Class Accuracy

Long-tail learning with class descriptors

SUN-LT - Long-Tailed Accuracy

Long-tail learning with class descriptors

AWA-LT - Long-Tailed Accuracy

Long-tail learning with class descriptors

CUB-LT - Per-Class Accuracy

Long-tail learning with class descriptors

CUB-LT - Long-Tailed Accuracy

Long-tail learning with class descriptors

ImageNet-LT-d - Per-Class Accuracy

Conversation Disentanglement

irc-disentanglement - VI

Conversation Disentanglement

irc-disentanglement - R

Conversation Disentanglement

irc-disentanglement - F

Multiple Object Tracking

SportsMOT - DetA

3D Object Reconstruction From A Single Image

BUFF - Point-to-surface distance (cm)

Atari Games

Atari 2600 Crazy Climber - Score

Atari Games

Atari 2600 HERO - Score

Atari Games

Atari 2600 Amidar - Score

Atari Games

Atari 2600 Venture - Score

Atari Games

Atari 2600 Yars Revenge - Score

Atari Games

Atari 2600 Gravitar - Score

Atari Games

Atari 2600 Kangaroo - Score

Atari Games

Atari 2600 Tutankham - Score

Atari Games

Atari 2600 Battle Zone - Score

Atari Games

Atari 2600 Solaris - Score

Atari Games

Atari 2600 Q*Bert - Score

Atari Games

Atari 2600 Star Gunner - Score

Image Super-Resolution

Urban100 - 4x upscaling - SSIM

Vehicle Speed Estimation

BrnoCompSpeed - Mean Speed Measurement Error (km/h)

Vehicle Speed Estimation

BrnoCompSpeed - Median Speed Measurement Error (km/h)

Graph Classification

ENZYMES - Accuracy

Text Classification

Amazon-2 - Error

Superpixel Image Classification

75 Superpixel MNIST - Classification Error

Layout-to-Image Generation

Visual Genome 256x256 - Inception Score

Layout-to-Image Generation

COCO-Stuff 64x64 - FID

Layout-to-Image Generation

COCO-Stuff 64x64 - Inception Score

Layout-to-Image Generation

COCO-Stuff 128x128 - SceneFID

Crowd Counting

TRANCOS - MAE

Video Saliency Detection

MSU Video Saliency Prediction - FPS

Unsupervised Video Object Segmentation

DAVIS 2017 (test-dev) - Jaccard (Decay)

Unsupervised Video Object Segmentation

DAVIS 2017 (test-dev) - F-measure (Decay)

Document Classification

Twitter - Accuracy

Keypoint Detection

COCO test-challenge - AR

Keypoint Detection

COCO test-challenge - ARM

Keypoint Detection

COCO test-challenge - AP

Keypoint Detection

COCO test-challenge - AP50

Keypoint Detection

COCO test-challenge - AP75

Keypoint Detection

COCO test-challenge - AR50

Keypoint Detection

COCO test-challenge - AR75

Keypoint Detection

COCO test-challenge - ARL

Video Prediction

SynpickVP - SSIM

Video Prediction

KTH - PSNR

Video Prediction

Cityscapes 128x128 - Cond.

Video Prediction

Cityscapes 128x128 - Pred

Dialogue Act Classification

Switchboard corpus - Accuracy

Action Segmentation

JIGSAWS - Edit Distance

Unsupervised Video Object Segmentation

DAVIS 2017 (val) - F-measure (Recall)

Unsupervised Machine Translation

WMT2014 English-French - BLEU

Sentence Compression

Google Dataset - F1

Generative Question Answering

CoQA - F1-Score

Unsupervised Video Object Segmentation

DAVIS 2017 (test-dev) - Jaccard (Mean)

Unsupervised Video Object Segmentation

DAVIS 2017 (test-dev) - Jaccard (Recall)

Unsupervised Video Object Segmentation

DAVIS 2017 (test-dev) - F-measure (Mean)

Unsupervised Video Object Segmentation

DAVIS 2017 (test-dev) - F-measure (Recall)

Unsupervised Video Object Segmentation

DAVIS 2017 (val) - Jaccard (Recall)

Negation Scope Resolution

*sem 2012 Shared Task: Sherlock Dataset - F1

Data-to-Text Generation

RotoWire (Content Ordering) - BLEU

Video Generation

UCF-101 16 frames, Unconditional, Single GPU - Inception Score

Gesture-to-Gesture Translation

Senz3D - PSNR

Gesture-to-Gesture Translation

Senz3D - IS

Gesture-to-Gesture Translation

Senz3D - AMT

Gesture-to-Gesture Translation

NTU Hand Digit - PSNR

Gesture-to-Gesture Translation

NTU Hand Digit - IS

Gesture-to-Gesture Translation

NTU Hand Digit - AMT

Scene Recognition

YUP++ - Accuracy (%)

Few-Shot Image Classification

OMNIGLOT - 1-Shot, 20-way - Accuracy

Multimodal Unsupervised Image-To-Image Translation

AFHQ - FID

Document Classification

Amazon - Accuracy

Document Classification

BBCSport - Accuracy

Synthetic-to-Real Translation

Syn2Real-C - Accuracy

Visual Dialog

Visual Dialog v1.0 test-std - NDCG (x 100)

Semi-Supervised Image Classification

SVHN, 1000 labels - Accuracy

Semi-Supervised Image Classification

cifar10, 250 Labels - Percentage correct

Image Manipulation Detection

CocoGlide - AUC

Cross-Lingual NER

CoNLL Dutch - F1

Atari Games

Atari 2600 Beam Rider - Score

Atari Games

Atari 2600 Bowling - Score

Atari Games

Atari 2600 Assault - Score

Atari Games

Atari 2600 River Raid - Score

Atari Games

Atari 2600 Frostbite - Score

Atari Games

Atari 2600 Zaxxon - Score

Atari Games

Atari 2600 Name This Game - Score

Atari Games

Atari 2600 Robotank - Score

Atari Games

Atari 2600 Alien - Score

Atari Games

Atari 2600 Fishing Derby - Score

Atari Games

Atari 2600 Time Pilot - Score

Satellite Image Classification

SAT-4 - Accuracy

Automated Theorem Proving

HolStep (Conditional) - Classification Accuracy

Reflection Removal

SIR^2(Objects) - SSIM

6D Pose Estimation using RGBD

YCB-Video - Mean ADD-S

Camera shot boundary detection

MSU Shot Boundary Detection Benchmark - F score

Camera shot boundary detection

MSU Shot Boundary Detection Benchmark - FPS

Chinese Named Entity Recognition

OntoNotes 4 - F1

Emotion Cause Extraction

ECE - F1

Open-Domain Question Answering

ELI5 - Rouge-1

Dialogue State Tracking

Wizard-of-Oz - Request

Image Super-Resolution

FFHQ 1024 x 1024 - 4x upscaling - SSIM

Grayscale Image Denoising

BSD200 sigma50 - PSNR

Grayscale Image Denoising

BSD200 sigma70 - PSNR

Grayscale Image Denoising

BSD200 sigma30 - PSNR

Atari Games

Atari 2600 Pong - Score

Unsupervised Person Re-Identification

MSMT17->DukeMTMC-reID - Rank-1

Unsupervised Person Re-Identification

MSMT17->DukeMTMC-reID - Rank-10

Unsupervised Person Re-Identification

MSMT17->DukeMTMC-reID - Rank-5

Unsupervised Person Re-Identification

MSMT17->DukeMTMC-reID - mAP

Unsupervised Person Re-Identification

DukeMTMC-reID->Market-1501 - Rank-5

Retinal OCT Disease Classification

OCT2017 - Acc

Retinal OCT Disease Classification

OCT2017 - Sensitivity

Retinal OCT Disease Classification

Srinivasan2014 - Acc

3D Object Detection

SUN-RGBD val - Inference Speed (s)

Image Super-Resolution

Urban100 - 8x upscaling - SSIM

License Plate Recognition

AOLP-RP - Average Recall

Text Simplification

TurkCorpus - BLEU

MS-SSIM

DocUNet - MS-SSIM

6D Pose Estimation using RGB

Occlusion LineMOD - Mean ADD

Skeleton Based Action Recognition

PKU-MMD - [email protected] (CV)

Skeleton Based Action Recognition

PKU-MMD - [email protected] (CS)

Text Classification

Sogou News - Accuracy

Cross-Lingual Document Classification

MLDoc Zero-Shot English-to-Chinese - Accuracy

Cross-Lingual Document Classification

MLDoc Zero-Shot English-to-French - Accuracy

Cross-Lingual Document Classification

MLDoc Zero-Shot English-to-Spanish - Accuracy

Cross-Lingual Document Classification

MLDoc Zero-Shot English-to-Russian - Accuracy

Homography Estimation

S-COCO - MACE

Image Super-Resolution

BSD100 - 4x upscaling - SSIM

Image Super-Resolution

Set14 - 4x upscaling - SSIM

Visual Storytelling

VIST - ROUGE-L

Cross-Lingual Document Classification

MLDoc Zero-Shot English-to-Italian - Accuracy

Session-Based Recommendations

yoochoose1/4 - HR@20

Emotion Recognition in Conversation

EC - Micro-F1

Lung Nodule Segmentation

LUNA - F1 score

Lung Nodule Segmentation

LUNA - AUC

Emotion Recognition in Conversation

SEMAINE - MAE (Valence)

Emotion Recognition in Conversation

SEMAINE - MAE (Power)

Text Style Transfer

Yelp Review Dataset (Small) - G-Score (BLEU, Accuracy)

Image Super-Resolution

PIRM-test - NIQE

Unsupervised Facial Landmark Detection

300W - NME

Unsupervised Facial Landmark Detection

AFLW-MTFL - NME

3D Semantic Segmentation

DALES - mIoU

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - Dice

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - Recall

Video Salient Object Detection

VOS-T - S-Measure

Video Salient Object Detection

VOS-T - max E-measure

Video Salient Object Detection

VOS-T - Average MAE

Multivariate Time Series Imputation

KDD CUP Challenge 2018 - MSE (10% missing)

Action Recognition In Videos

Something-Something V1 - Top 1 Accuracy

Visual Object Tracking

VOT2017 - Expected Average Overlap (EAO)

Open-Domain Question Answering

SearchQA - F1

Skeleton Based Action Recognition

J-HMDB - Accuracy (pose)

Skeleton Based Action Recognition

JHMDB (2D poses only) - Average accuracy of 3 splits

Image Super-Resolution

BSD100 - 4x upscaling - PSNR

Image Super-Resolution

Urban100 - 4x upscaling - PSNR

Image Super-Resolution

BSD100 - 2x upscaling - PSNR

Image Super-Resolution

Set5 - 4x upscaling - PSNR

Image Super-Resolution

Set5 - 4x upscaling - SSIM

Image Super-Resolution

Urban100 - 2x upscaling - PSNR

Image Super-Resolution

Urban100 - 2x upscaling - SSIM

Image Super-Resolution

Set5 - 2x upscaling - SSIM

Image Super-Resolution

Set14 - 2x upscaling - PSNR

Image Super-Resolution

Set14 - 2x upscaling - SSIM

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - Dice

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - IoU

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - Precision

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - Recall

Human Part Segmentation

PASCAL-Part - mIoU

Multivariate Time Series Forecasting

MuJoCo - MSE (10^-2, 50% missing)

Visual Object Tracking

VOT2017/18 - Expected Average Overlap (EAO)

3D Object Detection

KITTI Pedestrian Moderate val - AP

3D Object Detection

KITTI Pedestrian Easy val - AP

3D Object Detection

KITTI Pedestrian Hard val - AP

Grayscale Image Denoising

BSD68 sigma70 - PSNR

Color Image Denoising

CBSD68 sigma5 - PSNR

Dialogue State Tracking

SIMMC2.0 - Slot F1

Dialogue State Tracking

SIMMC2.0 - Act F1

Semi-Supervised Image Classification

CIFAR-10, 2000 Labels - Accuracy

Semi-Supervised Image Classification

SVHN, 500 Labels - Accuracy

Text Classification

Amazon-5 - Error

Text Classification

DBpedia - Error

Long-tail learning with class descriptors

AWA-LT - Per-Class Accuracy

Image Super-Resolution

VggFace2 - 8x upscaling - PSNR

Image Super-Resolution

Urban100 - 8x upscaling - PSNR

Image Super-Resolution

Manga109 - 8x upscaling - PSNR

Image Super-Resolution

Manga109 - 8x upscaling - SSIM

Multi-Label Text Classification

EUR-Lex - nDCG@5

Multi-Label Text Classification

EUR-Lex - P@5

Video Classification

YouTube-8M - Hit@1

Image Manipulation Detection

Columbia (OSN-transmitted - Wechat) - AUC

Image Manipulation Detection

DSO (OSN-transmitted - Facebook) - AUC

Image Manipulation Detection

DSO (OSN-transmitted - Facebook) - f-Score

Image Manipulation Detection

DSO (OSN-transmitted - Facebook) - Intersection over Union

Image Manipulation Detection

DSO (OSN-transmitted - Wechat) - f-Score

Image Manipulation Detection

DSO (OSN-transmitted - Wechat) - Intersection over Union

Image Manipulation Detection

DSO (OSN-transmitted - Weibo) - f-Score

Image Manipulation Detection

DSO (OSN-transmitted - Weibo) - Intersection over Union

Image Manipulation Detection

NIST (OSN-transmitted - Facebook) - f-Score

Image Manipulation Detection

NIST (OSN-transmitted - Facebook) - Intersection over Union

Video Salient Object Detection

DAVSOD-Difficult20 - S-Measure

Video Salient Object Detection

MCL - MAX F-MEASURE

Video Salient Object Detection

DAVSOD-Normal25 - S-Measure

Video Salient Object Detection

DAVSOD-Normal25 - max E-measure

Video Salient Object Detection

DAVSOD-Normal25 - Average MAE

Video Salient Object Detection

DAVSOD-easy35 - S-Measure

Video Salient Object Detection

DAVSOD-easy35 - max E-Measure

Video Salient Object Detection

DAVSOD-easy35 - Average MAE

Video Salient Object Detection

DAVSOD-easy35 - max F-Measure

Video Salient Object Detection

FBMS-59 - MAX E-MEASURE

UCCA Parsing

SemEval 2019 Task 1 - English-Wiki (open) F1

UCCA Parsing

SemEval 2019 Task 1 - English-20K (open) F1

Head Pose Estimation

BIWI - MAE-aligned (trained with other data)

Head Pose Estimation

BIWI - Geodesic Error - aligned (GE)

RGB Salient Object Detection

SOC - S-Measure

RGB Salient Object Detection

SOC - mean E-Measure

Object Detection

iSAID - Average Precision

Word Sense Induction

SemEval 2010 WSI - F-Score

Word Sense Induction

SemEval 2010 WSI - V-Measure

Word Sense Induction

SemEval 2010 WSI - AVG

Medical Image Classification

NCT-CRC-HE-100K - Accuracy (%)

Medical Image Classification

NCT-CRC-HE-100K - F1-Score

Medical Image Classification

NCT-CRC-HE-100K - Specificity

Automated Theorem Proving

HOList benchmark - Percentage correct

AMR Parsing

LDC2014T12: - F1 Newswire

AMR Parsing

LDC2014T12: - F1 Full

Action Recognition In Videos

Jester (Gesture Recognition) - Val

Point-interactive Image Colorization

ImageNet ctest10k - PSNR@100

Atari Games

Atari 2600 Skiing - Score

Atari Games

Atari 2600 Video Pinball - Score

Graph Classification

NEURON-MULTI - Accuracy

Graph Classification

NEURON-BINARY - Accuracy

Graph Classification

NEURON-Average - Accuracy

Stress-Strain Relation

Non-Linear Elasticity Benchmark - Time (ms)

Face Anti-Spoofing

CelebA-Spoof-Enroll5 - AUC

RGB Salient Object Detection

SOD - MAE

RGB Salient Object Detection

SBU - Balanced Error Rate

3D Semantic Segmentation

DALES - Overall Accuracy

RGB Salient Object Detection

UCF - Balanced Error Rate

Graph Classification

FRANKENSTEIN - Accuracy

Cross-View Image-to-Image Translation

Dayton (64x64) - ground-to-aerial - SSIM

Cross-View Image-to-Image Translation

Dayton (256×256) - ground-to-aerial - SSIM

Cross-View Image-to-Image Translation

Dayton (256×256) - aerial-to-ground - SSIM

Cross-View Image-to-Image Translation

Dayton (64×64) - aerial-to-ground - SSIM

Cross-View Image-to-Image Translation

Ego2Top - SSIM

Unsupervised Person Re-Identification

DukeMTMC-reID->Market-1501 - Rank-10

Unsupervised Person Re-Identification

Market-1501->MSMT17 - Rank-10

Unsupervised Person Re-Identification

DukeMTMC-reID->MSMT17 - Rank-10

Unsupervised Person Re-Identification

Market-1501->DukeMTMC-reID - Rank-10

Unsupervised Person Re-Identification

Market-1501->DukeMTMC-reID - Rank-5

Visual Dialog

VisDial v0.9 val - MRR

Visual Dialog

VisDial v0.9 val - Mean Rank

Visual Dialog

VisDial v0.9 val - R@1

Visual Dialog

VisDial v0.9 val - R@10

Visual Dialog

VisDial v0.9 val - R@5

Visual Dialog

Visual Dialog v1.0 test-std - R@5

Visual Dialog

Visual Dialog v1.0 test-std - R@10

Visual Dialog

Visual Dialog v1.0 test-std - Mean

Age Estimation

FGNET - MAE

Pose Transfer

Deep-Fashion - Retrieval Top10 Recall

Image Super-Resolution

Set14 - 8x upscaling - PSNR

Chinese Named Entity Recognition

OntoNotes 4 - Precision

Chinese Named Entity Recognition

OntoNotes 4 - Recall

Chinese Named Entity Recognition

Resume NER - Precision

Chinese Named Entity Recognition

Resume NER - Recall

Skeleton Based Action Recognition

JHMDB Pose Tracking - [email protected]

Skeleton Based Action Recognition

JHMDB Pose Tracking - [email protected]

Skeleton Based Action Recognition

JHMDB Pose Tracking - [email protected]

Skeleton Based Action Recognition

JHMDB Pose Tracking - [email protected]

Skeleton Based Action Recognition

JHMDB Pose Tracking - [email protected]

Weakly Supervised Object Detection

Charades - MAP

Data-to-Text Generation

E2E NLG Challenge - BLEU

Data-to-Text Generation

E2E NLG Challenge - NIST

Weakly Supervised Object Detection

HICO-DET - MAP

Unsupervised Anomaly Detection with Specified Settings -- 30% anomaly

Fashion-MNIST - AUC-ROC

Face Verification

Oulu-CASIA NIR-VIS - TAR @ FAR=0.001

Face Verification

Oulu-CASIA NIR-VIS - TAR @ FAR=0.01

Face Verification

CASIA NIR-VIS 2.0 - TAR @ FAR=0.001

Face Verification

BUAA-VisNir - TAR @ FAR=0.001

Aspect-Based Sentiment Analysis (ABSA)

Sentihood - Aspect

Aspect-Based Sentiment Analysis (ABSA)

Sentihood - Sentiment

Anomaly Detection In Surveillance Videos

ShanghaiTech Weakly Supervised - AUC-ROC

Point Cloud Registration

3DMatch (at least 30% overlapped - FCGF setting) - Recall (0.3m, 15 degrees)

Diffeomorphic Medical Image Registration

OASIS+ADIBE+ADHD200+MCIC+PPMI+HABS+HarvardGSP - Dice (Average)

Retinal Vessel Segmentation

ROSE-1 DVC - Dice Score

Dynamic Link Prediction

Enron Emails - AUC

Multimodal Activity Recognition

UTD-MHAD - Accuracy (CS)

Semi-Supervised Video Object Segmentation

YouTube - mIoU

Graph Classification

NCI109 - Accuracy

3D Face Reconstruction

Florence - Average 3D Error

Few-Shot Image Classification

OMNIGLOT - 1-Shot, 5-way - Accuracy

Image-guided Story Ending Generation

VIST-E - BLEU-3

Image-guided Story Ending Generation

VIST-E - BLEU-4

Visual Relationship Detection

VRD Predicate Detection - R@50

Multivariate Time Series Imputation

Basketball Players Movement - Path Length

Multivariate Time Series Imputation

Basketball Players Movement - OOB Rate (10^−3)

Multivariate Time Series Imputation

Basketball Players Movement - Step Change (10^−3)

Multivariate Time Series Imputation

PEMS-SF - L2 Loss (10^-4)

Hand Gesture Recognition

EgoGesture - Accuracy

Keypoint Detection

COCO test-challenge - APL

Face Identification

Trillion Pairs Dataset - Accuracy

Cross-Lingual Document Classification

MLDoc Zero-Shot English-to-Japanese - Accuracy

JPEG Artifact Correction

LIVE1 (Quality 40 Grayscale) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 30 Grayscale) - PSNR

Diffeomorphic Medical Image Registration

Automatic Cardiac Diagnosis Challenge (ACDC) - Grad Det-Jac

Multivariate Time Series Imputation

Basketball Players Movement - Player Distance

Color Image Denoising

Darmstadt Noise Dataset - PSNR (sRGB)

Color Image Denoising

Darmstadt Noise Dataset - SSIM (sRGB)

Crowd Counting

WorldExpo’10 - Average MAE

Crowd Counting

Venice - MAE

Video Super-Resolution

MSU Video Upscalers: Quality Enhancement - VMAF

Distant Speech Recognition

DIRHA English WSJ - Word Error Rate (WER)

Object Detection

COCO minival - AP50

Open-Domain Question Answering

SearchQA - Unigram Acc

Open-Domain Question Answering

SearchQA - N-gram F1

Data-to-Text Generation

E2E NLG Challenge - ROUGE-L

Data-to-Text Generation

E2E NLG Challenge - CIDEr

Conversation Disentanglement

irc-disentanglement - P

Face Detection

FDDB - AP

JPEG Artifact Correction

LIVE1 (Quality 20 Color) - PSNR-B

JPEG Artifact Correction

LIVE1 (Quality 20 Color) - SSIM

JPEG Artifact Correction

LIVE1 (Quality 10 Color) - PSNR-B

JPEG Artifact Correction

LIVE1 (Quality 20 Grayscale) - PSNR-B

JPEG Artifact Correction

ICB (Quality 20 Grayscale) - SSIM

JPEG Artifact Correction

Live1 (Quality 10 Grayscale) - PSNR-B

AMR Parsing

LDC2014T12 - F1 Newswire

Graph Classification

COX2 - Accuracy(10-fold)

Nuclear Segmentation

Cell17 - F1-score

Nuclear Segmentation

Cell17 - Dice

Nuclear Segmentation

Cell17 - Hausdorff

Diffeomorphic Medical Image Registration

OASIS+ADIBE+ADHD200+MCIC+PPMI+HABS+HarvardGSP - CPU (sec)

Face Detection

PASCAL Face - AP

Face Detection

Annotated Faces in the Wild - AP

Face Verification

BUAA-VisNir - TAR @ FAR=0.01

Image Super-Resolution

FFHQ 1024 x 1024 - 4x upscaling - PSNR

Image Super-Resolution

FFHQ 256 x 256 - 4x upscaling - PSNR

Image Super-Resolution

FFHQ 256 x 256 - 4x upscaling - SSIM

Video Salient Object Detection

DAVIS-2016 - MAX E-MEASURE

Video Salient Object Detection

DAVSOD-Difficult20 - Average MAE

Video Salient Object Detection

MCL - S-Measure

Video Salient Object Detection

MCL - MAX E-MEASURE

Video Salient Object Detection

MCL - AVERAGE MAE

Video Salient Object Detection

UVSD - S-Measure

Video Salient Object Detection

UVSD - max E-measure

Video Salient Object Detection

UVSD - Average MAE

Salt-And-Pepper Noise Removal

Kodak24 Noise Level 70% - PSNR

Salt-And-Pepper Noise Removal

Kodak24 Noise Level 50% - PSNR

Salt-And-Pepper Noise Removal

Kodak24 Noise Level 30% - PSNR

Photo geolocation estimation

Im2GPS - Street level (1 km)

Photo geolocation estimation

Im2GPS - City level (25 km)

3D Reconstruction

Data3D−R2N2 - 3DIoU

Weakly-Supervised Object Localization

ILSVRC 2016 - Top-5 Error

Multi-view Subspace Clustering

ORL - Accuracy

Unsupervised Facial Landmark Detection

AFLW (Zhang CVPR 2018 crops) - NME

Visual Object Tracking

VOT2016 - Expected Average Overlap (EAO)

Text Simplification

PWKP / WikiSmall - BLEU

Text Simplification

Newsela - BLEU

Atari Games

Atari 2600 Defender - Score

Video Quality Assessment

MSU SR-QA Dataset - SROCC

Video Quality Assessment

MSU SR-QA Dataset - PLCC

Video Alignment

MSU Video Alignment and Retrieval Benchmark Suite - Accuracy w/ 3 frames error (Light)

Video Alignment

MSU Video Alignment and Retrieval Benchmark Suite - Accuracy w/ 3 frames error (Medium geometric)

Video Alignment

MSU Video Alignment and Retrieval Benchmark Suite - Accuracy w/ 3 frames error (Medium color)

Video Alignment

MSU Video Alignment and Retrieval Benchmark Suite - Accuracy w/ 3 frames error (Hard)

Skeleton Based Action Recognition

J-HMDB - Accuracy (RGB+pose)

Video Salient Object Detection

DAVSOD-Difficult20 - max E-measure

Timex normalization

PNT - F1-Score

Multivariate Time Series Imputation

UCI localization data - MAE (10% missing)

Multivariate Time Series Imputation

Basketball Players Movement - Path Difference

Hyperspectral Image Classification

Pavia University - OA@15perclass

Semantic Role Labeling (predicted predicates)

CoNLL 2012 - F1

Video Prediction

SynpickVP - MSE

Multimodal Unsupervised Image-To-Image Translation

Edge-to-Handbags - Diversity

Multimodal Unsupervised Image-To-Image Translation

Edge-to-Shoes - Diversity

3D Shape Classification

Pix3D - R@1

3D Shape Classification

Pix3D - R@16

3D Shape Classification

Pix3D - R@2

3D Shape Classification

Pix3D - R@32

3D Shape Classification

Pix3D - R@4

3D Shape Classification

Pix3D - R@8

SMAC+

Off_Hard_sequential - Median Win Rate

SMAC+

Off_Near_parallel - Median Win Rate

Noisy Speech Recognition

CHiME real - Percentage error

3D Point Cloud Classification

Sydney Urban Objects - F1

Visual Object Tracking

OTB-50 - AUC

Video Prediction

Cityscapes 128x128 - Train

Skin Cancer Segmentation

Kaggle Skin Lesion Segmentation - F1 score

Skin Cancer Segmentation

Kaggle Skin Lesion Segmentation - AUC

Few-Shot 3D Point Cloud Classification

ModelNet40 10-way (20-shot) - Standard Deviation

Speech Synthesis

North American English - Mean Opinion Score

Table-to-Text Generation

WikiBio - BLEU

Unsupervised Image-To-Image Translation

Freiburg Forest Dataset - PSNR

Open-Domain Question Answering

Quasar - EM (Quasar-T)

Open-Domain Question Answering

Quasar - F1 (Quasar-T)

Video Prediction

KTH - Pred

Video Prediction

KTH - Train

Head Pose Estimation

BIWI - MAE (trained with BIWI data)

Liver Segmentation

LiTS2017 - Dice

Pancreas Segmentation

TCIA Pancreas-CT Dataset - Dice Score

Sketch-Based Image Retrieval

Handbags - R@1

Sketch-Based Image Retrieval

Handbags - R@10

Sketch-Based Image Retrieval

Chairs - R@1

Pedestrian Detection

TJU-Ped-campus - RS (miss rate)

Color Image Denoising

CBSD68 sigma50 - PSNR

Visual Relationship Detection

VRD Predicate Detection - R@100

Visual Relationship Detection

VRD Phrase Detection - R@100

Visual Relationship Detection

VRD Phrase Detection - R@50

Visual Relationship Detection

VRD Relationship Detection - R@100

Visual Relationship Detection

VRD Relationship Detection - R@50

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - LLE

Semantic Similarity

SICK - MSE

Semantic Similarity

SICK - Pearson Correlation

Semantic Similarity

SICK - Spearman Correlation

Image Super-Resolution

WebFace - 8x upscaling - PSNR

Sentence Compression

Google Dataset - CR

Image-guided Story Ending Generation

LSMDC-E - BLEU-2

Image-guided Story Ending Generation

LSMDC-E - ROUGE-L

SMAC+

Def_Infantry_sequential - Median Win Rate

SMAC+

Off_Superhard_parallel - Median Win Rate

Action Segmentation

JIGSAWS - Accuracy

AMR Parsing

LDC2015E86 - Smatch

Multimodal Activity Recognition

EV-Action - Accuracy

3D Part Segmentation

ShapeNet-Part - Instance Average IoU

Multimodal Unsupervised Image-To-Image Translation

Cats-and-Dogs - CIS

Multimodal Unsupervised Image-To-Image Translation

Cats-and-Dogs - IS

Few-Shot Image Classification

CUB 200 50-way (0-shot) - Accuracy

Keypoint Detection

Pascal3D+ - Mean PCK

Hand Gesture Recognition

ChaLearn val - Accuracy

Malware Detection

Android Malware Dataset - Accuracy

Conversation Disentanglement

Linux IRC (Ch2 Elsner) - 1-1

Conversation Disentanglement

Linux IRC (Ch2 Elsner) - Local

Conversation Disentanglement

Linux IRC (Ch2 Elsner) - Shen F-1

Conversation Disentanglement

Linux IRC (Ch2 Kummerfeld) - 1-1

Skeleton Based Action Recognition

Gaming 3D (G3D) - Accuracy

Hypernym Discovery

Music domain - MAP

Hypernym Discovery

Music domain - MRR

Hypernym Discovery

Music domain - P@5

Hypernym Discovery

General - MAP

Hypernym Discovery

General - MRR

Hypernym Discovery

General - P@5

Hypernym Discovery

Medical domain - MAP

Hypernym Discovery

Medical domain - MRR

Hypernym Discovery

Medical domain - P@5

Open-Domain Question Answering

SQuAD1.1 - EM

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - IoU

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - Precision

Image-to-Image Translation

Cityscapes Labels-to-Photo - Class IOU

Image-to-Image Translation

Cityscapes Photo-to-Labels - Class IOU

Atari Games

Atari 2600 Freeway - Score

Math Word Problem Solving

ALG514 - Accuracy (%)

Image Super-Resolution

BSD100 - 4x upscaling - MOS

Image Super-Resolution

Set14 - 4x upscaling - MOS

Medical Image Classification

NCT-CRC-HE-100K - Precision

Relation Classification

SemEval 2010 Task 8 - F1

Few-Shot Image Classification

OMNIGLOT - 5-Shot, 5-way - Accuracy

Semi-Supervised Video Object Segmentation

DAVIS 2016 - Jaccard (Decay)

Semi-Supervised Video Object Segmentation

DAVIS 2016 - F-measure (Decay)

Video Salient Object Detection

SegTrack v2 - max E-measure

Sketch-Based Image Retrieval

Chairs - R@10

Cross-lingual zero-shot dependency parsing

Universal Dependency Treebank - LAS

Atari Games

Atari 2600 Wizard of Wor - Score

Photo geolocation estimation

Im2GPS - Reference images

Atari Games

Atari 2600 Centipede - Score

Atari Games

Atari 2600 Ms. Pacman - Score

Multimodal Activity Recognition

MSR Daily Activity3D dataset - Accuracy

Video Prediction

KTH - Cond

Atari Games

Atari 2600 Double Dunk - Score

Cross-Lingual Document Classification

Reuters RCV1/RCV2 German-to-English - Accuracy

Cross-Lingual Document Classification

Reuters RCV1/RCV2 English-to-German - Accuracy

Document Dating

APW - Accuracy

BIRL

CIMA-10k - MMrTRE

BIRL

CIMA-10k - AMrTRE