ActivityNet-1.3 - AR@100

Temporal Action Proposal Generation

ActivityNet-1.3 - AUC (test)

Temporal Action Proposal Generation

THUMOS' 14 - AR@1000

Photoplethysmography (PPG) heart rate estimation


Photoplethysmography (PPG) heart rate estimation


Photoplethysmography (PPG) heart rate estimation

MMSE-HR - Pearson Correlation

6D Pose Estimation using RGBD

REAL275 - mAP 3DIou@75

Long-tail Learning

CIFAR-100-LT (ρ=100) - Error Rate

Few-Shot Image Classification

ORBIT Clutter Video Evaluation - Frame accuracy

Cross-Modal Retrieval

Flickr30k - Text-to-image R@1

Cross-Modal Retrieval

Flickr30k - Text-to-image R@10

Cross-Modal Retrieval

Flickr30k - Text-to-image R@5

3D Object Detection


3D Object Detection


3D Object Detection


Audio Super-Resolution

VCTK Multi-Speaker - Log-Spectral Distance

Audio Classification

VGGSound - Top 5 Accuracy

Multi-modal Classification

VGG-Sound - Top-1 Accuracy

Video Deinterlacing

MSU Deinterlacer Benchmark - FPS on CPU

Long-range modeling

LRA - ListOps

Long-range modeling

LRA - Avg

Sleep Stage Detection

Sleep-EDF - Macro-F1

Sleep Stage Detection

Sleep-EDF - Cohen's kappa

Sleep Stage Detection

SHHS - Cohen's Kappa

Sleep Stage Detection

SHHS - Macro-F1

RGB Salient Object Detection

ECSSD - S-Measure

Point Cloud Completion

ShapeNet-ViPC - Chamfer Distance

RGB Salient Object Detection


RGB Salient Object Detection


RGB Salient Object Detection

HKU-IS - F-measure

RGB Salient Object Detection

HKU-IS - S-Measure

Dichotomous Image Segmentation


Dichotomous Image Segmentation


Dichotomous Image Segmentation


Dichotomous Image Segmentation

DIS-TE4 - S-Measure

Dichotomous Image Segmentation


Few-Shot Image Classification

Dirichlet Mini-Imagenet (5-way, 1-shot) - 1:1 Accuracy

Few-Shot Image Classification

Dirichlet Tiered-Imagenet (5-way, 1-shot) - 1:1 Accuracy

Few-Shot Image Classification

FC100 5-way (5-shot) - Accuracy

Few-Shot Image Classification

Dirichlet CUB-200 (5-way, 1-shot) - 1:1 Accuracy

Few-Shot Image Classification

Dirichlet CUB-200 (5-way, 5-shot) - 1:1 Accuracy

Few-Shot Image Classification

FC100 5-way (1-shot) - Accuracy

Few-Shot Image Classification

Dirichlet Mini-Imagenet (5-way, 5-shot) - 1:1 Accuracy

Building change detection for remote sensing images


Domain Generalization

Stylized-ImageNet - Top 1 Accuracy

Action Segmentation

GTEA - F1@10%

Action Segmentation

GTEA - F1@50%

Action Segmentation

GTEA - Acc

Action Segmentation

GTEA - F1@25%

Action Segmentation

50 Salads - Edit

Action Anticipation

EPIC-KITCHENS-55 (Unseen test set (S2) - Top 1 Accuracy - Verb

Action Anticipation

EPIC-KITCHENS-55 (Unseen test set (S2) - Top 1 Accuracy - Noun

Action Anticipation

EPIC-KITCHENS-55 (Unseen test set (S2) - Top 1 Accuracy - Act.

Action Anticipation

EPIC-KITCHENS-55 (Unseen test set (S2) - Top 5 Accuracy - Verb

Action Anticipation

EPIC-KITCHENS-55 (Unseen test set (S2) - Top 5 Accuracy - Noun

Action Anticipation

EPIC-KITCHENS-55 (Seen test set (S1)) - Top 1 Accuracy - Noun

Action Anticipation

EPIC-KITCHENS-55 (Seen test set (S1)) - Top 1 Accuracy - Act.

Action Anticipation

EPIC-KITCHENS-55 (Seen test set (S1)) - Top 5 Accuracy - Verb

Action Anticipation

EPIC-KITCHENS-55 (Seen test set (S1)) - Top 5 Accuracy - Noun

Action Anticipation

EPIC-KITCHENS-55 (Seen test set (S1)) - Top 5 Accuracy - Act.

Multimodal Intent Recognition

MIntRec - Accuracy (20 classes)

Molecular Property Prediction

ToxCast - ROC-AUC

Molecular Property Prediction


Molecular Property Prediction


Molecular Property Prediction

Lipophilicity - RMSE

Molecular Property Prediction


Molecular Property Prediction


Pedestrian Detection


RGB Salient Object Detection

PASCAL-S - S-Measure

Few-Shot Object Detection

MS-COCO (1-shot) - AP

Molecular Property Prediction

ClinTox - Molecules (M)

Nested Named Entity Recognition

ACE 2005 - F1

Dialog Relation Extraction

DialogRE - F1 (v1)

Dialog Relation Extraction

DialogRE - F1c (v1)

Video Denoising

Set8 sigma10 - PSNR

Video Denoising

DAVIS sigma20 - PSNR

Video Denoising

Set8 sigma50 - PSNR

Video Denoising

DAVIS sigma30 - PSNR

Video Denoising

Set8 sigma20 - PSNR

Video Denoising

DAVIS sigma40 - PSNR

Video Denoising

Set8 sigma40 - PSNR

Video Denoising

DAVIS sigma10 - PSNR

Video Denoising

Set8 sigma30 - PSNR

Video Denoising

DAVIS sigma50 - PSNR

Stereo Image Super-Resolution

Middlebury - 2x upscaling - PSNR

Stereo Image Super-Resolution

Middlebury - 4x upscaling - PSNR

Stereo Image Super-Resolution

KITTI2012 - 4x upscaling - PSNR

Stereo Image Super-Resolution

Flickr1024 - 2x upscaling - PSNR

Stereo Image Super-Resolution

KITTI2015 - 2x upscaling - PSNR

Stereo Image Super-Resolution

Flickr1024 - 4x upscaling - PSNR

Drivable Area Detection

BDD100K val - Params (M)

Cross-Modal Retrieval

COCO 2014 - Image-to-text R@1

Cross-Modal Retrieval

COCO 2014 - Image-to-text R@5

Cross-Modal Retrieval

COCO 2014 - Text-to-image R@5

Few-Shot Image Classification

Mini-ImageNet-CUB 5-way (1-shot) - Accuracy

Few-Shot Image Classification

Mini-ImageNet-CUB 5-way (5-shot) - Accuracy

Earth Surface Forecasting

EarthNet2021 Extreme Track - EarthNetScore

Visual Storytelling


Semi-Supervised Image Classification

ImageNet - 1% labeled data - Top 5 Accuracy

Anomaly Detection In Surveillance Videos

XD-Violence - AP

3D Instance Segmentation

PartNet - mAP50

Few-Shot Semantic Segmentation

FSS-1000 (1-shot) - Mean IoU

Few-Shot Semantic Segmentation

FSS-1000 (5-shot) - Mean IoU

Multi-Object Tracking

TAO - ClsA

Video Quality Assessment

MSU SR-QA Dataset - KLCC

Retinal Vessel Segmentation

ROSE-1 SVC - Dice Score

Retinal Vessel Segmentation

ROSE-2 - Dice Score

Retinal Vessel Segmentation

ROSE-1 SVC-DVC - Dice Score

Learning with noisy labels

ANIMAL - Accuracy

New Product Sales Forecasting


New Product Sales Forecasting


Low-Light Image Enhancement


3D Object Detection From Monocular Images

Waymo Open Dataset - 3D mAPH Vehicle (Front Camera Only)

Category-Agnostic Pose Estimation

MP100 - Mean [email protected] - 1shot

Learning with noisy labels

CIFAR-10N-Random1 - Accuracy (mean)

Learning with noisy labels

CIFAR-10N-Aggregate - Accuracy (mean)

Learning with noisy labels

CIFAR-10N-Worst - Accuracy (mean)

Garment Reconstruction

4D-DRESS - Chamfer (cm)

Garment Reconstruction


Facial Attribute Classification

bFFHQ - Bias-Conflicting Accuracy

Text-to-Image Generation

LHQC - Block-FID

3D Room Layouts From A Single RGB Panorama

PanoContext - 3DIoU

3D Room Layouts From A Single RGB Panorama

Stanford2D3D Panoramic - 3DIoU

3D Room Layouts From A Single RGB Panorama

Stanford2D3D Panoramic - Pixel Error

Facial Attribute Classification

LFWA - Error Rate

Monocular 3D Object Detection

KITTI Cars Hard - AP Hard

Monocular 3D Object Detection

KITTI Cars Easy - AP Easy

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - FPS

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D17 test (G)

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D17 test (J)

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D17 test (F)

Point-interactive Image Colorization

ImageNet ctest10k - PSNR@10

Point-interactive Image Colorization

ImageNet ctest10k - PSNR@1

Image Denoising

SID SonyA7S2 x100 - PSNR (Raw)

Age Estimation

ChaLearn 2016 - e-error

Age Estimation

ChaLearn 2015 - e-error

Age Estimation

ChaLearn 2015 - MAE

Earth Surface Forecasting

EarthNet2021 OOD Track - EarthNetScore

Earth Surface Forecasting

EarthNet2021 IID Track - EarthNetScore

Group Activity Recognition

Collective Activity - Accuracy

Audio Classification

Speech Commands - Accuracy

3D Reconstruction

ShapeNet - Chamfer Distance

Retinal Vessel Segmentation

DRIVE - sensitivity


Abt-Buy - Candidate Set Size

Open Vocabulary Attribute Detection

OVAD benchmark - mean average precision

Heterogeneous Node Classification

OAG-Venue - NDCG

Heterogeneous Node Classification

OAG-Venue - MRR

Graph Classification

Mutagenicity - Accuracy

Retinal Vessel Segmentation

CHASE_DB1 - Sensitivity

Retinal Vessel Segmentation


Video Anomaly Detection

HR-UBnormal - AUC

Multi-label zero-shot learning

Open Images V4 - MAP

Multi-label zero-shot learning


Network Intrusion Detection

CICIDS2017 - Avg F1

Network Intrusion Detection

CICIDS2017 - Precision

Face Anti-Spoofing

SiW (Protocol 3) - ACER

Object Detection In Indoor Scenes

SUN RGB-D - AP 0.5

Semi-Supervised Image Classification

STL-10, 1000 Labels - Accuracy

Surgical tool detection

Cholec80 - mAP

Semi-Supervised Semantic Segmentation

ScribbleKITTI - mIoU (20% Labels)

Semi-Supervised Semantic Segmentation

ScribbleKITTI - mIoU (50% Labels)

Spoken Language Understanding

Snips-SmartLights - Accuracy (%)

Spoken Language Understanding

Fluent Speech Commands - Accuracy (%)

Video Quality Assessment


Math Word Problem Solving

MATH - Parameters (Billions)

Edge Detection


Text Classification

arXiv-10 - Accuracy

Conditional Image Generation

ImageNet 256x256 - FID

Text-to-Image Generation


3D Lane Detection

Apollo Synthetic 3D Lane - Z error near

Video Frame Interpolation

SNU-FILM (hard) - PSNR

Video Frame Interpolation

SNU-FILM (hard) - SSIM

Video Frame Interpolation

SNU-FILM (extreme) - PSNR

Video Frame Interpolation

SNU-FILM (extreme) - SSIM

Object Detection


Zero-Shot Video Question Answer

TVQA - Accuracy

Emotion Recognition in Conversation

DailyDialog - Macro F1

Unsupervised Semantic Segmentation with Language-image Pre-training

COCO-Stuff-27 - mIoU

Hand Pose Estimation

MSRA Hands - Average 3D Error

Face Anti-Spoofing

Replay-Attack - EER

Face Anti-Spoofing

Replay-Attack - HTER

JPEG Artifact Correction

LIVE1 (Quality 30 Grayscale) - SSIM

Color Image Denoising

CBSD68 sigma75 - PSNR

Long-range modeling

LRA - Pathfinder

Photoplethysmography (PPG) heart rate estimation


Photoplethysmography (PPG) heart rate estimation


Photoplethysmography (PPG) heart rate estimation

UBFC-rPPG - Pearson Correlation

Referring Expression Segmentation

PhraseCut - Mean IoU

Referring Image Matting (RefMatte-RW100)

RefMatte - SAD

Referring Image Matting (RefMatte-RW100)

RefMatte - MSE

Referring Image Matting (RefMatte-RW100)

RefMatte - MAD

Referring Image Matting (RefMatte-RW100)

RefMatte - SAD(E)

Referring Image Matting (RefMatte-RW100)

RefMatte - MSE(E)

Referring Image Matting (RefMatte-RW100)

RefMatte - MAD(E)

Speech Synthesis

LibriTTS - MCD

Speech Synthesis

LibriTTS - V/UV F1

Speech Synthesis


Text-to-Image Generation

Conceptual Captions - FID

Online Action Detection


3D Semantic Segmentation

OpenTrench3D - mIoU

3D Semantic Segmentation

OpenTrench3D - mAcc

Online Action Detection

TVSeries - mCAP

Monocular Depth Estimation

KITTI Eigen split unsupervised - RMSE

Monocular Depth Estimation

KITTI Eigen split unsupervised - Sq Rel

Document-level Event Extraction

ChFinAnn - F1

Atari Games

atari game - Human World Record Breakthrough

Atari Games

Atari 2600 Phoenix - Score

Atari Games

Atari 2600 Space Invaders - Score

Atari Games

Atari 2600 Pitfall! - Score

Atari Games

Atari 2600 Atlantis - Score

Atari Games

Atari 2600 Gopher - Score

Atari Games

Atari 2600 Breakout - Score

Atari Games

Atari 2600 Road Runner - Score

Atari Games

Atari 2600 Asterix - Score

Atari Games

Atari 2600 Kung-Fu Master - Score

Atari Games

Atari 2600 Ice Hockey - Score

Atari Games

Atari 2600 Krull - Score

Atari Games

Atari 2600 Asteroids - Score

Atari Games

Atari 2600 Seaquest - Score

Atari Games

Atari 2600 James Bond - Score

Atari Games

Atari 2600 Demon Attack - Score

Age Estimation

Adience - Accuracy

Aesthetics Quality Assessment

Image Aesthetics dataset - Accuracy

Aesthetics Quality Assessment

Image Aesthetics dataset - MAE

Cloud Removal


Hyperspectral Image Classification

Pavia University - Overall Accuracy

Video Super-Resolution

Vid4 - 4x upscaling - BD degradation - PSNR

Video Super-Resolution

Vid4 - 4x upscaling - BD degradation - SSIM

Entity Resolution

WDC Watches-xlarge - F1 (%)

Named Entity Recognition In Vietnamese

PhoNER COVID19 - F1 (%)

Unsupervised Facial Landmark Detection


Co-Salient Object Detection

CoCA - S-measure

Co-Salient Object Detection

CoCA - max F-measure

Co-Salient Object Detection

CoCA - mean E-measure

Co-Salient Object Detection

CoCA - Mean F-measure

Co-Salient Object Detection

CoCA - max E-measure

Co-Salient Object Detection


Co-Salient Object Detection

CoSOD3k - max E-measure

Co-Salient Object Detection


Video Frame Interpolation

Middlebury - Interpolation Error

Monocular Depth Estimation

KITTI Eigen split - Delta < 1.25^2

Monocular Depth Estimation

KITTI Eigen split - Delta < 1.25^3

Automated Theorem Proving

miniF2F-test - Pass@1

Unsupervised Facial Landmark Detection

MAFL Unaligned - NME

Relation Classification


Video Super-Resolution

Vimeo90K - PSNR

Spectral Reconstruction


Spectral Reconstruction


Spectral Reconstruction

Real HSI - User Study Score

Spectral Reconstruction


Spectral Reconstruction


Speech Dereverberation


Speech Dereverberation


Complex Query Answering

FB15k - MRR 2p

Dialogue State Tracking

CoSQL - question match accuracy

Dialogue State Tracking

CoSQL - interaction match accuracy

Scene Recognition

AID - Accuracy

Domain Generalization

ImageNet-Sketch - Top-1 accuracy

Hate Speech Detection

Ethos Binary - F1-score

Multi-Object Tracking

HiEve - IDF1

Causal Emotion Entailment

RECCON - Macro F1

Spoken Language Understanding

Spoken-SQuAD - F1 score

Action Recognition

RareAct - mWAP

Grayscale Image Denoising

BSD68 sigma15 - PSNR

Color Image Denoising

CBSD68 sigma35 - PSNR

Color Image Denoising

CBSD68 sigma15 - PSNR

Grayscale Image Denoising

BSD68 sigma50 - PSNR

Few Shot Action Recognition

Something-Something-100 - 1:1 Accuracy

Aspect Sentiment Triplet Extraction

ASTE-Data-V2 - F1

Motion Synthesis

InterHuman - MModality

Motion Synthesis

Inter-X - MMDist

Dynamic Link Prediction

Enron Emails - AP

Few-Shot Image Classification

Dirichlet Tiered-Imagenet (5-way, 5-shot) - 1:1 Accuracy

Text-to-Image Generation

Oxford 102 Flowers - Inception score

Stereo Image Super-Resolution

KITTI2015 - 4x upscaling - PSNR

Burst Image Super-Resolution


Burst Image Super-Resolution

SyntheticBurst - PSNR

Burst Image Super-Resolution

SyntheticBurst - SSIM

Burst Image Super-Resolution

SyntheticBurst - LPIPS

Spectral Reconstruction


Spectral Reconstruction


Spectral Reconstruction


Cross-Lingual Natural Language Inference

XNLI - Accuracy

Multiview Gait Recognition

CASIA-B - Accuracy (Cross-View, Avg)

Multiview Gait Recognition

CASIA-B - NM#5-6

Multiview Gait Recognition

CASIA-B - BG#1-2

Multiview Gait Recognition

CASIA-B - CL#1-2

KG-to-Text Generation

WebNLG 2.0 (Unconstrained) - BLEU

KG-to-Text Generation

WebNLG 2.0 (Unconstrained) - ROUGE

Motion Forecasting

Argoverse CVPR 2020 - minADE (K=1)

Motion Forecasting

Argoverse CVPR 2020 - minFDE (K=1)

Thermal Image Segmentation

RGB-T-Glass-Segmentation - MAE

Action Triplet Recognition

CholecT50 (Challenge) - mAP

Video Panoptic Segmentation

Cityscapes-VPS - VPQ (thing)

Image Super-Resolution

IXI - SSIM for 2x T2w

Image Super-Resolution

IXI - PSNR 2x T2w

Image Super-Resolution

IXI - PSNR 4x T2w

Image Dehazing

RS-Haze - PSNR

Image Dehazing

RS-Haze - SSIM

Image Dehazing


Image Dehazing


Image Denoising

SID SonyA7S2 x250 - SSIM (Raw)

Image Denoising

SID x100 - SSIM

Image Denoising

SID x300 - SSIM

Video Generation

UCF-101 16 frames, 64x64, Unconditional - Inception Score

Video Generation

UCF-101 16 frames, 64x64, Unconditional - FID

Video Retrieval

MSVD - text-to-video R@10

Video Retrieval

MSVD - text-to-video Mean Rank

Video Retrieval

MSVD - video-to-text R@5

Video Retrieval

LSMDC - text-to-video R@5

Video Retrieval

LSMDC - text-to-video R@10

Video Retrieval

LSMDC - text-to-video Median Rank

Video Retrieval

LSMDC - video-to-text R@5

Video Retrieval

LSMDC - video-to-text R@10

Video Retrieval

LSMDC - video-to-text Median Rank

Video Retrieval

LSMDC - text-to-video Mean Rank

Video Retrieval

LSMDC - video-to-text Mean Rank

Video Retrieval

MSR-VTT-1kA - text-to-video R@1

Video Retrieval

MSR-VTT-1kA - text-to-video R@5

Video Retrieval

MSR-VTT-1kA - text-to-video R@10

Video Retrieval

MSR-VTT-1kA - video-to-text R@1

Video Retrieval

MSR-VTT-1kA - video-to-text R@5

Video Retrieval

MSR-VTT-1kA - video-to-text R@10

3D Face Reconstruction

Florence - RMSE Indoor

3D Face Reconstruction

Florence - RMSE Outdoor

3D Face Reconstruction

NoW Benchmark - Mean Reconstruction Error (mm)

3D Face Reconstruction

NoW Benchmark - Stdev Reconstruction Error (mm)

3D Face Reconstruction

NoW Benchmark - Median Reconstruction Error

3D Object Detection From Stereo Images

KITTI Cars Moderate - AP75

3D Object Detection From Stereo Images

KITTI Cyclists Moderate - AP50

3D Object Detection From Stereo Images

KITTI Pedestrians Moderate - AP50

Few-Shot Image Classification

CUB 200 5-way 1-shot - Accuracy

Few-Shot Image Classification

CIFAR-FS 5-way (1-shot) - Accuracy

Multiple Choice Question Answering (MCQA)

BIG-bench (Novel Concepts) - Accuracy

Temporal Relation Classification


Dense Object Detection

SKU-110K - AP

Multiple Object Tracking


Online Multi-Object Tracking


Image Inpainting

CelebA-HQ - FID

Image Inpainting

Places2 - P-IDS

Image Inpainting

Places2 - U-IDS

Hand Pose Estimation

HANDS 2019 - Average 3D Error

Hand Pose Estimation

NYU Hands - Average 3D Error

Hand Pose Estimation

ICVL Hands - Average 3D Error

Face Verification

CFP-FP - Accuracy

Face Verification

AgeDB-30 - Accuracy

Knowledge Graph Completion

DBP-5L (Greek) - MRR

Knowledge Graph Completion

DPB-5L (French) - MRR

Knowledge Graph Completion

DBP-5L (English) - MRR

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (Ego4D) Visible

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (VISOR) Occ

3D Hand Pose Estimation

HInt: Hand Interactions in the wild - [email protected] (Ego4D) Occ

Affordance Recognition

HICO-DET(Unknown Concepts) - COCO-Val2017

Affordance Recognition

HICO-DET(Unknown Concepts) - Obj365

Affordance Recognition

HICO-DET(Unknown Concepts) - HICO

Affordance Recognition

HICO-DET(Unknown Concepts) - Novel Classes

Age Estimation

MORPH album2 (Caucasian) - MAE

Face Verification

IJB-C - TAR @ FAR=1e-3

Autonomous Driving

CARLA Leaderboard - Route Completion

Image Inpainting

Places2 - FID

Image Inpainting

Places2 - LPIPS

Nested Named Entity Recognition


3D Object Detection

V2XSet - AP0.5 (Perfect)

3D Object Detection

V2XSet - AP0.5 (Noisy)

3D Object Detection

V2XSet - AP0.7 (Noisy)

Lane Detection

TuSimple - F1 score

Lane Detection


Video Prediction


Video Prediction


Video Prediction

SynpickVP - LPIPS

Video Prediction

SynpickVP - PSNR

Video Prediction


Video Prediction


Document Image Classification

Tobacco-3482 - Accuracy

Neural Architecture Search

NAS-Bench-201, CIFAR-10 - Accuracy (Test)

Neural Architecture Search

NAS-Bench-201, CIFAR-10 - Accuracy (Val)

Neural Architecture Search

NAS-Bench-201, ImageNet-16-120 - Accuracy (Val)

AMR Parsing

The Little Prince - Smatch

AMR Parsing

New3 - Smatch

6D Pose Estimation using RGBD

REAL275 - mAP 10, 10cm

MRI Reconstruction

fastMRI Knee Val 8x - Params (M)

MRI Reconstruction

fastMRI Knee 8x - PSNR

Video Retrieval

MSR-VTT - text-to-video Mean Rank

Semi-Supervised Image Classification

CIFAR-10, 4000 Labels - Percentage error

Domain Generalization

ImageNet-A - Top-1 accuracy %

Gesture Generation


Video Salient Object Detection

DAVIS-2016 - S-Measure

Video Salient Object Detection


Video Salient Object Detection


Video Salient Object Detection

SegTrack v2 - S-Measure

Video Salient Object Detection

SegTrack v2 - AVERAGE MAE

Video Salient Object Detection

SegTrack v2 - MAX F-MEASURE

Video Salient Object Detection

ViSal - S-Measure

Video Salient Object Detection

ViSal - max E-measure

Video Salient Object Detection

ViSal - Average MAE

Video Salient Object Detection

FBMS-59 - S-Measure

Video Salient Object Detection


Video Salient Object Detection


Camouflaged Object Segmentation

PCOD_1200 - S-Measure

6D Pose Estimation using RGBD

REAL275 - mAP 3DIou@50

Stereo Depth Estimation

Spring - 1px total

Layout-to-Image Generation

Visual Genome 128x128 - Inception Score

Semi-Supervised Image Classification

CIFAR-100, 2500 Labels - Percentage error

Semi-Supervised Image Classification

cifar-100, 10000 Labels - Percentage error

Video Frame Interpolation

SNU-FILM (medium) - PSNR

Video Frame Interpolation

SNU-FILM (easy) - PSNR

Data-to-Text Generation

MLB Dataset (Relation Generation) - Precision

Data-to-Text Generation

MLB Dataset (Content Ordering) - DLD

Data-to-Text Generation

MLB Dataset - BLEU

Data-to-Text Generation

RotoWire (Relation Generation) - count

Learning with noisy labels

CIFAR-10N-Random3 - Accuracy (mean)

Learning with noisy labels

CIFAR-10N-Random2 - Accuracy (mean)

Small Data Image Classification

CIFAR-10, 500 Labels - Accuracy (%)

Text-to-Image Generation


Point Cloud Completion

Completion3D - Chamfer Distance

Point Cloud Registration

KITTI (FCGF setting) - Recall (0.6m, 5 degrees)

Point Cloud Registration

3DLoMatch (10-30% overlap) - Recall ( correspondence RMSE below 0.2)

Motion Forecasting

Argoverse CVPR 2020 - DAC (K=6)

Image Dehazing

I-Haze - SSIM

Image Dehazing

Dense-Haze - PSNR

Visual Entailment

SNLI-VE val - Accuracy

Visual Entailment

SNLI-VE test - Accuracy

Active Learning

CIFAR10 (10,000) - Accuracy

Multi-Frame Super-Resolution

PROBA-V - Normalized cPSNR

Entity Resolution

WDC Computers-xlarge - F1 (%)

Video Super-Resolution

MSU Video Super Resolution Benchmark: Detail Restoration - Subjective score

Video Super-Resolution

MSU Video Super Resolution Benchmark: Detail Restoration - ERQAv1.0

Video Super-Resolution

MSU Video Super Resolution Benchmark: Detail Restoration - QRCRv1.0

Video Super-Resolution

MSU Video Super Resolution Benchmark: Detail Restoration - SSIM

Video Super-Resolution

MSU Video Super Resolution Benchmark: Detail Restoration - PSNR

Video Super-Resolution

MSU Video Upscalers: Quality Enhancement - PSNR

Point Cloud Registration

3DMatch (at least 30% overlapped - sample 5k interest points) - Recall ( correspondence RMSE below 0.2)

Point Cloud Registration

3DMatch (at least 30% overlapped - FCGF setting) - RE (all)

Point Cloud Registration

3DMatch (at least 30% overlapped - FCGF setting) - TE (all)

Motion Synthesis

LaFAN1 - L2Q@5

Motion Synthesis

LaFAN1 - L2Q@15

Motion Synthesis

LaFAN1 - L2Q@30

Motion Synthesis

LaFAN1 - L2P@5

Motion Synthesis

LaFAN1 - L2P@15

Motion Synthesis


Motion Synthesis

LaFAN1 - NPSS@15

Motion Synthesis

LaFAN1 - NPSS@30

Motion Synthesis

LaFAN1 - L2P@30

Task-Oriented Dialogue Systems

KVRET - Entity F1

Audio Classification

ICBHI Respiratory Sound Database - Sensitivity

Image Super-Resolution

BSD100 - 8x upscaling - PSNR

Few-Shot Semantic Segmentation

FSS-1000 - Mean IoU

Long-range modeling


Long-range modeling


Long-range modeling


Long-range modeling


Open-World Semi-Supervised Learning

CIFAR-10 - Seen accuracy (50% Labeled)

Open-World Semi-Supervised Learning

ImageNet-100 - Novel accuracy (50% Labeled)

Face Anti-Spoofing

SiW-Enroll5 - AUC

Medical Object Detection

DeepLesion - Sensitivity

Image Manipulation Detection

Columbia (OSN-transmitted - Whatsapp) - AUC

Image Manipulation Detection

DSO (OSN-transmitted - Wechat) - AUC

Image Manipulation Detection

DSO (OSN-transmitted - Whatsapp) - AUC

Image Manipulation Detection

DSO (OSN-transmitted - Weibo) - AUC

Image Manipulation Detection

Columbia (OSN-transmitted - Facebook) - AUC

Supervised Video Summarization

SumMe - F1-score (Augmented)

Point Cloud Registration

FP-T-E - Recall (3cm, 10 degrees)

Point Cloud Registration

FP-O-H - Recall (3cm, 10 degrees)

Point Cloud Registration

FP-O-E - Recall (3cm, 10 degrees)

3D Lane Detection

Apollo Synthetic 3D Lane - Z error far

Supervised Video Summarization

TvSum - Spearman's Rho

Emotion Recognition in Conversation

DailyDialog - Micro-F1

Data-to-Text Generation

MLB Dataset (Relation Generation) - count

Data-to-Text Generation

MLB Dataset (Content Selection) - Precision

Data-to-Text Generation

MLB Dataset (Content Selection) - Recall

3D Human Reconstruction

CAPE - Chamfer (cm)

3D Human Reconstruction

CAPE - P2S (cm)

3D Human Reconstruction


3D Human Reconstruction

CustomHumans - Normal Consistency

Hand Gesture Recognition

NVGesture - Accuracy

Multi-Document Summarization

Multi-News - ROUGE-SU4

RGB Salient Object Detection

ECSSD - F-measure

RGB Salient Object Detection

DUT-OMRON - F-measure

AMR Parsing

LDC2020T02 - Smatch

AMR Parsing

Bio - Smatch

AMR Parsing

LDC2017T10 - Smatch

RGB Salient Object Detection

PASCAL-S - F-measure

Monocular Depth Estimation

Make3D - Abs Rel

Monocular Depth Estimation

Make3D - Sq Rel

Few Shot Action Recognition

HMDB51 - 1:1 Accuracy

3D Point Cloud Classification

IntrA - F1 score (5-fold)

Grayscale Image Denoising

Urban100 sigma50 - PSNR

Grayscale Image Denoising

Urban100 sigma25 - PSNR

Face Parsing

CelebAMask-HQ - Mean F1

Face Parsing

LaPa - Mean F1

3D Hand Pose Estimation


Monocular Depth Estimation

VA (Virtual Apartment) - Root mean square error (RMSE)

Monocular Depth Estimation

VA (Virtual Apartment) - Log root mean square error (RMSE_log)

Monocular Depth Estimation

VA (Virtual Apartment) - Mean average error (MAE)

Monocular Depth Estimation

VA (Virtual Apartment) - Absolute relative error (AbsRel)

Edge Detection


Edge Detection


Low-Light Image Enhancement


Text-to-Image Generation

MS COCO - Inception score

Learning with noisy labels

CIFAR-100N - Accuracy (mean)

Partial Domain Adaptation

VisDA2017 - Accuracy (%)

Partial Domain Adaptation

ImageNet-Caltech - Accuracy (%)

Supervised Video Summarization

SumMe - F1-score (Canonical)

End-To-End Dialogue Modelling

MULTIWOZ 2.0 - MultiWOZ (Success)

End-To-End Dialogue Modelling

MULTIWOZ 2.0 - MultiWOZ (Inform)

Text-to-Image Generation

Oxford 102 Flowers - FID

Music Source Separation

MUSDB18-HQ - SDR (bass)

Text-to-Image Generation


Text-to-Image Generation


Text-to-Image Generation


Neural Architecture Search

NATS-Bench Topology, CIFAR-10 - Test Accuracy

Neural Architecture Search

NATS-Bench Topology, ImageNet16-120 - Test Accuracy

Neural Architecture Search

NATS-Bench Topology, CIFAR-100 - Test Accuracy

Semi-supervised Medical Image Classification

Chest X-Ray14 2% labeled - AUC

Underwater Image Restoration


Text-to-Image Generation


Text-to-Image Generation


Object Proposal Generation

PASCAL VOC 2012, 60 proposals per image - Average Recall

Point Cloud Registration

3DMatch Benchmark - Feature Matching Recall

Single Image Deraining

Test2800 - SSIM

Single Image Deraining

Test100 - SSIM

Single Image Deraining

Test100 - PSNR

Grayscale Image Denoising

Urban100 sigma15 - PSNR

Pedestrian Detection

TJU-Ped-traffic - R+HO (miss rate)

Pedestrian Detection

TJU-Ped-traffic - ALL (miss rate)

Pedestrian Detection

TJU-Ped-campus - R (miss rate)

Pedestrian Detection

TJU-Ped-campus - HO (miss rate)

Pedestrian Detection

TJU-Ped-campus - R+HO (miss rate)

Pedestrian Detection

TJU-Ped-campus - ALL (miss rate)

Cross-Modal Retrieval

Flickr30k - Image-to-text R@5

Open Vocabulary Attribute Detection

OVAD-Box benchmark - mean average precision

Image Inpainting

Places2 val - FID

Image Inpainting

Places2 val - PD

Video Retrieval

MSR-VTT-1kA - text-to-video Median Rank

Video Retrieval

MSR-VTT-1kA - video-to-text Median Rank

Face Alignment

300W Split 2 (300W-LP) - NME (bbox)

Face Alignment

300W Split 2 (300W-LP) - AUC@7 (bbox)

Face Alignment

COFW-68 (300WLP) - NME (box)

Face Alignment

COFW-68 (300WLP) - AUC@7

Face Alignment

WFW (Extra Data) - NME (inter-ocular)

Face Alignment

WFW (Extra Data) - AUC@10 (inter-ocular)

Face Alignment

WFW (Extra Data) - FR@10 (inter-ocular)

Text Retrieval

Image-Chat - R@1

Text Retrieval

Image-Chat - R@5

Text Retrieval

Image-Chat - Sum(R@1,5)

Conditional Image Generation

ArtBench-10 (32x32) - FID

Dialogue State Tracking

Wizard-of-Oz - Joint

Unsupervised Domain Adaptation

HMDB-UCF - Accuracy

Object Detection


Human action generation


Human action generation


Human action generation


Human action generation


Face Alignment


Multi-Document Summarization

Multi-News - ROUGE-2

Multi-Document Summarization

Multi-News - ROUGE-1

Data-to-Text Generation

Cleaned E2E NLG Challenge - BLEU (Test set)

Data-to-Text Generation

WebNLG Full - BLEU

Vehicle Re-Identification

VeRi-776 - Rank1

Text Simplification


Fine-Grained Image Classification

Bird-225 - Accuracy

RGB Salient Object Detection


Passage Retrieval

EntityQuestions - Recall@20

Image Denoising

SID SonyA7S2 x100 - SSIM (Raw)

Few-Shot Semantic Segmentation

COCO-20i -> Pascal VOC (1-shot) - Mean IoU

Few-Shot Semantic Segmentation

PASCAL-5i (10-Shot) - Mean IoU

Few-Shot Semantic Segmentation

COCO-20i (10-shot) - Mean IoU

Unsupervised Semantic Segmentation

COCO-Stuff-15 - Pixel Accuracy

Depth Estimation

eBDtheque - Abs Rel

Depth Estimation

eBDtheque - Sq Rel

Depth Estimation

eBDtheque - RMSE

Depth Estimation

eBDtheque - RMSE log

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D16 val (F)

Suspicous (BIRADS 4,5)-no suspicous (BIRADS 1,2,3) per image classification

InBreast - AUC

Lesion Segmentation

ISIC 2018 - Dice Score

Liver Segmentation

LiTS2017 - IoU

Video Deinterlacing

MSU Deinterlacer Benchmark - PSNR

Video Deinterlacing

MSU Deinterlacer Benchmark - SSIM

Video Deinterlacing

MSU Deinterlacer Benchmark - Subjective

Video Deinterlacing

MSU Deinterlacer Benchmark - VMAF

JPEG Artifact Correction

ICB (Quality 30 Color) - PSNR

JPEG Artifact Correction

ICB (Quality 20 Color) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 20 Color) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 10 Color) - PSNR

JPEG Artifact Correction

Classic5 (Quality 40 Grayscale) - PSNR

JPEG Artifact Correction

Classic5 (Quality 10 Grayscale) - PSNR

JPEG Artifact Correction

Classic5 (Quality 30 Grayscale) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 20 Grayscale) - PSNR

JPEG Artifact Correction

ICB (Quality 10 Color) - PSNR

JPEG Artifact Correction

Live1 (Quality 10 Grayscale) - PSNR

JPEG Artifact Correction

Classic5 (Quality 20 Grayscale) - PSNR


Def_Outnumbered_parallel - Median Win Rate


Def_Outnumbered_sequential - Median Win Rate


Off_Near_sequential - Median Win Rate


Off_Hard_parallel - Median Win Rate


Off_Distant_parallel - Median Win Rate


Off_Superhard_sequential - Median Win Rate


Off_Complicated_sequential - Median Win Rate


Def_Armored_sequential - Median Win Rate


Off_Distant_sequential - Median Win Rate


Off_Complicated_parallel - Median Win Rate

Temporal Action Localization

CrossTask - Recall

Multimodal Activity Recognition

MMAct - F1-Score (Cross-Subject)

Unsupervised Person Re-Identification

DukeMTMC-reID - Rank-10

Text Classification

OneStopEnglish (Readability Assessment) - Accuracy (5-fold)

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D16 val (G)

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D16 val (J)

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D17 val (G)

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D17 val (J)

Semi-Supervised Video Object Segmentation

DAVIS (no YouTube-VOS training) - D17 val (F)

Supervised Video Summarization

TvSum - Kendall's Tau

Video Quality Assessment

LIVE Livestream - SRCC

Conditional Image Generation

ImageNet 64x64 - FID

Conditional Image Generation

ImageNet 256x256 - Inception score

Multi-Label Text Classification

Reuters-21578 - Micro-F1

Dialog Relation Extraction

DialogRE - F1c (v2)

Video Retrieval

MSR-VTT - text-to-video Median Rank

Video Retrieval

MSR-VTT - video-to-text R@5

Video Retrieval

MSR-VTT - video-to-text R@10

Video Retrieval

MSR-VTT - video-to-text Mean Rank

Video Retrieval

MSVD - text-to-video R@5

Video Retrieval

MSVD - text-to-video Median Rank

Semi-Supervised Semantic Segmentation

ScribbleKITTI - mIoU (1% Labels)

Semi-Supervised Semantic Segmentation

ScribbleKITTI - mIoU (10% Labels)

Action Triplet Recognition

CholecT50 - Mean AP

3D Object Detection

waymo vehicle - L1 mAP

Domain Generalization

NICO Animal - Accuracy

Domain Generalization

NICO Vehicle - Accuracy

Coherence Evaluation

GCDC + RST - Accuracy - Accuracy

Face Recognition

CASIA-WebFace+masks - Accuracy

Face Recognition

CelebA+masks - Accuracy

Speech Synthesis

LJSpeech - Mean Opinion Score

Image Manipulation Detection

Casia V1+ - AUC

Multi-Hypotheses 3D Human Pose Estimation

AH36M - H36M PMPJPE (n = 25)

Multi-Hypotheses 3D Human Pose Estimation

AH36M - Most-Likely Hypothesis PMPJPE (n = 1)

Multi-Hypotheses 3D Human Pose Estimation

AH36M - H36M PMPJPE (n = 1)

Audio Super-Resolution

Piano - Log-Spectral Distance

Audio Super-Resolution

Voice Bank corpus (VCTK) - Log-Spectral Distance

Vehicle Re-Identification

VehicleID Medium - Rank-5

Analog Video Restoration


Analog Video Restoration


Video Super-Resolution

MSU Super-Resolution for Video Compression - BSQ-rate over ERQA

Video Super-Resolution

MSU Super-Resolution for Video Compression - BSQ-rate over VMAF

Video Super-Resolution

MSU Super-Resolution for Video Compression - BSQ-rate over LPIPS

Action Recognition

BAR - Accuracy

Stereo Depth Estimation

KITTI2015 - three pixel error

Vehicle Re-Identification

VehicleID Small - Rank-1

Vehicle Re-Identification

VehicleID Medium - mAP

Vehicle Re-Identification

VehicleID Medium - Rank-1

Vehicle Re-Identification

VehicleID Large - mAP

Vehicle Re-Identification

VehicleID Large - Rank-1

3D Object Reconstruction From A Single Image

RenderPeople - Point-to-surface distance (cm)

3D Object Reconstruction From A Single Image

RenderPeople - Chamfer (cm)

3D Object Reconstruction From A Single Image

RenderPeople - Surface normal consistency

3D Object Reconstruction From A Single Image

BUFF - Chamfer (cm)

3D Object Reconstruction From A Single Image

BUFF - Surface normal consistency

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - TR V2V (mm), left hand

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - PA V2V (mm), whole body

Monocular 3D Object Detection

KITTI Pedestrian Moderate - AP Medium

Monocular 3D Object Detection

KITTI Pedestrian Hard - AP Hard

Multiview Detection

Wildtrack - MODP

Multiview Detection

MultiviewX - MODP

Poll Generation

WeiboPolls - ROUGE-1

Poll Generation

WeiboPolls - ROUGE-L

Poll Generation

WeiboPolls - BLEU-1

Poll Generation

WeiboPolls - BLEU-3

6D Pose Estimation using RGBD

REAL275 - mAP 3DIou@25

6D Pose Estimation using RGBD

REAL275 - mAP 5, 5cm

Multivariate Time Series Imputation

Beijing Multi-Site Air-Quality Dataset - MAE (PM2.5)

Scene Text Recognition

ICDAR 2003 - Accuracy

Video Prediction


Video Prediction

KTH - Params (M)

Rgb-T Tracking

RGBT234 - Success

Skeleton Based Action Recognition

UPenn Action - Accuracy

Action Classification

Toyota Smarthome dataset - CV2

Time Series Forecasting

ETTh2 (168) Univariate - MSE

Time Series Forecasting

ETTh2 (168) Univariate - MAE

Time Series Forecasting

ETTh2 (336) Univariate - MSE

Time Series Forecasting

ETTh2 (336) Univariate - MAE

Time Series Forecasting

ETTh2 (720) Univariate - MAE

Visual Storytelling


Visual Storytelling


Visual Storytelling


Multi-Object Tracking


Supervised Video Summarization

TvSum - F1-score (Augmented)

Zero-Shot Cross-Lingual Transfer

XTREME - Sentence-pair Classification

Zero-Shot Cross-Lingual Transfer

XTREME - Structured Prediction

Zero-Shot Cross-Lingual Transfer

XTREME - Question Answering

Zero-Shot Cross-Lingual Transfer

XTREME - Sentence Retrieval

Zero-Shot Cross-Lingual Transfer


Time Series Forecasting

PeMSD7 - 9 steps MAE

Video Retrieval

MSR-VTT - video-to-text Median Rank

Image Super-Resolution

BSD100 - 4x upscaling - LPIPS

Image Super-Resolution

Urban100 - 4x upscaling - LPIPS

Image Super-Resolution

IXI - SSIM 4x T2w

KG-to-Text Generation

WebNLG 2.0 (Unconstrained) - METEOR

Self-Supervised Action Recognition

UCF101 (finetuned) - 3-fold Accuracy

Session-Based Recommendations

Last.FM - HR@20

Session-Based Recommendations

Last.FM - MRR@20

Time Series Forecasting

ETTh1 (24) Multivariate - MSE

Time Series Forecasting

ETTh1 (24) Multivariate - MAE

Time Series Forecasting

ETTh1 (48) Multivariate - MSE

Time Series Forecasting

ETTh1 (48) Multivariate - MAE

Time Series Forecasting

ETTh2 (168) Multivariate - MSE

Time Series Forecasting

ETTh2 (168) Multivariate - MAE

Time Series Forecasting

ETTh2 (48) Multivariate - MSE

Time Series Forecasting

ETTh2 (48) Multivariate - MAE

Time Series Forecasting

ETTh2 (24) Multivariate - MSE

Time Series Forecasting

ETTh2 (24) Multivariate - MAE

Time Series Forecasting

ETTh1 (24) Univariate - MSE

Time Series Forecasting

ETTh1 (24) Univariate - MAE

Time Series Forecasting

ETTh1 (48) Univariate - MSE

Time Series Forecasting

ETTh1 (48) Univariate - MAE

Time Series Forecasting

ETTh1 (168) Univariate - MSE

Time Series Forecasting

ETTh1 (168) Univariate - MAE

Time Series Forecasting

ETTh1 (168) Multivariate - MSE

Time Series Forecasting

ETTh1 (168) Multivariate - MAE

Time Series Forecasting

ETTh2 (48) Univariate - MSE

Time Series Forecasting

ETTh2 (48) Univariate - MAE

Time Series Forecasting

ETTh2 (24) Univariate - MSE

Time Series Forecasting

ETTh2 (24) Univariate - MAE

3D Multi-Person Pose Estimation

Campus - PCP3D

3D Multi-Person Pose Estimation

Shelf - PCP3D

Sequential Image Classification

Sequential MNIST - Permuted Accuracy

Medical Named Entity Recognition

ShARe/CLEF eHealth corpus - F1

Program Repair

DeepFix - Average Success Rate

Atari Games

Atari 2600 Chopper Command - Score

Atari Games

Atari 2600 Tennis - Score

Atari Games

Atari 2600 Surround - Score

Atari Games

Atari 2600 Up and Down - Score

Atari Games

Atari 2600 Enduro - Score

Unsupervised 3D Human Pose Estimation


Semi-Supervised Video Object Segmentation

DAVIS 2016 - Jaccard (Recall)

Semi-Supervised Video Object Segmentation

DAVIS 2016 - F-measure (Recall)

Aspect-oriented Opinion Extraction

SemEval-2014 Task-4 - Restaurant 2014 (F1)

Aspect-oriented Opinion Extraction

SemEval-2014 Task-4 - Laptop 2014 (F1)

Aspect-oriented Opinion Extraction

SemEval-2014 Task-4 - Restaurant 2015 (F1)

Aspect-oriented Opinion Extraction

SemEval-2014 Task-4 - Restaurant 2016 (F1)

Aspect Term Extraction and Sentiment Classification

SemEval - Restaurant 2015 (F1)

Unsupervised Semantic Segmentation

ImageNet-S-50 - mIoU (val)

Unsupervised Semantic Segmentation

ImageNet-S-50 - mIoU (test)

Semi-Supervised Video Object Segmentation

VOT2020 - EAO

Fine-Grained Image Classification

Oxford-IIIT Pets - Accuracy

Action Anticipation

EPIC-KITCHENS-55 (Unseen test set (S2) - Top 5 Accuracy - Act.

Stereo Image Super-Resolution

KITTI2012 - 2x upscaling - PSNR

Entity Resolution

WDC Computers-small - F1 (%)

Entity Resolution

WDC Watches-small - F1 (%)

Relationship Extraction (Distant Supervised)

New York Times Corpus - P@10%

Relationship Extraction (Distant Supervised)

New York Times Corpus - P@30%

Aspect-Based Sentiment Analysis (ABSA)

MAMS - Macro-F1

Generalizable Person Re-identification

Market-1501 - MSMT17-All->mAP

Generalizable Person Re-identification

Market-1501 - MSMT17-All->Rank-1

Generalizable Person Re-identification

MSMT17 - Market-1501->Rank1

Generalizable Person Re-identification

MSMT17 - Market-1501->mAP

Generalizable Person Re-identification

CUHK03-NP (detected) - MSMT17-All->mAP

Generalizable Person Re-identification

CUHK03-NP (detected) - MSMT17-All->Rank-1

Generalizable Person Re-identification

CUHK03-NP (detected) - Market-1501->mAP

Generalizable Person Re-identification

CUHK03-NP (detected) - Market-1501->Rank-1

Unsupervised Anomaly Detection with Specified Settings -- 30% anomaly


Unsupervised Anomaly Detection with Specified Settings -- 30% anomaly


Cross-Lingual Question Answering

TyDiQA-GoldP - EM

Document Classification

HOC - F1

Relationship Extraction (Distant Supervised)

New York Times Corpus - AUC

Motion Forecasting

Argoverse CVPR 2020 - MR (K=6)

Point Cloud Registration

ETH (trained on 3DMatch) - Feature Matching Recall

Point Cloud Registration

FP-R-H - Recall (3cm, 10 degrees)

Point Cloud Registration

FP-R-M - Recall (3cm, 10 degrees)

Point Cloud Registration

KITTI (trained on 3DMatch) - Success Rate

Point Cloud Registration

3DMatch (trained on KITTI) - Recall

Point Cloud Registration

FP-R-E - Recall (3cm, 10 degrees)

Point Cloud Registration

FP-T-H - Recall (3cm, 10 degrees)

Retinal Vessel Segmentation

DRIVE - Accuracy

Text Classification

R52 - Accuracy

Text Classification

Ohsumed - Accuracy

3D Multi-Person Mesh Recovery


3D Multi-Person Mesh Recovery


3D Multi-Person Mesh Recovery


3D Multi-Person Mesh Recovery


3D Multi-Person Mesh Recovery


3D Human Reconstruction

Expressive hands and faces dataset (EHF) - PA V2V (mm), face

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - TR V2V (mm), body only

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - MPJPE-14

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - MPJPE, left hand

Mathematical Question Answering

Geometry3K - Accuracy (%)

Grammatical Error Detection

FCE - F0.5

Grammatical Error Detection

CoNLL-2014 A2 - F0.5

Grammatical Error Detection

CoNLL-2014 A1 - F0.5

Chinese Named Entity Recognition

Weibo NER - F1

Chinese Named Entity Recognition

Resume NER - F1

Facial Expression Recognition (FER)

FER2013 - Accuracy

Image Relighting

VIDIT’20 validation set - PSNR

Image Relighting

VIDIT’20 validation set - SSIM

Image Relighting

VIDIT’20 validation set - LPIPS

Image Relighting

VIDIT’20 validation set - MPS

Cross-Domain Few-Shot

Plantae - 5 shot

Cross-Domain Few-Shot

ChestX - 5 shot

AMR Parsing

LDC2014T12 - F1 Full

Facial Expression Recognition (FER)

JAFFE - Accuracy

3D Object Detection

KITTI Cyclist Easy val - AP

3D Object Detection

KITTI Cyclist Moderate val - AP

3D Object Detection

KITTI Cars Hard val - AP

3D Object Detection

KITTI Cyclist Hard val - AP

Supervised Video Summarization

TvSum - F1-score (Canonical)

Homography Estimation


Text-to-Image Generation

Multi-Modal-CelebA-HQ - Acc

Self-Supervised Person Re-Identification

SYSU-30k - Rank-1

Earth Surface Forecasting

EarthNet2021 Seasonal Track - EarthNetScore

Semantic Image Matting

Semantic Image Matting Dataset - SAD

Semantic Image Matting

Semantic Image Matting Dataset - MSE(10^3)

Semantic Image Matting

Semantic Image Matting Dataset - Grad

Semantic Image Matting

Semantic Image Matting Dataset - Conn

Face Anti-Spoofing


Visual Dialog

Visual Dialog v1.0 test-std - MRR (x 100)

Visual Dialog

Visual Dialog v1.0 test-std - R@1

Fundus to Angiography Generation

Fundus Fluorescein Angiogram Photographs & Colour Fundus Images of Diabetic Patients - FID

Fundus to Angiography Generation

Fundus Fluorescein Angiogram Photographs & Colour Fundus Images of Diabetic Patients - Kernel Inception Distance

Pose Transfer

Deep-Fashion - SSIM

Image Manipulation Detection


Text Classification

TREC-6 - Error

Atari Games

Atari 2600 Bank Heist - Score

6D Pose Estimation using RGBD

LineMOD - Mean ADD

3D Multi-Person Pose Estimation (root-relative)


RGB-D Salient Object Detection

LFSD - max E-Measure

RGB-D Salient Object Detection

LFSD - max F-Measure

Video Quality Assessment

MSU FR VQA Database - SRCC

Video Quality Assessment

MSU FR VQA Database - PLCC

Video Quality Assessment

MSU FR VQA Database - KLCC

Visual Object Tracking

VOT2019 - Expected Average Overlap (EAO)

Video Frame Interpolation

X4K1000FPS - tOF

Self-Supervised Action Recognition

HMDB51 (finetuned) - Top-1 Accuracy

Unsupervised Domain Adaptation

Cityscapes-to-OxfordCar - mIoU

Video Super-Resolution

MSU Video Upscalers: Quality Enhancement - LPIPS

Age Estimation

Adience - MAE

Image Inpainting

CelebA-HQ - P-IDS

Image Inpainting

CelebA-HQ - U-IDS

3D Room Layouts From A Single RGB Panorama

Stanford2D3D Panoramic - Corner Error

Stereo-LiDAR Fusion

KITTI Depth Completion Validation - RMSE

Layout-to-Image Generation

COCO-Stuff 128x128 - FID

Layout-to-Image Generation

COCO-Stuff 128x128 - Inception Score

Vehicle Re-Identification

VeRi-776 - Rank5

Unsupervised Domain Adaptation

Market to Duke - rank-1

Few-Shot Image Classification

Stanford Dogs 5-way (5-shot) - Accuracy

Few-Shot Image Classification

Stanford Dogs 5-way (1-shot) - Accuracy

Efficient ViTs

ImageNet-1K (with DeiT-T) - GFLOPs

Video Frame Interpolation

Middlebury - SSIM

Video Frame Interpolation

Middlebury - PSNR

Interactive Video Object Segmentation

DAVIS 2017 - AUC-J&F

Interactive Video Object Segmentation

DAVIS 2017 - AUC-J

Interactive Video Object Segmentation

DAVIS 2017 - J@60s

Multi-Person Pose Estimation

PoseTrack2017 - Mean mAP

Multi-Person Pose Estimation

PoseTrack2018 - Mean mAP

Video Denoising


Video Denoising


Sequential Image Classification

noise padded CIFAR-10 - % Test Accuracy

Text Simplification

PWKP / WikiSmall - SARI

Retinal Vessel Segmentation


Retinal Vessel Segmentation

DRIVE - F1 score

Interactive Video Object Segmentation

DAVIS 2017 - J&F@60s

Video Prediction

Kinetics-600 12 frames, 64x64 - Cond

Video Prediction

Kinetics-600 12 frames, 64x64 - Pred

Video Retrieval

MSVD - video-to-text Median Rank

Image Relighting

VIDIT’20 validation set - Runtime(s)

Object Detection


Video Semantic Segmentation

CamVid - Mean IoU

Neural Architecture Search

NAS-Bench-201, CIFAR-10 - Search time (s)

Neural Architecture Search

NAS-Bench-201, CIFAR-100 - Search time (s)


SMAC 27m_vs_30m - Median Win Rate


Def_Armored_parallel - Median Win Rate


Def_Infantry_parallel - Median Win Rate

Action Spotting

SoccerNet - Average-mAP

Graph Classification

PROTEINS - Accuracy

Cross-Modal Retrieval

Flickr30k - Image-to-text R@10

Vehicle Re-Identification

VehicleID Small - Rank1

Emotion Classification

SemEval 2018 Task 1E-c - Macro-F1

Motion Synthesis

BRACE - Beat DTW cost

Motion Synthesis

BRACE - Toprock average

Motion Synthesis

BRACE - Footwork average

Motion Synthesis

BRACE - Powermove average

Facial Expression Recognition (FER)

FERPlus - Accuracy(pretrained)

Action Anticipation

EPIC-KITCHENS-55 (Seen test set (S1)) - Top 1 Accuracy - Verb

Retinal Vessel Segmentation

CHASE_DB1 - F1 score

Semi-supervised Anomaly Detection

UBI-Fights - AUC

Semi-supervised Anomaly Detection

UBI-Fights - Decidability

Semi-supervised Anomaly Detection

UBI-Fights - EER

Text Classification

MR - Accuracy

Image Denoising


Face Alignment

AFLW2000 - Error rate

Trajectory Forecasting

TrajNet++ - FDE

Trajectory Forecasting

TrajNet++ - COL

Unsupervised 3D Human Pose Estimation


3D Part Segmentation

ShapeNet-Part - Class Average IoU

Head Pose Estimation

BIWI - Geodesic Error (GE)

Video Saliency Detection

MSU Video Saliency Prediction - SIM

Video Saliency Detection

MSU Video Saliency Prediction - CC

Video Saliency Detection

MSU Video Saliency Prediction - NSS

Video Saliency Detection

MSU Video Saliency Prediction - AUC-J

Video Saliency Detection

MSU Video Saliency Prediction - KLDiv

Video Saliency Detection


Cross-Modal Person Re-Identification

SYSU-MM01 - rank1

Video Panoptic Segmentation

Cityscapes-VPS - VPQ (stuff)

Image-to-Image Translation

Cityscapes Labels-to-Photo - LPIPS

Image-to-Image Translation

COCO-Stuff Labels-to-Photos - mIoU

Image-to-Image Translation

ADE20K-Outdoor Labels-to-Photos - mIoU

3D Object Detection

KITTI Cars Moderate val - AP

3D Object Detection

KITTI Cars Easy val - AP

3D Face Reconstruction

Stirling-LQ (FG2018 3D face reconstruction challenge) - Mean Reconstruction Error (mm)

3D Face Reconstruction

Stirling-HQ (FG2018 3D face reconstruction challenge) - Mean Reconstruction Error (mm)

Text-to-Image Generation

Multi-Modal-CelebA-HQ - Real

Object Detection


3D Reconstruction

DTU - Acc

Video Super-Resolution


Video Super-Resolution


Video Super-Resolution

TbD-3D - TIoU

Video Super-Resolution


Video Super-Resolution


Video Super-Resolution

TbD - TIoU

Video Super-Resolution

Falling Objects - SSIM

Video Super-Resolution

Falling Objects - PSNR

Video Super-Resolution

Falling Objects - TIoU

Single Image Deraining

Rain100L - PSNR

Panoptic Segmentation

Mapillary val - PQth

Few-Shot Image Classification

Stanford Cars 5-way (1-shot) - Accuracy

Few-Shot Image Classification

Stanford Cars 5-way (5-shot) - Accuracy

Conditional Image Generation

ImageNet 128x128 - Inception score

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - PA V2V (mm), left hand

Dialogue State Tracking

Second dialogue state tracking challenge - Joint

Video Frame Interpolation

MSU Video Frame Interpolation - Subjective score

Video Frame Interpolation

MSU Video Frame Interpolation - FPS

Video Quality Assessment

MSU NR VQA Database - SRCC

Video Quality Assessment

MSU NR VQA Database - PLCC

Video Quality Assessment

MSU NR VQA Database - KLCC

DRS Parsing

PMB-3.0.0 - F1

DRS Parsing

PMB-2.2.0 - F1

Video Generation

UCF-101 16 frames, 128x128, Unconditional - Inception Score

Complex Query Answering

NELL-995 - MRR 2i

3D Multi-Person Pose Estimation (root-relative)


Video Quality Assessment


Drug–drug Interaction Extraction

DDI extraction 2013 corpus - F1

Drug–drug Interaction Extraction

DDI extraction 2013 corpus - Micro F1

Multi-Object Tracking

HiEve - MOTA

Session-Based Recommendations

yoochoose1/64 - MRR@20

MRI Reconstruction

fastMRI Knee 8x - SSIM

Task-Oriented Dialogue Systems


Semantic Dependency Parsing

PSD - In-domain

Semantic Dependency Parsing

PSD - Out-of-domain

Semantic Dependency Parsing

DM - In-domain

Semantic Dependency Parsing

DM - Out-of-domain

Semantic Dependency Parsing

PAS - In-domain

Semantic Dependency Parsing

PAS - Out-of-domain


Penn Treebank - F1 score


CoNLL 2000 - Exact Span F1

Single Image Deraining

RainCityscapes - PSNR

Single Image Deraining

RainCityscapes - SSIM

Single Image Deraining

Rain100L - SSIM

Open-Domain Question Answering

SQuAD1.1 dev - EM

Dialogue Act Classification

ICSI Meeting Recorder Dialog Act (MRDA) corpus - Accuracy

Emotion Recognition in Conversation

SEMAINE - MAE (Arousal)

Emotion Recognition in Conversation

SEMAINE - MAE (Expectancy)

Type prediction

ManyTypes4TypeScript - Average Precision

Type prediction

ManyTypes4TypeScript - Average Recall

Type prediction

ManyTypes4TypeScript - Average F1

3D Human Pose Estimation

3D Poses in the Wild Challenge - MPJAE

Open-Domain Question Answering

SearchQA - EM

RGB Salient Object Detection

SOC - Average MAE

RGB-D Salient Object Detection

LFSD - S-Measure

RGB-D Salient Object Detection

LFSD - Average MAE

Automated Theorem Proving

Metamath - Percentage correct

Anomaly Detection In Surveillance Videos

UCSD Peds2 - AUC

Few-Shot Image Classification

Mini-Imagenet 20-way (5-shot) - Accuracy

Few-Shot Image Classification

Mini-Imagenet 20-way (1-shot) - Accuracy

Image Super-Resolution

Set14 - 8x upscaling - SSIM

Image Super-Resolution

Set5 - 8x upscaling - PSNR

Image Super-Resolution

Set5 - 8x upscaling - SSIM

Emotion Classification

SemEval 2018 Task 1E-c - Micro-F1

Emotion Classification

SemEval 2018 Task 1E-c - Accuracy

Image Super-Resolution

BSD100 - 8x upscaling - SSIM

3D Multi-Person Mesh Recovery


3D Human Reconstruction

Expressive hands and faces dataset (EHF) - PA V2V (mm), body only

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - TR V2V (mm), whole body

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - TR V2V (mm), face

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - mean P2S

3D Human Reconstruction

Expressive hands and faces dataset (EHF) - median P2S

Image Dehazing

O-Haze - SSIM

Text-to-Image Generation

Multi-Modal-CelebA-HQ - LPIPS

Speech Synthesis


Self-Supervised Action Recognition

Kinetics-600 - Top-1 Accuracy

Video Super-Resolution

MSU Video Super Resolution Benchmark: Detail Restoration - 1 - LPIPS

Lane Detection

CurveLanes - Precision

Lane Detection

CurveLanes - GFLOPs

Hand Pose Estimation

HANDS 2017 - Average 3D Error

Pose Transfer

Deep-Fashion - IS

Pose Transfer

Deep-Fashion - PCKh

Cross-Domain Few-Shot

EuroSAT - 5 shot

Visual Object Tracking

OTB-2013 - AUC

KG-to-Text Generation


Brain Tumor Segmentation

BRATS-2013 - Dice Score

Few-Shot Semantic Segmentation

Pascal5i - meanIOU

3D Human Pose Estimation

3D Poses in the Wild Challenge - MPJPE

Visual Object Tracking

OTB-2015 - Precision

Multi-Object Tracking


Video Super-Resolution

Ultra Video Group HD - 4x upscaling - Average PSNR

RGB-D Salient Object Detection

RGBD135 - max E-Measure

Chinese Word Segmentation

PKU - F1

Nested Named Entity Recognition

NNE - Micro F1

Video Super-Resolution

MSU Video Upscalers: Quality Enhancement - SSIM

Few-Shot Image Classification

Meta-Dataset Rank - Mean Rank

Video Super-Resolution

MSU Super-Resolution for Video Compression - BSQ-rate over Subjective Score

Video Super-Resolution

MSU Super-Resolution for Video Compression - BSQ-rate over PSNR

Video Super-Resolution

MSU Super-Resolution for Video Compression - BSQ-rate over MS-SSIM

Neural Architecture Search

NAS-Bench-201, CIFAR-100 - Accuracy (Test)

Neural Architecture Search

NAS-Bench-201, CIFAR-100 - Accuracy (Val)

Few-Shot Image Classification

Tiered ImageNet 10-way (5-shot) - Accuracy

Few-Shot Image Classification

Mini-Imagenet 10-way (5-shot) - Accuracy

Few-Shot Image Classification

Mini-Imagenet 10-way (1-shot) - Accuracy

Few-Shot Image Classification

Tiered ImageNet 10-way (1-shot) - Accuracy

Neural Architecture Search

CIFAR-10 Image Classification - Search Time (GPU days)

Neural Architecture Search

CIFAR-10 Image Classification - Params

Motion Synthesis

BRACE - Beat alignment score

Motion Synthesis

BRACE - Frechet Inception Distance

Unsupervised Person Re-Identification

DukeMTMC-reID->Market-1501 - mAP

Unsupervised Person Re-Identification

DukeMTMC-reID->Market-1501 - Rank-1

Unsupervised Person Re-Identification

Market-1501->MSMT17 - mAP

Unsupervised Person Re-Identification

Market-1501->MSMT17 - Rank-1

Unsupervised Person Re-Identification

DukeMTMC-reID->MSMT17 - mAP

Unsupervised Person Re-Identification

DukeMTMC-reID->MSMT17 - Rank-1

Unsupervised Person Re-Identification

MSMT17->Market-1501 - Rank-1

Unsupervised Person Re-Identification

MSMT17->Market-1501 - mAP

Unsupervised Person Re-Identification

Market-1501->DukeMTMC-reID - mAP

Unsupervised Person Re-Identification

Market-1501->DukeMTMC-reID - Rank-1

Neural Architecture Search

NAS-Bench-201, ImageNet-16-120 - Search time (s)

Few-Shot Image Classification

Mini-Imagenet 5-way (10-shot) - Accuracy

Session-Based Recommendations

yoochoose1 - MRR@20

Session-Based Recommendations

yoochoose1 - Precision@20

Monocular Depth Estimation

Make3D - RMSE

RGB-D Salient Object Detection

RGBD135 - S-Measure

RGB-D Salient Object Detection

RGBD135 - Average MAE

RGB-D Salient Object Detection

RGBD135 - max F-Measure

Unsupervised Machine Translation

WMT2016 English-German - BLEU

Unsupervised Machine Translation

WMT2016 Romanian-English - BLEU

Unsupervised Machine Translation

WMT2014 French-English - BLEU

Unsupervised Machine Translation

WMT2016 English-Romanian - BLEU

Unsupervised Machine Translation

WMT2016 German-English - BLEU

Data-to-Text Generation


Data-to-Text Generation


Data-to-Text Generation


K-complex detection

MASS SS2 - F1-score (@IoU = 0.3)

Face Swapping

FaceForensics++ - pose

Neural Architecture Search

CIFAR-10 Image Classification - Percentage error

Image Super-Resolution

FFHQ 1024 x 1024 - 4x upscaling - FID

Image Super-Resolution

FFHQ 1024 x 1024 - 4x upscaling - MS-SSIM

Image Super-Resolution

FFHQ 256 x 256 - 4x upscaling - FID

Image Super-Resolution

FFHQ 256 x 256 - 4x upscaling - MS-SSIM

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - PSNR

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - SSIM

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - MS-SSIM

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - FED

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - FID

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - LPIPS

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - NIQE

Face Hallucination

FFHQ 512 x 512 - 16x upscaling - FID

Face Hallucination

FFHQ 512 x 512 - 16x upscaling - LPIPS

Face Hallucination

FFHQ 512 x 512 - 16x upscaling - NIQE

Citation Intent Classification

SciCite - F1

Text Simplification

Newsela - SARI

End-To-End Dialogue Modelling


Atari Games

Atari 2600 Berzerk - Score

Atari Games

Atari 2600 Private Eye - Score

Cross-Lingual NER

CoNLL German - F1

Extractive Text Summarization

CNN / Daily Mail - ROUGE-2

Extractive Text Summarization

CNN / Daily Mail - ROUGE-1

Extractive Text Summarization

CNN / Daily Mail - ROUGE-L

JPEG Artifact Correction

ICB (Quality 30 Color) - PSNR-B

JPEG Artifact Correction

ICB (Quality 30 Color) - SSIM

JPEG Artifact Correction

ICB (Quality 20 Color) - PSNR-B

JPEG Artifact Correction

ICB (Quality 20 Color) - SSIM

JPEG Artifact Correction

ICB (Quality 10 Grayscale) - PSNR

JPEG Artifact Correction

ICB (Quality 10 Grayscale) - PSNR-B

JPEG Artifact Correction

ICB (Quality 10 Grayscale) - SSIM

JPEG Artifact Correction

LIVE1 (Quality 10 Color) - SSIM

JPEG Artifact Correction

Classic5 (Quality 10 Grayscale) - SSIM

JPEG Artifact Correction

Classic5 (Quality 30 Grayscale) - SSIM

JPEG Artifact Correction

LIVE1 (Quality 20 Grayscale) - SSIM

JPEG Artifact Correction

ICB (Quality 20 Grayscale) - PSNR

JPEG Artifact Correction

ICB (Quality 20 Grayscale) - PSNR-B

JPEG Artifact Correction

ICB (Quality 10 Color) - PSNR-B

JPEG Artifact Correction

ICB (Quality 10 Color) - SSIM

JPEG Artifact Correction

Live1 (Quality 10 Grayscale) - SSIM

JPEG Artifact Correction

Classic5 (Quality 20 Grayscale) - SSIM

Image-to-Image Translation

ADE20K-Outdoor Labels-to-Photos - FID

Weakly Supervised Object Detection

COCO test-dev - AP50

Atari Games

Atari 2600 Boxing - Score

Video Super-Resolution

MSU Video Super Resolution Benchmark: Detail Restoration - FPS

Panoptic Segmentation

Indian Driving Dataset - PQ

Panoptic Segmentation

KITTI Panoptic Segmentation - PQ

Long-tail learning with class descriptors

SUN-LT - Per-Class Accuracy

Long-tail learning with class descriptors

SUN-LT - Long-Tailed Accuracy

Long-tail learning with class descriptors

AWA-LT - Long-Tailed Accuracy

Long-tail learning with class descriptors

CUB-LT - Per-Class Accuracy

Long-tail learning with class descriptors

CUB-LT - Long-Tailed Accuracy

Long-tail learning with class descriptors

ImageNet-LT-d - Per-Class Accuracy

Conversation Disentanglement

irc-disentanglement - VI

Conversation Disentanglement

irc-disentanglement - R

Conversation Disentanglement

irc-disentanglement - F

Multiple Object Tracking

SportsMOT - DetA

3D Object Reconstruction From A Single Image

BUFF - Point-to-surface distance (cm)

Atari Games

Atari 2600 Crazy Climber - Score

Atari Games

Atari 2600 HERO - Score

Atari Games

Atari 2600 Amidar - Score

Atari Games

Atari 2600 Venture - Score

Atari Games

Atari 2600 Yars Revenge - Score

Atari Games

Atari 2600 Gravitar - Score

Atari Games

Atari 2600 Kangaroo - Score

Atari Games

Atari 2600 Tutankham - Score

Atari Games

Atari 2600 Battle Zone - Score

Atari Games

Atari 2600 Solaris - Score

Atari Games

Atari 2600 Q*Bert - Score

Atari Games

Atari 2600 Star Gunner - Score

Image Super-Resolution

Urban100 - 4x upscaling - SSIM

Vehicle Speed Estimation

BrnoCompSpeed - Mean Speed Measurement Error (km/h)

Vehicle Speed Estimation

BrnoCompSpeed - Median Speed Measurement Error (km/h)

Graph Classification

ENZYMES - Accuracy

Text Classification

Amazon-2 - Error

Superpixel Image Classification

75 Superpixel MNIST - Classification Error

Layout-to-Image Generation

Visual Genome 256x256 - Inception Score

Layout-to-Image Generation

COCO-Stuff 64x64 - FID

Layout-to-Image Generation

COCO-Stuff 64x64 - Inception Score

Layout-to-Image Generation

COCO-Stuff 128x128 - SceneFID

Crowd Counting


Video Saliency Detection

MSU Video Saliency Prediction - FPS

Unsupervised Video Object Segmentation

DAVIS 2017 (test-dev) - Jaccard (Decay)

Unsupervised Video Object Segmentation

DAVIS 2017 (test-dev) - F-measure (Decay)

Document Classification

Twitter - Accuracy

Keypoint Detection

COCO test-challenge - AR

Keypoint Detection

COCO test-challenge - ARM

Keypoint Detection

COCO test-challenge - AP

Keypoint Detection

COCO test-challenge - AP50

Keypoint Detection

COCO test-challenge - AP75

Keypoint Detection

COCO test-challenge - AR50

Keypoint Detection

COCO test-challenge - AR75

Keypoint Detection

COCO test-challenge - ARL

Video Prediction

SynpickVP - SSIM

Video Prediction


Video Prediction

Cityscapes 128x128 - Cond.

Video Prediction

Cityscapes 128x128 - Pred

Dialogue Act Classification

Switchboard corpus - Accuracy

Action Segmentation

JIGSAWS - Edit Distance

Unsupervised Video Object Segmentation

DAVIS 2017 (val) - F-measure (Recall)

Unsupervised Machine Translation

WMT2014 English-French - BLEU

Sentence Compression

Google Dataset - F1

Generative Question Answering

CoQA - F1-Score

Unsupervised Video Object Segmentation

DAVIS 2017 (test-dev) - Jaccard (Mean)

Unsupervised Video Object Segmentation

DAVIS 2017 (test-dev) - Jaccard (Recall)

Unsupervised Video Object Segmentation

DAVIS 2017 (test-dev) - F-measure (Mean)

Unsupervised Video Object Segmentation

DAVIS 2017 (test-dev) - F-measure (Recall)

Unsupervised Video Object Segmentation

DAVIS 2017 (val) - Jaccard (Recall)

Negation Scope Resolution

*sem 2012 Shared Task: Sherlock Dataset - F1

Data-to-Text Generation

RotoWire (Content Ordering) - BLEU

Video Generation

UCF-101 16 frames, Unconditional, Single GPU - Inception Score

Gesture-to-Gesture Translation

Senz3D - PSNR

Gesture-to-Gesture Translation

Senz3D - IS

Gesture-to-Gesture Translation

Senz3D - AMT

Gesture-to-Gesture Translation

NTU Hand Digit - PSNR

Gesture-to-Gesture Translation

NTU Hand Digit - IS

Gesture-to-Gesture Translation

NTU Hand Digit - AMT

Scene Recognition

YUP++ - Accuracy (%)

Few-Shot Image Classification

OMNIGLOT - 1-Shot, 20-way - Accuracy

Multimodal Unsupervised Image-To-Image Translation


Document Classification

Amazon - Accuracy

Document Classification

BBCSport - Accuracy

Synthetic-to-Real Translation

Syn2Real-C - Accuracy

Visual Dialog

Visual Dialog v1.0 test-std - NDCG (x 100)

Semi-Supervised Image Classification

SVHN, 1000 labels - Accuracy

Semi-Supervised Image Classification

cifar10, 250 Labels - Percentage correct

Image Manipulation Detection

CocoGlide - AUC

Cross-Lingual NER

CoNLL Dutch - F1

Atari Games

Atari 2600 Beam Rider - Score

Atari Games

Atari 2600 Bowling - Score

Atari Games

Atari 2600 Assault - Score

Atari Games

Atari 2600 River Raid - Score

Atari Games

Atari 2600 Frostbite - Score

Atari Games

Atari 2600 Zaxxon - Score

Atari Games

Atari 2600 Name This Game - Score

Atari Games

Atari 2600 Robotank - Score

Atari Games

Atari 2600 Alien - Score

Atari Games

Atari 2600 Fishing Derby - Score

Atari Games

Atari 2600 Time Pilot - Score

Satellite Image Classification

SAT-4 - Accuracy

Automated Theorem Proving

HolStep (Conditional) - Classification Accuracy

Reflection Removal

SIR^2(Objects) - SSIM

6D Pose Estimation using RGBD

YCB-Video - Mean ADD-S

Camera shot boundary detection

MSU Shot Boundary Detection Benchmark - F score

Camera shot boundary detection

MSU Shot Boundary Detection Benchmark - FPS

Chinese Named Entity Recognition

OntoNotes 4 - F1

Emotion Cause Extraction

ECE - F1

Open-Domain Question Answering

ELI5 - Rouge-1

Dialogue State Tracking

Wizard-of-Oz - Request

Image Super-Resolution

FFHQ 1024 x 1024 - 4x upscaling - SSIM

Grayscale Image Denoising

BSD200 sigma50 - PSNR

Grayscale Image Denoising

BSD200 sigma70 - PSNR

Grayscale Image Denoising

BSD200 sigma30 - PSNR

Atari Games

Atari 2600 Pong - Score

Unsupervised Person Re-Identification

MSMT17->DukeMTMC-reID - Rank-1

Unsupervised Person Re-Identification

MSMT17->DukeMTMC-reID - Rank-10

Unsupervised Person Re-Identification

MSMT17->DukeMTMC-reID - Rank-5

Unsupervised Person Re-Identification

MSMT17->DukeMTMC-reID - mAP

Unsupervised Person Re-Identification

DukeMTMC-reID->Market-1501 - Rank-5

Retinal OCT Disease Classification

OCT2017 - Acc

Retinal OCT Disease Classification

OCT2017 - Sensitivity

Retinal OCT Disease Classification

Srinivasan2014 - Acc

3D Object Detection

SUN-RGBD val - Inference Speed (s)

Image Super-Resolution

Urban100 - 8x upscaling - SSIM

License Plate Recognition

AOLP-RP - Average Recall

Text Simplification

TurkCorpus - BLEU



6D Pose Estimation using RGB

Occlusion LineMOD - Mean ADD

Skeleton Based Action Recognition

PKU-MMD - [email protected] (CV)

Skeleton Based Action Recognition

PKU-MMD - [email protected] (CS)

Text Classification

Sogou News - Accuracy

Cross-Lingual Document Classification

MLDoc Zero-Shot English-to-Chinese - Accuracy

Cross-Lingual Document Classification

MLDoc Zero-Shot English-to-French - Accuracy

Cross-Lingual Document Classification

MLDoc Zero-Shot English-to-Spanish - Accuracy

Cross-Lingual Document Classification

MLDoc Zero-Shot English-to-Russian - Accuracy

Homography Estimation


Image Super-Resolution

BSD100 - 4x upscaling - SSIM

Image Super-Resolution

Set14 - 4x upscaling - SSIM

Visual Storytelling


Cross-Lingual Document Classification

MLDoc Zero-Shot English-to-Italian - Accuracy

Session-Based Recommendations

yoochoose1/4 - HR@20

Emotion Recognition in Conversation

EC - Micro-F1

Lung Nodule Segmentation

LUNA - F1 score

Lung Nodule Segmentation


Emotion Recognition in Conversation

SEMAINE - MAE (Valence)

Emotion Recognition in Conversation


Text Style Transfer

Yelp Review Dataset (Small) - G-Score (BLEU, Accuracy)

Image Super-Resolution

PIRM-test - NIQE

Unsupervised Facial Landmark Detection

300W - NME

Unsupervised Facial Landmark Detection


3D Semantic Segmentation


Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - Dice

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - Recall

Video Salient Object Detection

VOS-T - S-Measure

Video Salient Object Detection

VOS-T - max E-measure

Video Salient Object Detection

VOS-T - Average MAE

Multivariate Time Series Imputation

KDD CUP Challenge 2018 - MSE (10% missing)

Action Recognition In Videos

Something-Something V1 - Top 1 Accuracy

Visual Object Tracking

VOT2017 - Expected Average Overlap (EAO)

Open-Domain Question Answering

SearchQA - F1

Skeleton Based Action Recognition

J-HMDB - Accuracy (pose)

Skeleton Based Action Recognition

JHMDB (2D poses only) - Average accuracy of 3 splits

Image Super-Resolution

BSD100 - 4x upscaling - PSNR

Image Super-Resolution

Urban100 - 4x upscaling - PSNR

Image Super-Resolution

BSD100 - 2x upscaling - PSNR

Image Super-Resolution

Set5 - 4x upscaling - PSNR

Image Super-Resolution

Set5 - 4x upscaling - SSIM

Image Super-Resolution

Urban100 - 2x upscaling - PSNR

Image Super-Resolution

Urban100 - 2x upscaling - SSIM

Image Super-Resolution

Set5 - 2x upscaling - SSIM

Image Super-Resolution

Set14 - 2x upscaling - PSNR

Image Super-Resolution

Set14 - 2x upscaling - SSIM

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - Dice

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - IoU

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - Precision

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - Recall

Human Part Segmentation

PASCAL-Part - mIoU

Multivariate Time Series Forecasting

MuJoCo - MSE (10^-2, 50% missing)

Visual Object Tracking

VOT2017/18 - Expected Average Overlap (EAO)

3D Object Detection

KITTI Pedestrian Moderate val - AP

3D Object Detection

KITTI Pedestrian Easy val - AP

3D Object Detection

KITTI Pedestrian Hard val - AP

Grayscale Image Denoising

BSD68 sigma70 - PSNR

Color Image Denoising

CBSD68 sigma5 - PSNR

Dialogue State Tracking

SIMMC2.0 - Slot F1

Dialogue State Tracking

SIMMC2.0 - Act F1

Semi-Supervised Image Classification

CIFAR-10, 2000 Labels - Accuracy

Semi-Supervised Image Classification

SVHN, 500 Labels - Accuracy

Text Classification

Amazon-5 - Error

Text Classification

DBpedia - Error

Long-tail learning with class descriptors

AWA-LT - Per-Class Accuracy

Image Super-Resolution

VggFace2 - 8x upscaling - PSNR

Image Super-Resolution

Urban100 - 8x upscaling - PSNR

Image Super-Resolution

Manga109 - 8x upscaling - PSNR

Image Super-Resolution

Manga109 - 8x upscaling - SSIM

Multi-Label Text Classification

EUR-Lex - nDCG@5

Multi-Label Text Classification

EUR-Lex - P@5

Video Classification

YouTube-8M - Hit@1

Image Manipulation Detection

Columbia (OSN-transmitted - Wechat) - AUC

Image Manipulation Detection

DSO (OSN-transmitted - Facebook) - AUC

Image Manipulation Detection

DSO (OSN-transmitted - Facebook) - f-Score

Image Manipulation Detection

DSO (OSN-transmitted - Facebook) - Intersection over Union

Image Manipulation Detection

DSO (OSN-transmitted - Wechat) - f-Score

Image Manipulation Detection

DSO (OSN-transmitted - Wechat) - Intersection over Union

Image Manipulation Detection

DSO (OSN-transmitted - Weibo) - f-Score

Image Manipulation Detection

DSO (OSN-transmitted - Weibo) - Intersection over Union

Image Manipulation Detection

NIST (OSN-transmitted - Facebook) - f-Score

Image Manipulation Detection

NIST (OSN-transmitted - Facebook) - Intersection over Union

Video Salient Object Detection

DAVSOD-Difficult20 - S-Measure

Video Salient Object Detection


Video Salient Object Detection

DAVSOD-Normal25 - S-Measure

Video Salient Object Detection

DAVSOD-Normal25 - max E-measure

Video Salient Object Detection

DAVSOD-Normal25 - Average MAE

Video Salient Object Detection

DAVSOD-easy35 - S-Measure

Video Salient Object Detection

DAVSOD-easy35 - max E-Measure

Video Salient Object Detection

DAVSOD-easy35 - Average MAE

Video Salient Object Detection

DAVSOD-easy35 - max F-Measure

Video Salient Object Detection


UCCA Parsing

SemEval 2019 Task 1 - English-Wiki (open) F1

UCCA Parsing

SemEval 2019 Task 1 - English-20K (open) F1

Head Pose Estimation

BIWI - MAE-aligned (trained with other data)

Head Pose Estimation

BIWI - Geodesic Error - aligned (GE)

RGB Salient Object Detection

SOC - S-Measure

RGB Salient Object Detection

SOC - mean E-Measure

Object Detection

iSAID - Average Precision

Word Sense Induction

SemEval 2010 WSI - F-Score

Word Sense Induction

SemEval 2010 WSI - V-Measure

Word Sense Induction

SemEval 2010 WSI - AVG

Medical Image Classification

NCT-CRC-HE-100K - Accuracy (%)

Medical Image Classification

NCT-CRC-HE-100K - F1-Score

Medical Image Classification

NCT-CRC-HE-100K - Specificity

Automated Theorem Proving

HOList benchmark - Percentage correct

AMR Parsing

LDC2014T12: - F1 Newswire

AMR Parsing

LDC2014T12: - F1 Full

Action Recognition In Videos

Jester (Gesture Recognition) - Val

Point-interactive Image Colorization

ImageNet ctest10k - PSNR@100

Atari Games

Atari 2600 Skiing - Score

Atari Games

Atari 2600 Video Pinball - Score

Graph Classification


Graph Classification


Graph Classification

NEURON-Average - Accuracy

Stress-Strain Relation

Non-Linear Elasticity Benchmark - Time (ms)

Face Anti-Spoofing

CelebA-Spoof-Enroll5 - AUC

RGB Salient Object Detection


RGB Salient Object Detection

SBU - Balanced Error Rate

3D Semantic Segmentation

DALES - Overall Accuracy

RGB Salient Object Detection

UCF - Balanced Error Rate

Graph Classification


Cross-View Image-to-Image Translation

Dayton (64x64) - ground-to-aerial - SSIM

Cross-View Image-to-Image Translation

Dayton (256×256) - ground-to-aerial - SSIM

Cross-View Image-to-Image Translation

Dayton (256×256) - aerial-to-ground - SSIM

Cross-View Image-to-Image Translation

Dayton (64×64) - aerial-to-ground - SSIM

Cross-View Image-to-Image Translation

Ego2Top - SSIM

Unsupervised Person Re-Identification

DukeMTMC-reID->Market-1501 - Rank-10

Unsupervised Person Re-Identification

Market-1501->MSMT17 - Rank-10

Unsupervised Person Re-Identification

DukeMTMC-reID->MSMT17 - Rank-10

Unsupervised Person Re-Identification

Market-1501->DukeMTMC-reID - Rank-10

Unsupervised Person Re-Identification

Market-1501->DukeMTMC-reID - Rank-5

Visual Dialog

VisDial v0.9 val - MRR

Visual Dialog

VisDial v0.9 val - Mean Rank

Visual Dialog

VisDial v0.9 val - R@1

Visual Dialog

VisDial v0.9 val - R@10

Visual Dialog

VisDial v0.9 val - R@5

Visual Dialog

Visual Dialog v1.0 test-std - R@5

Visual Dialog

Visual Dialog v1.0 test-std - R@10

Visual Dialog

Visual Dialog v1.0 test-std - Mean

Age Estimation


Pose Transfer

Deep-Fashion - Retrieval Top10 Recall

Image Super-Resolution

Set14 - 8x upscaling - PSNR

Chinese Named Entity Recognition

OntoNotes 4 - Precision

Chinese Named Entity Recognition

OntoNotes 4 - Recall

Chinese Named Entity Recognition

Resume NER - Precision

Chinese Named Entity Recognition

Resume NER - Recall

Skeleton Based Action Recognition

JHMDB Pose Tracking - [email protected]

Skeleton Based Action Recognition

JHMDB Pose Tracking - [email protected]

Skeleton Based Action Recognition

JHMDB Pose Tracking - [email protected]

Skeleton Based Action Recognition

JHMDB Pose Tracking - [email protected]

Skeleton Based Action Recognition

JHMDB Pose Tracking - [email protected]

Weakly Supervised Object Detection

Charades - MAP

Data-to-Text Generation

E2E NLG Challenge - BLEU

Data-to-Text Generation

E2E NLG Challenge - NIST

Weakly Supervised Object Detection


Unsupervised Anomaly Detection with Specified Settings -- 30% anomaly


Face Verification

Oulu-CASIA NIR-VIS - TAR @ FAR=0.001

Face Verification


Face Verification

CASIA NIR-VIS 2.0 - TAR @ FAR=0.001

Face Verification

BUAA-VisNir - TAR @ FAR=0.001

Aspect-Based Sentiment Analysis (ABSA)

Sentihood - Aspect

Aspect-Based Sentiment Analysis (ABSA)

Sentihood - Sentiment

Anomaly Detection In Surveillance Videos

ShanghaiTech Weakly Supervised - AUC-ROC

Point Cloud Registration

3DMatch (at least 30% overlapped - FCGF setting) - Recall (0.3m, 15 degrees)

Diffeomorphic Medical Image Registration


Retinal Vessel Segmentation

ROSE-1 DVC - Dice Score

Dynamic Link Prediction

Enron Emails - AUC

Multimodal Activity Recognition

UTD-MHAD - Accuracy (CS)

Semi-Supervised Video Object Segmentation

YouTube - mIoU

Graph Classification

NCI109 - Accuracy

3D Face Reconstruction

Florence - Average 3D Error

Few-Shot Image Classification

OMNIGLOT - 1-Shot, 5-way - Accuracy

Image-guided Story Ending Generation


Image-guided Story Ending Generation


Visual Relationship Detection

VRD Predicate Detection - R@50

Multivariate Time Series Imputation

Basketball Players Movement - Path Length

Multivariate Time Series Imputation

Basketball Players Movement - OOB Rate (10^−3)

Multivariate Time Series Imputation

Basketball Players Movement - Step Change (10^−3)

Multivariate Time Series Imputation

PEMS-SF - L2 Loss (10^-4)

Hand Gesture Recognition

EgoGesture - Accuracy

Keypoint Detection

COCO test-challenge - APL

Face Identification

Trillion Pairs Dataset - Accuracy

Cross-Lingual Document Classification

MLDoc Zero-Shot English-to-Japanese - Accuracy

JPEG Artifact Correction

LIVE1 (Quality 40 Grayscale) - PSNR

JPEG Artifact Correction

LIVE1 (Quality 30 Grayscale) - PSNR

Diffeomorphic Medical Image Registration

Automatic Cardiac Diagnosis Challenge (ACDC) - Grad Det-Jac

Multivariate Time Series Imputation

Basketball Players Movement - Player Distance

Color Image Denoising

Darmstadt Noise Dataset - PSNR (sRGB)

Color Image Denoising

Darmstadt Noise Dataset - SSIM (sRGB)

Crowd Counting

WorldExpo’10 - Average MAE

Crowd Counting

Venice - MAE

Video Super-Resolution

MSU Video Upscalers: Quality Enhancement - VMAF

Distant Speech Recognition

DIRHA English WSJ - Word Error Rate (WER)

Object Detection

COCO minival - AP50

Open-Domain Question Answering

SearchQA - Unigram Acc

Open-Domain Question Answering

SearchQA - N-gram F1

Data-to-Text Generation

E2E NLG Challenge - ROUGE-L

Data-to-Text Generation

E2E NLG Challenge - CIDEr

Conversation Disentanglement

irc-disentanglement - P

Face Detection


JPEG Artifact Correction

LIVE1 (Quality 20 Color) - PSNR-B

JPEG Artifact Correction

LIVE1 (Quality 20 Color) - SSIM

JPEG Artifact Correction

LIVE1 (Quality 10 Color) - PSNR-B

JPEG Artifact Correction

LIVE1 (Quality 20 Grayscale) - PSNR-B

JPEG Artifact Correction

ICB (Quality 20 Grayscale) - SSIM

JPEG Artifact Correction

Live1 (Quality 10 Grayscale) - PSNR-B

AMR Parsing

LDC2014T12 - F1 Newswire

Graph Classification

COX2 - Accuracy(10-fold)

Nuclear Segmentation

Cell17 - F1-score

Nuclear Segmentation

Cell17 - Dice

Nuclear Segmentation

Cell17 - Hausdorff

Diffeomorphic Medical Image Registration


Face Detection


Face Detection

Annotated Faces in the Wild - AP

Face Verification

BUAA-VisNir - TAR @ FAR=0.01

Image Super-Resolution

FFHQ 1024 x 1024 - 4x upscaling - PSNR

Image Super-Resolution

FFHQ 256 x 256 - 4x upscaling - PSNR

Image Super-Resolution

FFHQ 256 x 256 - 4x upscaling - SSIM

Video Salient Object Detection


Video Salient Object Detection

DAVSOD-Difficult20 - Average MAE

Video Salient Object Detection

MCL - S-Measure

Video Salient Object Detection


Video Salient Object Detection


Video Salient Object Detection

UVSD - S-Measure

Video Salient Object Detection

UVSD - max E-measure

Video Salient Object Detection

UVSD - Average MAE

Salt-And-Pepper Noise Removal

Kodak24 Noise Level 70% - PSNR

Salt-And-Pepper Noise Removal

Kodak24 Noise Level 50% - PSNR

Salt-And-Pepper Noise Removal

Kodak24 Noise Level 30% - PSNR

Photo geolocation estimation

Im2GPS - Street level (1 km)

Photo geolocation estimation

Im2GPS - City level (25 km)

3D Reconstruction

Data3D−R2N2 - 3DIoU

Weakly-Supervised Object Localization

ILSVRC 2016 - Top-5 Error

Multi-view Subspace Clustering

ORL - Accuracy

Unsupervised Facial Landmark Detection

AFLW (Zhang CVPR 2018 crops) - NME

Visual Object Tracking

VOT2016 - Expected Average Overlap (EAO)

Text Simplification

PWKP / WikiSmall - BLEU

Text Simplification

Newsela - BLEU

Atari Games

Atari 2600 Defender - Score

Video Quality Assessment


Video Quality Assessment

MSU SR-QA Dataset - PLCC

Video Alignment

MSU Video Alignment and Retrieval Benchmark Suite - Accuracy w/ 3 frames error (Light)

Video Alignment

MSU Video Alignment and Retrieval Benchmark Suite - Accuracy w/ 3 frames error (Medium geometric)

Video Alignment

MSU Video Alignment and Retrieval Benchmark Suite - Accuracy w/ 3 frames error (Medium color)

Video Alignment

MSU Video Alignment and Retrieval Benchmark Suite - Accuracy w/ 3 frames error (Hard)

Skeleton Based Action Recognition

J-HMDB - Accuracy (RGB+pose)

Video Salient Object Detection

DAVSOD-Difficult20 - max E-measure

Timex normalization

PNT - F1-Score

Multivariate Time Series Imputation

UCI localization data - MAE (10% missing)

Multivariate Time Series Imputation

Basketball Players Movement - Path Difference

Hyperspectral Image Classification

Pavia University - OA@15perclass

Semantic Role Labeling (predicted predicates)

CoNLL 2012 - F1

Video Prediction

SynpickVP - MSE

Multimodal Unsupervised Image-To-Image Translation

Edge-to-Handbags - Diversity

Multimodal Unsupervised Image-To-Image Translation

Edge-to-Shoes - Diversity

3D Shape Classification

Pix3D - R@1

3D Shape Classification

Pix3D - R@16

3D Shape Classification

Pix3D - R@2

3D Shape Classification

Pix3D - R@32

3D Shape Classification

Pix3D - R@4

3D Shape Classification

Pix3D - R@8


Off_Hard_sequential - Median Win Rate


Off_Near_parallel - Median Win Rate

Noisy Speech Recognition

CHiME real - Percentage error

3D Point Cloud Classification

Sydney Urban Objects - F1

Visual Object Tracking

OTB-50 - AUC

Video Prediction

Cityscapes 128x128 - Train

Skin Cancer Segmentation

Kaggle Skin Lesion Segmentation - F1 score

Skin Cancer Segmentation

Kaggle Skin Lesion Segmentation - AUC

Few-Shot 3D Point Cloud Classification

ModelNet40 10-way (20-shot) - Standard Deviation

Speech Synthesis

North American English - Mean Opinion Score

Table-to-Text Generation

WikiBio - BLEU

Unsupervised Image-To-Image Translation

Freiburg Forest Dataset - PSNR

Open-Domain Question Answering

Quasar - EM (Quasar-T)

Open-Domain Question Answering

Quasar - F1 (Quasar-T)

Video Prediction

KTH - Pred

Video Prediction

KTH - Train

Head Pose Estimation

BIWI - MAE (trained with BIWI data)

Liver Segmentation

LiTS2017 - Dice

Pancreas Segmentation

TCIA Pancreas-CT Dataset - Dice Score

Sketch-Based Image Retrieval

Handbags - R@1

Sketch-Based Image Retrieval

Handbags - R@10

Sketch-Based Image Retrieval

Chairs - R@1

Pedestrian Detection

TJU-Ped-campus - RS (miss rate)

Color Image Denoising

CBSD68 sigma50 - PSNR

Visual Relationship Detection

VRD Predicate Detection - R@100

Visual Relationship Detection

VRD Phrase Detection - R@100

Visual Relationship Detection

VRD Phrase Detection - R@50

Visual Relationship Detection

VRD Relationship Detection - R@100

Visual Relationship Detection

VRD Relationship Detection - R@50

Image Super-Resolution

FFHQ 512 x 512 - 4x upscaling - LLE

Semantic Similarity


Semantic Similarity

SICK - Pearson Correlation

Semantic Similarity

SICK - Spearman Correlation

Image Super-Resolution

WebFace - 8x upscaling - PSNR

Sentence Compression

Google Dataset - CR

Image-guided Story Ending Generation


Image-guided Story Ending Generation



Def_Infantry_sequential - Median Win Rate


Off_Superhard_parallel - Median Win Rate

Action Segmentation

JIGSAWS - Accuracy

AMR Parsing

LDC2015E86 - Smatch

Multimodal Activity Recognition

EV-Action - Accuracy

3D Part Segmentation

ShapeNet-Part - Instance Average IoU

Multimodal Unsupervised Image-To-Image Translation

Cats-and-Dogs - CIS

Multimodal Unsupervised Image-To-Image Translation

Cats-and-Dogs - IS

Few-Shot Image Classification

CUB 200 50-way (0-shot) - Accuracy

Keypoint Detection

Pascal3D+ - Mean PCK

Hand Gesture Recognition

ChaLearn val - Accuracy

Malware Detection

Android Malware Dataset - Accuracy

Conversation Disentanglement

Linux IRC (Ch2 Elsner) - 1-1

Conversation Disentanglement

Linux IRC (Ch2 Elsner) - Local

Conversation Disentanglement

Linux IRC (Ch2 Elsner) - Shen F-1

Conversation Disentanglement

Linux IRC (Ch2 Kummerfeld) - 1-1

Skeleton Based Action Recognition

Gaming 3D (G3D) - Accuracy

Hypernym Discovery

Music domain - MAP

Hypernym Discovery

Music domain - MRR

Hypernym Discovery

Music domain - P@5

Hypernym Discovery

General - MAP

Hypernym Discovery

General - MRR

Hypernym Discovery

General - P@5

Hypernym Discovery

Medical domain - MAP

Hypernym Discovery

Medical domain - MRR

Hypernym Discovery

Medical domain - P@5

Open-Domain Question Answering

SQuAD1.1 - EM

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - IoU

Lesion Segmentation

Anatomical Tracings of Lesions After Stroke (ATLAS) - Precision

Image-to-Image Translation

Cityscapes Labels-to-Photo - Class IOU

Image-to-Image Translation

Cityscapes Photo-to-Labels - Class IOU

Atari Games

Atari 2600 Freeway - Score

Math Word Problem Solving

ALG514 - Accuracy (%)

Image Super-Resolution

BSD100 - 4x upscaling - MOS

Image Super-Resolution

Set14 - 4x upscaling - MOS

Medical Image Classification

NCT-CRC-HE-100K - Precision

Relation Classification

SemEval 2010 Task 8 - F1

Few-Shot Image Classification

OMNIGLOT - 5-Shot, 5-way - Accuracy

Semi-Supervised Video Object Segmentation

DAVIS 2016 - Jaccard (Decay)

Semi-Supervised Video Object Segmentation

DAVIS 2016 - F-measure (Decay)

Video Salient Object Detection

SegTrack v2 - max E-measure

Sketch-Based Image Retrieval

Chairs - R@10

Cross-lingual zero-shot dependency parsing

Universal Dependency Treebank - LAS

Atari Games

Atari 2600 Wizard of Wor - Score

Photo geolocation estimation

Im2GPS - Reference images

Atari Games

Atari 2600 Centipede - Score

Atari Games

Atari 2600 Ms. Pacman - Score

Multimodal Activity Recognition

MSR Daily Activity3D dataset - Accuracy

Video Prediction

KTH - Cond

Atari Games

Atari 2600 Double Dunk - Score

Cross-Lingual Document Classification

Reuters RCV1/RCV2 German-to-English - Accuracy

Cross-Lingual Document Classification

Reuters RCV1/RCV2 English-to-German - Accuracy

Document Dating

APW - Accuracy