Interesting (quarterly) AI paper

The field of artificial intelligence is advancing quickly. Reading research papers and source code, if released, is inevitable to stay up to date. This is a list of articles I came across which I consider interesting. It does not necessarily mean that all of them are ground breaking or hyped - I simply considered them interesting while reading.

Q4/2024

Cheng et al. (2024): Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss. arXiv:2410.17243
Huang et al. (2024): LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation. arXiv:2411.04997
Li et al. (2024): AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions. arXiv:2410.20424
Lin et al. (FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors). arXiv:2410.16271
Xu et al. (2024): LLaVA-CoT: Let Vision Language Models Reason Step-by-Step. arXiv:2411.10440
Xu et al. (2024): No More Adam: Learning Rate Scaling at Initialization is All You Need. arXiv:2412.11768

Q3/2024

Fleischer et al. (2024): RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation. arXiv:2408.02545
Ruiz et al. (2024): Magic Insert: Style-Aware Drag-and-Drop. arXiv:2407.02489
Xiao et al. (2024): Enhancing HNSW Index for Real-Time Updates: Addressing Unreachable Points and Performance Degradation. arXiv:2407.07871

Q2/2024

Castells et al. (2024): EdgeFusion: On-Device Text-to-Image Generation. arXiv:2404.11925
Deep et al.(2024): DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling. arXiv:2406.11617
Faysee et al. (2024): ColPali: Efficient Document Retrieval with Vision Language Models. arXiv:2407.01449
Gagrani et al. (2024): On Speculative Decoding for Multimodal Large Language Models. arXiv:2404.08856

Q1/2024

Han et al. (2024): COCO is “ALL’’ You Need for Visual Instruction Fine-tuning. arXiv:2401.08968
Lu et al. (2024): From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities. arXiv:2401.15071
Ma et al. (2024): The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits. arXiv:2402.17764
Sun et al. (2024): EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters. arXiv:2402.04252
Wang et al. (2024): YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv:2402.13616

Q4/2023

Alizadeh et al. (2023): LLM in a flash: Efficient Large Language Model Inference with Limited Memory. arXiv:2312.11514
Garza and Mergenthaler-Canseco (2023): TimeGPT-1. arXiv:2310.03589
Li et al. (2023): Domain Generalization of 3D Object Detection by Density-Resampling. arXiv:2311.10845
Seras et al. (2023): Efficient Object Detection in Autonomous Driving using Spiking Neural Networks: Performance, Energy Consumption Analysis, and Insights into Open-set Object Discovery. arXiv:2312.07466
Wang et al. (2023): BitNet: Scaling 1-bit Transformers for Large Language Models. arXiv:2310.11453
Zhou et al. (2023): WaterHE-NeRF: Water-ray Tracing Neural Radiance Fields for Underwater Scene Reconstruction. arXiv:2312.06946

Q3/2023

Ding et al. (2023): LongNet: Scaling Transformers to 1,000,000,000 Tokens. arXiv:2307.02486
Karaev et al. (2023): CoTracker: It is Better to Track Together. arXiv:2307.07635
Sun et al. (2023): Retentive Network: A Successor to Transformer for Large Language Models. arXiv:2307.08621
Touvron et al. (2023): Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288

Q2/2023

Barchid et al. (2023): Spiking-Fer: Spiking Neural Network for Facial Expression Recognition With Event Cameras. arXiv:2304.10211
Bulatov et al. (2023): Scaling Transformer to 1M tokens and beyond with RMT. arXiv:2304.11062
Ducoffe et al. (2023): LARD – Landing Approach Runway Detection – Dataset for Vision Based Landing. arXiv:2304.09938
Kirillov et al. (2023): Segment Anything. arXiv:2304.02643
Lv et al. (2023): DETRs Beat YOLOs on Real-time Object Detection. arXiv:2304.08069
Pernias et al. (2023): Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models. arXiv:2306.00637
Zhoug et al. (2023): LIMA: Less Is More for Alignment. arXiv:2305.11206

Q1/2023

Bauer et al. (2023): Human-Timescale Adaptation in an Open-Ended Task Space. arXiv:2301.07608
Cuarado et al. (2023): Optical Flow estimation with Event-based Cameras and Spiking Neural Networks. arXiv:2302.06492
Li et al. (2023): BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. arXiv:2301.12597
Sahak et al. (2023): Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild. arXiv:2302.07864
Sauer et al. (2023): StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis. arXiv:2301.09515
Serych and Matas (2023): Planar Object Tracking via Weighted Optical Flow. arXiv:2301.10057
Shinn et al. (2023): Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv:2303.11366
Trabucco et al. (2023): Effective Data Augmentation With Diffusion Models. arXiv:2302.07944
Vallés-Pérez et al. (2023): Empirical study of the modulus as activation function in computer vision applications. arXiv:2301.05993
Wen et al. (2023): BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects. arXiv:2303.14158
Yang et al. (2023): Event Camera Data Pre-training. arXiv:2301.01928
Zhang et al. (2023): Multimodal Chain-of-Thought Reasoning in Language Models. arXiv:2302.00923

Q4/2022

Beyer et al. (2022): FlexiViT: One Model for All Patch Sizes. arXiv:2212.08013
Défossez et al. (2022): High Fidelity Neural Audio Compression. arXiv:2210.13438
Fan et al. (2022): Rolling Shutter Inversion: Bring Rolling Shutter Images to High Framerate Global Shutter Video. arXiv:2210.03040
Ghiasi et al. (2022): What do Vision Transformers Learn? A Visual Exploration. arXiv:2212.06727
Hinton (2022): The Forward-Forward Algorithm: Some Preliminary Investigations. arXiv:2212.13345
Li et al. (2022): Rethinking Vision Transformers for MobileNet Size and Speed. arXiv:2212.08059
Liu et al. (2022): Event-based Monocular Dense Depth Estimation with Recurrent Transformers. arXiv:2212.02791
Radford et al. (2022): Robust Speech Recognition via Large-Scale Weak Supervision. arXiv:2212.04356
Shaker et al. (2022): UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation. arXiv:2212.04497
Taylor et al. (2022): Galactica: A Large Language Model for Science. arXiv:2211.09085

Q3/2022

Boegner et al. (2022): Large Scale Radio Frequency Signal Classification. arXiv:2207.09918
Izacard et al. (2022): Few-shot Learning with Retrieval Augmented Language Models. arXiv:2208.03299
Hu and Li (2022): Early Stopping for Iterative Regularization with General Loss Functions. JMLR 23. pdf
Renzulli and Grangetto (2022): Towards Efficient Capsule Networks. arXiv:2208.09203
Singer et al. (2022): Make-A-Video: Text-to-Video Generation without Text-Video Data. arXiv:2209.14792
Thai et al. (2022): Riesz-Quincunx-UNet Variational Auto-Encoder for Satellite Image Denoising. arXiv:2208.12810
Wang et al. (2022): YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv:2207.02696
Wen et al. (2022): CTL-MTNet: A Novel CapsNet and Transfer Learning-Based Mixed Task Net for the Single-Corpus and Cross-Corpus Speech Emotion Recognition. arXiv:2207.10644
Wu et al. (2022): TinyViT: Fast Pretraining Distillation for Small Vision Transformers. arXiv:2207.10666
Yao et al. (2022): Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning. arXiv:2207.04978

Q2/2022

Balestriero et al. (2022): The Effects of Regularization and Data Augmentation are Class Dependent. arXiv:2204.03632
Boutros et al. (2022): SFace: Privacy-friendly and Accurate Face Recognition using Synthetic Data. arXiv:2206.10520
Cao et al. (2022): Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking. arXiv:2203.14360
De Soursa Riberio et al. (2022): Learning with Capsules: A Survey. arXiv:2206.02664
Gava et al. (2022): PUCK: Parallel Surface and Convolution-kernel Tracking for Event-Based Cameras. arXiv:2205.07657
Imbiriba et al. (2022): Hybrid Neural Network Augmented Physics-based Models for Nonlinear Filtering. arXiv:2204.06471
Lee et al. (2022): Fix the Noise: Disentangling Source Feature for Transfer Learning of StyleGAN. arXiv:2204.14079
Marchisio et al. (2022): Enabling Capsule Networks at the Edge through Approximate Softmax and Squash Operations. arXiv:2206.10200
Öztürk et al. (2022): Zero-Shot AutoML with Pretrained Models. arXiv: 2206.08476
Reed et al. (2022): A Generalist Agent. arXiv:2205.06175
Renzulli et al. (2022): REM: Routing Entropy Minimization for Capsule Networks. arXiv:2204.01298
Rombach et al. (2022): High-Resolution Image Synthesis with Latent Diffusion Models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10684-10695. pdf
Scholl (2022): RF Signal Classification with Synthetic Training Data and its Real-World Performance. arXiv:2206.12967
Sun and Boning (2022): FreDo: Frequency Domain-based Long-Term Time Series Forecasting. arXiv:2205.12301
Wang et al. (2022): Recent Advances in Embedding Methods for Multi-Object Tracking: A Survey. arXiv:2205.10766
Zhang et al. (2022): MiniViT: Compressing Vision Transformers with Weight Multiplexing. arXiv:2204.07154
Zhang et al. (2022): OPT: Open Pre-trained Transformer Language Models. arXiv:2205.01068

Q1/2022

An et al. (2022): Killing Two Birds with One Stone:Efficient and Robust Training of Face Recognition CNNs by Partial FC. arXiv:2203.15565
Akyon et al. (2022): Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection. arXiv:2202.06934
Bright et al. (2022): ME-CapsNet: A Multi-Enhanced Capsule Networks with Routing Mechanism. arXiv:2203.15547
Cao et al. (2022): Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking. arXiv:2203.14360
Drefs et al. (2022): Evolutionary Variational Optimization of Generative Models . JMLR 23(21).
Du et al. (2022): StrongSORT: Make DeepSORT Great Again. arXiv:2202.13514
Huang et al. (2022): 1000x Faster Camera and Machine Vision with Ordinary Devices. arXiv:2201.09302
Huang et al. (2022): Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agent.s arXiv:2201.07207
Jin et al. (2022): Full RGB Just Noticeable Difference (JND) Modelling. arXiv:2203.00629
Lämsä et al. (2022): Video2IMU: Realistic IMU features and signals from videos. arXiv:2202.06547
Li et al. (2022): Brain-inspired Multilayer Perceptron with Spiking Neurons. arXiv:2203.14679
Li et al. (2022): SimpleTrack: Rethinking and Improving the JDE Approach for Multi-Object Tracking. arXiv:2203.03985
Liu et al. (2022): A ConvNet for the 2020s. arXiv:2201.03545
Manita et al. (2022): Universal Approximation in Dropout Neural Networks . JMLR 23. pdf
Roros et al. (2022): maskGRU: Tracking Small Objects in the Presence of Large Background Motions. arXiv:2201.00467
Wang et al. (2022): OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework. arXiv:2202.03052
Yang et al. (2022): Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer. arXiv:2203.03466
Yu et al. (2022): HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network. arXiv:2203.10699
Zhou et al. (2022): TransVOD: End-to-end Video Object Detection with Spatial-Temporal Transformers. arXiv:2201.05047

Q4/2021

Chen and Shrivastava (2021): HR-RCNN: Hierarchical Relational Reasoning for Object Detection. arXiv:2110.13892
Datta and Beerel (2021): Can Deep Neural Networks be Converted to Ultra Low-Latency Spiking Neural Networks?. arXiv:2112.12133
Du et al. (2021): Learning Signal-Agnostic Manifolds of Neural Fields. arXiv:2111.06387
Eichenberg et al. (2021): MAGMA – Multimodal Augmentation of Generative Models through Adapter-based Finetuning. arXiv:2112.05253
Kirby et al. (2021): Reliability of Event Timing in Silicon Neurons. arXiv:2112.14134
Koutini et al. (2021): Efficient Training of Audio Transformers with Patchout. arXiv:2110.05069
Kovachki et al. (2021): On Universal Approximation and Error Bounds for Fourier Neural Operators . JMLR 22(290)
Laakom et al. (2021): Learning to ignore: rethinking attention in CNNs. arXiv:2111.05684
Vinci et al. (2021): Self-consistent stochastic dynamics for finite-size networks of spiking neurons. arXiv:2112.14867
Yuan et al. (2021): Florence: A New Foundation Model for Computer Vision. arXiv:2111.11432

Q3/2021

Chae et al. (2021): SiamEvent: Event-based Object Tracking via Edge-aware Similarity Learning with Siamese Networks. arXiv:2109.13456
Guo et al. (2021): Eyes Tell All: Irregular Pupil Shapes Reveal GAN-generated Faces. arXiv:2109.00162
He et al. (2021): Integrating Circle Kernels into Convolutional Neural Networks. arXiv:2107.02451
Keller & Welling (2021): Topographic VAEs learn Equivariant Capsules. arXiv:2109.01394
Liu et al. (2021): Infrared Small-Dim Target Detection with Transformer under Complex Backgrounds. arXiv:2109.14379
Machado et al. (2021): HSMD: An object motion detection algorithm using a Hybrid Spiking Neural Network Architecture. arXiv:2109.04119
Park et al. (2021): Is Pseudo-Lidar needed for Monocular 3D Object detection? arXiv:2108.06417
Peng et al. (2021): Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation. arXiv:2109.12484
Shi et al. (2021): Reinforcement Learning with Evolutionary Trajectory Generator: A General Approach for Quadrupedal Locomotion. arXiv:2109.06409
Yao et al. (2021): Temporal-wise Attention Spiking Neural Networks for Event Streams Classification. arXiv:2107.11711
Zhao & Cheng (2021): Capsule networks with non-iterative cluster routing. arXiv:2109.09213
Zheng & Zhang (2021): RockGPT: Reconstructing three-dimensional digital rocks from single two-dimensional slice from the perspective of video generation. arXiv:2108.03132

Q2/2021

Bonnaerens et al. (2021): Anchor Pruning for Object Detection. arXiv:2104.00432
Bykov et al. (2021): NoiseGrad: enhancing explanations by introducing stochasticity to model weights. arXiv:2106.10185
Chakraborty et al. (2021): A Fully Spiking Hybrid Neural Network for Energy-Efficient Object Detection. arXiv:2104.10719
Chen et al. (2021): How to Accelerate Capsule Convolutions in Capsule Networks. arXiv:2104.02621
Chen et al. (2021): “BNN - BN = ?”: Training Binary Neural Networks without Batch Normalization. arXiv:2104.08215
Liu et al. (2021): Pay Attention to MLPs. arXiv:2105.08050
Liu et al. (2021): Video Swin Transformer. arXiv:2106.13230
Ney et al. (2021): HALF: Holistic Auto Machine Learning for FPGAs. arXiv:2106.14771
Wu et al. (2021): Poisoning the Search Space in Neural Architecture Search. arXiv:2106.14406
Xiao et al. (2021): Early Convolutions Help Transformers See Better. arXiv:2106.14881
Zhang et al. (2021): Hallucination Improves Few-Shot Object Detection. arXiv:2105.01294
Zhao et al. (2021): Neko: a Library for Exploring Neuromorphic Learning Rules. arXiv:2105.00324
Zhao et al. (2021): TrTr: Visual Tracking with Transformer. arXiv:2105.03817

Q1/2021

Ding et al. (2021): Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges. arXiv:2102.12219
Han et al. (2021): ReDet: A Rotation-equivariant Detector for Aerial Object Detection. arXiv:2103.07733
Jaegle et al. (2021): Perceiver: General Perception with Iterative Attention. arXiv:2103.03206
Joseph et al. (2021): Towards Open World Object Detection. arXiv:2103.02603
Lee et al. (2021): Detecting Micro Fractures with X-ray Computed Tomography. arXiv:2103.12821
Li et al. (2021): Involution: Inverting the Inherence of Convolution for Visual Recognition. arXiv:2103.06255
Liu et al (2021): Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv:2103.14030
Mazzia et al. (2021): Efficient-CapsNet: Capsule Network with Self-Attention Routing. arXiv:2101.12491
Northcutt et al. (2021): Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks. arXiv:2103.14749
Ren et al. (2021): Deep Texture-Aware Features for Camouflaged Object Detection. arXiv:2102.02996
Runkel et al. (2021): Depthwise Separable Convolutions Allow for Fast and Memory-Efficient Spectral Normalization. arXiv:2102.06496
Titirsha et al. (2021): Endurance-Aware Mapping of Spiking Neural Networks to Neuromorphic Hardware. arXiv:2103.05707
Tuggener at al. (2021): Is it Enough to Optimize CNN Architectures on ImageNet?. arXiv:2103.09108
Zhou et al. (2021): Probabilistic two-stage detection. arXiv:2103.07461

Q4/2020

Awad et al. (2020): Differential Evolution for Neural Architecture Search. arXiv:2012.06400
Chen et al. (2020): A Group-Theoretic Framework for Data Augmentation. JMLR 21(245): 1-71
Gerg and Monga (2020): Deep Autofocus for Synthetic Aperture Sonar. arXiv:2010.15687
Hu et al. (2020): Multi-objective Neural Architecture Search with Almost No Training. arXiv:2011.13591
Kedziora et al. (2020): AutonoML: Towards an Integrated Framework for Autonomous Machine Learning. arXiv:2012.12600
Keller et al. (2020): Self Normalizing Flows. arXiv:2011.07248
Kileel et al. (2020): Manifold learning with arbitrary norms. arXiv:2012.14172
Li and Jordan (2020): Stochastic Approximation for Online Tensorial Independent Component Analysis. arXiv:2012.14415
Li et al. (2020): Underwater image filtering: methods, datasets and evaluation. arXiv:2012.12258
Lindauer and Hutter (2020): Best Practices for Scientific Research on Neural Architecture Search. JMLR 21(243): 1-18
Liu et al. (2020): YolactEdge: Real-time Instance Segmentation on the Edge (Jetson AGX Xavier: 30 FPS, RTX 2080 Ti: 170 FPS). arXiv:2012.12259
Luo and Jennings (2020): A Differential Privacy Mechanism that Accounts for Network Effects for Crowdsourcing Systems. JAIR 69, 1127-1164. doi: 10.1613/jair.1.12158
Neekhara et al. (2020): Adversarial Threats to DeepFake Detection: A Practical Perspective. arXiv:2011.09957
Pang et al. (2020): TROJANZOO: Everything you ever wanted to know about neural backdoors (but were afraid to ask). arXiv:2012.09302
Rock et al. (2020): Quantized Neural Networks for Radar Interference Mitigation. 2011.12706
Salman et al. (2020): Unadversarial Examples: Designing Objects for Robust Vision. arXiv:2012.12235
Schrittwieser et al. (2020): Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588: 604-609. doi:10.1038/s41586-020-03051-4
Sheeny (2020): All-Weather Object Recognition Using Radar and Infrared Sensing. arXiv:2010.16285
Shen et al. (2020): DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation. arXiv:2011.09876
Sushko et al. (2020): You Only Need Adversarial Supervision for Semantic Image Synthesis. arXiv:2012.04781
Sun et al. (2020): Extreme Value Preserving Networks. arXiv:2011.08367
Sun et al. (2020): Identifying Invariant Texture Violation for Robust Deepfake Detection. arXiv:2012.10580
Svendsen et al. (2020): Deep Gaussian Processes for geophysical parameter retrieval. arXiv:2012.12099
Wandl et al. (2020): Fast Fluid Simulations in 3D with Physics-Informed Deep Learning. arXiv:2012.11893
Weston et al. (2020): There and Back Again: Learning to Simulate Radar Data for Real-World Applications. arXiv:2011.14389
Xie et al. (2020): Skillearn: Machine Learning Inspired by Humans’ Learning Skills. arXiv:2012.04863
Yu et al (2020): HMFlow: Hybrid Matching Optical Flow Network for Small and Fast-Moving Objects. arXiv:2011.09654
Yue et al (2020): Effective, Efficient and Robust Neural Architecture Search. arXiv:2011.09820
Zhang et al. (2020): FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations. arXiv:2012.12206
Zhu et al. (2020): Integrating Deep Neural Networks with Full-waveform Inversion: Reparametrization, Regularization, and Uncertainty Quantification. arXiv:2012.11149

Q3/2020

Agrawal et al. (2020): Wide Neural Networks with Bottlenecks are Deep Gaussian Processes. JMLR 21 (175)
Bonald et al. (2020): Scikit-network: Graph Analysis in Python. JMLR 21(185)
Chen et al (2020): Learning Deep ReLU Networks Is Fixed-Parameter Tractable. arXiv:2009.13512
Chen et al. (2020): WaveGrad: Estimating Gradients for Waveform Generation. arXiv:2009.00713
Davies et al. (2020): Overfit Neural Networks as a Compact Shape Representation. arXiv:2009.09808
Feurer et al. (2020): Auto-Sklearn 2.0: The Next Generation. arXiv:2007.04074
Fuchs and Pernkopf (2020): Wasserstein Routed Capsule Networks. arXiv:2007.11465
Guo et al. (2020): Variational Temporal Deep Generative Model for Radar HRRP Target Recognition. arXiv:2009.13011
Kidger et al. (2020): “Hey, that’s not an ODE”: Faster ODE Adjoints with 12 Lines of Code. arXiv:2009.09457
Long et al. (2020): PP-YOLO: An Effective and Efficient Implementation of Object Detector. arXiv:2007.12099
Morrill et al. (2020): Neural CDEs for Long Time-Series via the Log-ODE Method. arXiv:2009.08295
Nguyen et al (2020): Quaternion Graph Neural Networks. arXiv:2008.05089
Obukhov et al. (2020): T-Basis: a Compact Representation for Neural Networks. arXiv:2007.06631
Perot et al. (2020): Learning to Detect Objects with a 1 Megapixel Event Camera. arXiv:2009.13436
Reuther et al. (2020): Survey of Machine Learning Accelerators. arXiv:2009.00993
Shen and Savvides (2020): MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks. arXiv:2009.08453
Tek et al. (2020): Adaptive Convolution Kernel for Artificial Neural Networks. arXiv:2009.06385
Wunderlich and Pehle (2020): EventProp: Backpropagation for Exact Gradients in Spiking Neural Networks. arXiv:2009.08378
Xiang et al. (2020): KIT MOMA: A Mobile Machines Dataset. arXiv:2007.04198

Q2/2020

Ahmed et al. (2020): Reinforcement Learning based Beamforming for Massive MIMO Radar Multi-target Detection. arXiv:2005.04708
Brown et al. (2020): Language Models are Few-Shot Learners. arXiv:2005.14165
Carion et al. (2020): End-to-End Object Detection with Transformers. arXiv:2005.12872
Cheng et al. (2020): Detecting and Tracking Communal Bird Roosts in Weather Radar Data. arXiv:2004.12819
Cui et al. (2020): Fully Convolutional Online Tracking. arXiv:2004.07109
Dogra and Redman (2020): Optimizing Neural Networks via Koopman Operator Theory. arXiv:2006.02361
Geirhos et al. (2020): Shortcut Learning in Deep Neural Networks. arXiv:2004.07780
Hernandex and Brown (2020): Measuring the Algorithmic Efficiency of Neural Networks. arXiv:2005.04305
Huang et al. (2020): SQE: a Self Quality Evaluation Metric for Parameters Optimization in Multi-Object Tracking. arXiv:2004.07472
Hupkees et al. (2020): Compositionality Decomposed: How do Neural Networks Generalise? . JAIR (67), 757 - 795. doi:10.1613/jair.1.11674
Lee et al. (2020): Continual Learning with Extended Kronecker-factored Approximate Curvature. arXiv:2004.07507
Lelekas et al. (2020): Top-Down Networks: A coarse-to-fine reimagination of CNNs. arXiv:2004.07629
Li et al. (2020): SmallBigNet: Integrating Core and Contextual Views for Video Classification. arXiv:2006.14582
Marchisio et al. (2020): Q-CapsNets: A Specialized Framework for Quantizing Capsule Networks. arXiv:2004.07116
Marvasti-Zadeh et al. (2020): COMET: Context-Aware IoU-Guided Network for Small Object Tracking. arXiv:2006.02597
Mobiny et al. (2020): Radiologist-Level COVID-19 Detection Using CT Scans with Detail-Oriented Capsule Networks. arXiv:2004.07407
Palffy et al. (2020): CNN based Road User Detection using the 3D Radar Cube. arXiv:2004.12165
Park et al. (2020): Variational Bayes In Private Settings (VIPS). JAIR (68), 109 -157. doi:10.1613/jair.1.11763
Quaknine et al. (2020): CARRADA Dataset: Camera and Automotive Radar with Range-Angle-Doppler Annotations. arXiv:2005.01456
Qui et al. (2020): Quaternion Neural Networks for Multi-channel Distant Speech Recognition. arXiv:2005.08566
Scheiner et al. (2020): Off-the-shelf sensor vs. experimental radar – How much resolution is necessary in automotive radar classification?. arXiv:2006.05485
Shuai et al. (2020): Multi-Object Tracking with Siamese Track-RCNN. arXiv:2004.07786
Sitzmann et al. (2020): Implicit Neural Representations with Periodic Activation Functions. 2006.09661
Thornton et al. (2020): Deep Reinforcement Learning Control for Radar Detection and Tracking in Congested Spectral Environments. arXiv:2006.13173
Toyer et al. (2020): ASNets: Deep Learning for Generalised Planning. JARI (68), 1 - 68. doi:10.1613/jair.1.11633
Valery et al. (2020): Self-Supervised training for blind multi-frame video denoising. arXiv:2004.06957
Wang et al. (2020): Residual-driven Fuzzy C-Means Clustering for Image Segmentation. arXiv:2004.07160
Wiedemann et al. (2020): Dithered backprop: A sparse and quantized backpropagation algorithm for more efficient deep neural network training. arXiv:2004.04729
Zhang et al. (2020): Ocean: Object-aware Anchor-free Tracking. arXiv2006.10721
Zhao et al. (2020): TSDM: Tracking by SiamRPN++ with a Depth-refiner and a Mask-generator. arXiv:2005.04063

Q1/2020

Arias-Castro et al. (2020): Perturbation Bounds for Procrustes, Classical Scaling, and Trilateration, with Applications to Manifold Learning . JMLR 21
Blondel et al. (2020): Learning with Fenchel-Young losses. JMLR 21(35):1-69
Danelljan et al. (2020): Probabilistic Regression for Visual Tracking. arXiv:2003.12565
Deng et al. (2020): Self-attention-based BiGRU and capsule network for named entity recognition. arXiv:2002.00735
Edraki et al. (2020): Subspace Capsule Network. arXiv:2002.02924v1
Hadjeres and Nielsen (2020): Schoenberg-Rao distances: Entropy-based and geometry-aware statistical Hilbert distances. arXiv:2002.08345
Jia et al. (2020): Entangled Watermarks as a Defense against Model Extraction. arXiv:2002.12200
Kadeethum et al. (2020): Physics-informed Neural Networks for Solving Nonlinear Diffusivity and Biot’s equations. arXiv:2002.08235
Liu et al. (2020): Are Labels Necessary for Neural Architecture Search? arXiv:2003.12056
Manchev and Spratling (2020): Target Propagation in Recurrent Neural Networks . JMLR 21(7):1−33.
Molnar and Culurciello et al. (2020): Capsule Network Performance with Autonomous Navigation. arXiv:2002.03181v1
Punjabi et al. (2020): Examining the Benefits of Capsule Neural Networks. arXiv:2001.10964
Radosavovic et al. (2020): Designing Network Design Spaces. arXiv:2003.13678
Rogers et al. (2020): A Primer in BERTology: What we know about how BERT works. arXiv:2002.12327
Romero et al. (2020): Attentive Group Equivariant Convolutional Networks. arXiv:2002.03830
Ruby et al. (2020: The Mertens Unrolled Network (MU-Net): A High Dynamic Range Fusion Neural Network for Through the Windshield Driver Recognition. arXiv:2002.12257
Schmitt et al. (2020): Weakly Supervised Semantic Segmentation of Satellite Images for Land Cover Mapping – Challenges and Opportunities. arXiv:2002.08254v1
Vecchi et al. (2020): Compressing deep quaternion neural networks with targeted regularization. arXiv:1907.11546v2
Tang et al. (2020): RSL-Net: Localising in Satellite Images From a Radar on the Ground. arXiv:2001.03233
Thornton et al. (2020): Experimental Analysis of Reinforcement Learning Techniques for Spectrum Sharing Radar. arXiv:2001.01799
Tsai et al. (2020): Capsules with Inverted Dot-Product Attention Routing. ICLR 2020
Wang et al. (2020): Multi-wavelet residual dense convolutional neural network for image denoising. arXiv:2002.08254
Yoo and Owhadi (2020): Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows. arXiv:2002.08335

Q4/2019

Dovesi et al. (2019): Real-Time Semantic Stereo Matching. arXiv:1910.00541
Gu and Tresp (2019): Improving the Robustness of Capsule Networks to Image Affine Transformations. arXiv:1911.0796
Hoogi et al. (2019): Self-Attention Capsule Networks for Object Classification. arXiv:1904.12483
Hwang et al. (2019): SegSort: Segmentation by Discriminative Sorting of Segments. arXiv:1910.0696
Jegorova et al. (2019): Full-Scale Continuous Synthetic Sonar Data Generation with Markov Conditional Generative Adversarial Networks. arXiv:1910.06750
Liu et al. (2019): GPRInvNet: Deep Learning-Based Ground Penetrating Radar Data Inversion for Tunnel Lining. arXiv:1912.05759
Nguyen et al. (2019): Use of a Capsule Network to Detect Fake Images and Videos. arXiv:1910.12467
Scheiner et al. (2019): Seeing Around Street Corners: Non-Line-of-Sight Detection and Tracking In-the-Wild Using Doppler Radar. arXiv:1912.06613
Varadarajan et al. (2019): Benchmark for Generic Product Detection: A strong baseline for Dense Object Detection. arXiv:1912.09476
Wang et al. (2019): CSPNet: A New Backbone that can Enhance Learning Capability of CNN. arXiv:1911.11929
Weissman et al. (2019): JackHammer: Efficient Rowhammer on Heterogeneous FPGA-CPU Platform. arXiv:1912.11523
Zhang et al. (2019): 3D-Rotation-Equivariant Quaternion Neural Networks. arXiv:1911.09040
Zhao et al. (2019): Quaternion Equivariant Capsule Networks for 3D Point Clouds. arXiv:1912.12098

Q3/2019

Andraghetti et al. (2019): Enhancing self-supervised monocular depth estimation with traditional visual odometry. arXiv:1908.03127
Caliva et al. (2019): Distance Map Loss Penalty Term for Semantic Segmentation. arXiv:1908.03679
Chen et al. (2019): Fast Point R-CNN. arXiv:1908.02990
Choi et al. (2019): Attention routing between capsules. arXiv:1907.01750
Duggal et al. (2019): DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch. arXiv:1909.05845
Garnier et al. (2019): A review on Deep Reinforcement Learning for Fluid Mechanics. arXiv:1908.04127
Gong et al. (2019): AutoGAN: Neural Architecture Search for Generative Adversarial Networks. arXiv:1908.03835
He et al. (2019): Constructing an Associative Memory System Using Spiking Neural Network. Front. Neurosci., DOI:10.3389/fnins.2019.00650
Huegle et al. (2019): Dynamic Input for Deep Reinforcement Learning in Autonomous Driving. arXiv:1907.10994
Kim and Ganapathi (2019): LumièreNet: Lecture Video Synthesis from Audio. arXiv:1907.02253
Kulhánek et al. (2019): Vision-based Navigation Using Deep Reinforcement Learning. arXiv:1908.03627
Lee et al. (2019): On-Device Neural Net Inference with Mobile GPUs. arXiv:1907.01989
Li et al. (2019): Deformable Tube Network for Action Detection in Videos. arXiv:1907.01847
Li et al. (2019): Overfitting of neural nets under class imbalance: Analysis and improvements for segmentation. arXiv:1907.10982
Li et al. (2019): Differentially Private Meta-Learning. arXiv:1909.05830
Liu et al. (2019): On the Variance of the Adaptive Learning Rate and Beyond. arXiv:1908.03265
Misra et al. (2019): Mish: A Self Regularized Non-Monotonic Neural Activation Function. arXiv:1908.08681
Qin et al. (2019: Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions. arXiv:1907.02957
Soures and Kudithipudi (2019): Deep Liquid State Machines With Neural Plasticity for Video Activity Recognition. Front. Neurosci., DOI:10.3389/fnins.2019.00686
Wang and Shen (2019): Flow-Motion and Depth Network for Monocular Stereo and Beyond. arXiv:1909.05452
You et al. (2019): Tracking system of Mine Patrol Robot for Low Illumination Environment. arXiv:1907.01806
You et al. (2019): Gate Decorator: Global Filter Pruning Method for Accelerating Deep Convolutional Neural Networks. arXiv:1909.08174
Zhao et al. (2019): UER: An Open-Source Toolkit for Pre-training Models. arXiv:1909.05658
Zhang et al. (2019): Lookahead Optimizer: k steps forward, 1 step back. arXiv:1907.08610
Zhang et al. (2019): SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications. arXiv:1907.11093
Zhou et al (2019): One-stage Shape Instantiation from a Single 2D Image to 3D Point Cloud. arXiv:1907.10763

Q2/2019

Alekseev and Bobe (2019): GaborNet: Gabor filters with learnable parameters in deep convolutional neural networks. arXiv:1904.13204
Ardila et al. (2019): End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. doi:10.1038/s41591-019-0447-x
Bai et al. (2019): Deep Learning Based Robot for Automatically Picking up Garbage on the Grass. arXiv:1904.13034
Balog et al. (2019): Fast Training of Sparse Graph Neural Networks on Dense Hardware. arXiv:1906.11786
Becker et al. (2019): Deep Optimal Stopping. Journal of Machine Learning Research 20 (2019) 1-25
Berner et al. (2019): How degenerate is the parametrization of neural networks with the ReLU activation function? arXiv:1905.09803
Brandt J. (2019): Spatio-temporal crop classification of low-resolution satellite imagery with capsule layers and distributed attention. arXiv:1904.10130
Danzer et al. (2019): 2D Car Detection in Radar Data with PointNets. arXiv:1904.08414
Drori et al. (2019): Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar. arXiv:1905.10345
Eggensperger (2019): Pitfalls and Best Practices in Algorithm Configuration. Journal of Artificial Intelligence Research 64 (2019) 861-893
Harikrishnan and Nagaraj (2019): A Novel Chaos Theory Inspired Neural Architecture. arXiv:1905.12601
Hoogi et al. (2019): Self-Attention Capsule Networks for Image Classification. arXiv:1904.12483
Hu et al. (2019): Optimal Sparse Decision Trees. arXiv:1904.12847
Hughes et al. (2019): Wave Physics as an Analog Recurrent Neural Network. arXiv:1904.12831
Jia et al. (2019): Direct speech-to-speech translation with a sequence-to-sequence model. arXiv:1904.06037
Klemmer et al. (2019): Augmenting correlation structures in spatial data using deep generative models. arXiv:1905.09796
Kosiorek et al. (2019): Stacked Capsule Autoencoders. arXiv:1906.06818
Leite and Enembreck (2019): Using Collective Behavior of Coupled Oscillators for Solving DCOP. Journal of Artificial Intelligence Research 64 (2019) 987-1023
Li (2019): Graph Matching Networks for Learning the Similarity of Graph Structured Objects. arXiv:1904.12787
Nguyen and Holmes (2019): Ten quick tips for effective dimensionality reduction. PLoS Comput Biol 15(6): e1006907. DOI:10.1371/journal.pcbi.1006907
Oh et al. (2019): Speech2Face: Learning the Face Behind a Voice. arXiv:1905.09773
Park et al. (2019): SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. arXiv:1904.08779
Rajasegaran et al. (2019): DeepCaps: Going Deeper with Capsule Networks. arXiv:1904.09546
Sanyal et al. (2019): Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision. arXiv:1905.06817
Sherry et al. (2019): Learning the Sampling Pattern for MRI. arXiv:1906.08754
Shin (2019): Encoding Database Schemas with Relation-Aware Self-Attention for Text-to-SQL Parsers. arXiv:11790
Sun et al. (2019): GeoCapsNet: Aerial to Ground view Image Geo-localization using Capsule Network. arXiv:1904.06281
Thomas et al. (2019): DeLiO: Decoupled LiDAR Odometry. arXiv:1904.12667
Valade et al. (2019): Towards Global Volcano Monitoring Using Multisensor Sentinel Missions and Artificial Intelligence: The MOUNTS Monitoring System. DOI:10.3390/rs11131528
Wang et al. (2019): Monocular Plan View Networks for Autonomous Driving. arXiv: 1905.06937
Zhang (2019): Making Convolutional Networks Shift-Invariant Again. arXiv:1904.11486
Zhang et al. (2019): Quaternion Knowledge Graph Embedding. arXiv:1904.10281
Zhang et al. (2019): You Only Propagate Once: Accelerate Adversarial Training via Maximal Principle. arXiv:1905.00877
Zhao et al. (2019): Fast Inference in Capsule Networks Using Accumulated Routing Coefficients. arXiv:1904.07304
Zhao et al. (2019): PyOD: A Python Toolbox for Scalable Outlier Detection. JMLR 20(96):1−7. http://jmlr.org/papers/v20/19-011.html
Zhu et al. (2019): Transferable Clean-Label Poisoning Attacks on Deep Neural Nets. arXiv:1905.05897

Q1/2019

Barz and Denzler (2019): Deep Learning on Small Datasets without Pre-Training using Cosine Loss. arXiv:1901.09054v1
Cheng, S. et al. (2019): MeshGAN: Non-linear 3D Morphable Models of Faces. arXiv:1903.10384
Duarte, A. et al. (2019): Wav2Pix: Speech-conditioned Face Generation using Generative Adversarial Networks. arXiv:1903.10195
Elser, V. et al. (2019): Monotone Learning with Rectified Wire Networks. Journal of Machine Learning Research (20), 1 - 42. link
Fey, M. and Lenssen, J. E. (2019): Fast Graph Representation Learning with PyTorch Geometric. arXiv:1903.02428
Francis, A. et al. (2019): Long-Range Indoor Navigation with PRM-RL. arXiv:1902.09458
Ge et al. (2019): DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images. arXiv:1901.07973v1
Hawkins et al. (2019): A Framework for Intelligence and Cortical Function Based on Grid Cells in the Neocortex. doi:10.3389/fncir.2018.00121
Kreiss et al. (2019): PifPaf: Composite Fields for Human Pose Estimation. arXiv:1903.06593
Li et al. (2019): Rethinking on Multi-Stage Networks for Human Pose Estimation. arXiv:1901.00148
Mirsky, Y. et al. (2019): CT-GAN: Malicious Tampering of 3D Medical Imagery using Deep Learning. arXiv:1901.03597
Sonoda, S. and Murata, N. (2019): Transport Analysis of Infinitely Deep Neural Network. Journal of Machine Learning Research (20), 1-52. link
Sun, K. et al (2019): Deep High-Resolution Representation Learning for Human Pose Estimation. arXiv:1902.09212
Tang, Z. and Hwang, J.-N. (2019): MOANA: An Online Learned Adaptive Appearance Model for Robust Multiple Object Tracking in 3D. arXiv:1901.02626
Voigtlaender et al. (2019): FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation. arXiv:1902.09513
Wofk, D. et al. (2019): FastDepth: Fast Monocular Depth Estimation on Embedded Systems. arXiv:1903.03273
Wu et al. (2019): Simplifying Graph Convolutional Networks. arXiv:1902.07153
Xinyi, Z. and Chen, L. (2019): Capsule Graph Neural Network. ICLR 2019. link
Xu, B. et al. (2019): Graph Wavelet Neural Network. ICLR 2019. link

Q4/2018

Istrate et al. (2018): TAPAS: Train-less Accuracy Predictor for Architecture Search. https://www.ibm.com/blogs/research/2018/12/tapas/; preprint: arXiv: 1806.00250
Jang et al. (2018): Spiking Neural Networks: A Stochastic Signal Processing Perspective. arXiv: 1812.03929v2
Leike et al. (2018): Scalable agent alignment via reward modeling: a research direction. arXiv: 1811.07871
O’Keeffe et al. (2018): Adaptive Online Fault Diagnosis in Autonomous Robot Swarms. doi: 10.3389/frobt.2018.00131
Prenger et al. (2018): WaveGlow: A Flow-based Generative Network for Speech Synthesis. arXiv: 1811.00002. Code available here: https://github.com/NVIDIA/waveglow