The field of artificial intelligence is advancing quickly. Reading research papers and source code, if released, is inevitable to stay up to date. This is a list of articles I came across which I consider interesting. It does not necessarily mean that all of them are ground breaking or hyped - I simply considered them interesting while reading.

Q4/2024

  • Cheng et al. (2024): Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss. arXiv:2410.17243

  • Huang et al. (2024): LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation. arXiv:2411.04997

  • Li et al. (2024): AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions. arXiv:2410.20424
  • Lin et al. (FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors). arXiv:2410.16271

  • Xu et al. (2024): LLaVA-CoT: Let Vision Language Models Reason Step-by-Step. arXiv:2411.10440
  • Xu et al. (2024): No More Adam: Learning Rate Scaling at Initialization is All You Need. arXiv:2412.11768

Q3/2024

  • Fleischer et al. (2024): RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation. arXiv:2408.02545

  • Ruiz et al. (2024): Magic Insert: Style-Aware Drag-and-Drop. arXiv:2407.02489

  • Xiao et al. (2024): Enhancing HNSW Index for Real-Time Updates: Addressing Unreachable Points and Performance Degradation. arXiv:2407.07871

Q2/2024

  • Castells et al. (2024): EdgeFusion: On-Device Text-to-Image Generation. arXiv:2404.11925

  • Deep et al.(2024): DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling. arXiv:2406.11617

  • Faysee et al. (2024): ColPali: Efficient Document Retrieval with Vision Language Models. arXiv:2407.01449

  • Gagrani et al. (2024): On Speculative Decoding for Multimodal Large Language Models. arXiv:2404.08856

Q1/2024

  • Han et al. (2024): COCO is “ALL’’ You Need for Visual Instruction Fine-tuning. arXiv:2401.08968

  • Lu et al. (2024): From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities. arXiv:2401.15071

  • Ma et al. (2024): The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits. arXiv:2402.17764

  • Sun et al. (2024): EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters. arXiv:2402.04252

  • Wang et al. (2024): YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv:2402.13616

Q4/2023

  • Alizadeh et al. (2023): LLM in a flash: Efficient Large Language Model Inference with Limited Memory. arXiv:2312.11514

  • Garza and Mergenthaler-Canseco (2023): TimeGPT-1. arXiv:2310.03589

  • Li et al. (2023): Domain Generalization of 3D Object Detection by Density-Resampling. arXiv:2311.10845

  • Seras et al. (2023): Efficient Object Detection in Autonomous Driving using Spiking Neural Networks: Performance, Energy Consumption Analysis, and Insights into Open-set Object Discovery. arXiv:2312.07466

  • Wang et al. (2023): BitNet: Scaling 1-bit Transformers for Large Language Models. arXiv:2310.11453

  • Zhou et al. (2023): WaterHE-NeRF: Water-ray Tracing Neural Radiance Fields for Underwater Scene Reconstruction. arXiv:2312.06946

Q3/2023

  • Ding et al. (2023): LongNet: Scaling Transformers to 1,000,000,000 Tokens. arXiv:2307.02486

  • Karaev et al. (2023): CoTracker: It is Better to Track Together. arXiv:2307.07635

  • Sun et al. (2023): Retentive Network: A Successor to Transformer for Large Language Models. arXiv:2307.08621

  • Touvron et al. (2023): Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288

Q2/2023

  • Barchid et al. (2023): Spiking-Fer: Spiking Neural Network for Facial Expression Recognition With Event Cameras. arXiv:2304.10211
  • Bulatov et al. (2023): Scaling Transformer to 1M tokens and beyond with RMT. arXiv:2304.11062

  • Ducoffe et al. (2023): LARD – Landing Approach Runway Detection – Dataset for Vision Based Landing. arXiv:2304.09938

  • Kirillov et al. (2023): Segment Anything. arXiv:2304.02643

  • Lv et al. (2023): DETRs Beat YOLOs on Real-time Object Detection. arXiv:2304.08069

  • Pernias et al. (2023): Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models. arXiv:2306.00637

  • Zhoug et al. (2023): LIMA: Less Is More for Alignment. arXiv:2305.11206

Q1/2023

  • Bauer et al. (2023): Human-Timescale Adaptation in an Open-Ended Task Space. arXiv:2301.07608

  • Cuarado et al. (2023): Optical Flow estimation with Event-based Cameras and Spiking Neural Networks. arXiv:2302.06492

  • Li et al. (2023): BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. arXiv:2301.12597

  • Sahak et al. (2023): Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild. arXiv:2302.07864
  • Sauer et al. (2023): StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis. arXiv:2301.09515
  • Serych and Matas (2023): Planar Object Tracking via Weighted Optical Flow. arXiv:2301.10057
  • Shinn et al. (2023): Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv:2303.11366

  • Trabucco et al. (2023): Effective Data Augmentation With Diffusion Models. arXiv:2302.07944

  • Vallés-Pérez et al. (2023): Empirical study of the modulus as activation function in computer vision applications. arXiv:2301.05993

  • Wen et al. (2023): BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects. arXiv:2303.14158

  • Yang et al. (2023): Event Camera Data Pre-training. arXiv:2301.01928

  • Zhang et al. (2023): Multimodal Chain-of-Thought Reasoning in Language Models. arXiv:2302.00923

Q4/2022

  • Beyer et al. (2022): FlexiViT: One Model for All Patch Sizes. arXiv:2212.08013

  • Défossez et al. (2022): High Fidelity Neural Audio Compression. arXiv:2210.13438

  • Fan et al. (2022): Rolling Shutter Inversion: Bring Rolling Shutter Images to High Framerate Global Shutter Video. arXiv:2210.03040

  • Ghiasi et al. (2022): What do Vision Transformers Learn? A Visual Exploration. arXiv:2212.06727

  • Hinton (2022): The Forward-Forward Algorithm: Some Preliminary Investigations. arXiv:2212.13345

  • Li et al. (2022): Rethinking Vision Transformers for MobileNet Size and Speed. arXiv:2212.08059
  • Liu et al. (2022): Event-based Monocular Dense Depth Estimation with Recurrent Transformers. arXiv:2212.02791

  • Radford et al. (2022): Robust Speech Recognition via Large-Scale Weak Supervision. arXiv:2212.04356

  • Shaker et al. (2022): UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation. arXiv:2212.04497

  • Taylor et al. (2022): Galactica: A Large Language Model for Science. arXiv:2211.09085

Q3/2022

  • Boegner et al. (2022): Large Scale Radio Frequency Signal Classification. arXiv:2207.09918

  • Izacard et al. (2022): Few-shot Learning with Retrieval Augmented Language Models. arXiv:2208.03299

  • Hu and Li (2022): Early Stopping for Iterative Regularization with General Loss Functions. JMLR 23. pdf

  • Renzulli and Grangetto (2022): Towards Efficient Capsule Networks. arXiv:2208.09203

  • Singer et al. (2022): Make-A-Video: Text-to-Video Generation without Text-Video Data. arXiv:2209.14792

  • Thai et al. (2022): Riesz-Quincunx-UNet Variational Auto-Encoder for Satellite Image Denoising. arXiv:2208.12810

  • Wang et al. (2022): YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv:2207.02696
  • Wen et al. (2022): CTL-MTNet: A Novel CapsNet and Transfer Learning-Based Mixed Task Net for the Single-Corpus and Cross-Corpus Speech Emotion Recognition. arXiv:2207.10644
  • Wu et al. (2022): TinyViT: Fast Pretraining Distillation for Small Vision Transformers. arXiv:2207.10666

  • Yao et al. (2022): Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning. arXiv:2207.04978

Q2/2022

  • Balestriero et al. (2022): The Effects of Regularization and Data Augmentation are Class Dependent. arXiv:2204.03632
  • Boutros et al. (2022): SFace: Privacy-friendly and Accurate Face Recognition using Synthetic Data. arXiv:2206.10520

  • Cao et al. (2022): Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking. arXiv:2203.14360

  • De Soursa Riberio et al. (2022): Learning with Capsules: A Survey. arXiv:2206.02664

  • Gava et al. (2022): PUCK: Parallel Surface and Convolution-kernel Tracking for Event-Based Cameras. arXiv:2205.07657

  • Imbiriba et al. (2022): Hybrid Neural Network Augmented Physics-based Models for Nonlinear Filtering. arXiv:2204.06471

  • Lee et al. (2022): Fix the Noise: Disentangling Source Feature for Transfer Learning of StyleGAN. arXiv:2204.14079

  • Marchisio et al. (2022): Enabling Capsule Networks at the Edge through Approximate Softmax and Squash Operations. arXiv:2206.10200

  • Öztürk et al. (2022): Zero-Shot AutoML with Pretrained Models. arXiv: 2206.08476

  • Reed et al. (2022): A Generalist Agent. arXiv:2205.06175
  • Renzulli et al. (2022): REM: Routing Entropy Minimization for Capsule Networks. arXiv:2204.01298
  • Rombach et al. (2022): High-Resolution Image Synthesis with Latent Diffusion Models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10684-10695. pdf

  • Scholl (2022): RF Signal Classification with Synthetic Training Data and its Real-World Performance. arXiv:2206.12967
  • Sun and Boning (2022): FreDo: Frequency Domain-based Long-Term Time Series Forecasting. arXiv:2205.12301

  • Wang et al. (2022): Recent Advances in Embedding Methods for Multi-Object Tracking: A Survey. arXiv:2205.10766

  • Zhang et al. (2022): MiniViT: Compressing Vision Transformers with Weight Multiplexing. arXiv:2204.07154
  • Zhang et al. (2022): OPT: Open Pre-trained Transformer Language Models. arXiv:2205.01068

Q1/2022

  • An et al. (2022): Killing Two Birds with One Stone:Efficient and Robust Training of Face Recognition CNNs by Partial FC. arXiv:2203.15565
  • Akyon et al. (2022): Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection. arXiv:2202.06934

  • Bright et al. (2022): ME-CapsNet: A Multi-Enhanced Capsule Networks with Routing Mechanism. arXiv:2203.15547

  • Cao et al. (2022): Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking. arXiv:2203.14360

  • Drefs et al. (2022): Evolutionary Variational Optimization of Generative Models . JMLR 23(21).
  • Du et al. (2022): StrongSORT: Make DeepSORT Great Again. arXiv:2202.13514

  • Huang et al. (2022): 1000x Faster Camera and Machine Vision with Ordinary Devices. arXiv:2201.09302
  • Huang et al. (2022): Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agent.s arXiv:2201.07207

  • Jin et al. (2022): Full RGB Just Noticeable Difference (JND) Modelling. arXiv:2203.00629

  • Lämsä et al. (2022): Video2IMU: Realistic IMU features and signals from videos. arXiv:2202.06547
  • Li et al. (2022): Brain-inspired Multilayer Perceptron with Spiking Neurons. arXiv:2203.14679
  • Li et al. (2022): SimpleTrack: Rethinking and Improving the JDE Approach for Multi-Object Tracking. arXiv:2203.03985
  • Liu et al. (2022): A ConvNet for the 2020s. arXiv:2201.03545

  • Manita et al. (2022): Universal Approximation in Dropout Neural Networks . JMLR 23. pdf

  • Roros et al. (2022): maskGRU: Tracking Small Objects in the Presence of Large Background Motions. arXiv:2201.00467

  • Wang et al. (2022): OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework. arXiv:2202.03052

  • Yang et al. (2022): Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer. arXiv:2203.03466
  • Yu et al. (2022): HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network. arXiv:2203.10699

  • Zhou et al. (2022): TransVOD: End-to-end Video Object Detection with Spatial-Temporal Transformers. arXiv:2201.05047

Q4/2021

  • Chen and Shrivastava (2021): HR-RCNN: Hierarchical Relational Reasoning for Object Detection. arXiv:2110.13892

  • Datta and Beerel (2021): Can Deep Neural Networks be Converted to Ultra Low-Latency Spiking Neural Networks?. arXiv:2112.12133
  • Du et al. (2021): Learning Signal-Agnostic Manifolds of Neural Fields. arXiv:2111.06387

  • Eichenberg et al. (2021): MAGMA – Multimodal Augmentation of Generative Models through Adapter-based Finetuning. arXiv:2112.05253

  • Kirby et al. (2021): Reliability of Event Timing in Silicon Neurons. arXiv:2112.14134
  • Koutini et al. (2021): Efficient Training of Audio Transformers with Patchout. arXiv:2110.05069
  • Kovachki et al. (2021): On Universal Approximation and Error Bounds for Fourier Neural Operators . JMLR 22(290)

  • Laakom et al. (2021): Learning to ignore: rethinking attention in CNNs. arXiv:2111.05684

  • Vinci et al. (2021): Self-consistent stochastic dynamics for finite-size networks of spiking neurons. arXiv:2112.14867

  • Yuan et al. (2021): Florence: A New Foundation Model for Computer Vision. arXiv:2111.11432

Q3/2021

  • Chae et al. (2021): SiamEvent: Event-based Object Tracking via Edge-aware Similarity Learning with Siamese Networks. arXiv:2109.13456

  • Guo et al. (2021): Eyes Tell All: Irregular Pupil Shapes Reveal GAN-generated Faces. arXiv:2109.00162

  • He et al. (2021): Integrating Circle Kernels into Convolutional Neural Networks. arXiv:2107.02451

  • Keller & Welling (2021): Topographic VAEs learn Equivariant Capsules. arXiv:2109.01394

  • Liu et al. (2021): Infrared Small-Dim Target Detection with Transformer under Complex Backgrounds. arXiv:2109.14379

  • Machado et al. (2021): HSMD: An object motion detection algorithm using a Hybrid Spiking Neural Network Architecture. arXiv:2109.04119

  • Park et al. (2021): Is Pseudo-Lidar needed for Monocular 3D Object detection? arXiv:2108.06417
  • Peng et al. (2021): Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation. arXiv:2109.12484

  • Shi et al. (2021): Reinforcement Learning with Evolutionary Trajectory Generator: A General Approach for Quadrupedal Locomotion. arXiv:2109.06409

  • Yao et al. (2021): Temporal-wise Attention Spiking Neural Networks for Event Streams Classification. arXiv:2107.11711

  • Zhao & Cheng (2021): Capsule networks with non-iterative cluster routing. arXiv:2109.09213
  • Zheng & Zhang (2021): RockGPT: Reconstructing three-dimensional digital rocks from single two-dimensional slice from the perspective of video generation. arXiv:2108.03132

Q2/2021

  • Bonnaerens et al. (2021): Anchor Pruning for Object Detection. arXiv:2104.00432
  • Bykov et al. (2021): NoiseGrad: enhancing explanations by introducing stochasticity to model weights. arXiv:2106.10185

  • Chakraborty et al. (2021): A Fully Spiking Hybrid Neural Network for Energy-Efficient Object Detection. arXiv:2104.10719
  • Chen et al. (2021): How to Accelerate Capsule Convolutions in Capsule Networks. arXiv:2104.02621
  • Chen et al. (2021): “BNN - BN = ?”: Training Binary Neural Networks without Batch Normalization. arXiv:2104.08215

  • Liu et al. (2021): Pay Attention to MLPs. arXiv:2105.08050
  • Liu et al. (2021): Video Swin Transformer. arXiv:2106.13230

  • Ney et al. (2021): HALF: Holistic Auto Machine Learning for FPGAs. arXiv:2106.14771

  • Wu et al. (2021): Poisoning the Search Space in Neural Architecture Search. arXiv:2106.14406

  • Xiao et al. (2021): Early Convolutions Help Transformers See Better. arXiv:2106.14881

  • Zhang et al. (2021): Hallucination Improves Few-Shot Object Detection. arXiv:2105.01294
  • Zhao et al. (2021): Neko: a Library for Exploring Neuromorphic Learning Rules. arXiv:2105.00324
  • Zhao et al. (2021): TrTr: Visual Tracking with Transformer. arXiv:2105.03817

Q1/2021

  • Ding et al. (2021): Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges. arXiv:2102.12219

  • Han et al. (2021): ReDet: A Rotation-equivariant Detector for Aerial Object Detection. arXiv:2103.07733

  • Jaegle et al. (2021): Perceiver: General Perception with Iterative Attention. arXiv:2103.03206
  • Joseph et al. (2021): Towards Open World Object Detection. arXiv:2103.02603

  • Lee et al. (2021): Detecting Micro Fractures with X-ray Computed Tomography. arXiv:2103.12821
  • Li et al. (2021): Involution: Inverting the Inherence of Convolution for Visual Recognition. arXiv:2103.06255
  • Liu et al (2021): Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv:2103.14030

  • Mazzia et al. (2021): Efficient-CapsNet: Capsule Network with Self-Attention Routing. arXiv:2101.12491

  • Northcutt et al. (2021): Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks. arXiv:2103.14749

  • Ren et al. (2021): Deep Texture-Aware Features for Camouflaged Object Detection. arXiv:2102.02996
  • Runkel et al. (2021): Depthwise Separable Convolutions Allow for Fast and Memory-Efficient Spectral Normalization. arXiv:2102.06496

  • Titirsha et al. (2021): Endurance-Aware Mapping of Spiking Neural Networks to Neuromorphic Hardware. arXiv:2103.05707
  • Tuggener at al. (2021): Is it Enough to Optimize CNN Architectures on ImageNet?. arXiv:2103.09108

  • Zhou et al. (2021): Probabilistic two-stage detection. arXiv:2103.07461

Q4/2020

  • Awad et al. (2020): Differential Evolution for Neural Architecture Search. arXiv:2012.06400

  • Chen et al. (2020): A Group-Theoretic Framework for Data Augmentation. JMLR 21(245): 1-71

  • Gerg and Monga (2020): Deep Autofocus for Synthetic Aperture Sonar. arXiv:2010.15687

  • Hu et al. (2020): Multi-objective Neural Architecture Search with Almost No Training. arXiv:2011.13591

  • Kedziora et al. (2020): AutonoML: Towards an Integrated Framework for Autonomous Machine Learning. arXiv:2012.12600
  • Keller et al. (2020): Self Normalizing Flows. arXiv:2011.07248
  • Kileel et al. (2020): Manifold learning with arbitrary norms. arXiv:2012.14172

  • Li and Jordan (2020): Stochastic Approximation for Online Tensorial Independent Component Analysis. arXiv:2012.14415
  • Li et al. (2020): Underwater image filtering: methods, datasets and evaluation. arXiv:2012.12258
  • Lindauer and Hutter (2020): Best Practices for Scientific Research on Neural Architecture Search. JMLR 21(243): 1-18
  • Liu et al. (2020): YolactEdge: Real-time Instance Segmentation on the Edge (Jetson AGX Xavier: 30 FPS, RTX 2080 Ti: 170 FPS). arXiv:2012.12259
  • Luo and Jennings (2020): A Differential Privacy Mechanism that Accounts for Network Effects for Crowdsourcing Systems. JAIR 69, 1127-1164. doi: 10.1613/jair.1.12158

  • Neekhara et al. (2020): Adversarial Threats to DeepFake Detection: A Practical Perspective. arXiv:2011.09957

  • Pang et al. (2020): TROJANZOO: Everything you ever wanted to know about neural backdoors (but were afraid to ask). arXiv:2012.09302

  • Rock et al. (2020): Quantized Neural Networks for Radar Interference Mitigation. 2011.12706

  • Salman et al. (2020): Unadversarial Examples: Designing Objects for Robust Vision. arXiv:2012.12235
  • Schrittwieser et al. (2020): Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588: 604-609. doi:10.1038/s41586-020-03051-4
  • Sheeny (2020): All-Weather Object Recognition Using Radar and Infrared Sensing. arXiv:2010.16285
  • Shen et al. (2020): DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation. arXiv:2011.09876
  • Sushko et al. (2020): You Only Need Adversarial Supervision for Semantic Image Synthesis. arXiv:2012.04781
  • Sun et al. (2020): Extreme Value Preserving Networks. arXiv:2011.08367
  • Sun et al. (2020): Identifying Invariant Texture Violation for Robust Deepfake Detection. arXiv:2012.10580
  • Svendsen et al. (2020): Deep Gaussian Processes for geophysical parameter retrieval. arXiv:2012.12099

  • Wandl et al. (2020): Fast Fluid Simulations in 3D with Physics-Informed Deep Learning. arXiv:2012.11893
  • Weston et al. (2020): There and Back Again: Learning to Simulate Radar Data for Real-World Applications. arXiv:2011.14389

  • Xie et al. (2020): Skillearn: Machine Learning Inspired by Humans’ Learning Skills. arXiv:2012.04863

  • Yu et al (2020): HMFlow: Hybrid Matching Optical Flow Network for Small and Fast-Moving Objects. arXiv:2011.09654
  • Yue et al (2020): Effective, Efficient and Robust Neural Architecture Search. arXiv:2011.09820

  • Zhang et al. (2020): FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations. arXiv:2012.12206
  • Zhu et al. (2020): Integrating Deep Neural Networks with Full-waveform Inversion: Reparametrization, Regularization, and Uncertainty Quantification. arXiv:2012.11149

Q3/2020

  • Agrawal et al. (2020): Wide Neural Networks with Bottlenecks are Deep Gaussian Processes. JMLR 21 (175)

  • Bonald et al. (2020): Scikit-network: Graph Analysis in Python. JMLR 21(185)

  • Chen et al (2020): Learning Deep ReLU Networks Is Fixed-Parameter Tractable. arXiv:2009.13512
  • Chen et al. (2020): WaveGrad: Estimating Gradients for Waveform Generation. arXiv:2009.00713

  • Davies et al. (2020): Overfit Neural Networks as a Compact Shape Representation. arXiv:2009.09808

  • Feurer et al. (2020): Auto-Sklearn 2.0: The Next Generation. arXiv:2007.04074
  • Fuchs and Pernkopf (2020): Wasserstein Routed Capsule Networks. arXiv:2007.11465

  • Guo et al. (2020): Variational Temporal Deep Generative Model for Radar HRRP Target Recognition. arXiv:2009.13011

  • Kidger et al. (2020): “Hey, that’s not an ODE”: Faster ODE Adjoints with 12 Lines of Code. arXiv:2009.09457

  • Long et al. (2020): PP-YOLO: An Effective and Efficient Implementation of Object Detector. arXiv:2007.12099

  • Morrill et al. (2020): Neural CDEs for Long Time-Series via the Log-ODE Method. arXiv:2009.08295

  • Nguyen et al (2020): Quaternion Graph Neural Networks. arXiv:2008.05089

  • Obukhov et al. (2020): T-Basis: a Compact Representation for Neural Networks. arXiv:2007.06631

  • Perot et al. (2020): Learning to Detect Objects with a 1 Megapixel Event Camera. arXiv:2009.13436

  • Reuther et al. (2020): Survey of Machine Learning Accelerators. arXiv:2009.00993

  • Shen and Savvides (2020): MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks. arXiv:2009.08453

  • Tek et al. (2020): Adaptive Convolution Kernel for Artificial Neural Networks. arXiv:2009.06385

  • Wunderlich and Pehle (2020): EventProp: Backpropagation for Exact Gradients in Spiking Neural Networks. arXiv:2009.08378

  • Xiang et al. (2020): KIT MOMA: A Mobile Machines Dataset. arXiv:2007.04198

Q2/2020

  • Ahmed et al. (2020): Reinforcement Learning based Beamforming for Massive MIMO Radar Multi-target Detection. arXiv:2005.04708

  • Brown et al. (2020): Language Models are Few-Shot Learners. arXiv:2005.14165

  • Carion et al. (2020): End-to-End Object Detection with Transformers. arXiv:2005.12872
  • Cheng et al. (2020): Detecting and Tracking Communal Bird Roosts in Weather Radar Data. arXiv:2004.12819
  • Cui et al. (2020): Fully Convolutional Online Tracking. arXiv:2004.07109

  • Dogra and Redman (2020): Optimizing Neural Networks via Koopman Operator Theory. arXiv:2006.02361

  • Geirhos et al. (2020): Shortcut Learning in Deep Neural Networks. arXiv:2004.07780

  • Hernandex and Brown (2020): Measuring the Algorithmic Efficiency of Neural Networks. arXiv:2005.04305
  • Huang et al. (2020): SQE: a Self Quality Evaluation Metric for Parameters Optimization in Multi-Object Tracking. arXiv:2004.07472
  • Hupkees et al. (2020): Compositionality Decomposed: How do Neural Networks Generalise? . JAIR (67), 757 - 795. doi:10.1613/jair.1.11674

  • Lee et al. (2020): Continual Learning with Extended Kronecker-factored Approximate Curvature. arXiv:2004.07507
  • Lelekas et al. (2020): Top-Down Networks: A coarse-to-fine reimagination of CNNs. arXiv:2004.07629
  • Li et al. (2020): SmallBigNet: Integrating Core and Contextual Views for Video Classification. arXiv:2006.14582

  • Marchisio et al. (2020): Q-CapsNets: A Specialized Framework for Quantizing Capsule Networks. arXiv:2004.07116
  • Marvasti-Zadeh et al. (2020): COMET: Context-Aware IoU-Guided Network for Small Object Tracking. arXiv:2006.02597
  • Mobiny et al. (2020): Radiologist-Level COVID-19 Detection Using CT Scans with Detail-Oriented Capsule Networks. arXiv:2004.07407

  • Palffy et al. (2020): CNN based Road User Detection using the 3D Radar Cube. arXiv:2004.12165
  • Park et al. (2020): Variational Bayes In Private Settings (VIPS). JAIR (68), 109 -157. doi:10.1613/jair.1.11763

  • Quaknine et al. (2020): CARRADA Dataset: Camera and Automotive Radar with Range-Angle-Doppler Annotations. arXiv:2005.01456
  • Qui et al. (2020): Quaternion Neural Networks for Multi-channel Distant Speech Recognition. arXiv:2005.08566

  • Scheiner et al. (2020): Off-the-shelf sensor vs. experimental radar – How much resolution is necessary in automotive radar classification?. arXiv:2006.05485
  • Shuai et al. (2020): Multi-Object Tracking with Siamese Track-RCNN. arXiv:2004.07786
  • Sitzmann et al. (2020): Implicit Neural Representations with Periodic Activation Functions. 2006.09661

  • Thornton et al. (2020): Deep Reinforcement Learning Control for Radar Detection and Tracking in Congested Spectral Environments. arXiv:2006.13173
  • Toyer et al. (2020): ASNets: Deep Learning for Generalised Planning. JARI (68), 1 - 68. doi:10.1613/jair.1.11633

  • Valery et al. (2020): Self-Supervised training for blind multi-frame video denoising. arXiv:2004.06957

  • Wang et al. (2020): Residual-driven Fuzzy C-Means Clustering for Image Segmentation. arXiv:2004.07160
  • Wiedemann et al. (2020): Dithered backprop: A sparse and quantized backpropagation algorithm for more efficient deep neural network training. arXiv:2004.04729

  • Zhang et al. (2020): Ocean: Object-aware Anchor-free Tracking. arXiv2006.10721
  • Zhao et al. (2020): TSDM: Tracking by SiamRPN++ with a Depth-refiner and a Mask-generator. arXiv:2005.04063

Q1/2020

  • Arias-Castro et al. (2020): Perturbation Bounds for Procrustes, Classical Scaling, and Trilateration, with Applications to Manifold Learning . JMLR 21

  • Blondel et al. (2020): Learning with Fenchel-Young losses. JMLR 21(35):1-69

  • Danelljan et al. (2020): Probabilistic Regression for Visual Tracking. arXiv:2003.12565
  • Deng et al. (2020): Self-attention-based BiGRU and capsule network for named entity recognition. arXiv:2002.00735

  • Edraki et al. (2020): Subspace Capsule Network. arXiv:2002.02924v1

  • Hadjeres and Nielsen (2020): Schoenberg-Rao distances: Entropy-based and geometry-aware statistical Hilbert distances. arXiv:2002.08345

  • Jia et al. (2020): Entangled Watermarks as a Defense against Model Extraction. arXiv:2002.12200

  • Kadeethum et al. (2020): Physics-informed Neural Networks for Solving Nonlinear Diffusivity and Biot’s equations. arXiv:2002.08235

  • Liu et al. (2020): Are Labels Necessary for Neural Architecture Search? arXiv:2003.12056

  • Manchev and Spratling (2020): Target Propagation in Recurrent Neural Networks . JMLR 21(7):1−33.
  • Molnar and Culurciello et al. (2020): Capsule Network Performance with Autonomous Navigation. arXiv:2002.03181v1

  • Punjabi et al. (2020): Examining the Benefits of Capsule Neural Networks. arXiv:2001.10964

  • Radosavovic et al. (2020): Designing Network Design Spaces. arXiv:2003.13678
  • Rogers et al. (2020): A Primer in BERTology: What we know about how BERT works. arXiv:2002.12327
  • Romero et al. (2020): Attentive Group Equivariant Convolutional Networks. arXiv:2002.03830
  • Ruby et al. (2020: The Mertens Unrolled Network (MU-Net): A High Dynamic Range Fusion Neural Network for Through the Windshield Driver Recognition. arXiv:2002.12257

  • Schmitt et al. (2020): Weakly Supervised Semantic Segmentation of Satellite Images for Land Cover Mapping – Challenges and Opportunities. arXiv:2002.08254v1

  • Vecchi et al. (2020): Compressing deep quaternion neural networks with targeted regularization. arXiv:1907.11546v2

  • Tang et al. (2020): RSL-Net: Localising in Satellite Images From a Radar on the Ground. arXiv:2001.03233
  • Thornton et al. (2020): Experimental Analysis of Reinforcement Learning Techniques for Spectrum Sharing Radar. arXiv:2001.01799
  • Tsai et al. (2020): Capsules with Inverted Dot-Product Attention Routing. ICLR 2020

  • Wang et al. (2020): Multi-wavelet residual dense convolutional neural network for image denoising. arXiv:2002.08254

  • Yoo and Owhadi (2020): Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows. arXiv:2002.08335

Q4/2019

  • Dovesi et al. (2019): Real-Time Semantic Stereo Matching. arXiv:1910.00541

  • Gu and Tresp (2019): Improving the Robustness of Capsule Networks to Image Affine Transformations. arXiv:1911.0796

  • Hoogi et al. (2019): Self-Attention Capsule Networks for Object Classification. arXiv:1904.12483
  • Hwang et al. (2019): SegSort: Segmentation by Discriminative Sorting of Segments. arXiv:1910.0696

  • Jegorova et al. (2019): Full-Scale Continuous Synthetic Sonar Data Generation with Markov Conditional Generative Adversarial Networks. arXiv:1910.06750

  • Liu et al. (2019): GPRInvNet: Deep Learning-Based Ground Penetrating Radar Data Inversion for Tunnel Lining. arXiv:1912.05759

  • Nguyen et al. (2019): Use of a Capsule Network to Detect Fake Images and Videos. arXiv:1910.12467

  • Scheiner et al. (2019): Seeing Around Street Corners: Non-Line-of-Sight Detection and Tracking In-the-Wild Using Doppler Radar. arXiv:1912.06613

  • Varadarajan et al. (2019): Benchmark for Generic Product Detection: A strong baseline for Dense Object Detection. arXiv:1912.09476

  • Wang et al. (2019): CSPNet: A New Backbone that can Enhance Learning Capability of CNN. arXiv:1911.11929
  • Weissman et al. (2019): JackHammer: Efficient Rowhammer on Heterogeneous FPGA-CPU Platform. arXiv:1912.11523

  • Zhang et al. (2019): 3D-Rotation-Equivariant Quaternion Neural Networks. arXiv:1911.09040
  • Zhao et al. (2019): Quaternion Equivariant Capsule Networks for 3D Point Clouds. arXiv:1912.12098

Q3/2019

  • Andraghetti et al. (2019): Enhancing self-supervised monocular depth estimation with traditional visual odometry. arXiv:1908.03127

  • Caliva et al. (2019): Distance Map Loss Penalty Term for Semantic Segmentation. arXiv:1908.03679
  • Chen et al. (2019): Fast Point R-CNN. arXiv:1908.02990
  • Choi et al. (2019): Attention routing between capsules. arXiv:1907.01750

  • Duggal et al. (2019): DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch. arXiv:1909.05845

  • Garnier et al. (2019): A review on Deep Reinforcement Learning for Fluid Mechanics. arXiv:1908.04127
  • Gong et al. (2019): AutoGAN: Neural Architecture Search for Generative Adversarial Networks. arXiv:1908.03835

  • He et al. (2019): Constructing an Associative Memory System Using Spiking Neural Network. Front. Neurosci., DOI:10.3389/fnins.2019.00650
  • Huegle et al. (2019): Dynamic Input for Deep Reinforcement Learning in Autonomous Driving. arXiv:1907.10994

  • Kim and Ganapathi (2019): LumièreNet: Lecture Video Synthesis from Audio. arXiv:1907.02253
  • Kulhánek et al. (2019): Vision-based Navigation Using Deep Reinforcement Learning. arXiv:1908.03627

  • Lee et al. (2019): On-Device Neural Net Inference with Mobile GPUs. arXiv:1907.01989
  • Li et al. (2019): Deformable Tube Network for Action Detection in Videos. arXiv:1907.01847
  • Li et al. (2019): Overfitting of neural nets under class imbalance: Analysis and improvements for segmentation. arXiv:1907.10982
  • Li et al. (2019): Differentially Private Meta-Learning. arXiv:1909.05830
  • Liu et al. (2019): On the Variance of the Adaptive Learning Rate and Beyond. arXiv:1908.03265

  • Misra et al. (2019): Mish: A Self Regularized Non-Monotonic Neural Activation Function. arXiv:1908.08681

  • Qin et al. (2019: Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions. arXiv:1907.02957

  • Soures and Kudithipudi (2019): Deep Liquid State Machines With Neural Plasticity for Video Activity Recognition. Front. Neurosci., DOI:10.3389/fnins.2019.00686

  • Wang and Shen (2019): Flow-Motion and Depth Network for Monocular Stereo and Beyond. arXiv:1909.05452

  • You et al. (2019): Tracking system of Mine Patrol Robot for Low Illumination Environment. arXiv:1907.01806
  • You et al. (2019): Gate Decorator: Global Filter Pruning Method for Accelerating Deep Convolutional Neural Networks. arXiv:1909.08174

  • Zhao et al. (2019): UER: An Open-Source Toolkit for Pre-training Models. arXiv:1909.05658
  • Zhang et al. (2019): Lookahead Optimizer: k steps forward, 1 step back. arXiv:1907.08610
  • Zhang et al. (2019): SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications. arXiv:1907.11093
  • Zhou et al (2019): One-stage Shape Instantiation from a Single 2D Image to 3D Point Cloud. arXiv:1907.10763

Q2/2019

  • Alekseev and Bobe (2019): GaborNet: Gabor filters with learnable parameters in deep convolutional neural networks. arXiv:1904.13204
  • Ardila et al. (2019): End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. doi:10.1038/s41591-019-0447-x

  • Bai et al. (2019): Deep Learning Based Robot for Automatically Picking up Garbage on the Grass. arXiv:1904.13034
  • Balog et al. (2019): Fast Training of Sparse Graph Neural Networks on Dense Hardware. arXiv:1906.11786
  • Becker et al. (2019): Deep Optimal Stopping. Journal of Machine Learning Research 20 (2019) 1-25
  • Berner et al. (2019): How degenerate is the parametrization of neural networks with the ReLU activation function? arXiv:1905.09803
  • Brandt J. (2019): Spatio-temporal crop classification of low-resolution satellite imagery with capsule layers and distributed attention. arXiv:1904.10130

  • Danzer et al. (2019): 2D Car Detection in Radar Data with PointNets. arXiv:1904.08414
  • Drori et al. (2019): Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar. arXiv:1905.10345

  • Eggensperger (2019): Pitfalls and Best Practices in Algorithm Configuration. Journal of Artificial Intelligence Research 64 (2019) 861-893

  • Harikrishnan and Nagaraj (2019): A Novel Chaos Theory Inspired Neural Architecture. arXiv:1905.12601
  • Hoogi et al. (2019): Self-Attention Capsule Networks for Image Classification. arXiv:1904.12483
  • Hu et al. (2019): Optimal Sparse Decision Trees. arXiv:1904.12847
  • Hughes et al. (2019): Wave Physics as an Analog Recurrent Neural Network. arXiv:1904.12831

  • Jia et al. (2019): Direct speech-to-speech translation with a sequence-to-sequence model. arXiv:1904.06037

  • Klemmer et al. (2019): Augmenting correlation structures in spatial data using deep generative models. arXiv:1905.09796
  • Kosiorek et al. (2019): Stacked Capsule Autoencoders. arXiv:1906.06818

  • Leite and Enembreck (2019): Using Collective Behavior of Coupled Oscillators for Solving DCOP. Journal of Artificial Intelligence Research 64 (2019) 987-1023
  • Li (2019): Graph Matching Networks for Learning the Similarity of Graph Structured Objects. arXiv:1904.12787

  • Nguyen and Holmes (2019): Ten quick tips for effective dimensionality reduction. PLoS Comput Biol 15(6): e1006907. DOI:10.1371/journal.pcbi.1006907

  • Oh et al. (2019): Speech2Face: Learning the Face Behind a Voice. arXiv:1905.09773

  • Park et al. (2019): SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. arXiv:1904.08779

  • Rajasegaran et al. (2019): DeepCaps: Going Deeper with Capsule Networks. arXiv:1904.09546

  • Sanyal et al. (2019): Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision. arXiv:1905.06817
  • Sherry et al. (2019): Learning the Sampling Pattern for MRI. arXiv:1906.08754
  • Shin (2019): Encoding Database Schemas with Relation-Aware Self-Attention for Text-to-SQL Parsers. arXiv:11790
  • Sun et al. (2019): GeoCapsNet: Aerial to Ground view Image Geo-localization using Capsule Network. arXiv:1904.06281

  • Thomas et al. (2019): DeLiO: Decoupled LiDAR Odometry. arXiv:1904.12667

  • Valade et al. (2019): Towards Global Volcano Monitoring Using Multisensor Sentinel Missions and Artificial Intelligence: The MOUNTS Monitoring System. DOI:10.3390/rs11131528

  • Wang et al. (2019): Monocular Plan View Networks for Autonomous Driving. arXiv: 1905.06937

  • Zhang (2019): Making Convolutional Networks Shift-Invariant Again. arXiv:1904.11486
  • Zhang et al. (2019): Quaternion Knowledge Graph Embedding. arXiv:1904.10281
  • Zhang et al. (2019): You Only Propagate Once: Accelerate Adversarial Training via Maximal Principle. arXiv:1905.00877
  • Zhao et al. (2019): Fast Inference in Capsule Networks Using Accumulated Routing Coefficients. arXiv:1904.07304
  • Zhao et al. (2019): PyOD: A Python Toolbox for Scalable Outlier Detection. JMLR 20(96):1−7. http://jmlr.org/papers/v20/19-011.html
  • Zhu et al. (2019): Transferable Clean-Label Poisoning Attacks on Deep Neural Nets. arXiv:1905.05897

Q1/2019

  • Barz and Denzler (2019): Deep Learning on Small Datasets without Pre-Training using Cosine Loss. arXiv:1901.09054v1

  • Cheng, S. et al. (2019): MeshGAN: Non-linear 3D Morphable Models of Faces. arXiv:1903.10384

  • Duarte, A. et al. (2019): Wav2Pix: Speech-conditioned Face Generation using Generative Adversarial Networks. arXiv:1903.10195

  • Elser, V. et al. (2019): Monotone Learning with Rectified Wire Networks. Journal of Machine Learning Research (20), 1 - 42. link

  • Fey, M. and Lenssen, J. E. (2019): Fast Graph Representation Learning with PyTorch Geometric. arXiv:1903.02428
  • Francis, A. et al. (2019): Long-Range Indoor Navigation with PRM-RL. arXiv:1902.09458

  • Ge et al. (2019): DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images. arXiv:1901.07973v1

  • Hawkins et al. (2019): A Framework for Intelligence and Cortical Function Based on Grid Cells in the Neocortex. doi:10.3389/fncir.2018.00121

  • Kreiss et al. (2019): PifPaf: Composite Fields for Human Pose Estimation. arXiv:1903.06593

  • Li et al. (2019): Rethinking on Multi-Stage Networks for Human Pose Estimation. arXiv:1901.00148

  • Mirsky, Y. et al. (2019): CT-GAN: Malicious Tampering of 3D Medical Imagery using Deep Learning. arXiv:1901.03597

  • Sonoda, S. and Murata, N. (2019): Transport Analysis of Infinitely Deep Neural Network. Journal of Machine Learning Research (20), 1-52. link
  • Sun, K. et al (2019): Deep High-Resolution Representation Learning for Human Pose Estimation. arXiv:1902.09212

  • Tang, Z. and Hwang, J.-N. (2019): MOANA: An Online Learned Adaptive Appearance Model for Robust Multiple Object Tracking in 3D. arXiv:1901.02626

  • Voigtlaender et al. (2019): FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation. arXiv:1902.09513

  • Wofk, D. et al. (2019): FastDepth: Fast Monocular Depth Estimation on Embedded Systems. arXiv:1903.03273
  • Wu et al. (2019): Simplifying Graph Convolutional Networks. arXiv:1902.07153

  • Xinyi, Z. and Chen, L. (2019): Capsule Graph Neural Network. ICLR 2019. link
  • Xu, B. et al. (2019): Graph Wavelet Neural Network. ICLR 2019. link

Q4/2018