Раздел посвящен методам распознавания, анализа и преобразования изображений, речи и других образов данных. Выберите подраздел для более точной классификации.


289 публикаций

Нажмите рядом со статьёй — скопируете ссылку для списка литературы по ГОСТ.

Patient-Level Diagnosis of Acute Myeloid Leukemia via Deep Learning Analysis of Bone Marrow Smear
DD-INR: Dynamics-Driven Implicit Neural Representation for Accelerated Whole-Brain Functional MRI Reconstruction
ZODS-RS -- Zero-training Oriented Detection & Segmentation for Remote Sensing
Spatially Selective Self-Training for Unsupervised Building Change Detection
Depth from Dual Differential Defocus and Stereo Consensus
AtlasGS: Brain MRI Spatial Resolution Harmonization With Shared Gaussian Geometry
DexPIE: Stable Dexterous Policy Improvement from Real-World Experience
ATN3D: Density-Aware LiDAR-Radar Early 3D Object Detection Under Extreme Sparsity
CineDance: Towards Next-Generation Multi-Shot Long-Form Cinematic Audio-Video Generation
MAVIS: Multi-Agent Video Retrieval via Structured Video Understanding
X-Palm: Paired Multispectral-to-Smartphone Dataset for Cross-Domain Palmprint Authentication
Active Source-free Domain Adaptation in Open-set Medical Image Segmentation via Decomposed Uncertainty and Prototype Discrepancy
Dynamic XR Rendering Offloading Based on Feature-Based Quality Assessment
Vendor-agnostic 4D Phase Contrast MRI: a complete open-source pipeline for velocities, displacement, and strain analysis
EgoTactile: Learning Grasp Pressure for Everyday Objects from Egocentric Video
Proposal Refinement for Few-Shot Object Detection
SOMA: From Surface Observations to Muscle Anatomy
Temporal-Aware Reasoning Optimization for Video Temporal Grounding
RFDT-Channel: RGB-LiDAR-Based RF Digital Twin Scene Construction for 28 GHz Indoor Ray-Tracing Channel Simulation
ResNet-34 with Lightweight Decoder for Accurate and Efficient Segmentation of Fetal Brain MRI
Leaf Spectral Reflectance Prediction Using Multi-Head Attention Neural Networks
Training Strategies for Vision Transformers for Object Detection
Mathematical framework for perception-driven parameter choice in image denoising
A unified deeplearning framework for contrast-phase-specific virtual monochromatic imaging
Detecting Temporally Localized Manipulations in Authentic Video Streams
GP-Adapter: Gaussian Process CLIP-Adapter for Few-Shot Out-of-Distribution Detection
Advanced Flood Prediction with Physics-Guided Deep Learning: Combining UNet, FNO, and SAR/Optical Imagery
Co-optimization of Diffusive and Tomographic Blur in Computed Axial Lithography via Experimental Kernel Identification
Prospective Dynamic 3D MRI Reconstruction via Latent-Space Motion Tracking from Single Measurement
FUSE-Flow: A Decoupled Framework for Calibration and Stateless Real-Time Multi-View Point Cloud Fusion
Closed-Form Spectral Regularization for Multi-Task Model Merging
CULTURESCORE: Evaluating Cultural Faithfulness in Video Generation Models
AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization
Varifold Moment Invariants for Sustainable and Explainable Contour Feature Extraction
Differences in Detection: Explainability Where it Matters
Streaming Video Generation with Streaming Force Control
MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism
UniSHARP: Universal Sharp Monocular View Synthesis
Prospective evaluation of multimodal respiratory failure prediction: Do chest X-rays improve performance beyond EHR signals?
FM-fMRI: Event Conditioned Flow Matching for Rest-to-Task fMRI Time-Series Synthesis
Measuring Prediction Uncertainty in Neural Cellular Automata
In-Context Multiple Instance Learning
Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators
Complexity-Balanced Diffusion Splitting
PAR3D: A Unified 3D-MLLM with Part-Aware Representation for Scene Understanding
Boosting Image Quality Assessment Performance: Unsupervised Score Fusion by Deep Maximum a Posteriori Estimation
3D-GlioPREDICT: 3D Latent Diffusion for Post-Radiotherapy Brain MRI Prediction in Patients with Glioma
Anti-Hyperspectral Anomaly Detection: A First Study on Stealthy Lipschitz-Forcing Perturbations Against Unknown Detectors
A turbo-inference strategy for object detection and instance segmentation
SC-MFJ: A Simple Haptic Quality Metric for Medical Image Segmentation
Symb-xMIL: Symbolic Explanations for Multiple Instance Learning in Digital Pathology
SAM-Flow: Source-Anchored Masked Flow for Training-Free Image Editing
Next-Generation Parallel Decoder for LPDR: Architectural Optimization and Class-Balanced GAN-Augmentation
Emotion-Aware Image Generation from Korean Diary Text via LLM-based Prompt Translation and LoRA Fine-Tuning
Gender Artifacts from Art History to Text-to-Image Generation
Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models
Global-Local Monte Carlo Tree Search in Vision-Language Models for Text-to-3D Indoor Scene Generation
ReSAGE-PAR: Representational Similarity Assessment for Generative Expansion in Pedestrian Attribute Recognition
Texture-preserving implicit neural representation for Cone beam CT truncated reconstruction
LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing
Geometry Gaussians: Decoupling Appearance and Geometry in Gaussian Splatting
GeM-NR: Geometry-Aware Multi-View Editing for Nonrigid Scene Changes
An Open-Source Two-Stage Computer Vision Pipeline for Fine-Grained Vehicle Classification using Vision Transformers
Controllable Dynamic 3D Shape Generation via 3D Trajectories and Text
Source-free Domain Adaptation for Video Object Detection Under Adverse Image Conditions
Closing the Alignment-Maturity Gap in Federated Prototype Learning
Symmetry-Aware 9D Pose Estimation with Sim(3)-Consistent Feature and Spherical Inception Convolution
CORE-MTL: Rethinking Gradient Balancing via Causal Orthogonal Representations
Face Liveness Detection Using RGB and Thermal Image Fusion
Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization
LALE: Lightweight-Transformer Architecture for Land-Cover Estimation
Predicting the risk of colorectal anastomotic leak based on preoperative mapping of the blood supply of the bowel
Absorption and Phase-Contrast Microtomography Using Direct X-ray Detection With COTS CMOS Sensors
A Novel Computer Vision Approach for Assessing Fish Responses to Intrusive Objects in Aquaculture
Bounding Global and Local Compression Error of Signal Parameterizations
MORPHOS: Autoregressive 4D Generation with Temporal Structured Latents
GloResNet: A lightweight 3D CNN with global topological features for preterm brain injury prediction
Question-Aware Evidence Ledgers for Video Relational Reasoning
Not All Points Are Equal: Uncertainty-Aware 4D LiDAR Scene Synthesis
VEDAL: Variational Error-Driven Asynchronous Learning for 3D Gaussian Splatting Pruning
TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos
Multi-modal Video Representation Alignment for Robust Self-supervised Driver Distraction Detection
Do Multimodal Agents Really Benefit from Tool Use? A Systematic Study of Capability Gains
Do Synthetic Brain MRIs Reliably Improve Tumour Classification? A StyleGAN2-ADA Class-Plane Augmentation Study on BRISC 2025
STAMBRIDGE: Spectral-Temporal Amplitude-aware Mid-Feature Bridge for EEG Visual Decoding
LANCE: Locally Adaptive Neural Context Estimation for Overfitted Image Compression
Motion-Robust Deep Reconstruction for Free-Breathing Cardiac Cine MRI
E-ReCON: An Energy- and Resource-Efficient Precision-Configurable Sparse nvCIM Macro for Conventional and Spiking Neural Edge Inference
Parallel Context Modeling for Sliding Window Attention in Neural Video Coding
SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models
Linear Scaling Video VLMs for Long Video Understanding
Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models
Representation Forcing for Bottleneck-Free Unified Multimodal Models
GMENet: Generative Mixture of Experts Network for Multi-Center Glioma Diagnosis with Incomplete Imaging Sequences
DriveMA: Driving Vision-Language-Action Models with verifiable Meta-Actions
Topologically Consistent Multi-view 3D Head Reconstruction via Coarse-Guided Layered Surface Sampling
SAM for Robust Mitochondria Instance Segmentation in Fluorescence Microscopy
VolFill: Single-View Amodal 3D Scene Reconstruction with Volumetric Flow Matching
Enhancing Computer Vision Model Generalization in Warehouse Facilities: A Case Study on Anomaly Detection in Vertical Material Handling Systems
How can embedding models bind concepts?
Internalizing Temporal Consistency in Video Object-Centric Learning without Explicit Regularization
Parameter-Efficient CT Reconstruction via Deep Graph Laplacian Regularization
A Clinically Validated Foundation Model for Comprehensive Lung Pathology Interpretation
How Accurate are Video Quality Models for Diffusion-Based Video Super-Resolution?
NL-MambaXCT: Self-Supervised Nested-Learning Mamba for Nomex Honeycomb X-ray CT Defect Classification
Disambiguation in Unknown Object Detection by Integrating Image and Speech Recognition Confidences
Mesh-Aware Epipolar Matching for Multi-View Multi-Person 3D Pose Estimation in Basketball
SwInception -- Local Attention Meets Convolutions
EVL-ECG: Efficient ECG Interpretation With Multi-Aspect Heterogeneous Knowledge Distillation
Genetically Aligned Patient Representations Improve Hematological Diagnosis
NeuROK: Generative 4D Neural Object Kinematics
AdaState: Self-Evolving Anchors for Streaming Video Generation
VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion
GMOS: Grounding Moving Object Segmentation in 3D Space and Time
Large Depth Completion Model from Sparse Observations
SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation
PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding
CCS: Clinical Consensus Selection for Radiology Report Generation
Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning
How LoRA Remembers? A Parametric Memory Law for LLM Finetuning
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models
LoMo: Local Modality Substitution for Deeper Vision-Language Fusion
A Survey Of Free-Form Object Representation and Recognition Techniques
Deep Learning Strain Estimation: Is Physics-Based Simulation the Solution?
Janus-LoRA: A Balanced Low-Rank Adaptation for Continual Learning
DriveWAM: Video Generative Priors Enable Scalable World-Action Modeling for Autonomous Driving
GEM: Generative Supervision Helps Embodied Intelligence
Object recognition and segmentation in videos by connecting heterogeneous visual features
Saliency Sequential Surface Organization for Free-Form Object Recognition
7 Tesla Quantitative MRI and Machine Learning for Exploratory Motor Subtype Stratification and Diagnosis in Parkinson's Disease
Causal Evaluation of Contributing Factors to Urban Heat Island
A Signal Extraction Approach for Remote Heart Rate Variability Assessment Using Proxy Measure in a Driving Simulator
Leveraging pretrained RGB denoisers for hyperspectral image restoration
LV-OSD: Language-Vision-Complementary Open-Set Object Detection
EchoAvatar: Real-time Generative Avatar Animation from Audio Streams
EventShiftFlow: Towards Hardware-efficient FPGA-based Flow Estimation
Inpainting-Style Conditional Diffusion for Multivariable Time Series Forecasting
SAM-Enhanced Segmentation on Road Datasets: Balancing Critical Classes in Autonomous Driving
No Safe Dose: How Training Data Drives Unsafe Image Generation
A novel ordinal multi-view aggregation scheme for oak defoliation
Intra-YOLO: A Small Object Detection Model for Caries and Molar-Incisor Hypomineralization in Intraoral Photography Based on Transfer Learning with Reinforcement Learning
CodecCap: High-Fidelity Codec-Inspired Residual Modeling for Dense Video Captioning
On the Robustness of Machine Unlearning for Vision-Language Models
ChartAct: A Benchmark for Dynamic Chart Understanding
Timestep-Aware SVDQuant-GPTQ for W4A4 Quantization of Wan2.2-I2V
When Search Becomes Memory: Turning Robot Design Trials into Transferable Skills
SAM3-Assisted Training of Lightweight YOLO Models for Precision Pig Farming
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation
F-RNG: Feed-Forward Relightable Neural Gaussians
LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence
Towards 3D heart mesh generation using contactless radar imaging and physics-informed neural network
MAGIC: Multimodal Alignment & Grounding-aware Instruction Coreset for Vision-Language Models
Astronomical Image Data Reduction for Moving Object Detection
RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video
CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models
Revitalizing Dense Material Segmentation: Stabilized Vision Transformers and the Generalization Paradox
Evaluating the Effect of Compression on Video Temporal Consistency Using Objective Quality Metrics
Using a Digital Twin for Fringe Projection Profilometry Optimisation
Mixtac: A Novel Bio-Inspired Hybrid Tactile Sensor with Synergistic Event-Frame Perception
Dynamic MRI Reconstruction Via Dual Deep Priors and Low-Rank Plus Sparse Modeling
Spatio-Temporal Similarity Volume Aggregation for Open-Vocabulary Action Recognition
General Hazard Detection
Efficient Learned Image Compression without Entropy Coding
Enhancing Blood Cells Classification using Hybrid Quantum Neural Networks
GFSR: Geometric Fidelity and Spatial Refinement for Reliable Lane Detection
CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models
Decoupling Spatio-Temporal Adapter for Fine-Grained Badminton Action Localization
IPG-Net: Image Pyramid Guidance Network for Small Object Detection
Sketchable Histograms of Oriented Gradients for Object Detection
Rethinking image formats for computer vision: JPEG sRGB, linear, and log RGB; object detection and shadow removal
Robustness of breast lesion segmentation under MRI undersampling improves with k-space-aware deep learning
3D LULC classification using multispectral LiDAR and deep learning: current and prospective schemes
4D-GSW: Kinematic-Aware Spatio-Temporal Consistent Watermarking for 4D Gaussian Splatting
Identifying visual attributes for object recognition from text and taxonomy
MotiMotion: Motion-Controlled Video Generation with Visual Reasoning
Cambrian-P: Pose-Grounded Video Understanding
Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs
Computer Vision in Agriculture: Object Detection, Recognition, and Image Segmentation Techniques and Advanced Image Analysis
RDDM: A Residual-Driven Drifting Model for High-Fidelity Low-Dose CT Denoising
EchoSR: Efficient Context Harnessing for Lightweight Image Super-Resolution
LUMEN: Low-light Unified Multi-stage Enhancement Network using depth-guided flash, clustering, and attention-based Transformers
See Silhouettes in Motion with Neuromorphic Vision
SdcNet for object recognition
DIPA: Distilled Preconditioned Algorithms for Solving Imaging Inverse Problems
Learning Normalized Energy Models for Linear Inverse Problems
Dynamic resolution switching for live streaming
Text-RSIR: A Text-Guided Framework for Efficient Remote Sensing Image Transmission and Reconstruction
Disentangling Sampling from Training Budget in Class-Imbalanced CT Body Composition Segmentation
NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding
PEMark: Watermarking API Responses Based on Proxy Gateways and Position Encoding
Entropy-Guided Self-Supervised Learning for Medical Image Classification
Time-varying rPPG signal separation via block-sparse signal model
SegCompass: Exploring Interpretable Alignment with Sparse Autoencoders for Enhanced Reasoning Segmentation
Computer Vision Based Object Detection and Recognition System for Image Searching
Part-based deformable object detection with a single sketch
Towards Few-Annotation Learning in Computer Vision : Application to Image Classification and Object Detection tasks
AtomicMotion: Learning Human Motion From Different Human Parts
The Double Dilemma in Multi-Task Radiology Report Generation: A Gradient Dynamics Analysis and Solution
From Baseline to Follow-Up: Counterfactual Spine DXA Image Synthesis in UK Biobank Using a Causal Hierarchical Variational Autoencoder
What Does the Caption Really Say? Counterfactual Phrase Intervention for Compositional Data Selection in Vision-Language Pretraining
Matching with Deliberation: Test-Time Evolutionary Hierarchical Multi-Agents for Zero-Shot Compositional Image Retrieval
Supervised Classification Heads as Semantic Prototypes: Unlocking Vision-Language Alignment via Weight Recycling
Training-Free Fine-Grained Semantic Segmentations in Low Data Regimes: A FungiTastic Baseline
LACO: Adaptive Latent Communication for Collaborative Driving
Learning object motion patterns for anomaly detection and improved object detection
RGB-D Salient Object Detection: A Review
Efficient Object Detection and Segmentation for Fine-Grained Recognition
Reducing Object Hallucination in LVLMs via Emphasizing Image-negative Tokens
Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls
Deformba: Vision State Space Model with Adaptive State Fusion
Hyper-V2X: Hypernetworks for Estimating Epistemic and Aleatoric Uncertainty in Cooperative Bird's-Eye-View Semantic Segmentation
Object detection based on spatiotemporal background models
Fast features for time constrained object detection
Spectral gradients for color-based object recognition and indexing
Diffusion Graph Posterior Sampling for Nonlinear Inverse Problems with Application to Electrical Impedance Tomography
Set Shaping Theory as a Complementary Payload-Shaping Layer for Steganography
FGSVQA: Frequency-Guided Short-form Video Quality Assessment
Probability-Conserving Flow Guidance
Fast moving-object detection in H.264/AVC compressed domain for video surveillance
A framework for abandoned object detection from video surveillance
Adaptive object detection and recognition based on a feedback strategy
Physics-informed simulation framework for realistic sonar image generation and statistical validation
Physics-in-the-Loop: A Hybrid Agentic Architecture for Validated CAD Engineering Design
Efficient Long-Context Modeling in Diffusion Language Models via Block Approximate Sparse Attention
Tango3D: Towards Alignment for Global and Local 2D-3D Correspondence
CADENet: Condition-Adaptive Asynchronous Dual-Stream Enhancement Network for Adverse Weather Perception in Autonomous Driving
When Preference Labels Fall Short: Aligning Diffusion Models from Real Data
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding
A Framework for Evaluating Zero-Shot Image Generation in Concept-based Explainability
An object detection and recognition system for weld bead extraction from digital radiographs
Aurora: Unified Video Editing with a Tool-Using Agent
WavFlow: Audio Generation in Waveform Space
Can These Views Be One Scene? Evaluating Multiview 3D Consistency when 3D Foundation Models Hallucinate
Special issue on 3D representation for object and scene recognition
Finite asymmetric generalized Gaussian mixture models learning for infrared object detection
Computer vision object detection and image recognition algorithm optimization based on self-supervised learning
CATA: Continual Machine Unlearning via Conflict-Averse Task Arithmetic
ManiSoft: Towards Vision-Language Manipulation for Soft Continuum Robotics
CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark
SPIKE: An Adaptive Dual Controller Framework for Cost-Efficient Long-Horizon Game Agents
Object Recognition
Recent Progress on Object Classification and Detection
Object detection with vector quantized binary features
Automatic Representation and Classifier Optimization for Image-based Object Recognition
AWADA: Foreground-focused adversarial learning for cross-domain object detection
The MixCount Dataset: Bridging the Data Gap for Open-Vocabulary Object Counting
SENSE: Satellite-based ENergy Synthesis for Sustainable Environment
DanceHMR: Hand-Aware Whole-Body Human Mesh Recovery from Monocular Videos
TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning
Spatial Competition for Low-Complexity Learned Image Compression
Learning to Optimize Radiotherapy Plans via Fluence Maps Diffusion Model Generation and LSTM-based Optimization
An Underwater Dehazing Network with Implicit Transmission Estimation
Keyed Nonlinear Transform: Lightweight Privacy-Enhancing Feature Sharing for Medical Image Analysis
Application of Computer Vision Algorithms in Image Recognition and Object Detection
Boosting masked dominant orientation templates for efficient object detection
Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models
WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation
Deterministic Event-Graph Substrates as World Models for Counterfactual Reasoning
Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization
Evaluating a color-based active basis model for object recognition
Learning-Based Night-Vision Image Recognition and Object Detection
Research on electrical equipment fault detection by combining object detection and image segmentation algorithms
Introduction to the CVIU special issue on “Parts and Attributes: Mid-level representation for object recognition, scene classification and object detection”
Object recognition with uncertain geometry and uncertain part detection
Detection and matching of object using proposed signature
AI in Computer Vision: Image Processing, Object Detection, and Recognition Techniques
Object recognition using discriminative parts
VISUAL OBJECT RECOGNITION WITH IMAGE RETRIEVAL
Evaluation of Anatomical Shape Priors in Deep Learning-Based Cardiac Multi-Compartment Segmentation
3D Segmentation Using Viewpoint-Dependent Spatial Relationships
EntropyScan: Towards Model-level Backdoor Detection in LVLMs via Visual Attention Entropy
Semi-MedRef: Semi-Supervised Medical Referring Image Segmentation with Cross-Modal Alignment
Multiview feature distributions for object detection and continuous pose estimation
Discriminative Training for Object Recognition Using Image Patches
The Velocity Deficit: Initial Energy Injection for Flow Matching
HDRFace: Rethinking Face Restoration with High-Dimensional Representation
Learning Direct Control Policies with Flow Matching for Autonomous Driving
Multi-proposal Collaboration and Multi-task Training for Weakly-supervised Video Moment Retrieval
50 Years of object recognition: Directions forward
The 3dSOBS+ algorithm for moving object detection
Holistic object detection and image understanding
Pick-Object-Attack: Type-specific adversarial attack for object detection
Fast quality-guided phase unwrapping algorithm for 3D profilometry based on object image edge detection
Histogram of Radon Projections: A new descriptor for object detection
АВТОМАТИЗИРОВАННАЯ ПОДГОТОВКА ИЗОБРАЖЕНИЙ ДЛЯ РАСПОЗНАВАНИЯ РОБОТОТЕХНИЧЕСКИМИ СИСТЕМАМИ В РЕЖИМЕ РЕАЛЬНОГО ВРЕМЕНИ
РАЗБИЕНИЕ КОНТУРА ИЗОБРАЖЕНИЯ ГРАФИЧЕСКОГО ОБЪЕКТА НА ФРАГМЕНТЫ В ЗАДАЧАХ КЛАССИФИКАЦИИ
Исследование чувствительности векторов признаков, сформированных на основе кратномасштабных преобразований обрабатываемых изображений
АЛГОРИТМЫ АНАЛИЗА ВИДЕОИЗОБРАЖЕНИЙ В ДИАГНОСТИКЕ ТИПОВ БОЛЕЗНЕЙ НА ОСНОВЕ ВЕЙВЛЕТНЫХ ВОЛНОВЫХ ФУНКЦИЙ

Ещё 8 статей в подразделах