Bio

I’m a Senior Research Engineer at Borealis AI (RBC Research) in Montreal, where I work on Foundation Models and LLMs for Capital Markets and Credit Modeling within the AI Solutions team led by Prof. Greg Mori.

Previously, I was a Research Engineer at Samsung AI Center Toronto, where I worked with Dr. Alex Levinshtein and Prof. Allan Jepson on computer vision research, focusing on burst photography, neural implicit representations, and image enhancement and synthesis. Before that, I was a Software Engineer at Broadcom Inc., where I developed behavior-based malware classifiers for Norton AntiVirus using machine learning.

I have over 8 years of full-time experience in applied AI and software engineering, and I serve as a Conference Reviewer / Program Committee member for NeurIPS, CVPR, ICLR, ICML, and AAAI.

★ News

May 2026 – Our Curiosity-Critic work has been accepted to the EIML Workshop @ ICML 2026! 🎉
Apr 2026 – Our paper on Cumulative Training Progress as Curiosity is out on arXiv! Read the blog! 🥳
Mar 2026 – Patent filed for our Fast, Robust, Diverse-Retrieval Method for RAG (US App. No. 19/575,161) at Borealis AI. 🎉

Misc: I enjoy Kaggle challenges in healthcare and medicine, and I rank as a Competitions Expert (Top 5% globally, 5 medals). I custom-built a Nvidia GeForce RTX 3090 Ti workstation for these projects – check out its detailed specs and benchmarks on PC Part Picker!

Current Research Interests: Calibration, uncertainty estimation, and robustness of large language models (LLMs), particularly for risk-sensitive applications in healthcare and finance.

Contact: Reach out at vin.bhaskara@gmail.com or bhaskara@cs.toronto.edu.

Education

M.Sc. Applied Computing (Deep Learning)
Department of Computer Science, University of Toronto			2018 - 2020
Grade: A+ [4.0/4.0], Vector Scholar in AI
Research Topic: Robust Single-Shot Object Detection for Computer Vision

B.Tech. Engineering Physics
Indian Institute of Technology Guwahati (IIT Guwahati)			2012 - 2016
Institute Silver Medalist, IQC Research Visitor

Publications

11 papers · 3 patents (1 granted, 2 pending) · 324 citations · h-index 8

(^* denotes equal contribution)

Machine Learning

Apr
2026

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training
Vin Bhaskara, Haicheng Wang
ICML 2026 Workshop on Epistemic Intelligence in Machine Learning
arXiv:2604.18701 [cs.LG]

Arxiv Blog Video Code Cite

Tl;dr:
Curiosity-Critic grounds intrinsic reward in improvement of cumulative world-model error, reducing to per-step reducible (epistemic) error above a learned irreducible (aleatoric) baseline, recovers prior methods, beats them on stochastic gridworld.

Full Abstract:
Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it admits a tractable per-step surrogate: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this error baseline online with a learned critic co-trained alongside the world model; regressing a single scalar, the critic converges well before the world model saturates, redirecting exploration toward learnable transitions without oracle knowledge of the noise floor. The reward is higher for learnable transitions and collapses toward the error baseline for stochastic ones, effectively separating epistemic (reducible) from aleatoric (irreducible) prediction error online. Prior prediction-error curiosity formulations, from Schmidhuber (1991) to learned-feature-space variants, emerge as special cases corresponding to specific approximations of this error baseline. Experiments on a stochastic grid world show that Curiosity-Critic outperforms prediction-error, visitation-count, and Random Network Distillation methods in training speed and final world model accuracy.

Jan
2022

GraN-GAN: Piecewise Gradient Normalization for Generative Adversarial Networks
Vin Bhaskara^*, Tristan Aumentado-Armstrong^*, Allan Jepson, Alex Levinshtein
Winter Conference on Applications of Computer Vision (WACV 2022)

Paper Arxiv Poster Slides Video Code Cite

We introduce Gradient Normalization (GraN), an input-dependent normalization for GAN discriminators that strictly enforces a piecewise K-Lipschitz constraint by dividing the network's output by its own input gradient norm. Unlike spectral normalization, GraN acts on the full network rather than layer-by-layer, avoiding gradient attenuation from loose compositional bounds. Unlike gradient penalties, it's a hard constraint, not a soft one on sampled points. We show improved image generation quality (FID, KID, IS) across multiple datasets and loss functions, and reveal that tuning the Lipschitz constant K (usually left at 1) interacts with Adam's epsilon on loss plateaus, yielding further gains.

Jan
2021

Efficient Super-Resolution Using MobileNetV3
Haicheng Wang^*, Vin Bhaskara^*, Alex Levinshtein^*, Stavros Tsogkas, Allan Jepson
European Conference on Computer Vision (ECCV 2020) Workshops

Paper PDF Cite

We adapt MobileNetV3 blocks — originally designed for classification, detection, and segmentation — to build an efficient 4× single-image super-resolution network suitable for on-device mobile deployment. Our model approaches the PSNR of heavy state-of-the-art SR methods like ESRGAN and RCAN while being 40–1000× more efficient in FLOPs and 30–1000× smaller in parameters. We also present an extremely lite variant (18K params, 2.6G FLOPs) capable of generating a 12MP image in ~1.4s on a mobile phone, making real-world deployment feasible at the cost of a barely perceptible ~0.5–1 dB PSNR drop.

Apr
2020

Part-based Auxiliary Objectives with No Extra Labels for Robust Single-Shot Object Detection
Vin Bhaskara, Stavros Tsogkas, Kosta Derpanis, Alex Levinshtein
Preprint. DOI: 10.13140/RG.2.2.10079.47521

Link PDF Cite

We introduce annotation-free, part-based auxiliary objectives to improve single-stage object detection, applied on top of CenterNet. Specifically, we add two self-supervised tasks derived purely from existing bounding box labels: class-agnostic corner keypoint heatmaps and pixel-wise vector fields that cast votes for object part directions. We also fix a train-test discrepancy in CenterNet's regression head by predicting edge offsets within a 3×3 window around centers, making the detector robust to single-pixel center localization errors. Together, these yield ~2.6–3.9 AP improvements on MS COCO test-dev across backbones, with our best model hitting 39.4 AP at 31 FPS (DLA34) and fastest at 32.2 AP at 71 FPS (ResNet-18).

May
2019

Exploiting Uncertainty of Loss Landscape for Stochastic Optimization
Vin Bhaskara, Sneha Desai
Preprint. arXiv:1905.13200 [cs.LG]

Arxiv Code Cite

We introduce novel variants of momentum for stochastic optimization that use the gradient of the mini-batch loss variance to quantify uncertainty in the loss landscape. We apply this "variance-gradient" in two ways: as a bias (MomentumUCB) that steers updates toward regions where the loss landscape's features conform across mini-batches, improving generalization; and as a stochastic exploration signal (MomentumS) that randomly injects noise along high-variance directions, yielding an unbiased full-gradient estimate. We also introduce a model-agnostic stochastic regularizer through the update rule itself, establish a connection between the variance-gradient and REINFORCE with baseline, and incorporate these into Adam variants that show faster convergence on MNIST and CIFAR-10.

Apr
2019

Risk Prediction in the General Internal Medicine Ward at St. Michael's Hospital
Vin Bhaskara, Yingying Fu, Sindhu Gowda
Preprint. DOI: 10.13140/RG.2.2.27695.55205

Link PDF Cite

We build an early warning system to predict critical care outcomes (ICU transfer, palliative care, death) for General Internal Medicine patients using only the first 24 hours of admission data from St. Michael's Hospital. We explore feed-forward and recurrent architectures, and propose a data-driven regularization technique that uses ICD-10 diagnosis codes as intermediate training labels to improve patient representations without requiring them at inference time. We also incorporate positional embeddings to disentangle temporal from feature information, and show that ensembling diverse model families yields the best overall performance (~0.82 AUC-ROC), with neural models generalizing substantially better than XGBoost despite similar test AUC, and performance analysis stratified by gender, length of stay, and diagnosis codes revealing where each model excels.

Jul
2018

Emulating Malware Authors for Proactive Protection using GANs over File Behaviors
Vin Bhaskara, Debanjan Bhattacharyya
Preprint. arXiv:1807.07525 [stat.ML]

Arxiv Code Cite

We propose a reversible method for encoding dynamic file behavior (API call sequences) into RGB images using the Fourier transform, where API call ngram tf-idfs are mapped to amplitudes in frequency space and first-invocation order to phase, producing distributed image textures that are visually distinctive across software categories and fully decodable back to the original sequence information. We train a WGAN-GP on these malware behavior images to act as a malware author emulator, generating synthetic malware behaviors that can be used to proactively test and harden ML-based threat detection before zero-day attacks appear in the wild. We also propose perceptual hash-based "smart definitions" over these images as a replacement for cryptographic hashes, offering meaningful similarity metrics that generalize across malware variants, and validate that even a simple XGBoost classifier over raw image pixels achieves 0.97 AUC for malware detection on our dataset.

Quantum Information

May
2022

Generalized Entanglement Measure for Continuous-Variable Systems
Nibedita Swain^*, Vin Bhaskara^*, Prasanta K. Panigrahi
Physical Review A 105, 052441

Paper PDF Arxiv Cite

We extend the wedge product and Lagrange-Brahmagupta identity framework (previously proposed for discrete-variable systems by Bhaskara et al.) to construct a family of faithful entanglement measures for general pure and mixed continuous-variable (CV) states across arbitrary bipartitions and degrees of freedom. The resulting generalized entanglement measure (GEM) provides necessary and sufficient conditions for separability — unlike prior CV criteria (Simon, Duan et al., Hillery-Zubairy) which are only necessary for general non-Gaussian states — and for pure CV states is computationally simpler than von Neumann entropy as it avoids diagonalization of the infinite-dimensional reduced density matrix, requiring only integrals over the wave function. We validate the measure on Gaussian states, pair-coherent states, non-Gaussian CV Bell states, and superpositions of squeezed states, recovering known results, and establish equivalences to the Hilbert-Schmidt distance and von Neumann entropy as entanglement measures.

Mar
2017

Generalized Entanglement Measure for Multiparticle Pure States in Arbitrary Dimensions
Vin Bhaskara, Prasanta K. Panigrahi
Quantum Information Processing, Volume 16, Article number: 118

Paper PDF Arxiv Cite

We present a new framework based on the wedge product and the generalized Lagrange's (Brahmagupta–Fibonacci) identity to extend concurrence as a faithful entanglement measure to multiparticle pure states in arbitrary dimensions. The key geometric insight is that separability across any bipartition is equivalent to the post-measurement vectors being parallel, captured by vanishing wedge products, and the entanglement measure corresponds to the area of the complex parallelotope formed by these vectors. Applying Lagrange's identity converts the O(m²) wedge product norm computation into the O(m) expression E²_M = 2(1 − tr(ρ_M²)), yielding necessary and sufficient separability conditions across arbitrary bipartitions. The resulting measure coincides with the I-concurrence of Rungta et al. (derived independently via a universal inverter superoperator), but our geometric derivation exposes the underlying structure and naturally extends to the continuous-variable case addressed in subsequent work.

Mar
2017

Implementing Bragg Mirrors in a Hollow-Core Photonic-Crystal Fiber
Jeremy Flannery, Golam Bappi, Vin Bhaskara, Omar Alshehri, Michal Bajcsy
Optical Materials Express, Volume 7, Issue 4, pp. 1198-1210

Paper Cite

We propose and numerically simulate two methods for implementing Bragg gratings in hollow-core photonic-crystal fibers (HCPCFs) that crucially leave the hollow core unobstructed, enabling continued loading of atoms for quantum optics experiments. The first method coats the inner hollow-core wall with photoresist followed by UV interference lithography, while the second selectively fills photonic-crystal cladding holes with UV-curable epoxy. Numerical simulations predict that the hollow-core coating approach can achieve reflectivities >99.99% with only ~300 Bragg periods (~100 µm penetration depth), compared to ~10⁵ periods (~5 cm) needed if modulating only the silica material itself. The hole-filling method achieves lower reflectivity (~99.8%) due to weaker effective index contrast, but relies on selective injection techniques that have already been experimentally demonstrated, making it more immediately practical.

Sep
2016

Mesoscale Cavities in Hollow-Core Waveguides for Quantum Optics with Atomic Ensembles
C.M. Haapamaki, J. Flannery, G. Bappi, R. Al Maruf, Vin Bhaskara, O. Alshehri, T. Yoon, M. Bajcsy
Nanophotonics, Volume 5, No. 3, pp. 392-408

Paper Cite

We review and propose approaches for incorporating Bragg gratings, mirrors, and Fabry–Pérot cavities into hollow-core photonic-crystal (HCPC) fibers and hollow-core ARROW waveguides without obstructing their cores, enabling loading of atomic ensembles for enhanced light–matter interactions. We analyze two Bragg grating methods for HCPC fibers (hollow-core wall coating and selective photonic-crystal hole filling with photosensitive polymers), gratings etched into ARROW cladding layers, and photonic-crystal membrane metasurfaces as compact broadband mirrors — showing that the membrane approach can reach near-unity reflectivity sufficient for both the strong-coupling and high-cooperativity cavity QED regimes with cesium atoms in ~1–4 cm fiber cavities. We propose applications of these "mesoscale" cavities (effective lengths from hundreds of microns to centimeters, transverse confinement at the micron scale) including single-photon transistors and superradiant lasers, and discuss on-chip integration pathways for both ARROW and fiber-based platforms.

Patents

Mar
2026

Systems, Methods, and Techniques for Performing Retrieval Augmented Generation (RAG) with a Diverse-Retrieval Method
Lorne Schell, and Vin Bhaskara (Borealis AI)
US Patent Application No. 19/575,161 (Filed Mar 23, 2026)
Patent-Pending

Mar
2025

Generating Credit Capacity Estimates
S. H. Hajimirsadeghi, M. O. Ahmed, E. J. Smith, Vin Bhaskara, et al. (Borealis AI)
US Patent Application No. 19/091,539 (Filed Mar 17, 2025)
Patent-Pending

Jan
2025

Unsupervised Super-Resolution Training Data Construction
Haicheng Wang, Xinyu Sun, Vin Bhaskara, Stavros Tsogkas, Allan Jepson, Alex Levinshtein (Samsung AI Center)
US Patent 12,210,587 (Granted Jan 28, 2025)

Patent Cite

We propose a multi-stage unsupervised pipeline for constructing super-resolution training data that avoids the domain gap between synthetically downsampled images used during training and real low-resolution images encountered at test time. Given a real low-resolution image, a first model (blind SR with deep image prior) generates an initial high-resolution estimate by jointly estimating the unknown blur kernel, then a second model (CycleGAN-based domain adaptation) refines this estimate using an unpaired dataset of real high-resolution images to produce natural-looking pseudo-ground-truth, with a low-frequency content preservation loss to maintain structural fidelity. The resulting paired dataset of real low-resolution inputs and synthetic high-resolution targets is used to train a final supervised super-resolution network, eliminating the need for paired training data and ensuring that the network sees real low-resolution images during training — not synthetic ones. Since no fixed degradation kernel (e.g., bicubic) is assumed, the method generalizes to real-world images with unknown and varying degradations.

Vin Bhaskara he/him

Education

Publications

Machine Learning

Quantum Information

Patents