Jaimin Bhoi | Computer Vision Engineer

Professional Experience

Graduate Research Assistant

Center for Research in Computer Vision @UCF | May 2024 - Jan 2025

Proposed a novel approach on Neurosymbolic AI to generate Dynamic Scene Graph from videos using Off-the-shelf Multi-Modal Large Language Models (MLLMS), advancing video understanding through automated scene graph generation.

Individual Contributor

UCF College of Medicine | Dec 2023 - Apr 2024

Designed and implemented a Computer Vision solution to perform automated angle measurement of MRI data (DICOMs) for bone alignment using Segment Anything Model (SAM) + Image Classifier Heads, streamlining bone alignment assessment and reducing manual workload for clinicians.

Systems Engineer

Tata Consultancy Services (TCS) | Jun 2018 - Sep 2023

Executed and delivered three projects to production: IVI system, Computer Vision on QC RB500 board, and Container Image Analytics that directly impacted human lives and saved billions in USD.

Cutting-Edge Computer Vision Research

What can off-the-shelf MLLMs do for Dynamic Scene Graph Generation?

Designed and implemented an MLLM-based method for dynamic video scene graph generation, improvingperformance by 10-40% for different top-Ks and achieving state-of-the-art performance (SOTA) on benchmark datasets.

EEG Signal Analysis & Visualization

Developing novel algorithms to transform brain activity into actionable visual data, with applications in neuroscience and human-computer interaction.

Industrial and Research Projects

Dynamic Scene Graph Generation(DSGG)

Proposed a novel solution for Dynamic Scene Graph Generation (DSGG) with MLLMs, demonstrating a 10-40% performance improvement using just 5-10% of training data across varying top-K metrics, while maintaining the recall-precision balance.

Container Image Analytics

Developed and deployed Computer Vision algorithms that saved $4M in container repair and cleaning costs, and reduced lead time from 12 to 1 day for 10% of repair volume while ensuring high accuracy and performance.

Computer Vision on Qualcomm RB5 Development Board

Addressed the challenge of static-image action recognition by fine-tuning a CLIP model, enabling accurate classification of human activities and improving interpretability using self-attention visualization.

Academic Projects

DumbVLMs (Visual Language Models)

Created a novel dataset of 2D/3D shapes and real images to evaluate reasoning limits in MLLMs/VLMs (LLaVA-One-Vision, InternVL3, Qwen2-VL), revealing critical biases and failure cases in geometric and in-context understanding of SOTA VLMs.

Human Activity Recognition on Static Images (HAR)

Designed and implemented a video analytics solution to prevent losses in retail self-checkout environments, addressing an industry-wide annual loss of $90B.

DistilledDINO

Distilled DINO models to smaller EfficientNets and VITs for efficient inference.

SSL with DINO on X-ray Images

Tackled the lack of annotated X-ray data by fine-tuning a DINO self-supervised model on chest X-rays for pneumonia classification, achieving a 95.5% test accuracy and demonstrating strong generalization.

RetailEye

Built a real-time customer behavior analysis system using action classification and state tracking, preventing self checkout theft.

EEG Visual Stimuli

Achieved near image-level representation of EEG visual stimuli features by training a self-supervised learning model using the DINO framework, enabling better alignment with visual representations.

Drunk and Drowsiness Alert System (DADAS)

Implemented a drunk and drowsiness alert system for long route drivers for monitoring and safety purposes.

What I Bring to the Table

Cutting Edge CV

Expertise in MLLMs, training, finetuning LLMs, stable diffusion models, LoRA

Classical Computer Vision

Expertise in object detection, image segmentation, and facial recognition using OpenCV and YOLO.

Deep Learning

Proficient in CNN and RNN architectures with TensorFlow and PyTorch for real-world applications.

Research

During my master's, I gained hands-on experience applying research methodologies, exploring novel ideas, and leveraging HPC resources to train/eval large-scale models.

Image Processing

Skilled in classical techniques for enhancement, restoration, and feature extraction.

Azure Cloud

Experience deploying ML models and APIs using Azure AI and Cognitive Services.

Academics

Master's in Computer Vision

University of Central Florida | August 2023 - May 2025

Developed strong expertise in computer vision through rigorous coursework that emphasized state-of-the-art methods, robust system design, and practical applications.

Bachelor's in Computer Engineering

A D Patel Institute of Technology | 2014 - 2018

Let's Connect

Send Me an Email

I'm open to opportunities and collaborations in computer vision and AI. Reach out to discuss ideas or projects!