Shounak Naik

Bringing Pixels to Life with the magic of Computer Vision!

Greetings! I am Shounak Naik, a Computer Vision Engineer with keen interests in Machine Learning, Graphics, Embedded Systems, and Robotics. I recently graduated a Master's in Robotics at Worcester Polytechnic Institute. Earlier, I graduated with a Bachelors Degree in Computer Science from BITS Pilani in India.

I am currently working in the Generative AI field at the intersection of Neural Rendering and Computer Vision at Aireal.

My professional journey has been marked by enriching experiences, notably as a Computer Vision Intern at Cognex Corporation, where I explored the nuances of multicamera systems and developed an Epipolar Geometry based extrinsic calibration error detection system. At Carnegie Robotics, I implemented and deployed a high-performance object detection pipeline on an FPGA. As a Machine Learning Engineer at Bloomreach, Inc, I designed and analyzed multi-modal (images and text) Neural Recommendation Engines..

Email / Github / Linkedin / Resume

Education

Worcester Polytechnic Institute
Masters in Robotics Engineering

Relevant Coursework: Computer Vision, Deep Learning, Embedded Deep Learning, Reinforcement Learning

Affiliations: PeAR Lab, VisLab , BASH Lab

GPA 4.0/4.0

Birla Institute of Technology and Sciences
Bachelor's in Computer Science, Masters in Biological Sciences

During my undergraduate degree, I very much loved working on system/hardware level projects. I worked on simulation of the complete Memory Hierachy of a Computer. I also worked on a 8086 based intoxication detector

During my final years in undergraduate, I started working on Deep Learning and Computer Vision and I absolutely loved working in this field. So much so that I chose to pursue a Masters focusing on Computer Vision.

I have always been fascinated by lifescienes. Thus I additionally pursued a Masters in Biological Scienes here. Any technology project invloving lifescienes genuinely excites me!

Work Experience

	Aireal Generative Machine Learning Engineer Working on camera tracking, diffusion models for generating novel scenes of a room with custom furniture.
	Cognex Corporation Computer Vision Intern Studied the effect of adding relative pose constraints to the Perspective-n-Point step for a multicamera system. Prototyped a Epipolar Geometry based extrinsic calibration and the motion model error detection system of a tunnel.
	Carnegie Robotics Computer Vision Intern Implemented, Quantized into int8 and deployed SSD300 on a FPGA using Xilinx Vitis AI acheiving 24 FPS. Designed a ROS based error flagging system for length measuring product that uses Stereo matching and MaskRCNN.
	Bloomreach Machine Learning Engineer Designed, trained and analyzed multi-modal RankNets (images+text) to build a Neural Recommendation Engine. Trained networks (across multiple GPUs) according to the BYOL self-supervised technique with ResNet being the base encoder. Improved network performance (upto 10% on certain classes) by evaluating attention maps generated by GradCAM.

Research Experience

VisLab
Graduate Research Assistant

Used COLMAP Point Cloud based depth to formulate a novel depth loss for the Generalizable NeRF Transformer.

PeAR Lab
Graduate Research Assistant

Generated Synthetic Optical Flow, Depth and Surface Normals datasets using Blender Python API.

Designed a Aleoteric Uncertainty based perception stack that on a Tello Drone could dodge static obstacles in the scene. The perception stack relied on uncertainty of optical flow network predictions to detect free space in the scene. The free space is shown as black pixels in the last section in the following video. At each timestep, the drone is directed towards the red dot.

CLSNet Lab
Undergraduate Research Assistant

Studied the semantic grounding in CodeBERT, a language model of code by Microsoft. We studied how semantic grounding varies across the layers of the CodeBERT, with the amount of fine tuning and with different programming languages.

Github, Paper, Talk

Computer Vision Projects

	Semantic Point Cloud Painting Github Built a map from LiDAR point clouds (taken directly from KITTI-360 dataset) using Point to Point ICP. Semantically segmented images using DeepLabv3 and then transfered these semantic labels to the generated ICP map
	Structure from Motion - 3D Reconstruction Github Implemented Non-Linear Triangulation, PnP and Bundle Adjustment to reconstruct the 3D structure of a building at WPI.
	Camera Calibration Github Implementated Zhang's camera calibration method from scratch to estimate the camera intrinsics and distortion parameters. Used SVD and MLE for estimating the camera calibration parameters
	Panaroma Stitching Github Used Adaptive non max suppression to retain high quality features. Matched features between image pair and then estimated homography. Used RANSAC for removing the outliers among the feature matches
	Face Swap Github Successfully swapped faces between 2 people by using Delenauy Triangulation and Inverse Barycentric Coordinates
	Probability based edge detection Github Implemented a edge detector which works by searching for texture,color and intensity discontinuities across multiple scales. Essentially it is a simpler implementaion of Pablo Arbelaez's paper

Machine Learning/Deep Learning Projects

	Zero Shot Semantic Style Transfer Github Implemented AdaAttN for diverse style application on images with text-based image segmentation using CLIPSeg
	Mobile Nerf Github Using the Pytorch API, optimised the NeRF model for embedded deployment. Performed pruning and quantization aware iterative training. Deployed the model on a M1 chip using Snapchat's Lensstudio and ONNX. Acheived realtime scene rendering on the iPad.
	Neural Radiance Field (NeRF) Github Implemented the original NeRF paper which includes ray sampling, positional encoding and volume rendering.
	Embedded Deep Learning Projects Pruning, Quantization for optimizing the VGG-16 network for CIFAR-10 classification Neural Architecture Search for microcontroller deployment from MCUNet super-network by evolutionary search. Dynamic Network Inference on BranchyNet to achieve entropy based early exit.
	La Liga Match Outcome Predictor One of my first ML Project :) Scraped 5 years of team and player statistics from the Spanish League(La Liga) site. Experimented with Random Forest and XGBoost to build a win-draw-lose classifier for any given fixture.

System level or Embedded Systems Projects

Memory Hierachy Simulation
Github

Simulated the complete memory hierarchy having the following specification:

TLB, L1 Cache with a LRU replacement policy, L2 Cache with a FIFO replacement policy.
Main memory with a hierarchical paging and with thrashing policy as Page fault frequency.

Intoxication Level Detector

Built with a 8086 microprocessor, 8259 interrupt controller, 8253 Programmable Interrupt, RAM and ROM chips.

This machine measures the reaction time of a person with respect to a predefined stimulus. Based on the reaction time, the intoxication level of the person is displayed on a 7-segment LED

Robotics Projects

	Visual Inertial Odometry using Extended Kalman Filter Github Implemented an EKF Filter to track drone pose in a 3D space by Visual Inertial Odometry Designed and implemented the process model to take in IMU acceleration. The observation model corrected the estimate after it's independent pose estimation step using AprilTags.
	Custom Robot Navigation using Nav2 Github Built a bot having LiDAR with URDF/SDF. Setup Odometry plugins in the URDF as well. Navigated the bot through set waypoints in the map using the LiDAR and nav2.
	PD Control with Gazebo and ROS Github Implemented a position controller (PD) for the 3 robot joints. A reference value for the joints is passed through a service. The calculated joint efforts are published (continuously with high sampling rates) so that the joints finally reach the reference value.
	Trajectory Tracking Using Sliding Mode Control Github Generated fifth-order trajectories between waypoints. Implemented Sliding Mode Control for trajectory tracking for the generated trajectory.

Thank You to Jon Barron for this template!