Bringing Pixels to Life with the magic of Computer Vision!
Greetings! I am Shounak Naik, a Computer Vision Engineer with keen interests in Machine Learning, Graphics, Embedded Systems, and Robotics. I recently graduated a Master's in Robotics at Worcester Polytechnic Institute. Earlier, I graduated with a Bachelors Degree in Computer Science from BITS Pilani in India.
I am currently working in the Generative AI field at the intersection of Neural Rendering and Computer Vision at Aireal.
My professional journey has been marked by enriching experiences, notably as a Computer Vision Intern at Cognex Corporation, where I explored the nuances of multicamera systems and developed an Epipolar Geometry based extrinsic calibration error detection system. At Carnegie Robotics, I implemented and deployed a high-performance object detection pipeline on an FPGA. As a Machine Learning Engineer at Bloomreach, Inc, I designed and analyzed multi-modal (images and text) Neural Recommendation Engines..
Birla Institute of Technology and Sciences Bachelor's in Computer Science, Masters in Biological Sciences
During my undergraduate degree, I very much loved working on system/hardware level projects. I worked on simulation of the complete Memory Hierachy of a Computer. I also worked on a 8086 based intoxication detector
During my final years in undergraduate, I started working on Deep Learning and Computer Vision and I absolutely loved working in this field. So much so that I chose to pursue a Masters focusing on Computer Vision.
I have always been fascinated by lifescienes. Thus I additionally pursued a Masters in Biological Scienes here. Any technology project invloving lifescienes genuinely excites me!
Work Experience
Aireal Generative Machine Learning Engineer
Working on camera tracking, diffusion models for generating novel scenes of a room with custom furniture.
Cognex Corporation Computer Vision Intern
Studied the effect of adding relative pose constraints to the Perspective-n-Point step for a multicamera system.
Prototyped a Epipolar Geometry based extrinsic calibration and the motion model error detection system of a tunnel.
Carnegie Robotics Computer Vision Intern
Implemented, Quantized into int8 and deployed SSD300 on a FPGA using Xilinx Vitis AI acheiving 24 FPS.
Designed a ROS based error flagging system for length measuring product that uses Stereo matching and MaskRCNN.
Bloomreach Machine Learning Engineer
Designed, trained and analyzed multi-modal RankNets (images+text) to build a Neural Recommendation Engine.
Trained networks (across multiple GPUs) according to the BYOL self-supervised technique with ResNet being the base encoder.
Improved network performance (upto 10% on certain classes) by evaluating attention maps generated by GradCAM.
Research Experience
VisLab Graduate Research Assistant
Used COLMAP Point Cloud based depth to formulate a novel depth loss for the Generalizable NeRF Transformer.
PeAR Lab Graduate Research Assistant
Generated Synthetic Optical Flow, Depth and Surface Normals datasets using Blender Python API.
Designed a Aleoteric Uncertainty based perception stack that on a Tello Drone could dodge static obstacles in the scene.
The perception stack relied on uncertainty of optical flow network predictions to detect free space in the scene. The free space is shown as black pixels in the last section in the following video. At each timestep, the drone is directed towards the red dot.
CLSNet Lab Undergraduate Research Assistant
Studied the semantic grounding in CodeBERT, a language model of code by Microsoft.
We studied how semantic grounding varies across the layers of the CodeBERT, with the amount of fine tuning and
with different programming languages.
Scraped 5 years of team and player statistics from the Spanish League(La Liga) site. Experimented with Random Forest and XGBoost to build a win-draw-lose classifier for any given fixture.
Simulated the complete memory hierarchy having the following specification:
TLB, L1 Cache with a LRU replacement policy, L2 Cache with a FIFO replacement policy.
Main memory with a hierarchical paging and with thrashing policy as Page fault frequency.
Intoxication Level Detector
Built with a 8086 microprocessor, 8259 interrupt controller, 8253 Programmable Interrupt, RAM and ROM chips.
This machine measures the reaction time of a person with respect to a predefined stimulus. Based on the reaction time, the intoxication level of the person is displayed on a 7-segment LED
Robotics Projects
Visual Inertial Odometry using Extended Kalman Filter Github
Implemented an EKF Filter to track drone pose in a 3D space by Visual Inertial Odometry
Designed and implemented the process model to take in IMU acceleration. The observation model corrected the estimate after it's independent
pose estimation step using AprilTags.
Implemented a position controller (PD) for the 3 robot joints.
A reference value for the joints is passed through a service. The calculated joint efforts
are published (continuously with high sampling rates) so that the joints finally reach the reference value.
Trajectory Tracking Using Sliding Mode Control Github
Generated fifth-order trajectories between waypoints. Implemented Sliding Mode Control for trajectory tracking for the generated trajectory.