(click on images to go to project pages)

Image-Set Visual Question Answering

Human-Object Interaction Detection

Semi-supervised Learning

Internship at Amazon Rekognition, Seattle, WA (May 2018-August 2018)

Zero-shot Object Detection

Internship at SRI International, Princeton, NJ (May 2017-August 2017)


A large scale dataset for face recognition and verification. It includes over 360,000 annotated still images, 22,000 videos, and over 3.7 million annotated video frames.

Comparison of Supervised and Unsupervised Neural Networks

Course Project for CMSC828L (Deep Learning), UMD College Park (August 2016-December 2016)
Compared the networks learnt using Generative Adversarial Networks [1] and compared against supervised networks for classification. [report]

Real-time (30 fps) Face Detection Using YOLO (You Only Look Once)

Trained a real-time (30 fps) face detector using the You Only Look Once method [2] to find faces in about 15 million images.

Detecting handles of kitchen appliances using Faster R-CNN

Course Project for CMSC828K (Preception for Robotics), UMD College Park (January 2016-April 2016)
Obtained annotations for data using Amazon Mechanical Turks and trained Faster R-CNN [3] models for detection of handles of common kitchen appliances.

Estimating Number of People in Images of Very High Density Crowds

Master's thesis, IIT Kanpur (May 2014-June 2015)
The task was to find the number of people in images of crowds taken at concerts, rallies, demonstrations, sports events, religious processions and ceremonies, fairs, etc. Exploited multiple sources - head counts, Fourier analysis, GLCM features, and interest point based counting. Used deformable part models (DPM) for detecting heads and estimating their count and confidence. Implemented Fourier analysis based counting by finding local maxima in reconstructed gradient images of crowds. Trained a support vector regression model on interest points (SIFT features) to obtain an estimate of the count. Obtained the final count by fusing counts from different sources using support vector regression. Paper based on the work.


[1] Radford, Alec, Luke Metz, and Soumith Chintala. “Unsupervised representation learning with deep convolutional generative adversarial networks.” arXiv preprint arXiv:1511.06434 (2015).
[2] Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. “You only look once: Unified, real-time object detection.” arXiv preprint arXiv:1506.02640 (2015).
[3] Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. “Faster R-CNN: Towards real-time object detection with region proposal networks.” In Advances in neural information processing systems, pp. 91-99. 2015.