About • Works • Resume • Research
While few-shot learning as a transfer learning paradigm has gained significant traction for scenarios with limited data, it has primarily been explored in the context of building unimodal and unilingual models. Furthermore, a significant part of the existing literature in the domain of few-shot multitask learning perform in-context learning which requires manually generated prompts as the input, yielding varying outcomes depending on the level of manual prompt-engineering. In addition, in-context learning suffers from substantial computational, memory, and storage costs which eventually leads to high inference latency because it involves running all of the prompt's examples through the model every time a prediction is made. In contrast, methods based on the transfer learning via the fine-tuning paradigm avoid the aforementioned issues at a one-time cost of fine-tuning weights on a per-task basis. However, such methods lack exposure to few-shot multimodal multitask learning. In this paper, we propose few-shot learning for a multimodal multitask multilingual (FM3) setting by adapting pre-trained vision and language models using task-specific hypernetworks and contrastively fine-tuning them to enable few-shot learning. FM3's architecture combines the best of both worlds of in-context and fine-tuning based learning and consists of three major components: (i) multimodal contrastive fine-tuning to enable few-shot learning, (ii) hypernetwork task adaptation to perform multitask learning, and (iii) task-specific output heads to cater to a plethora of diverse tasks. FM3 learns the most prominent tasks in the vision and language domains along with their intersections, namely visual entailment (VE), visual question answering (VQA), and natural language understanding (NLU) tasks such as neural entity recognition (NER) and the GLUE benchmark including QNLI, MNLI, QQP, and SST-2.
Report | Poster | Presentation | Milestone | ProposalCausality knowledge is vital to building robust AI systems. Deep learning models often perform poorly on tasks that require causal reasoning, which is often derived using some form of commonsense knowledge not immediately available in the input but implicitly inferred by humans. Prior work has unraveled spurious observational biases that models fall prey to in the absence of causality. While language representation models preserve contextual knowledge within learned embeddings, they do not factor in causal relationships during training. By blending causal relationships with the input features to an existing model that performs visual cognition tasks (such as scene understanding, video captioning, video question-answering, etc.), better performance can be achieved owing to the insight causal relationships bring about. Recently, several models have been proposed that have tackled the task of mining causal data from either the visual or textual modality. However, there does not exist widespread prevalent research that mines causal relationships by juxtaposing the visual and language modalities. While images offer a rich and easy-to-process resource for us to mine causality knowledge from, videos are denser and consist of naturally time-ordered events. Also, textual information offers details that could be implicit in videos. As such, we propose iReason, a framework that infers visual-semantic commonsense knowledge using both videos and natural language captions. Furthermore, iReason’s architecture integrates a causal rationalization module to aid the process of interpretability, error analysis and bias detection. We demonstrate the effectiveness of iReason using a two-pronged comparative analysis with language representation learning models (BERT, GPT-2) as well as current state-of-the-art multimodal causality models. Finally, we present case-studies attesting to the universal applicability of iReason by incorporating the “causal signal” in a range of downstream cognition tasks such as dense video captioning, video question-answering and scene understanding and show that iReason outperforms the state-of-the-art.
Readme | Report | Github | Milestone | ProposalMost previous works in visual understanding rely solely on understanding the "what" (e.g., object recognition) and "where" (e.g., event localization), which in some cases, fails to describe correct contextual relationships between events or leads to incorrect underlying visual attention. Part of what defines us as human and fundamentally different from machines is our instinct to seek causality behind any association, say an event Y that happened as a direct result of event X. To this end, we propose iPerceive, a framework capable of understanding the "why" between events in a video by building a common-sense knowledge base using contextual cues. We demonstrate the effectiveness of our technique to the dense video captioning (DVC) and video question answering (VideoQA) tasks. Furthermore, while most prior art in DVC and VideoQA relies solely on visual information, other modalities such as audio and speech are vital for a human observer's perception of an environment. We formulate DVC and VideoQA tasks as machine translation problems that utilize multiple modalities. Another common drawback of current methods is that they train the event proposal and captioning model either separately or in alternation, which prevents direct influence of the proposal based on the caption. To address this, we adopt an end-to-end Transformer architecture. By evaluating the performance of iPerceive DVC and iPerceive VideoQA on the ActivityNet Captions and TVQA datasets respectively, we show that our approach furthers the state-of-the-art.
Readme | Report | Presentation | Github | Milestone | ProposalRecently, learning-based models have enhanced the performance of Single-Image Super- Resolution (SISR). However, applying SISR successively to each video frame leads to lack of temporal coherency. On the other hand, Video Super Resolution (VSR) models based on Convolutional Neural Networks (CNNs) outperform traditional approaches in terms of image quality metrics such as Peak Signal to Noise Ratio (PSNR) and Structural SIMilarity (SSIM). However, Generative Adversarial Networks (GANs) offer a competitive advantage in terms of being able to mitigate the issue of lack of finer texture details when super-resolving at large upscaling factors which is usually seen with CNNs. We present iSeeBetter, a novel spatio-temporal approach to VSR. iSeeBetter seeks to render temporally consistent Super Resolution (SR) videos by extracting spatial and temporal information from the current and neighboring frames using the concept of Recurrent Back-Projection Networks (RBPN) as its generator. Further, to improve the "naturality" of the super-resolved image while eliminating artifacts seen with traditional algorithms, we utilize the discriminator from Super-Resolution Generative Adversarial Network (SRGAN). Mean Squared Error (MSE) as a primary loss-minimization objective improves PSNR and SSIM, but these metrics may not capture fine details in the image leading to misrepresentation of perceptual quality. To address this, we use a four-fold (adversarial, perceptual, MSE and Total-Variation (TV)) loss function. Our results demonstrate that iSeeBetter offers superior VSR fidelity and surpasses state-of-the-art performance.
Readme | Poster | Report | Movie | Github | Milestone | ProposalGit can be a difficult beast to master. The cheatsheet does a quick walk-through of initializing a repo, checking-in code, reverting back commits, pushing code, switching branches, removing files etc.
To get a head-start on Regular Expressions. Includes the following sections: special characters, character classes/sets, quantifiers, special sequences, character escape sequences, module level functions, functions for RegEx objects (returned from re.compile()), MatchObjects (returned from match() and search()), flags for re.compile(), miscellaneous, extensions and examples.
To get a head-start on Vim, the popular text editor. Includes the following sections: movement, deletion, yank & put, command mode, insert mode, window management, visual mode, miscellaneous and regular use.
Features: GUI with flat-icon buttons, add/remove songs from a playlist, forward/rewind tracks, mute/unmute functionality, shows time remaining on the current track
Source | Screenshots | Git
We document efforts to assemble and modify a set of benchmark circuits for testing a new class of circuit partitioning algorithms designed for heterogeneous FPGAs. Often, computations can be implemented using different types of resources within these devices; the new partitioning algorithms incorporate the circuit mapping step, where computations are mapped to specific resource types, into the partitioning algorithms themselves. We elaborate on the details of this new form of partitioning, called “multi-personality” partitioning. While remapping provides a great deal of flexibility to the partitioner to modify the implementation of circuit nodes in order to meet the desired partitioning criteria, testing this new partitioner requires large, heterogeneous netlists. In this work we develop a list of requirements for the needed benchmarks, investigate existing benchmarks to determine their suitability, and document how we adapt the chosen benchmarks for use in testing multi-personality partitioning algorithms. Finally, we also discuss initial efforts in assembling and developing a set of benchmarks for testing a new form of partitioning called content-aware partitioning.
LAMBERT Academic Publishing (Germany), ISBN-13: 978-3-659-69487-5, ISBN-10: 3659694878, EAN: 9783659694875, Pages: 72, Published on: 2015-04-15
Buy off Amazon ($40) | Buy off MoreBooks (€36) | Book Cover | Book Table of Contents | Report | Amber Source Files and Workloads | UW-Madison Theses RepositoryOcean is a simulation of large-scale sea conditions from the SPLASH benchmark suite. It is a scientific workload used for performance evaluation of parallel machines. Our version of Ocean simulates water temperatures using a large grid of integer values over a fixed number of time steps. At each time step, the value of a given grid location will be averaged with the values of its immediate north, south, east, and west neighbors to determine the value of that grid location in the next time step (total of five grid locations averaged to produce the next value for a given location). This assignment consisted of writing explicit parallel programs using two different programming styles - loop-level parallelism, shared-memory and message passing. Loop-level parallelism was exploited by starting off with the serial implementation and using an OpenMP compiler directive for parallelizing loops. The Pthreads API was used for shared-memory programming and MPI API was used for message passing.
Serial and OpenMP Implementations | Pthreads and MPI Implementations | Download Serial and OpenMP Code | Download Pthread and MPI CodeThis assignment consisted of setting up gem5, performing Syscall Emulation (SE), Full System (FS) Simulation, and simulating programs to observe speedup trends using SE and FS modes.
Project Report | Download CodeWith increase in the power consumption of computing systems, focus has shifted towards techniques that can improve the effective resource utilization. Cache Compression is one such technique that increases the effective cache capacity and thereby improving the performance of the system. In the current paper, we evaluate cache compression for GPUs. We analyze the suitability of compression to GPU workloads by implementing a Decoupled Compressed Cache as the Last Level Cache. We simulated the design in GPGPU-Sim and evaluated the design for a set of integer and floating-point benchmarks. We observed a 1-3% improvement in the cache miss rate for most of the benchmarks.
Project PPT | Project Report | Project Script(s)We perform a detailed analysis and classification of patches applied to Linux Kernel focussing particularly on patches related to memory module. After analysis and classification of patches related to the memory module of Linux Kernel versions 2 and 3, we observe the pattern of mistakes leading to necessity of kernel patches. We then identify the distribution of multiple patches attempting to fix same bug which we call as fix-fixes. Through our observations, we put together a short guideline of programming practices to avoid most frequent mistakes we encountered.
Project PPT | Project Report | Patch Analysis Data | Project Script(s)The infokernel, proposed by Arpaci-Dusseau et al., allows operating systems to be used in a more flexible and appropriate manner by higher-level services. It exposes key pieces of information about its algorithms and internal state; thus, its default policies become mechanisms, which can be controlled from user-level. In this project, we develop an Informed Virtual Memory system that exposes which physical pages various virtual pages are mapped to. We also develop an Informed File System which takes an offset in a given (previously opened) file and, if successful, returns back the disk block that the file resides in. We develop a test-case suite which validates functionality of both the Informed Virtual Memory system and the Informed File System. We then propose a system call that exposes the limit of virtual memory pages available for current process. Such a system call can be useful in a scenario in which a program may have to perform operations on a certain data set. For instance, if the program seeks to sort the data set, then at that point in the program, it may dynamically choose to implement a particular sorting algorithm based on its space complexity and the current available virtual memory page limit.
Video Tutorial | Source Files | Project ReadmeAn implementation of a set of programs that outperform NFS in comparison to AFS, and vice-versa. We also studied the properties of each system being exploited in this comparison.
Programs | Project ReadmeIn this warm-up OS project, we build the linux kernel, get it to run in a virtualized environment and modify it by adding some counters into the Linux ext2 file system code. We then add a system call to return the value of that counter.
Video Tutorial | Source Files | Project ReadmeAn implementation of Eraser, a tool for dynamically detecting data races in lock-based multi-threaded programs.
Source and Trace filesAn implementation of the Adaptive Replacement Cache, a page replacement algorithm which offers better performance than the Least Recently Used replacement algorithm by keeping track of both frequently-used and recently-used pages coupled with a recent eviction history for both.
Source and Trace filesDesigned a system using gem5 which adapts to changes in workload characteristics using a Machine Learning algorithm and accordingly performs Cache Resizing and Dynamic Frequency Scaling (DFS) to improve the power efficiency of the system.
Project Presentation | Project ReportDesigned a parser for the Verilog 2001 standard in C that could parse a Verilog HDL file into an intermediate representation format having module names and their connections.
Features include:
• Support for connect-by-name and connect-by-reference type of connections.
• Concatenation of consecutive lines until a complete module instantiation is formed. This feature was particularly intended for netlists obtained from Synposys Design Compiler.
• Support for disambiguation between a bus referenced with its entire range (without an [X:Y]) from a bus referenced with an [X:Y] in its name.
• Ability to split up the bus when you have an [X:Y] in the bus name so each signal of the bus can be identified.
• Support for parsing of Escaped identifiers: \XYZ/abc or \abc.
• Ability to output a catalogue at the end of parsing with a list of unique modules and a count of the # of instantiations of each module.
A complete functional design of a 16-Bit Microprocessor was implemented in this project. Designed components include an ALU, Instruction Decoder, Data Memory, Instruction Memory and a Register File (with a Data Segment register for Load/Store instructions and a Stack Pointer register for Call/Return instructions).
Features include:
• ISA can be found here
• Data Forwarding Unit (to forward data from the EX and MEM stages)
• Hazard Detection Unit (to handle Load-Use Hazards)
• Branch Control Unit (to handle 8 different branch conditions)
• Stall and Flush functionality
• A Flag Register with three flags: Zero (Z), Overflow (V), and Sign bit (N)
Verification of the entire system was carried out by considering corner cases. The project was initially developed as a single-cycle datapath implementation and later extended to a 5-stage pipelined design.
A high speed, high precision angle resolver was designed. It had a 2-channel 12-bit A2D converter to sample a sine and cosine signal from an angle sensor. The sine channel generated an output that is A*sin(a)+B, and the cosine channel generated C*cos(a)+D. The B and D terms represent undesired offsets that have to be cancelled through calibration. A and C represent unknown scaling terms that also need normalization through calibration. Calibration coefficients were stored in the EEPROM. After each channel is digitally corrected for offset and gain, the angle was calculated by the digital core, and available on the SPI interface.
Possible applications of this module include being a part of a power steering angular sensor or a Variable-frequency drive system for an AC motor.
LAMBERT Academic Publishing (Germany), ISBN-13: 978-3-659-17615-9, ISBN-10: 365917615X, EAN: 9783659176159, Pages: 76, Published on: 2012-07-04
Book Cover | Book Table of Contents | Buy off Amazon ($62) | Buy off Barnes & Noble ($62) | Buy off MoreBooks (€49) | Download (Initial Report) | Download (PPT #1) | Download (Final Report) | Download (PPT #2)With the increasing need for stringent security measures in recent times, biometric systems have assumed greater importance for information security systems. For biometric systems to offer reliable security, they themselves have to satisfy high security requirements to ensure invulnerability. Two different approaches based on RSA and Elliptic Curve Cryptography for database protection of biometric authentication systems, were investigated via this project. Current database protection schemes and image encryption schemes were explored. Implementations of public key algorithms has been realized for experimental purposes and the results thus obtained, have been critically analyzed.
An integrated system that aims at providing patient identification, monitoring of abnormal body conditions, tracking, rescue and response to deal with life-threatening emergencies. The project involves processing of body signals and monitoring of body weight, blood pressure and pulse oximetry to identify critical body conditions and generate alert triggers that could be transmitted to a monitoring station in case of an abnormality. A Bluetooth interface would enable the system to communicate to an application end-point in a mobile handset which can forward the data through GPRS or a GSM channel. As part of the research team, my role in this project is to accomplish the processing of body signals such as ECG, to aid the detection and diagnosis of cardiac abnormalities. In this project, we have utilized the MIT-BIH ECG Database. MATLAB simulations of QRS detection in the ECG were been carried out and the algorithm was ported to the ARM7 based system. The project is sponsored by the Innovation and Entrepreneurship Development Centre (IEDC) under Government of India, with a grant of Rs. 60,000.
Download (QRS Detection MATLAB Codes) | QRS Detection Initial Plan of Action | QRS Detection Final Outcome and Results | QRS Detection and Extraction Papers (Copyright with original authors and IEEE) | Project Aim and ProposalThe project included a Java-based interface for guiding various industrial logistics functions such as Cycle Count, Inventory Tracking, Palletization, Unloading etc. and supported 7 types of barcode reading namely UPC-E, EAN-8, EAN-13, Code 128, Interleaved 2 of 5, Code 39 and QR Code. The project was sponsored by Arshiya International, a supply chain and logistics infrastructure company.
Download (Executable) | Screenshots | Download (Source) | Barcode Samples | Extended DescriptionThis project aimed at developing a signature-based authentication system using a novel cascaded algorithmic approach. The system delivered exceptional accuracy rates and excellent time-response.
The primary goal was to perform image processing algorithms on an embedded board comprising of a microcontroller chip subtracting out any kind of computer interface.
The main aim of this project was to develop a system for recognizing sentence-level continuous American Sign Language (ASL) using a desk mounted camera by tracking the user's hand movements.
A multi-layered neural network based algorithm was employed, and a six element feature vector was used which was found to be reliable in identifying all characters on a standard QWERTY keyboard. A MATLAB GUI enabled the user to either train or test the network on a ‘one character at a time’ basis.
An interface between a CMOS camera and a computer using the AVR microcontroller was designed. The interface allowed the user to fetch images from the camera as well as to change some of the properties of the camera such as brightness, luminance, etc.
Involved a segmentation-based method of object tracking using image warping and Kalman filtering. Head and hand tracking were performed using this method to demonstrate its performance.
Image in image, image in video and video in video watermarking was implemented using the Discrete Wavelet Transform and the Discrete Cosine Transform with a focus on invisibility and recoverability.
Under this project, filters of different specifications were developed using FIR, IIR and Adaptive algorithms, such as Least Mean Squares (LMS). The C codes were implemented on the TMS320C6713 Floating-Point Digital Signal Processor from Texas Instruments and satisfactory results were achieved.
Fundamental frequency estimation, i.e., pitch detection, was performed using Mel Frequency Cepstral Coefficients (MFCCs) and vector quantization was used to minimize the amount of data to be handled.
Conceptualized and designed a temperature sensing and displaying unit using the LM35D precision centigrade temperature sensor and AVR microcontroller.
Developed a system to detect the number appearing on the die using image processing techniques with the help of an overhead camera.
Developed a Robotic system consisting of an AVR microcontroller, digital IR sensors and motors which when kept in a grid, could detect fire, go towards it and extinguish the fire.
As part of my undergrad engineering curriculum, we developed a Luggage Security Alarm using IC UM3561, a MOD 11 Synchronous Binary Counter using IC 74163 and a Simple Dice using IC 555 and IC 4017
Download Report (Counter) | Download Report (Luggage Alarm) | Download Report (Dice) | Download Schematics and PCB Designs | Download Report (Smart Continuity Tester - an extra project I did for fun)Designed websites for CSI TSEC, EVID solutions, AMANzPlanet, PirateSolutions
Download (CSI TSEC) | Download (EVID) | Download (AMANzPlanet) | Download (PirateSolutions)Presentation on Maglev Trains: the what, why, when and how of Maglev - all covered.
Download Report (PDF) | Download Report (DOC) | Download Presentation (PDF) | Download Presentation (PPT)Why do countries trade with each other? the theory of comparative advantage and international trade, exports and imports, balance of payments, exchange rates, exchange rate systems, exchange rate movements, devaluation
Download Report (PDF) | Download Report (DOC) | Download Presentation (PDF) | Download Presentation (PPT)Gave a PPT on IPTV: the what, why, when and how of IPTV - all covered.
Download (PDF) | Download (PPT)The presentation talked about using the latest technology to overcome and minimize effects of congestion and overpopulation in metropolitan cities like Mumbai. Credits to Divya Jyoti for the awesome PPT.
Download (PDF) | Download (PPT)A PPT on the comparative advantages and disadvantages of Linux. A one-on-one with Windows.
Download (PPT) | Download (PPT on Linux Commands - was covered during the PPT)As part of my undergrad engineering curriculum, my team developed a proposal for the renovation of the gymkhana and the entertainment room (we seriously needed it!)
Download (PDF) | Download (PPT)I took this presentation with Divya Jyoti, my colleague, on Web Designing (basics) for CSI-TSEC which covered HTML and CSS. We intended to take elementary PHP too, but time left us no choice but to drop the idea!
Download (PDF) | Download (PPT)At St. Xavier's, as part of my duties as a teaching assistant, I undertook this workshop on MATLAB which dealt with the capabilities and possibilities of MATLAB. Took it with Divya Jyoti.
Download Lecture 1 (PDF) | Download Lecture 1 (PPT) | Download Lecture 2 (PDF) | Download Lecture 2 (PPT) | Download Lecture 3 (PDF) | Download Lecture 3 (PPT) | Download MATLAB Sample Programs | Download Proposal for the workshop (PDF)A presentation I made for a friend's aunt for her MBA in Travel and Tourism.
Download (PDF) | Download (PPT)Introduction to optical fiber communication, classification of FOC based on modes (single mode, multi mode), linearly polarized model.
Download (PDF) | Download (PPT)A presentation we made for a competition "Enquest 2012", an inter-collegiate techno-commercial competition. The competition was basically for a commercial-product idea that involved a good degree of practical feasibility.
Download (PDF) | Download (PPT)Flyers I designed for some workshops I took at several colleges throughout the city.
Download PC Assembling Workshop Flyer (PDF) | Download Linux Workshop Flyer (PDF) | Download Windows 7 Workshop Flyer (PDF)PIC and 8086 Programs. The 8086 programs are done by me, so if you need help feel free to get in touch. Credits to Divya Jyoti for the Wireless Networks programs.
Download (PIC Programs) | Download (8086 Programs) | Download (Wireless Networks Programs)