Atharva Bhagwat | Separating Perception and Reasoning via Relation Networks

Separating Perception and Reasoning via Relation Networks ~ Atharva Bhagwat, Harini Appansrinivasan, and Abdulqadir Zakir December 11, 2022

Visual Question Answering (VQA) is a multi-modal task relating text and images through captions or a questionnaire. For example, with a picture of a busy highway, there could be a question: “How many red cars are there?” or “Are there more motorbikes than cars?”. It is a very challenging task since it requires high-level understanding of both the text and the image and the relationships between them.

In this project, we study the Relation Networks implementations of AI approaches that offer the ability to combine neural and symbolic representations to answer VQA task.

Relation Networks by DeepMind is a simple and representationally flexible general solution to relational reasoning in neural networks. We solve our problem using two different types of data, pixel-based and state-descriptions based.

Here is the link to the repository.