In this project, I built my initial interests and research experience in Artificial Intelligence and Machine Learning, which informs my current aims toward more effective human-AI collaboration with diverse users. We approached the task of few-shot scene graph prediction from images. You can find the paper on arXiv.
Motivation: Collaboration in the visual world requires firm understanding of the objects and relationships in a scene, and accommodating varied user behavior further requires perceiving relationships rarely seen in training data. Current scene graph models struggle on this long tail in visual relationship datasets, which leads to deceptively brittle collaborative agents that can dramatically falter, at great risk to their users, in unusual circumstances. Instead, object representations must capture broad relationship affordances beyond the limited relationships they co-occur with in training data.
Our solution: We addressed this problem by learning visual relationships as functions between object representations, which allows our model to learn an object embedding space that clusters objects affording similar relationships. Our novel architecture, based on Graph Convolution Networks, defines objects as nodes and relationships as edges, such that the output of the network is a set of object embeddings encoding our desired relationship affordances. We can accomplish few-shot learning of new relationships with a classifier operating on these relationship-oriented object embeddings.
My contribution: For this project, I contributed by reading a broad sweep of literature, conducting extensive research and experimentation to iterate on this complex neural network architecture, and co-authoring our conference submission.
Future work: I am interested in extending similar few-shot learning capabilities to directly interactive tasks, which enables more robust and individualized collaborative ML models.