In this project, I developed my experience and interest in combining Human-Computer Interaction and Social Computing methods to study and improve interactions between people and AI/ML models, which motivates my current aims toward more effective human-AI collaboration with diverse users. We approached the task of evaluating human judgement of the outputs of image generation models. You can find the paper at NeurIPS 2019.
Motivation: Human-AI collaboration depends upon measuring and understanding people’s perception of the outputs, performance, and behavior of ML models. For many ML tasks such as image generation, realizing a desired human perception (e.g. strikingly realistic generated faces) is the ultimate goal. However, current evaluation metrics are either automated and thus fail to accurately capture human judgement, or they attempt human evaluation that is unreliable and ad-hoc.
Our solution: Toward this end, I worked on HYPE, a benchmark that utilizes novel crowdsourcing techniques to directly and reliably evaluate the human-perceived realism of GAN-generated images. Compared to existing human evaluation methods, HYPE is more consistent and benefits from theoretical grounding in psychophysics.
My contribution: My work involved training a wide assortment of GAN models, digging in to the crowdsourcing literature, iterating on task designs to optimally capture the differences in human perception between state of the art models, and co-authoring our accepted oral conference paper.
Future work: I aim to expand on this work by applying social computing methods to analyze user interactions with ML models, which will better inform my future ML research.