Parsing and Summarizing Infographics with Synthetically Trained Icon Detection

Madan S, Bylinskii Z, Nobre C, Tancik M, Recasens A, Zhong K, Alsheikh S, Oliva A, Durand F, and Pfister H.

2021 IEEE 14th Pacific Visualization Symposium (PacificVis), 2021.

Widely used in news, business, and educational media, infographics are handcrafted to effectively communicate messages about complex and often abstract topics including `ways to conserve the environment' and `coronavirus prevention'. The computational understanding of infographics required for future applications like automatic captioning, summarization, search, and question-answering, will depend on being able to parse the visual and textual elements contained within. However, being composed of stylistically and semantically diverse visual and textual elements, infographics pose challenges for current A.I. systems. While automatic text extraction works reasonably well on infographics, standard object detection algorithms fail to identify the stand-alone visual elements in infographics that we refer to as `icons'. In this paper, we propose a novel approach to train an object detector using synthetically-generated data, and show that it succeeds at generalizing to detecting icons within in-the-wild infographics. We further pair our icon detection approach with an icon classifier and a state-of-the-art text detector to demonstrate three demo applications: topic prediction, multi-modal summarization, and multi-modal search. Parsing the visual and textual elements within infographics provides us with the first steps towards automatic infographic understanding.