Open Conference Systems, STATISTICS AND DATA SCIENCE: NEW CHALLENGES, NEW GENERATIONS

Font Size: 
Using Twitter data for Population Estimates
Dilek Yildiz, Jo Munson, Agnese Vitali, Ramine Tinati, Jennifer Holland

Last modified: 2017-05-20

Abstract


Twitter is increasingly being used as a source of data for the Social Sciences. However, deriving the demographic characteristics of users and dealing with the non-random non-representative populations from which they are drawn represent challenges for social scientists. This paper has two objectives: first, it compares different methods for estimating demographic information from Twitter data based on the crowd-sourcing platform CrowdFlower and the image-recognition software Face++. Second, it proposes a method for calibrating the non-representative sample of Twitter users with auxiliary information from official statistics, hence allowing to generalize findings based on Twitter to the general population.