You'll notice that I have quotes in part of the title for this blog entry, and that's because
this study about using network information to infer unknown demographics has been given a rather unfortunate and irresponsible name.
The link I have provided is a
Boston Globe article about some research coming (mostly) out of MIT about using Facebook data to infer demographic information that people have not openly acknowledged---in this case, sexual orientation. Now, of course it is true that such data can be used for that, and indeed many of the tools to do so come from network science. And it is essential for people to be far more aware than they seem to be that that is the case. Of course the media is going to butcher the actual research, so as a scientist it is crucial when talking them to make sure approximations are reasonable and---when it comes to sensitive studies like this one---non-damaging. ("First, do no harm.") I'll come back to some of this in a minute, but first I want to tell a story (and then beyond my research interests, it will really become clear why I am blogging about this incident).
I saw a link to this study this morning when I was going through the daily digest from SOCnet listserv. Among the included e-mails was one from sociologist Barry Wellman that included a link to this newspaper article and some of his commentary. More commentary from others came in subsequent e-mails. Obviously, I have a great deal of interest in this topic based on my research, and instead of saving the article for later (as I often might, but in this case I was especially interested in the article), I opened it up immediately. Scanning through the text, I noticed a very familiar name:
Fuck! One of the students on this project is one of my former research students from Georgia Tech!. He worked with me on an abstract problem in dynamical systems---nothing to do with networks---but I am one of his educators, and my memories of him are of his being just a fundamentally nice kid (not a generic nice kid, but somehow nicer in some deeper sense). Also, educators have nightmares about seeing their students' names in situations like this. When he applies for jobs in a few years, people will google him, and this thing is what they're going to see, and he's probably going to have to explain himself.
For this kind of research, there are obviously very serious issues---especially when such personal information is involved---and any study of this nature necessarily has a lot of hurdles to pass to even be allowed to happen, and that includes ensuring that the data is anonymized. The article in the
Boston Globe indicates one of the ethical hurdle but not one that would be sufficient. (Based on an e-mail exchange with my former student, there were several beyond this, though I can't say if they are sufficient.) Also, the "validation" that was mentioned in the article is just frightening. My former student assures me that there is more in there, but things are exacerbated by the fact that the paper is not available publicly in any form. It is currently under review and isn't even around as a preprint. So the experts reading the
Boston Globe article can't even look it up to see what the researchers really did. Also, it seems like huge mistakes were made in the discussions with the media---I wonder if the people on the project actually used the term 'gaydar' in those discussions, and that is just stupid. How could one think that something like that wouldn't blow up in one's face? (And not that it's a particularly nice term in the first place.)
Anyway, conveying science to the media is a very difficult endeavor, but it needs to be done in way that doesn't result in articles like this. Obviously, the researchers didn't want this kind of article, and something like this could have been prevented---even with a sensitive subject (which is when you want to be especially careful). Work with your interviewers, prepare for your interviews by establishing both talking points and things not to say, try not to say asinine things, and insist very stubbornly to look at drafts of what might get published. Some of the science will necessarily get approximated; grin and bear it if the approximation is reasonable and kindly offer corrections if it isn't. And, especially, as it is with doctors: "First, do no harm."
If you want to see an excellent capsule of the scientist-journalist relationship, take a look at the
short sequence of PHD Comics that starts with this one. "Robots", indeed.