Friday, February 11, 2011

Facebook100 Data Set

I am proud to finally be able to announce the public debut of the Facebook100 data set, which includes the complete set of people and friendships from the Facebook networks of 100 different colleges and universities from a single snapshot from September 2005. (It also has limited demographic data---in anonynmized form, obviously.)

My collaborators and I had released the Facebook5 data set (containing the complete networks for 5 institutions) a couple of years ago when we submitted our first paper on Facebook networks, and we were hoping to quickly finish the sequel paper and release the data for the other 95 institutions, but unfortunately the research and associated paper has taken about 2 years longer than we had anticipated. That's par for the course in science.

Anyway, the paper, called Social Structure of Facebook Networks, is finally on the arXiv preprint server. And that means we have finally released the data as well. Enjoy!

Update (3:36 pm): I have taken the data down temporarily to fix a bug. I hope to repost it very soon. As usual, how long it takes depends on how much time various teaching and admin duties take, etc.

Update (10:21 pm): I have fixed the bug, so the data is back up! What a way to spend a Friday night, but (as most of you know) I have a very severe case of OCD.

Update (2/12/11): I don't know for sure, but I wonder if the data set we released is the second largest social network data set to be released publicly (except for the Netflix data)? Anyway, with 100 parallel data sets that arose from ostensibly the same mechanism, this data will be great for testing new methods, etc., so I hope that its public availability will advance science fruitfully. And, ideally, it will also help us to learn more about the social structure of universities. (Our paper only scratches the surface of what one can do with this data.)

Update (2/16/11): I had a very good discussion with the Facebook Data Team last night. Per their request, I have taken down the data, and I will be working with them to eventually post a version of the data set with which both they and I are happy. I can't say exactly when this will occur, but I am very excited about working with them (both to have a good resolution to recent adventures and also on research projects themselves).

18 comments:

Sebastien Heymann said...

Thank you very much for the effort!

Do you have these files also available in graph file formats like .GML, .Graphml or .GEXF? Common graph visualization softwares can't handle .MAT files.

Mason said...

You're very welcome!

My coauthors and I are mathematicians, so Matlab is our weapon of choice. You can also load the files using octave (which is free). From there, you can save it in things like comma separated variables or otherwise convert the data to things that are used for graph visualization packages.

We wrote our own implementations of the usual visualization algorithms (and made some of our own tweaks as well) for our own visualizations. You can find that here.

At some point, we might prepare versions of the data in other formats, but I can't say when or if that will happen. (We are spread rather thin.) We do realize that the .mat format is much more common in our neck of the scientific world than in others.

Mason said...

Oh, right. You made one of the visualizations, so why on earth would you want to use ours. :)

Anyway, if you use either Octave or Matlab, then you can convert the data to your desired format.

Sebastien Heymann said...

Thanks for the quick reply!

I don't have Matlab so I try to find an alternative. I'll come back at you with converted files to make the bridge between our two worlds ;)

Mason said...

Quick replies are my specialty. :)

Just make sure you convert the corrected version of the data!

Though I should work on revising a paper right now before the birthday party starts... Sometimes I spend too much time responding quickly. :)

David said...

Is your dataset still available? Links to ~porterm/data/facebook100.zip fail to return anything. I'm doing research on novel user-centric analysis methods for social networks, and your dataset will be very useful!

Mason said...

David: As I indicated in the update of 16 February, the head of Facebook Data Team requested that I take down the data, so I did so. They and I will be working together to eventually put the data back up. (I also plan to collaborate with people at Oxford Internet Institute on this, as using this data set as a way to study making data sets secure is an important thing to do.) I agree the data will be very useful for many studies, so the plan is definitely to repost it once Facebook and I agree on a version that we're both happy to post. For now, I have returned to working on more mundane things---such as teaching, other projects, and interviewing prospective doctoral students---but I hope to touch base with Facebook again very soon (though I still can't say at all when the actual reposting will occur).

Karan Ghai said...

Any updates on when you will be able to repost the datasets?

Afshin said...

Hi Mason,
I have access to that dataset, I want to use it for a academic research on social networks, I will refere to your paper as provider of the data but I doubt if Facebook has violated any privacy regulation by that.IF it is not available publicly does it mean people who have downloaded it before can use it for their research?

Mason said...

Afshin: I'm sorry, but I am unable to parse what you wrote. I'll be happy to try to answer your question, but you'll first need to rephrase it or expand on it so that I understand it.

Afshin said...

:) to put it in one sentence:

Can I publish a paper related to social networks structure, in which I have used facebook100.zip ?

Mason said...

Yes, though you will not be able to do it in a venue that requires you to post the data online (because the data is not yours to post online and is now only available online via torrents and the like). The data is just as good for making scientific progress as it's always been, but posting it publicly via such venues unfortunately isn't in the cards.

As you indicated, just cite the published version of the paper, which appeared in Physica A in 2012, and also make the appropriate acknowledgements for the data.

Afshin said...

Thanks for the fast response.

Tom LaGatta said...

Hi Mason, any news on when the data will be available? This is a great resource, and I look forward to it being publicly available eventually.

Mason said...

No news. I haven't been in touch with Facebook for a while. I assume that it is possible to find it if one searches thoroughly enough.

Paul Sheridan said...

Hi Mason, any developments on the availability of the facebook100 dataset?

Mason said...

Facebook and I haven't discussed a way to officially post this data set for several years now, and the person with whom I discussed this no longer works there.

Anyway, the data set has been in the wild for several years.

farooq faisal said...

I need dataset from facebook for my research. if someone could help me by providing the dataset.