Philly’s Own FSRDC Coming in April

Last week’s announcement that Philadelphia will host a Federal Statistical Research Data Center (FSRDC) in April of 2017 was exciting news. To be located in the Federal Reserve fed_bank_philadelphia-2e16d0ba-fill-735x490Bank of Philadelphia, a secured facility at Ten Independence Mall, qualified researchers will be granted access to confidential data at the facility.  The Center is a partnership of the University of Pennsylvania (lead by associate professor of economics, Iourii Manovskii), Penn State University, Drexel University, and the Federal Reserve Bank of Philadelphia. There are 24 of these centers already dotted around the country; six more are on the way, including Philadelphia.

I went to the information session at the Wharton School on Friday, December 2, to learn more about the contents of the archive as well as access procedures which are formal and take between four and twelve months to complete.

This network of data centers provide researchers with access to restricted data from the Census Bureau, the Agency for Healthcare Research and Quality (AHRQ), the National Center for Health Statistics (NCHS) and the Bureau of Labor Statistics(BLS).  Microdata from these four sources on individuals or businesses include detailed geographic identifiers to allow merging of city, county, or state information. Details on personal and institutional characteristics–place of birth,date of birth, occupation, income, firm or plant size–is also available. What’s more, most Census datasets can be cross-linked with other datasets, including external ones. Most of the microdata that will be available locally as of next April has been heretofore suppressed by the Census Bureau.  Manovskii believes this is a “big deal for us. Until now, such detail and high-quality US data was impossible to get.” (PennCurrent, December 1).

nawrokipsaTo access data researchers must submit a proposal after having contacted an RDC administrator.  It is important to get a clear idea of what is available and how it can meet expectations.  It’s also good to establish that the sought after data it’s not publically available somewhere else. After submitting the proposal there is a security clearance and an “SSS” (special sworn status) to obtain–all these steps take time so it is good to get the process started as soon as possible. Maximum project time once approved is five years.

For a complete list of available datasets at each of the four centers click here.

 

YouTube-8M

Everyone’s looking for large datasets these days and Google is here to help with its recent release of YouTube-8M which is comprised of 8 million videos tagged with over 4800 visual labels (I contenthaven’t looked but surely there are tags for that perennial genre of viral video involving inter-species animal friendships). Let the video analysis begin as this trove hosts over 500,000 viewing hours!  According to Google, all videos selected are public and have over over 1000 views.

content2There are large-scale image datasets out there (such as ImageNet) but this YouTube-8M is the fist of its kind for video.  The precursor to this newly minted dataset is Sports-1Mcontaining over a million video URLs tagged with 487 labels. (Sports-1M is actually included in Youtube-8M.) You can learn more about this new open access resource from the recent Google Research Blog announcement, or just dive right into the dataset itself here.

Speaking of YouTube research, check out these titles:

The Impact of YouTube on U. S. Politics by LaChrystal D. Ricke (Lexington Books, 2014).

Unruly media: YouTube, music video, and the new digital cinema, by Carol Vernallis (Oxford, 2013)

Out online: Trans Self-Representation and Community Building on YouTube, by  Tobias Raun (Routledge, 2016)

The YouTube Reader, edited by Pelle Snickars and Patrick Vondera (National Library of Sweden, 2009) 
Front Cover