Saturday, June 29, 2013

Full Google Reader Backup

"Reader is dead," says Mihai Parparita, one of the former Google Reader engineers. You still have 2-3 days to use Google Reader, but the best thing you can do is to export your data.

Google Takeout lets you export some of your Reader data, but not everything: your subscriptions, your notes, starred items, shared items, liked items, the list of followers and the people you were following, the items shared by the people you were following. Mihai Parparita wrote some Python scripts that download everything from your Google Reader accounts, including the entire content of the posts from your subscriptions. You need Python 2.7, some basic command-line skills and a lot of free storage: my backup has more than 5 GB for about 250 feeds (vs 125 MB for the uncompressed Takeout backup).


Here are some tips for running the script in Windows 7/8:

- install Python from here (Python 2.7.5 Windows Installer)
- download Mihai's zip file and extract the files
- open the folder in Windows Explorer and you should see a list of subfolders like "base", "bin", "feed_archive".


- Shift + right-click below the folders and select "open command-line window here".
- copy this code, paste it in the command-line window and press Enter (I assumed that Python's folder is c:\python27):

set PYTHONPATH=%cd%
c:\python27\python reader_archive\reader_archive.py --output=download


- a web page will open in your favorite browser and you'll need to click "Accept", copy the authorization code and paste it in the command-line window.
- wait until the script downloads all the files.

Mihai also started to write a script that lets you browse your archive. It's a work in progress, probably because the script for downloading your data is more important right now.

There's also a script for downloading a feed's archive. "Google Reader has (for the most part) a copy of all blog posts and other feed items published since its launch in late 2005 (assuming that at least one Reader user subscribed to the feed). This makes it an invaluable resource for sites that disappear, can serve as a backup mechanism and enables tools to be created." My post from 2007 provides another way to download the history of a feed. You can also upload your OPML file to this site, which preserves hitorical feed data.

"I don't fault Google for providing only partial data via Takeout. Exporting all 612,599 read items in my account (and a few hundred thousand more from subscriptions, recommendations, etc.) results in almost 4 GB of data. Even if I'm in the 99th percentile for Reader users (I've got the badge to prove it), providing hundreds of megabytes of data per user would not be feasible. I'm actually happy that Takeout support happened at all, since my understanding is that it was all during 20% time," says Mihai Parparita, who spent 5 years working on Google Reader.

If you're curious to know which Reader alternative gets a thumb up from Mihai, his answer is "a toss-up between NewsBlur and Digg Reader."

No comments:

Post a Comment