July 15, 2011

Powershell to get Google Reader starred items

2012-04-21: An updated version of this post is available to support the Google Reader JSON feed.

I use Google Reader to manage my RSS feeds. The mobile version works well over 3G in the U.K. and helps me to make use of my regular 50 minute train journey. In the office, I sometimes use the integration with Read It Later to cache articles on the iPhone for when 3G coverage is poor.

As a hoarder of information, my biggest problem is an ever-growing list of Reader “starred” items. The list has grown so long I need help to to organise and clean it up. Enter Powershell…

Pre-reqs

1) Make your starred items feed “public”
Go to “Reader settings” – currently via a “cog” icon at the right-side of the top banner on the any Reader page
Click on “Folders and Tags”
Click the RSS feed icon next to “Your starred items” to toggle the status to “public”

image

2) Get the shared feed URL for your starred items
In Reader Settings > Folders and Tags, click on the “view public page” link (next to “Your starred items”)
A new browser window will open showing the feed.
Copy the URL to the clipboard for use as a parameter to the Powershell script
Your URL should be similar to the following example :
http://www.google.com/reader/shared/user%2F13017751506166689204%2Fstate%2Fcom.google%2Fstarred

Script command Line


PS C: > .\Get-Starred.ps1 –URL “http://www.google.com/reader/shared/user%2F13017751506166689204%2Fstate%2Fcom.google%2Fstarred”

Output


The script will output objects with the following properties:
Title : The title of the posting
FeedSource : The name of the blog or feed
Href: The link to the posting
DatePosted: The date and time the article was posted or updated.

image

Notes

Google Reader limits the download to the most recent 2000 items. It should be possible to retrieve older items by removing the star status on the newer ones – something for the next version.

The script has the summary attribute included but commented out. My preference is to access the original source article via the “href” link.

Script


<#
.Synopsis
    Get items from a Google reader "starred items" feed.
    
.Description
    Lists the Title, Source, Link and Date of each item in a public Google Reader feed.
    Tested using a "starred items" feed.
    Note that Reader limts the feed to the most recent 2000 items.
    
.Parameter URL
     The full URL of the shared feed.
    Obtain the URL using the “view public page” link for the feed in Reader Settings > Folders and Tags.
    
.Example
    PS C: > .\Get-Starred.ps1 –URL “http://www.google.com/reader/shared/user%2F13017751506166689204%2Fstate%2Fcom.google%2Fstarred”

     This example will return up to  2000 entries from the feed, listing the Title,FeedSource,href and DatePosted of each.
     
.Example
   PS C: > .\Get-Starred.ps1 –URL “http://www.google.com/reader/shared/user%2F13017751506166689204%2Fstate%2Fcom.google%2Fstarred” |
               Export-Csv -Path .\StarredItems.csv -NoTypeInformation

     This example will return up to  2000 entries from the feed, exporting the Title,FeedSource,href and DatePosted to a comma-delimted file.
     
.Notes
     NAME:              Get-Starred.ps1
     AUTHOR:            TheInfraGuy
     VERSION:        1.0
     DEPENDENCIES:      A publicly shared Google Reader starred items feed
#>
[CmdletBinding()]
    param(
        [Parameter(Position=0,Mandatory=$true,ValueFromPipeline=$False,
        HelpMessage="Google reader starred item feed URL")]
        [string]$URL
    )
    BEGIN{}
    
    PROCESS{
        $webclient = New-Object system.net.webclient
        
        # Extract the userID from the URL
        if($URL.Contains("/shared/user")){
            $URL -match "^http://www.google.com/reader/shared/user%2F([0-9]+)%2F" | Out-null
            [string]$UserID = $matches[1]
            
        }elseif($URL.Contains("/public/")){
            $URL -match "^http://www.google.com/reader/public/javascript/user/([0-9]+)/state" | Out-null
            [string]$UserID = $matches[1]
            
        }else{
            Write-Error -Message "Invalid URL format" -Category InvalidArgument
            return
        }
        
        # Build the feed URL using the userID and a page size of 500
        [string]$RSS = "http://www.google.com/reader/public/atom/user/$UserID/state/com.google/starred?n=500"
        
        [xml]$Feed = $webclient.DownloadString($RSS)
        
        # XML element supplied by Google to control paging of items
        $ContinuationID = $Feed.feed.continuation
        
        [int]$PageCnt=1
        [int]$TotalItems = 0
        While($ContinuationID){
            Write-Verbose -Message "Page = $PageCnt"
            
            $Feed.feed.entry | Foreach-Object{
            
                $Title = ($_.title)."#text"
                $HREF = $_.Link | Where-Object{$_.type -eq "text/html"} | Foreach-Object{$_.href}
                $FeedSource = ($_.source.title)."#text"
                $DatePosted = $_.updated
                #$Summary = ($_.summary)."#Text"
                
                New-Object -TypeName PSObject -Property @{
                    Title = $Title
                    Href = $HREF
                    FeedSource = $FeedSource
                    DatePosted = $DatePosted
                    #Summary = $Summary            
                }
            }
            
            $TotalItems+=($Feed.feed.entry).Count
            
            # Continuation id used as a URL parameter to control paging
            $NextPage = "$($RSS)&c=$($ContinuationID)"
            Write-Verbose "Next Page = $NextPage"
            
            [xml]$Feed = $webclient.DownloadString($NextPage)
            $ContinuationID = $Feed.feed.continuation
            
            $PageCnt++
        }
        Write-Verbose -Message "Total items = $TotalItems"
    }
    
    END{}

No comments:

Post a Comment