April 21, 2012

Powershell v3 Google Reader Starred Items

A previous post provided a Powershell script to get the RSS feed of a public Google Reader starred items feed. Google have since updated Reader and now provide a JSON stream.

The script below is an updated version which requires Powershell 3 (currently in Beta) and allows feeds to to be saved to disk on an on-going basis.

The script takes the user’s Reader numeric userID as a parameter and accesses the JSON feed to get properties such as the item’s Title, Author, Summary, URL, Blog Name etc.

The following example shows the objects output to the pipeline when not saving to a file.

PS C:\> .\Get-Starred.ps1 -user "123456789123" | Select -First 2

Title       : vSphere 5.0 Hardening Guide public draft available
Summary     :  
WebSiteName : Yellow Bricks
WebSiteURL  : http://www.yellow-bricks.com
FeedURL     : http://feeds.feedburner.com/YellowBricks
ItemURL     : http://feedproxy.google.com/~r/YellowBricks/~3/e29m6-                 2UApQ/
Published   : 18/04/2012 20:26:53
Updated     : 18/04/2012 20:26:53
Categories  : Server;security;vSphere;whitepaper
Author      : Duncan Epping


Title       : Welcome to System Center 2012 Configuration Manager
Summary     : Configuration Manager 2012 is finally here! We are
              having a great time at MMS ...
WebSiteName : Microsoft Server and Cloud Platform Blog
WebSiteURL  : http://blogs.technet.com/b/server-cloud/
FeedURL     : http://blogs.technet.com/b/server-cloud/rss.aspx
ItemURL     : http://blogs.technet.com/b/server
              -cloud/archive/2012/04/18/welcome-to-system-center-2012
              -configuration-manager.aspx
Published   : 18/04/2012 22:36:00
Updated     : 18/04/2012 22:36:00
Categories  : System Center;System Center 2012;Configuration Manager
              2012
Author      : Microsoft Server and Cloud Platform Team


Saving starred items to a file

The script supports serialising items to an XML file. The –path parameter can be used to store a full list of items over time. Only unique items are saved so the script can be be run repeatedly, exporting to the same file, and only new items will be added.

Note that Reader has an export limit of 2000 starred items (even with pagination), so the script will only ever return the 2000 newest.


PS C:\> .\Get-Starred.ps1 -path c:\temp\AllItems.xml

Completed export to file 'c:\temp\AllItems.xml'

Items stored in the XML file can be re-hydrated into objects using the built-in Powershell “Import-Clixml” cmdlet.

PS C:\> Import-Clixml -path c:\temp\AllItems.xml | Select -First 1

Title       : How Microsoft IT Developed a Private Cloud
              Infrastructure
Summary     : Microsoft moved to the private cloud by planning,
              architecting, developing, and managing the first
              private cloud solution within the Microsoft data 
              center. The combination of shared resources, flexible
              capacity, charge back, and an overall self-service ...
WebSiteName : 
WebSiteURL  : http://www.microsoft.com/events/podcasts/default.aspx
              ?podcast=rss&audience=Audience-b046181f-3333-4c19-977e
              -c230ed48d9c0&pageId=x40
FeedURL     : http://www.microsoft.com/events/podcasts/default.aspx
              ?podcast=rss&audience=Audience-b046181f-3333-4c19-977e
              -c230ed48d9c0&pageId=x40
ItemURL     : http://dlbmodigital.microsoft.com/audio/23209.wma
Published   : 19/04/2012 18:12:59
Updated     : 19/04/2012 18:12:59
Categories  : 
Author      : 

Script Get-Starred.ps1

#requires -version 3

<#
        .SYNOPSIS
  Function to get the items from a public Google Reader starred item list

 .DESCRIPTION
  Uses Invoke-Webrequest and ConvertFROM-JSON to access the feed items.

 .PARAMETER  User
  A unique number representing an individual google reader account

 .PARAMETER  Path
  An optional path to an output XML file.
  If the path is specified, output is sent to the file only, 
  unless the -Passthru switch is also used.
  
  If the file already exists, only new items are added to it.
  
 .PARAMETER  Passthru
  Used in conjunction with the Path option, this causes the
  PSObjects to be sent to the pipeline as well as the xml file.
  
 .EXAMPLE
  PS C:\> .\Get-Starred.ps1 -user "12114651402167687217"
  
  
  Title       : Microsoft Volume Licensing Reference Guide
  Summary     : A comprehensive review of the Microsoft Volume Licensing programs,
                including Microsoft product licensing models and Microsoft Software
                Assurance.
  WebSiteName : Microsoft Download Center
  WebSiteURL  : http://www.microsoft.com/downloads/
  FeedURL     : feed/http://www.thundermain.com/rss/
  ItemURL     : http://feedproxy.google.com/~r/MicrosoftDownloadCenter/~3/nKKswfBdmuk/

  Title       : Language Support for Asynchronous Programming
  Summary     : Asynchronous programming is what the doctor usually orders for unresponsive
                client apps and for services with thread-scaling issues. This usually means
                a bleak departure from the imperative programming constructs we know and
                love into a spaghetti ...
  WebSiteName : Channel 9
  WebSiteURL  : http://channel9.msdn.com/
  FeedURL     : feed/http://channel9.msdn.com/Feeds/RSS/
  ItemURL     : http://channel9.msdn.com/Events/Lang-NEXT/Lang-NEXT-2012/Language-Support-for-Asynchronous-Programming
  
  (...)
 .NOTES
  Version 2.0
 .LINK
  http://www.google.com/reader
  
#>

[CmdletBinding()]
param(
  [Parameter(Position=0)]
  [ValidateNotNullOrEmpty()]
  [String]$user="<numeric user id>"
  ,
  [parameter(position=1)]
  [ValidateNotNullOrEmpty()]
  [string]$path
  ,
  [parameter(position=2)]
  [switch]$Passthru
 )
BEGIN{

 #Region -----Support Functions-----
 
 
 Function Get-ReaderJSON{
  <# 
   .SYNOPSIS
    Function to get the JSON feed from a public Google Reader starred item list
   .PARAMETER  userID
    A unique number representing an individual google reader account
   .PARAMETER  continuationID
    A string returned from a request to indicate additional pages of items are available
   .OUTPUTS
    PSObject
  #>
  param(
   [parameter(position=0)]
   [string]$userID
   ,
   [parameter(position=1)]
   [string]$continuationID
  )
  $URL = "http://www.google.com/reader/public/javascript/user/{0}/state/com.google/starred?n=1000&c={1}" `
   -f $userID,$continuationID
   
  Write-Verbose "Getting URL:: $URL"
  
  try{
   $wr = Invoke-WebRequest -Method Get -Uri $URL -verbose:$False
   $wr.Content | ConvertFrom-Json
   
  }catch [InvalidOperationException] {
   Write-Warning "Request for URL failed. Check the user ID and retry `n URL :: $URL"
  
  }catch{
   $_.Exception
   Write-Warning "Unexpected error requesting URL `n $URL `n Error $_"
  
  }
 }
 
 Function Export-Feed{
  <# 
   .SYNOPSIS
    Function to export unique feed items to an XML file
   . DESCRIPTION
    Serialises PSObject feed items from Get-Starred to an XML file
    Only unique items are saved, duplicates are filtered out.
   .PARAMETER  feeditem
    A PSObject represeting an individual RSS feed item
   .PARAMETER  path
    The path to the XML output file.
    If the file does not exist, it is created.
    
    If it already exists, only new items are added.
   .OUTPUTS
    XML file
  #>
  [CmdletBinding()]
  param(
   [parameter(position=0,valuefrompipeline=$true)]
   [ValidateNotNullOrEmpty()]
   [psobject[]]$feeditem
   ,
   [parameter(position=1)]
   [ValidateNotNullOrEmpty()]
   [string]$path="$Env:TEMP\ReaderStarred.xml"
   ,
   [parameter(position=2)]
   [ValidateNotNullOrEmpty()]
   [string]$BackupExt="bkup"
   ,
   [parameter(position=3)]
   [ValidateNotNullOrEmpty()]
   [int]$HistoryCount=7   
  )
  BEGIN{
   try{
                if(Test-Path $path){
        [Object[]]$ExistingItems=@(Import-Clixml -Path $path -ErrorAction Stop)
                    
                    Write-Verbose "Backing-up file '$path'"

                    $dt = Get-Date -Format "yyyyMMddhhmmss"
                    Copy-Item -Path $path -Destination "$Path-$dt.$BackupExt" -Force
                }
    
   }catch{
    Write-Verbose "Existing file not found. Creating a new file."

   }
            
            if(-not $ExistingItems){
                [Object[]]$ExistingItems=@()
            }

   
   Write-Verbose "Cleaning-up backup files"
   
   $Folder = Split-Path $path -Parent
   if(Test-Path $Folder -PathType Container){
    
    $BackupFiles = Get-ChildItem $Folder -Force -Filter "*.$BackupExt"
    if($BackupFiles -and $BackupFiles.Count -gt $HistoryCount){
    
     Write-Verbose " > Number of existing .$BackupExt Files $($BackupFiles.Count)"
     
     Try{
       $FilesToDelete = $BackupFiles | Sort-Object CreationTime -Descending |
       Select-Object -Last $($BackupFiles.Count - $HistoryCount)
                        
                        Write-Verbose "Deleting `n $FilesToDelete"      
                        $FilesToDelete | Remove-Item -Force

     }catch{
      Write-Verbose "Error removing backup file `n $_"
     }
    
    }else{
     Write-Verbose " > No .$BackupExt files deleted."
    }
    
   
   }else{
    Write-Warning "Unable to continue. Error exporting feeds. `n Folder '$Folder' does not exist."
    break
   }
  
  }
  
  PROCESS{
   
   Foreach($Item in $feeditem){
   
    $ExistingItems+=$Item
   }
  }
  
  END{
  
   $ExistingItems = $ExistingItems | Select-Object * -Unique | Sort Published -Desc
   Try{
    Write-Verbose "Exporting items to '$path'"
    
    $ExistingItems | Export-Clixml -Path $path -Force -ErrorAction Stop
    
    Write-Host "Completed export to file '$path'"
    
   }Catch{
    Write-Warning "Error: Failed to save updated file '$path'. `n Error : $_"
   }
  }
 
 }
 
 #Endregion
 
 # used in verbose output
 $ItemCount = 0
 
 # used if path specified
 $AllItems = @()

}

PROCESS{

 $page = 1
 Write-Verbose "-------- Page $Page ---------"
 
 $JSONStream = Get-ReaderJSON -userid $user
 if($JSONStream){
 
  # Reader uses the continuation string to page items
  $continuationID = $JSONStream.continuation
  
  Write-Verbose "Continuation :: $($JSONStream.continuation)"
  
  While($continuationID){
   
   $JSONStream | Select -ExpandProperty Items | ForEach-Object{
   
    
    $Item = [pscustomobject]@{
     Title   = $_.Title
     Summary  = $_.summary
     WebSiteName = $_.origin.title
     WebSiteURL  = $_.origin.htmlurl
     FeedURL  = $_.origin.Streamid -replace "^feed\/",""
     ItemURL  = $_.alternate.href
     Published  = `
      [timezone]::CurrentTimeZone.ToLocalTime(([datetime]'01/01/1970 00:00:00').AddSeconds($_.published))    
     Updated  = `
      [timezone]::CurrentTimeZone.ToLocalTime(([datetime]'01/01/1970 00:00:00').AddSeconds($_.updated))
     Categories = $_.categories -join ";"
     Author  = $_.author
    }
    if($path){
    
     $AllItems += $Item
     
     if($Passthru){
      $Item
     }
     
    }else{
     $Item
    }
    
    $ItemCount++
   }
   
   if($continuationID){
    # Get the next page
    $page++
    Write-Verbose "-------- Page $Page ---------"
    
    $JSONStream = Get-ReaderJSON -userid $user -continuationID $continuationID
    
    $continuationID = $JSONStream.continuation
    Write-Verbose "Continuation :: $continuationID"
   }
  }
 }else{
  Write-Verbose "No results."
 
 }
}
END{
 if($path -and $AllItems){
 
  $AllItems | Export-Feed -path $path
 
 }

 Write-Verbose "'$Page' pages retrived, containing '$ItemCount' items."
}




No comments:

Post a Comment