Most websites don't offer the functionality to save the data from their site onto your computer. Typically the only option is to Right Click > Save As, which can become a very tedious task very quickly. Being able to scrap a site of its content could most certainly have it's uses, such as; perhaps you want to download Wikipidia(which I heard is only 14GBs with no pictures), or your really into something like PowerShell, you could search Google for all images with powershell in the name and then download them to you computer.[Next Upcoming Post]
In the below function I scrape my web sites homepage for all of it's images, this means that my computer will do a search of my homepage [http://www.matthewkerfooot.com] for all image files and then download them to my local machine.
001
002 003 004 |
$Url = "http://www.TheOvernightAdmin.com"
$iwr = Invoke-WebRequest -Uri $Url $images = ($iwr).Images | select src $images |
Output:
PS C:\> $images src --- http://img1.blogblog.com/img/icon18_wrench_allbkg.png http://img1.blogblog.com/img/icon18_wrench_allbkg.png https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2-MeZf9LB70psOUNXQR5bZdkDZCCgfohOI25DusGLDSQqVDFubnV52a-mdwqgCSa9FkvHOUh9J_UYk52AUGP9G7ciSrV-4YxSH8DVeZ2RRSn5bu8CRljguqkfBXmrG18c6bLcJKlgFJI/s1600/3spaces.JPG https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2-MeZf9LB70psOUNXQR5bZdkDZCCgfohOI25DusGLDSQqVDFubnV52a-mdwqgCSa9FkvHOUh9J_UYk52AUGP9G7ciSrV-4YxSH8DVeZ2RRSn5bu8CRljguqkfBXmrG18c6bLcJKlgFJI/s1600/3spaces.JPG https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEih5-DOrMYIyvCdHe5St3MIAPgjoguh4YYoPK8VqFiL3N1wRAH7VBv_KzS2suq7HV8sAtnLnvEsBUeqNVEKAMKRCUzYVvRj9SPjyiAuhRVQ_bUNweRhyphenhyphenh5DuIjoA35A2_zKLfuMkvpy8BQ/s1600/computerlists.JPG https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAEjZ1mUkKD2oFHcE1zOlFIwvbx4JQn8FZYJqgemAKJeB1oU09vFRle1VrMg19wCLdVeWgx_nUxMrnWkEc8Ued4EdhxXh9erASTtXzgmJhzp-mMMLDW8QWBCWZHnmGzA9fC30jtWGXwQg/s1600/finalproduct.JPG https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhsjHq0MYuYo_iR-ztwv9_k9Jrz7uDj9TJ-SOmOkKEsynQvBYoGNJpC_BZtVTdorBG2SJdi3SfnB5Z-1lVVxtcS6GFTBXsKNMr3uUzdePb1OENcUkpngnx2f1KP2PuhWeeE7Pt4mA3rxmc/s1600/sysadminbeer.png Continued... PS C:\>
The Invoke-WebRequest cmdlet used above will filter my webpage for all image files via select src . Which would give us a list of all of the image paths that we will be downloading later.
The only difference from web scraping and web browsing is that when you are scraping it is usually automated and you are also saving the data, not just viewing it.
Here is the full function
001
002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 |
function Get-WebPageImages
{ <# .CREATED BY: Matt Kerfoot .CREATED ON: 08/11/2014 .Synopsis Downloads all available images from the specified $URL (A mandatory Variable) .DESCRIPTION This funciton will download all images from a specific web page and save them to your desktop by default. Requires PSv3+ .EXAMPLE PS C:\> Get-WebPageImages -Url http://www.matthewkerfoot.com -outputpath c:\ #> [CmdletBinding()] Param ( [Parameter(Mandatory=$false, ValueFromPipelineByPropertyName=$true, Position=0)] $Url = "http://www.TheOvernightAdmin.com", $OutputPath = "$env:USERPROFILE\Desktop\" ) Begin { $iwr = Invoke-WebRequest -Uri $Url $images = ($iwr).Images | select src } Process { $wc = New-Object System.Net.WebClient $images | foreach { $wc.DownloadFile( $_.src, ("$OutputPath\"+[io.path]::GetFileName($_.src) ) ) } } End { Write-Host "Downloading all images from $Url to $OutputPath" } } Get-WebPageImages |