Monday, August 11, 2014

Download all images from a web page with PowerShell

Web Scraping also known as Screen Scraping, Web Data Extraction, or Web Harvesting is a technique used to extract large amounts of data from one or multiple web sites.

Most websites don't offer the functionality to save the data from their site onto your computer. Typically the only option is to Right Click > Save As, which can become a very tedious task very quickly. Being able to scrap a site of its content could most certainly have it's uses, such as; perhaps you want to download Wikipidia(which I heard is only 14GBs with no pictures), or your really into something like PowerShell, you could search Google for all images with powershell in the name and then download them to you computer.[Next Upcoming Post]

In the below function I scrape my web sites homepage for all of it's images, this means that my computer will do a search of my homepage [http://www.matthewkerfooot.com] for all image files and then download them to my local machine.

001
002
003
004
$Url = "http://www.TheOvernightAdmin.com"
$iwr = Invoke-WebRequest -Uri $Url
$images = ($iwr).Images | select src
$images

Output:

PS C:\> $images
src                              
---
http://img1.blogblog.com/img/icon18_wrench_allbkg.png
http://img1.blogblog.com/img/icon18_wrench_allbkg.png
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2-MeZf9LB70psOUNXQR5bZdkDZCCgfohOI25DusGLDSQqVDFubnV52a-mdwqgCSa9FkvHOUh9J_UYk52AUGP9G7ciSrV-4YxSH8DVeZ2RRSn5bu8CRljguqkfBXmrG18c6bLcJKlgFJI/s1600/3spaces.JPG                                
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2-MeZf9LB70psOUNXQR5bZdkDZCCgfohOI25DusGLDSQqVDFubnV52a-mdwqgCSa9FkvHOUh9J_UYk52AUGP9G7ciSrV-4YxSH8DVeZ2RRSn5bu8CRljguqkfBXmrG18c6bLcJKlgFJI/s1600/3spaces.JPG                                
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEih5-DOrMYIyvCdHe5St3MIAPgjoguh4YYoPK8VqFiL3N1wRAH7VBv_KzS2suq7HV8sAtnLnvEsBUeqNVEKAMKRCUzYVvRj9SPjyiAuhRVQ_bUNweRhyphenhyphenh5DuIjoA35A2_zKLfuMkvpy8BQ/s1600/computerlists.JPG                           
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAEjZ1mUkKD2oFHcE1zOlFIwvbx4JQn8FZYJqgemAKJeB1oU09vFRle1VrMg19wCLdVeWgx_nUxMrnWkEc8Ued4EdhxXh9erASTtXzgmJhzp-mMMLDW8QWBCWZHnmGzA9fC30jtWGXwQg/s1600/finalproduct.JPG                            
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhsjHq0MYuYo_iR-ztwv9_k9Jrz7uDj9TJ-SOmOkKEsynQvBYoGNJpC_BZtVTdorBG2SJdi3SfnB5Z-1lVVxtcS6GFTBXsKNMr3uUzdePb1OENcUkpngnx2f1KP2PuhWeeE7Pt4mA3rxmc/s1600/sysadminbeer.png                                                                                                             
Continued...                                                                       
PS C:\> 

The Invoke-WebRequest cmdlet used above will filter my webpage for all image files via select src . Which would give us a list of all of the image paths that we will be downloading later.

The only difference from web scraping and web browsing is that when you are scraping it is usually automated and you are also saving the data, not just viewing it.


Here is the full function

001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
043
044
045
046
function Get-WebPageImages
{
<#
.CREATED BY:
    Matt Kerfoot
.CREATED ON:
    08/11/2014
.Synopsis
   Downloads all available images from the specified $URL (A mandatory Variable)
.DESCRIPTION
   This funciton will download all images from a specific web page and save them to your desktop by default.
   Requires PSv3+
.EXAMPLE
   PS C:\> Get-WebPageImages -Url http://www.matthewkerfoot.com -outputpath c:\
#>

                                  [CmdletBinding()]
                          Param ( [Parameter(Mandatory=$false,
                                  ValueFromPipelineByPropertyName=$true,
                                  Position=0)]
                                  $Url = "http://www.TheOvernightAdmin.com",
                                  $OutputPath = "$env:USERPROFILE\Desktop\"
                          )

                 Begin {
        
                            $iwr = Invoke-WebRequest -Uri $Url
                            $images = ($iwr).Images | select src

                 }

       Process {

                    $wc = New-Object System.Net.WebClient
                    $images | foreach { $wc.DownloadFile( $_.src, ("$OutputPath\"+[io.path]::GetFileName($_.src) ) ) }
       }

 End {

              Write-Host "Downloading all images from $Url to $OutputPath"

 }

}

Get-WebPageImages






Saturday, August 9, 2014

Create a list of Computer Names with no spaces with PowerShell

A few months back a co-worker of mine shot an email my way asking what the best way to create a list of all of the servers in the domain so that he could run a script against them way.

If you haven't ever done this before it can be a little trickier than you might originally think. The best way I've found to do this is with a function. A function allows you to name a block of code. Once defined you can then call that function block anywhere in a script or just at the console. Personally functions are one of my favorite things about PowerShell, well at least its up their with PS-Remoting, Workflows, and Desired State Configuration. Functions have 6 main parts; The Functions Name, a help file, param block, and a begin, Process, and End Block [Shown Below].

001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
function Get-ComputerList {              # The Funtions Name

<#...#>                                  #Help File

                      [CmdletBinding()]
              Param (                    #Param Block
              )

          Begin {                        #Begin Block
          }

     Process {                           #Process Block
     }

  End {                                  #End Block
  }                   
}

Now lets take a look at what I've chosen to place into the Param, Begin, Process, and End blocks.


Param

First off I always like to define the location of the text file in the parameter block but this comes down to personal preference.

001
002
003
004
005
Param ( [Parameter(Mandatory=$false,
                   ValueFromPipelineByPropertyName=$true,
                   Position=0)]
                   $OutputPath = "$env:USERPROFILE\Desktop\ComputerList.txt"
)

I made $OutputPath above equal "$env:USERPROFILE\Desktop\ComputerList.txt" which is where the newly created computer list will be Out-File'd' too.


Begin


The Begin Block -- This is where we I've chosen to gather the computer names and output them to the $OutputPath defined above in the Param block. If we were to run only the cmdlets in the Param block and the Begin block we would have a text file on our Desktop called ComputerList.txt.

001
002
003
004
Begin {
       # requires PSv3.0+ -- retrieves all computer and server OS names
       Get-ADComputer -Filter * -Properties * | Select-Object -ExpandProperty name | Out-File $OutputPath
}

Another commonly used parameter is the -Filter parameter which I've used to only retrieve computer names whose OperatingSystem field have *Server* in the name. From there the generated computer list is piped to Select-Object -ExpandProperty name. A key thing to note, the -ExpandProperty parameter will give us a list of computers ready to run commands against. If we didn't use the -ExpandProperty  parameter the output would contain header information as shown on the right. There is a way to make it work without using the -ExpandProperty parameter but I'll talk about that in the next section. After the list is generated and filtered everything is Out-File'd' to $OutputPath also known as $env:USERPROFILE\Desktop\ComputerList.txt.

001
002
003
004
Begin {
       # requires PSv3.0+ -- retrieves all server OS names
       Get-ADComputer -Filter 'OperatingSystem -like "*Server*"' | Select-Object -ExpandProperty name | Out-File $OutputPath
}


Process

There are multiple different ways to extract or remove white spaces from a document.

My favorite being:

001
(Get-Content $OutputPath).replace(" ","")


Here is another way to remove the white spaces from a file.

001
(get-content $ComputerList-replace "\s+", ""


Okay so lets say we didn't use -ExpandProperty in the Begin block, If that were the case we could get around the problem of the output containing the header information by using the below line of code.  


001
(Get-Content $OutputPath  | Select-Object -Skip 3).replace(" ","")



The -Skip 3 will successfully -Skip the first three lines of the output which would need to be done before we could successfully run a script or function against it.


The process block in its entirety.

001
002
003
004
005
Process {
         # removes empty spaces from txt document
         # Alternate Way --> (get-content $ComputerList) -replace '\s+', ''
         $ComputerList = (Get-Content $OutputPath).replace(" ","")
}



End

The end block should be where you wrap up the function, this is where all Out-File's' will output their final products.This is where any formatting would be if you must place some in your script. However it is best practice not to do any formatting to the output as the next user who runs this script might want all the data to be displayed differently and this will allow he\she to simply format the data if they feel the need.

001
002
003
004
005
006
  End {
       $ComputerList | Out-File $OutputPath
       write-verbose "The Computer list has been saved to $OutputPath"
       write-verbose "Opening file $OutputPath at this time"
       Invoke-Item $OutputPath
  }

The full function called Get-ComputerList can be downloaded at the Microsoft Script Repository.


Wednesday, August 6, 2014

Which Hyper-V Host is hosting this VM

Last night I was working on resolving a Veeam backup failure and needed to know which Hyper-V server was hosting the VM I was currently logged onto. As I do with everything server related alert I thought to myself -- hmm how can I retrieve this with PowerShell. The answer is simple, you just need to know where to look and in this case it was in the registry and I guarantee that running the below code in an Administrative PowerShell prompt is much more efficient than opening regedit and navigating to "HKLM:\SOFTWARE\Microsoft\Virtual Machine\Guest\Parameters". But to each their own, that is how some may prefer to gather their information...

Oh and one more thing; If the hostname returned is a mix of letters and numbers about 10 characters long its hosted in Azure.

001
002
003
004
005
006
007
008
009
010
011
# Retieves $env:COMPUTERNAME's Hyper-V Host Server Name

Function Get-VMHostname

{

     (Get-Item "HKLM:\SOFTWARE\Microsoft\Virtual Machine\Guest\Parameters").GetValue("HostName")

}

Get-VMHostname