The Overnight Admin

Monday, August 11, 2014

Download all images from a web page with PowerShell

Web Scraping also known as Screen Scraping, Web Data Extraction, or Web Harvesting is a technique used to extract large amounts of data from one or multiple web sites.

Most websites don't offer the functionality to save the data from their site onto your computer. Typically the only option is to Right Click > Save As, which can become a very tedious task very quickly. Being able to scrap a site of its content could most certainly have it's uses, such as; perhaps you want to download Wikipidia(which I heard is only 14GBs with no pictures), or your really into something like PowerShell, you could search Google for all images with powershell in the name and then download them to you computer.[Next Upcoming Post]

In the below function I scrape my web sites homepage for all of it's images, this means that my computer will do a search of my homepage [http://www.matthewkerfooot.com] for all image files and then download them to my local machine.

      
001

002

003

004
      
$Url = "http://www.TheOvernightAdmin.com"

$iwr = Invoke-WebRequest -Uri $Url

$images = ($iwr).Images | select src

$images

Output:

PS C:\> $images
src                              
---
http://img1.blogblog.com/img/icon18_wrench_allbkg.png
http://img1.blogblog.com/img/icon18_wrench_allbkg.png
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2-MeZf9LB70psOUNXQR5bZdkDZCCgfohOI25DusGLDSQqVDFubnV52a-mdwqgCSa9FkvHOUh9J_UYk52AUGP9G7ciSrV-4YxSH8DVeZ2RRSn5bu8CRljguqkfBXmrG18c6bLcJKlgFJI/s1600/3spaces.JPG                                
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2-MeZf9LB70psOUNXQR5bZdkDZCCgfohOI25DusGLDSQqVDFubnV52a-mdwqgCSa9FkvHOUh9J_UYk52AUGP9G7ciSrV-4YxSH8DVeZ2RRSn5bu8CRljguqkfBXmrG18c6bLcJKlgFJI/s1600/3spaces.JPG                                
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEih5-DOrMYIyvCdHe5St3MIAPgjoguh4YYoPK8VqFiL3N1wRAH7VBv_KzS2suq7HV8sAtnLnvEsBUeqNVEKAMKRCUzYVvRj9SPjyiAuhRVQ_bUNweRhyphenhyphenh5DuIjoA35A2_zKLfuMkvpy8BQ/s1600/computerlists.JPG                           
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAEjZ1mUkKD2oFHcE1zOlFIwvbx4JQn8FZYJqgemAKJeB1oU09vFRle1VrMg19wCLdVeWgx_nUxMrnWkEc8Ued4EdhxXh9erASTtXzgmJhzp-mMMLDW8QWBCWZHnmGzA9fC30jtWGXwQg/s1600/finalproduct.JPG                            
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhsjHq0MYuYo_iR-ztwv9_k9Jrz7uDj9TJ-SOmOkKEsynQvBYoGNJpC_BZtVTdorBG2SJdi3SfnB5Z-1lVVxtcS6GFTBXsKNMr3uUzdePb1OENcUkpngnx2f1KP2PuhWeeE7Pt4mA3rxmc/s1600/sysadminbeer.png                                                                                                             
Continued...                                                                       
PS C:\>

The Invoke-WebRequest cmdlet used above will filter my webpage for all image files via select src . Which would give us a list of all of the image paths that we will be downloading later.

The only difference from web scraping and web browsing is that when you are scraping it is usually automated and you are also saving the data, not just viewing it.

Here is the full function

      
001

002

003

004

005

006

007

008

009

010

011

012

013

014

015

016

017

018

019

020

021

022

023

024

025

026

027

028

029

030

031

032

033

034

035

036

037

038

039

040

041

042

043

044

045

046
      
function Get-WebPageImages

{

<# 

.CREATED BY: 

    Matt Kerfoot 

.CREATED ON: 

    08/11/2014 

.Synopsis 

   Downloads all available images from the specified $URL (A mandatory Variable)

.DESCRIPTION 

   This funciton will download all images from a specific web page and save them to your desktop by default.

   Requires PSv3+ 

.EXAMPLE 

   PS C:\> Get-WebPageImages -Url http://www.matthewkerfoot.com -outputpath c:\

#>

                                  [CmdletBinding()]

                          Param ( [Parameter(Mandatory=$false,

                                  ValueFromPipelineByPropertyName=$true,

                                  Position=0)]

                                  $Url = "http://www.TheOvernightAdmin.com",

                                  $OutputPath = "$env:USERPROFILE\Desktop\"

                          )

                 Begin {

                            $iwr = Invoke-WebRequest -Uri $Url

                            $images = ($iwr).Images | select src

                 }

       Process {

                    $wc = New-Object System.Net.WebClient

                    $images | foreach { $wc.DownloadFile( $_.src, ("$OutputPath\"+[io.path]::GetFileName($_.src) ) ) }

       }

 End {

              Write-Host "Downloading all images from $Url to $OutputPath"

 }

}

Get-WebPageImages

Saturday, August 9, 2014

Create a list of Computer Names with no spaces with PowerShell

A few months back a co-worker of mine shot an email my way asking what the best way to create a list of all of the servers in the domain so that he could run a script against them way.

If you haven't ever done this before it can be a little trickier than you might originally think. The best way I've found to do this is with a function. A function allows you to name a block of code. Once defined you can then call that function block anywhere in a script or just at the console. Personally functions are one of my favorite things about PowerShell, well at least its up their with PS-Remoting, Workflows, and Desired State Configuration. Functions have 6 main parts; The Functions Name, a help file, param block, and a begin, Process, and End Block [Shown Below].

      
001

002

003

004

005

006

007

008

009

010

011

012

013

014

015

016

017
      
function Get-ComputerList {              # The Funtions Name

<#...#>                                  #Help File

                      [CmdletBinding()]

              Param (                    #Param Block

              )

          Begin {                        #Begin Block

          }

     Process {                           #Process Block

     }

  End {                                  #End Block

  }                   

}

Now lets take a look at what I've chosen to place into the Param, Begin, Process, and End blocks.

Param

First off I always like to define the location of the text file in the parameter block but this comes down to personal preference.

      
001

002

003

004

005
      
Param ( [Parameter(Mandatory=$false,

                   ValueFromPipelineByPropertyName=$true,

                   Position=0)]

                   $OutputPath = "$env:USERPROFILE\Desktop\ComputerList.txt"

)

I made $OutputPath above equal "$env:USERPROFILE\Desktop\ComputerList.txt" which is where the newly created computer list will be Out-File'd' too.

Begin

The Begin Block -- This is where we I've chosen to gather the computer names and output them to the $OutputPath defined above in the Param block. If we were to run only the cmdlets in the Param block and the Begin block we would have a text file on our Desktop called ComputerList.txt.

      
001

002

003

004
      
Begin {

       # requires PSv3.0+ -- retrieves all computer and server OS names

       Get-ADComputer -Filter * -Properties * | Select-Object -ExpandProperty name | Out-File $OutputPath

}

Another commonly used parameter is the -Filter parameter which I've used to only retrieve computer names whose OperatingSystem field have *Server* in the name. From there the generated computer list is piped to Select-Object -ExpandProperty name. A key thing to note, the -ExpandProperty parameter will give us a list of computers ready to run commands against. If we didn't use the -ExpandProperty parameter the output would contain header information as shown on the right. There is a way to make it work without using the -ExpandProperty parameter but I'll talk about that in the next section. After the list is generated and filtered everything is Out-File'd' to $OutputPath also known as $env:USERPROFILE\Desktop\ComputerList.txt.

      
001

002

003

004
      
Begin {

       # requires PSv3.0+ -- retrieves all server OS names
       Get-ADComputer -Filter 'OperatingSystem -like "*Server*"' | Select-Object -ExpandProperty name | Out-File $OutputPath

}

Process

There are multiple different ways to extract or remove white spaces from a document.

My favorite being:

      
001
      
(Get-Content $OutputPath).replace(" ","")

Here is another way to remove the white spaces from a file.

      
001
      
(get-content $ComputerList) -replace "\s+", ""

Okay so lets say we didn't use -ExpandProperty in the Begin block, If that were the case we could get around the problem of the output containing the header information by using the below line of code.

001

(Get-Content $OutputPath  | Select-Object -Skip 3).replace(" ","")

The -Skip 3 will successfully -Skip the first three lines of the output which would need to be done before we could successfully run a script or function against it.

The process block in its entirety.

001
002
003
004
005

Process {
         # removes empty spaces from txt document
         # Alternate Way --> (get-content $ComputerList) -replace '\s+', ''

         $ComputerList = (Get-Content $OutputPath).replace(" ","")
}

End

The end block should be where you wrap up the function, this is where all Out-File's' will output their final products.This is where any formatting would be if you must place some in your script. However it is best practice not to do any formatting to the output as the next user who runs this script might want all the data to be displayed differently and this will allow he\she to simply format the data if they feel the need.

      
001

002

003

004

005

006
      
  End {

       $ComputerList | Out-File $OutputPath

       write-verbose "The Computer list has been saved to $OutputPath"

       write-verbose "Opening file $OutputPath at this time"

       Invoke-Item $OutputPath

  }

The full function called Get-ComputerList can be downloaded at the Microsoft Script Repository.

Wednesday, August 6, 2014

Which Hyper-V Host is hosting this VM

Last night I was working on resolving a Veeam backup failure and needed to know which Hyper-V server was hosting the VM I was currently logged onto. As I do with everything server related alert I thought to myself -- hmm how can I retrieve this with PowerShell. The answer is simple, you just need to know where to look and in this case it was in the registry and I guarantee that running the below code in an Administrative PowerShell prompt is much more efficient than opening regedit and navigating to "HKLM:\SOFTWARE\Microsoft\Virtual Machine\Guest\Parameters". But to each their own, that is how some may prefer to gather their information...

Oh and one more thing; If the hostname returned is a mix of letters and numbers about 10 characters long its hosted in Azure.

          001

002

003

004

005

006

007

008

009

010

011

        # Retieves $env:COMPUTERNAME's Hyper-V Host Server Name

Function Get-VMHostname

{

     (Get-Item "HKLM:\SOFTWARE\Microsoft\Virtual Machine\Guest\Parameters").GetValue("HostName")

}

Get-VMHostname

The Overnight Admin

Pages

Monday, August 11, 2014

Download all images from a web page with PowerShell

Saturday, August 9, 2014

Create a list of Computer Names with no spaces with PowerShell

Param

Begin

Process

001
002
003
004
005

Process {
         # removes empty spaces from txt document
         # Alternate Way --> (get-content $ComputerList) -replace '\s+', ''

         $ComputerList = (Get-Content $OutputPath).replace(" ","")
}

End

Wednesday, August 6, 2014

Which Hyper-V Host is hosting this VM

Connect With Me

The Overnight Admin

Pages

Monday, August 11, 2014

Download all images from a web page with PowerShell

Saturday, August 9, 2014

Create a list of Computer Names with no spaces with PowerShell

Param

Begin

Process

001 002 003 004 005 Process { # removes empty spaces from txt document # Alternate Way --> (get-content $ComputerList) -replace '\s+', '' $ComputerList = (Get-Content $OutputPath).replace(" ","") }

End

Wednesday, August 6, 2014

Which Hyper-V Host is hosting this VM

Connect With Me

001
002
003
004
005

Process {
# removes empty spaces from txt document
# Alternate Way --> (get-content $ComputerList) -replace '\s+', ''

$ComputerList = (Get-Content $OutputPath).replace(" ","")
}