Converting Word document format with PowerShell

Do you have a file server full documents in the old Word document format? This blog explains how to use PowerShell to bulk convert files from .DOC to .DOCX. The script can be run against a folder full of documents, automatically crearting a new version in .DOCX format. The same script can be easily modified to convert Word documents of any format to PDF format.

The PC running the script must have PowerShell and Microsoft Word installed.

The example script processes all .DOC files in the C:\Olddocuments folder


$path = "c:\olddocuments\" 
$word_app = New-Object -ComObject Word.Application

$Format = [Microsoft.Office.Interop.Word.WdSaveFormat]::wdFormatXMLDocument

Get-ChildItem -Path $path -Filter *.doc | ForEach-Object {
    $document = $word_app.Documents.Open($_.FullName)
    $docx_filename = "$($_.DirectoryName)\$($_.BaseName).docx"
    $document.SaveAs([ref] $docx_filename, [ref]$Format)
    $document.Close()
}
$word_app.Quit()

If you need to convert the documents to PDF, make the following change to the “SaveAs” line in the script. 17 corresponds to the PDF file format when doing a Save As in Microsoft Word.


$document.SaveAs([ref] $docx_filename, [ref]17)

One of the big benefits of converting files is the reduction in size. In a test across several thousand documents I noticed a 40% diskspace saving. In addition, DOCX files are less like to get corrupted, and support new Microsoft Word features.

6 thoughts on “Converting Word document format with PowerShell

  1. Hey! Just what I was looking for. Is there a way to get it to move the converted files to a subfolder (named something obvious like “old files – why are you still using this format!?”)? And can I get this to do subfolders recursively?

    1. Hi Mark,

      You can specify a subfolder for the new files. Just modify this line and replace the xxxxxxxx with your subfolder name.

      $docx_filename = “$($_.DirectoryName)\xxxxxxxx\$($_.BaseName).docx”

      It is possible to iterate through subfolders. First you need to get the subfolders. Here’s the basic code structure

      $FolderPath = Get-ChildItem -Directory -Path “C:\temp” -Recurse -Force
      ForEach ($Folder in $FolderPath)
      {
      …convert file code goes here…
      }

  2. This article is great. I just converted a huge amount of files using this. Quick question, how would it look if I wanted to convert XLS to XLSX? I have tried the below but am prompted every time that a file aready exists and if I want to overwrite..

    $path = “C:\Users\TEST”
    $excel = New-Object -ComObject excel.Application
    $Format = [Microsoft.Office.Interop.Excel.XlFileFormat]::xlWorkbookDefault
    Get-ChildItem -Path $path -Filter *.xls | ForEach-Object {
    $workbook = $excel.workbooks.Open($_.FullName)
    $xlsx_filename = “$($_.DirectoryName)\$($_.BaseName).xlsx”
    $workbook.SaveAs([ref] $xlsx_filename, [ref]$Format)
    $workbook.Close()
    }
    $excel_app.Quit()

    1. Hi Alex, glad this helped you. Try chaging this line to save the converted file into a subfolder.

      $xlsx_filename = “$($_.DirectoryName)\subfolder\$($_.BaseName).xlsx”

    1. This could be done with SharePoint, the file conversion itself just needs to know where the source documents are. A simple solution would be to add a OneDrive shortcut to the Library and then run the PowerShell script in that shortcut (e.g. via Windows Explorer)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s