Converting Word document format with PowerShell

Do you have a file server full documents in the old Word document format? This blog explains how to use PowerShell to bulk convert files from .DOC to .DOCX. The script can be run against a folder full of documents, automatically crearting a new version in .DOCX format. The same script can be easily modified to convert Word documents of any format to PDF format.

The PC running the script must have PowerShell and Microsoft Word installed.

The example script processes all .DOC files in the C:\Olddocuments folder


$path = "c:\olddocuments\" 
$word_app = New-Object -ComObject Word.Application

$Format = [Microsoft.Office.Interop.Word.WdSaveFormat]::wdFormatXMLDocument

Get-ChildItem -Path $path -Filter *.doc | ForEach-Object {
    $document = $word_app.Documents.Open($_.FullName)
    $docx_filename = "$($_.DirectoryName)\$($_.BaseName).docx"
    $document.SaveAs([ref] $docx_filename, [ref]$Format)
    $document.Close()
}
$word_app.Quit()

If you need to convert the documents to PDF, make the following change to the “SaveAs” line in the script. 17 corresponds to the PDF file format when doing a Save As in Microsoft Word.


$document.SaveAs([ref] $docx_filename, [ref]17)

One of the big benefits of converting files is the reduction in size. In a test across several thousand documents I noticed a 40% diskspace saving. In addition, DOCX files are less like to get corrupted, and support new Microsoft Word features.

24 comments

  1. Hey! Just what I was looking for. Is there a way to get it to move the converted files to a subfolder (named something obvious like “old files – why are you still using this format!?”)? And can I get this to do subfolders recursively?

    • Hi Mark,

      You can specify a subfolder for the new files. Just modify this line and replace the xxxxxxxx with your subfolder name.

      $docx_filename = “$($_.DirectoryName)\xxxxxxxx\$($_.BaseName).docx”

      It is possible to iterate through subfolders. First you need to get the subfolders. Here’s the basic code structure

      $FolderPath = Get-ChildItem -Directory -Path “C:\temp” -Recurse -Force
      ForEach ($Folder in $FolderPath)
      {
      …convert file code goes here…
      }

  2. This article is great. I just converted a huge amount of files using this. Quick question, how would it look if I wanted to convert XLS to XLSX? I have tried the below but am prompted every time that a file aready exists and if I want to overwrite..

    $path = “C:\Users\TEST”
    $excel = New-Object -ComObject excel.Application
    $Format = [Microsoft.Office.Interop.Excel.XlFileFormat]::xlWorkbookDefault
    Get-ChildItem -Path $path -Filter *.xls | ForEach-Object {
    $workbook = $excel.workbooks.Open($_.FullName)
    $xlsx_filename = “$($_.DirectoryName)\$($_.BaseName).xlsx”
    $workbook.SaveAs([ref] $xlsx_filename, [ref]$Format)
    $workbook.Close()
    }
    $excel_app.Quit()

    • Hi Alex, glad this helped you. Try chaging this line to save the converted file into a subfolder.

      $xlsx_filename = “$($_.DirectoryName)\subfolder\$($_.BaseName).xlsx”

    • This could be done with SharePoint, the file conversion itself just needs to know where the source documents are. A simple solution would be to add a OneDrive shortcut to the Library and then run the PowerShell script in that shortcut (e.g. via Windows Explorer)

  3. I appreciate the post. I ran the Powershell script and when I opened the new .docx file it opened in compatibility mode. When I went to save the file it asked if I wanted to convert the file to a new format. It doesn’t look like it converted the files.

  4. Based on this article plus the comments, I’ve built an all-in-one script for Word/Excel/PowerPoint. Note that I move the original files to an “old” folder, created were the new files are, as a precaution. The only problem is the PowerPoint part, where the program has to open to convert. I can’t find a silent way.

    $source = “C:\ConvertTest”
    $appX = New-Object -ComObject Excel.Application
    $appW = New-Object -ComObject Word.Application
    $appP = New-Object -ComObject PowerPoint.Application
    $FormatX = [Microsoft.Office.Interop.Excel.XlFileFormat]::xlWorkbookDefault
    $FormatW = [Microsoft.Office.Interop.Word.WdSaveFormat]::wdFormatXMLDocument
    $FormatP = [Microsoft.Office.Interop.PowerPoint.PpSaveAsFileType]::ppSaveAsOpenXMLPresentation
    $searchX = Get-ChildItem -Path $source -Recurse -Include *.xls -Exclude *.xlsx
    $searchW = Get-ChildItem -Path $source -Recurse -Include *.doc -Exclude *.docx
    $searchP = Get-ChildItem -Path $source -Recurse -Include *.ppt -Exclude *.pptx

    $searchX | ForEach-Object {
    $document = $appX.Workbooks.Open($_.FullName)
    $filename = “$($_.DirectoryName)\$($_.BaseName).xlsx”
    $document.SaveAs([ref] $filename, [ref]$FormatX)
    $document.Close()
    $path = “$($_.DirectoryName)\$($_.Name)”
    Mkdir -Force “$($_.DirectoryName)\old”
    $destination = “$($_.DirectoryName)\old\$($_.Name)”
    Move-Item -Path $path -Destination $destination -Force
    }
    $appX.Quit()

    $searchW | ForEach-Object {
    $document = $appW.Documents.Open($_.FullName)
    $filename = “$($_.DirectoryName)\$($_.BaseName).docx”
    $document.SaveAs([ref] $filename, [ref]$FormatW)
    $document.Convert()
    $document.Close()
    $path = “$($_.DirectoryName)\$($_.Name)”
    Mkdir -Force “$($_.DirectoryName)\old”
    $destination = “$($_.DirectoryName)\old\$($_.Name)”
    Move-Item -Path $path -Destination $destination -Force
    }
    $appW.Quit()

    $searchP | ForEach-Object {
    $document = $appP.Presentations.Open($_.FullName)
    $filename = “$($_.DirectoryName)\$($_.BaseName).pptx”
    $document.SaveAs([ref] $filename, [ref]$FormatP)
    $document.Close()
    $path = “$($_.DirectoryName)\$($_.Name)”
    Mkdir -Force “$($_.DirectoryName)\old”
    $destination = “$($_.DirectoryName)\old\$($_.Name)”
    Move-Item -Path $path -Destination $destination -Force
    }
    $appP.Quit()

  5. Documents converted to DocX using this script will likely still be in “Compatibility Mode” and some newer features of Word unavailable to these documents. To upscale them to the latest version of DocX based on the version of Word you have, you will need to call the Convert() method –
    $word_app.Convert() – prior to calling the saveas method.

    Note that tables have a tendancy to have their layout changed when being upscaled to the highest docx version so I also included a check to only upscale documents where the $word_app.Tables.Count() -eq 0.

  6. I had the issue with duplicate files and subfolders.
    Following script also checks if there are files with the same new with the extension .docx
    You dont need the line: $document.Convert(). I added the convert because I had some compatibility issues.

    Have fun:)

    # Specify the root path to search for DOC files
    $path = “C:\temp\TEST”

    # Create a new Word Application object
    $word_app = New-Object -ComObject Word.Application

    # Set the save format to XML Document
    $Format = [Microsoft.Office.Interop.Word.WdSaveFormat]::wdFormatXMLDocument

    # Retrieve the list of DOC files in the specified path and its subfolders
    $files = Get-ChildItem -Path $path -Filter *.doc -File -Recurse

    # Process each DOC file
    foreach ($file in $files) {
    # Open the document in Word
    $document = $word_app.Documents.Open($file.FullName)

    # Check if the file is a DOC
    if ($file.Extension -eq “.doc”) {
    $docx_filename = “$($file.DirectoryName)\$($file.BaseName).docx”
    $docx_duplicate_filename = “$($file.DirectoryName)\$($file.BaseName)_duplicate.docx”

    # Check if DOCX file already exists
    if (-not (Test-Path $docx_filename)) {
    $document.SaveAs([ref]$docx_filename, [ref]$Format)
    $document.Convert()
    $document.Close()
    Remove-Item -Path $file.FullName
    } else {
    $duplicateCounter = 1
    $duplicateFilename = $docx_duplicate_filename

    # Find a unique filename for the duplicate DOCX file
    while (Test-Path $duplicateFilename) {
    $duplicateCounter++
    $duplicateFilename = “$($file.DirectoryName)\$($file.BaseName)_duplicate_$duplicateCounter.docx”
    }

    $document.SaveAs([ref]$duplicateFilename, [ref]$Format)
    $document.Convert()
    $document.Close()
    Remove-Item -Path $file.FullName
    }
    } else {
    $document.Close()
    Write-Host “Skipping conversion. File is already a DOCX: $($file.Name)”
    }
    }

    # Close the Word application
    $word_app.Quit()

  7. I had the issue with duplicate files and subfolders.
    Following script also checks if there are files with the same new with the extension .docx
    You dont need the line: $document.Convert(). I added the convert because I had some compatibility issues

    # Specify the root path to search for DOC files
    $path = “C:\temp\TEST”

    # Create a new Word Application object
    $word_app = New-Object -ComObject Word.Application

    # Set the save format to XML Document
    $Format = [Microsoft.Office.Interop.Word.WdSaveFormat]::wdFormatXMLDocument

    # Retrieve the list of DOC files in the specified path and its subfolders
    $files = Get-ChildItem -Path $path -Filter *.doc -File -Recurse

    # Process each DOC file
    foreach ($file in $files) {
    # Open the document in Word
    $document = $word_app.Documents.Open($file.FullName)

    # Check if the file is a DOC
    if ($file.Extension -eq “.doc”) {
    $docx_filename = “$($file.DirectoryName)\$($file.BaseName).docx”
    $docx_duplicate_filename = “$($file.DirectoryName)\$($file.BaseName)_duplicate.docx”

    # Check if DOCX file already exists
    if (-not (Test-Path $docx_filename)) {
    $document.SaveAs([ref]$docx_filename, [ref]$Format)
    $document.Convert()
    $document.Close()
    Remove-Item -Path $file.FullName
    } else {
    $duplicateCounter = 1
    $duplicateFilename = $docx_duplicate_filename

    # Find a unique filename for the duplicate DOCX file
    while (Test-Path $duplicateFilename) {
    $duplicateCounter++
    $duplicateFilename = “$($file.DirectoryName)\$($file.BaseName)_duplicate_$duplicateCounter.docx”
    }

    $document.SaveAs([ref]$duplicateFilename, [ref]$Format)
    $document.Convert()
    $document.Close()
    Remove-Item -Path $file.FullName
    }
    } else {
    $document.Close()
    Write-Host “Skipping conversion. File is already a DOCX: $($file.Name)”
    }
    }

    # Close the Word application
    $word_app.Quit()

  8. Are you still available to answer this? The part I’m missing is the Format. I have

    $FormatV = [Microsoft.Office.Interop.Visio.

    But I don’t know the rest of the formatting. Any help would be greatly appreciated.

  9. Is there a way to exclude the created old directories when running the script on a folder that it has already ran on and been interrupted?

Leave a comment