Do you have a file server full documents in the old Word document format? This blog explains how to use PowerShell to bulk convert files from .DOC to .DOCX. The script can be run against a folder full of documents, automatically crearting a new version in .DOCX format. The same script can be easily modified to convert Word documents of any format to PDF format.
The PC running the script must have PowerShell and Microsoft Word installed.
The example script processes all .DOC files in the C:\Olddocuments folder
$path = "c:\olddocuments\"
$word_app = New-Object -ComObject Word.Application
$Format = [Microsoft.Office.Interop.Word.WdSaveFormat]::wdFormatXMLDocument
Get-ChildItem -Path $path -Filter *.doc | ForEach-Object {
$document = $word_app.Documents.Open($_.FullName)
$docx_filename = "$($_.DirectoryName)\$($_.BaseName).docx"
$document.SaveAs([ref] $docx_filename, [ref]$Format)
$document.Close()
}
$word_app.Quit()
If you need to convert the documents to PDF, make the following change to the “SaveAs” line in the script. 17 corresponds to the PDF file format when doing a Save As in Microsoft Word.
$document.SaveAs([ref] $docx_filename, [ref]17)
One of the big benefits of converting files is the reduction in size. In a test across several thousand documents I noticed a 40% diskspace saving. In addition, DOCX files are less like to get corrupted, and support new Microsoft Word features.
Hey! Just what I was looking for. Is there a way to get it to move the converted files to a subfolder (named something obvious like “old files – why are you still using this format!?”)? And can I get this to do subfolders recursively?
Hi Mark,
You can specify a subfolder for the new files. Just modify this line and replace the xxxxxxxx with your subfolder name.
$docx_filename = “$($_.DirectoryName)\xxxxxxxx\$($_.BaseName).docx”
It is possible to iterate through subfolders. First you need to get the subfolders. Here’s the basic code structure
$FolderPath = Get-ChildItem -Directory -Path “C:\temp” -Recurse -Force
ForEach ($Folder in $FolderPath)
{
…convert file code goes here…
}
This article is great. I just converted a huge amount of files using this. Quick question, how would it look if I wanted to convert XLS to XLSX? I have tried the below but am prompted every time that a file aready exists and if I want to overwrite..
$path = “C:\Users\TEST”
$excel = New-Object -ComObject excel.Application
$Format = [Microsoft.Office.Interop.Excel.XlFileFormat]::xlWorkbookDefault
Get-ChildItem -Path $path -Filter *.xls | ForEach-Object {
$workbook = $excel.workbooks.Open($_.FullName)
$xlsx_filename = “$($_.DirectoryName)\$($_.BaseName).xlsx”
$workbook.SaveAs([ref] $xlsx_filename, [ref]$Format)
$workbook.Close()
}
$excel_app.Quit()
Hi Alex, glad this helped you. Try chaging this line to save the converted file into a subfolder.
$xlsx_filename = “$($_.DirectoryName)\subfolder\$($_.BaseName).xlsx”
Hi Steve
looks great i have lots of doc files in SharePoint that I would like to convert. do you think we could use something similar on sharepoint>
This could be done with SharePoint, the file conversion itself just needs to know where the source documents are. A simple solution would be to add a OneDrive shortcut to the Library and then run the PowerShell script in that shortcut (e.g. via Windows Explorer)
I appreciate the post. I ran the Powershell script and when I opened the new .docx file it opened in compatibility mode. When I went to save the file it asked if I wanted to convert the file to a new format. It doesn’t look like it converted the files.
I had to add the line $document.Convert() and I no longer get the compatibility mode warning.
Based on this article plus the comments, I’ve built an all-in-one script for Word/Excel/PowerPoint. Note that I move the original files to an “old” folder, created were the new files are, as a precaution. The only problem is the PowerPoint part, where the program has to open to convert. I can’t find a silent way.
$source = “C:\ConvertTest”
$appX = New-Object -ComObject Excel.Application
$appW = New-Object -ComObject Word.Application
$appP = New-Object -ComObject PowerPoint.Application
$FormatX = [Microsoft.Office.Interop.Excel.XlFileFormat]::xlWorkbookDefault
$FormatW = [Microsoft.Office.Interop.Word.WdSaveFormat]::wdFormatXMLDocument
$FormatP = [Microsoft.Office.Interop.PowerPoint.PpSaveAsFileType]::ppSaveAsOpenXMLPresentation
$searchX = Get-ChildItem -Path $source -Recurse -Include *.xls -Exclude *.xlsx
$searchW = Get-ChildItem -Path $source -Recurse -Include *.doc -Exclude *.docx
$searchP = Get-ChildItem -Path $source -Recurse -Include *.ppt -Exclude *.pptx
$searchX | ForEach-Object {
$document = $appX.Workbooks.Open($_.FullName)
$filename = “$($_.DirectoryName)\$($_.BaseName).xlsx”
$document.SaveAs([ref] $filename, [ref]$FormatX)
$document.Close()
$path = “$($_.DirectoryName)\$($_.Name)”
Mkdir -Force “$($_.DirectoryName)\old”
$destination = “$($_.DirectoryName)\old\$($_.Name)”
Move-Item -Path $path -Destination $destination -Force
}
$appX.Quit()
$searchW | ForEach-Object {
$document = $appW.Documents.Open($_.FullName)
$filename = “$($_.DirectoryName)\$($_.BaseName).docx”
$document.SaveAs([ref] $filename, [ref]$FormatW)
$document.Convert()
$document.Close()
$path = “$($_.DirectoryName)\$($_.Name)”
Mkdir -Force “$($_.DirectoryName)\old”
$destination = “$($_.DirectoryName)\old\$($_.Name)”
Move-Item -Path $path -Destination $destination -Force
}
$appW.Quit()
$searchP | ForEach-Object {
$document = $appP.Presentations.Open($_.FullName)
$filename = “$($_.DirectoryName)\$($_.BaseName).pptx”
$document.SaveAs([ref] $filename, [ref]$FormatP)
$document.Close()
$path = “$($_.DirectoryName)\$($_.Name)”
Mkdir -Force “$($_.DirectoryName)\old”
$destination = “$($_.DirectoryName)\old\$($_.Name)”
Move-Item -Path $path -Destination $destination -Force
}
$appP.Quit()
Thank you John, that is awesome!
What version of Office are you running when running this script?
Office 365 v2110 currently, but it will work with older versions. I think Office 2016 or later should work, but I haven’t tested to 100% sure.
I just wanted to say this worked really well for me. For some reason the powers that be decided that all old office files were no longer “trusted” and I’m scrambling to get clients old data accessible again.
Hi Richard, I am glad it helped!
Documents converted to DocX using this script will likely still be in “Compatibility Mode” and some newer features of Word unavailable to these documents. To upscale them to the latest version of DocX based on the version of Word you have, you will need to call the Convert() method –
$word_app.Convert() – prior to calling the saveas method.
Note that tables have a tendancy to have their layout changed when being upscaled to the highest docx version so I also included a check to only upscale documents where the $word_app.Tables.Count() -eq 0.
Thank you, that’s very helpful to know!
I had the issue with duplicate files and subfolders.
Following script also checks if there are files with the same new with the extension .docx
You dont need the line: $document.Convert(). I added the convert because I had some compatibility issues.
Have fun:)
# Specify the root path to search for DOC files
$path = “C:\temp\TEST”
# Create a new Word Application object
$word_app = New-Object -ComObject Word.Application
# Set the save format to XML Document
$Format = [Microsoft.Office.Interop.Word.WdSaveFormat]::wdFormatXMLDocument
# Retrieve the list of DOC files in the specified path and its subfolders
$files = Get-ChildItem -Path $path -Filter *.doc -File -Recurse
# Process each DOC file
foreach ($file in $files) {
# Open the document in Word
$document = $word_app.Documents.Open($file.FullName)
# Check if the file is a DOC
if ($file.Extension -eq “.doc”) {
$docx_filename = “$($file.DirectoryName)\$($file.BaseName).docx”
$docx_duplicate_filename = “$($file.DirectoryName)\$($file.BaseName)_duplicate.docx”
# Check if DOCX file already exists
if (-not (Test-Path $docx_filename)) {
$document.SaveAs([ref]$docx_filename, [ref]$Format)
$document.Convert()
$document.Close()
Remove-Item -Path $file.FullName
} else {
$duplicateCounter = 1
$duplicateFilename = $docx_duplicate_filename
# Find a unique filename for the duplicate DOCX file
while (Test-Path $duplicateFilename) {
$duplicateCounter++
$duplicateFilename = “$($file.DirectoryName)\$($file.BaseName)_duplicate_$duplicateCounter.docx”
}
$document.SaveAs([ref]$duplicateFilename, [ref]$Format)
$document.Convert()
$document.Close()
Remove-Item -Path $file.FullName
}
} else {
$document.Close()
Write-Host “Skipping conversion. File is already a DOCX: $($file.Name)”
}
}
# Close the Word application
$word_app.Quit()
Thank you for sharing!