There are a bunch of static web contents in my project that initially get drafted from word documents from various sources. These word documents need to get published to the web application eventually in either HTML or PDF formats. Obviously, our BA Julio is super tired of converting them one by one. With hundreds of documents, it could be a nightmare to do it by hand. So he came to me for help. It ended up to be a simple task with a few lines of script using PowerShell. Salute to the power of programming!!
Let’s get started. Copy the code below and save it to converter.ps1 for eg.
[CmdletBinding()]
Param(
[Parameter(Mandatory=$True)]
[string]$documents_path
)
$targetPdfPath = "${documents_path}\pdf"
if (!(Test-Path $targetPdfPath -PathType Container)) {
New-Item -ItemType Directory -Force -Path $targetPdfPath
}
$word_app = New-Object -ComObject Word.Application
# This filter will find .doc as well as .docx documents
$count = 0
Get-ChildItem -Path $documents_path -Filter *.doc? | ForEach-Object {
$count++;
$document = $word_app.Documents.Open($_.FullName)
$pdf_filename = "${targetPdfPath}\$($_.BaseName).pdf"
Write-Host "Converting ${pdf_filename} ..."
$document.SaveAs([ref] $pdf_filename, [ref] 17)
$document.Close()
}
Write-Host "Complete converting ${count} word documents to pdf under ${targetPdfPath}" -BackgroundColor "Green" -ForegroundColor "Black";
$targetHtmlPath = "${documents_path}\html"
if (!(Test-Path $targetHtmlPath -PathType Container)) {
New-Item -ItemType Directory -Force -Path $targetHtmlPath
}
$word_app = New-Object -ComObject Word.Application
# This filter will find .doc as well as .docx documents
$count1 = 0
Get-ChildItem -Path $documents_path -Filter *.doc? | ForEach-Object {
$count1++;
$document = $word_app.Documents.Open($_.FullName)
$html_filename = "${targetHtmlPath}\$($_.BaseName).html"
Write-Host "Converting ${html_filename} ..."
$saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], "wdFormatFilteredHTML");
$document.SaveAs([ref] $html_filename, [ref] $saveFormat)
$document.Close()
}
Write-Host "Complete converting ${count1} word documents to html under ${targetHtmlPath}" -BackgroundColor "Green" -ForegroundColor "Black";
$word_app.Quit()
Now copy the docs you would like to convert to a temporary folder for eg:
C:\temp\msdocs
Follow the following steps to convert the documents to htmls and pdfs. Assume you are on windows system.
Step 1: Click on window icon at the left bottom corner and search powershell.
Step 2: Run as administrator to open powershell
Step 3: At prompt, run the following command
Set-ExecutionPolicy Restricted
If the above command didn’t work, try this command
Set-ExecutionPolicy RemoteSigned
Step 4: Input “Y” when prompted
Step 5: Navigate to script folder => cd
is where your put your converter.ps1.
Step 6: Run the following command
./converter.ps1 “”
refers to C:\temp\msdocs if you follow the example to move your documents to this example folder.
Step 7: It will create a pdf & html folder respectively under the current directory. Please watch the console output in case there is some unusual hanging behavior on the screen. It could indicate that there is some popup interaction from word application. Once you interact with the word application, script will continue the process until finished.
Step 8: Go to msdocs folder and check to make sure you don’t have any missing files.
Note the console would output the number of pdfs that have been converted from word documents. You should compare the two counts to prevent from any potential missing conversion.
Thanks!
Working on PS 7.1!
Fantastic! Thanks a lot
Thanks so much for this! It worked greatly. Had about a 100 files that needed conversion. You saved me a lot of time.
Thanks so much for this! It worked great. Had about a 100 files that needed conversion. You saved me a lot of time.