Extracting text/html out Word (.docx) files

November 03, 2018 04:44:50 PM GMT
<p>Repositories https://github.com/jmohler1970/WordExtractor https://github.com/jmohler1970/WordExtractor_demo Introduction We are going to be extracting out HTML from a Word (.docx) file. .docx is an example of an Open Document Format for Office Applications (ODF) file. It is a ZIP of an XML document. By unzipping the file and locating the appropriate XML file, we can process the data an generate HTML Resources https://en.wikipedia.org/wiki/OpenDocument https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-tags/tags-u-z/cfzip.html https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-functions/functions-t-z/xmlparse.html</p>
Well done, James! I really enjoyed this walk through. Been a loooooOOOOooooong time since I've seen any great CF tutorials. And I think I can use this. I've been wanting to convert my word documents into markdown files. Thanks for the head start! BTW: I had no idea docx files were really just zip files. (mind-blown)
Comment by chrisg57685480
1362 | November 05, 2018 06:38:01 PM GMT
Glad you liked it!
Comment by James Mohler
1365 | November 06, 2018 04:52:25 AM GMT
Excellent content James.  This was really well done.
Comment by David Byers
1369 | November 07, 2018 09:13:37 PM GMT
Glad you liked it!
Comment by James Mohler
1370 | November 07, 2018 09:58:03 PM GMT