portal entry

select a category, or use search below
(searches all categories and all time range)

Extracting text/html out Word (.docx) files

| View in Portal
November 03, 2018 04:44:50 PM GMT
<p>Repositories https://github.com/jmohler1970/WordExtractor https://github.com/jmohler1970/WordExtractor_demo Introduction We are going to be extracting out HTML from a Word (.docx) file. .docx is an example of an Open Document Format for Office Applications (ODF) file. It is a ZIP of an XML document. By unzipping the file and locating the appropriate XML file, we can process the data an generate HTML Resources https://en.wikipedia.org/wiki/OpenDocument https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-tags/tags-u-z/cfzip.html https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-functions/functions-t-z/xmlparse.html</p>
<p>The post <a rel="nofollow" href="https://coldfusion.adobe.com/2018/11/extracting-text-html-out-word-docx-files/">Extracting text/html out Word (.docx) files</a> appeared first on <a rel="nofollow" href="https://coldfusion.adobe.com">ColdFusion</a>.</p>
Labels: Blog, Learning, blog, cfscript, cfzip, learning, programming


Well done, James! I really enjoyed this walk through. Been a loooooOOOOooooong time since I've seen any great CF tutorials. And I think I can use this. I've been wanting to convert my word documents into markdown files. Thanks for the head start! BTW: I had no idea docx files were really just zip files. (mind-blown)
Comment by chrisg57685480
1362 | November 05, 2018 06:38:01 PM GMT
Glad you liked it!
Comment by James Mohler
1365 | November 06, 2018 04:52:25 AM GMT
Excellent content James.  This was really well done.
Comment by David Byers
1369 | November 07, 2018 09:13:37 PM GMT
Glad you liked it!
Comment by James Mohler
1370 | November 07, 2018 09:58:03 PM GMT