tracker issue : CF-4198499

select a category, or use search below
(searches all categories and all time range)

Unable to index older versions of MS Word using SOLR and ColdFusion

| View in Tracker

Status/Resolution/Reason: Closed/Withdrawn/CannotReproduce

Reporter/Name(from Bugbase): Calvert Acklin / Calvert Acklin ()

Created: 04/10/2017

Components: Text Search, Solr

Versions: 11.0

Failure Type: Incorrectly functioning

Found In Build/Fixed In Build: Version: 11,0,11,301867, Adobe: 5.1.3 (Build 000094) /

Priority/Frequency: Normal /

Locale/System: English / Win 2012 Server x64

Vote Count: 0

Problem Description: Unable to index older versions of MS Word.

Steps to Reproduce:

Use the cfsearch tag to index a MS Office Word document with a .doc extension.

Actual Result:

WARNING: Could not index {file name}.doc in SOLR. Check the exception for more details: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF) 

Expected Result:

Older MS Word documents are indexed without issue.

Any Workarounds: N/A



Calvert, I am unable to observe the issue with 11,0,12,302575/win 7x64 I do not seem to get any errors when indexing .doc or docx files. Just to be clear, cfsearch tag is used to search for a text pattern. cfindex tag is used to index a collection of docs. Can you share the following: - The update level of your CF11 server. - Where do you see the error you've reported (log file, CF output console)? If it is a log file, the name and the location of the log file. - The complete stack trace of the reported error. - Are you not able to index any .doc files or certain specific .doc file? - The relevant part of the CFML which is resulting in the error. For the record, here the test code I used to try and reproduce the issue: <cfscript> fl_separator = IIF( FindNoCase("windows",, DE("\"), DE("/")); coln_name = "winDocs"; cfcollection( action="list", name="lst_col", engine="solr"); col_lst = ValueList(lst_col.NAME, ","); if( ListContainsNoCase(col_lst, "#coln_name#") EQ 0) { cfcollection( action="create", collection="#coln_name#", engine="solr", path="#expandpath(".")##fl_separator#Coln_vld_1"); writeOutput("creating collection " & coln_name & "...<br>"); } else { writeOutput("collection " & coln_name & " pre-exists...<br>"); } //fodder_path = "#expandpath("./")#pdfs"; fodder_path = "C:\inetpub\cf2016\misc\solr\docs"; cfdirectory (action="list", directory="#fodder_path#", name="listfls", recurse=true); WriteOutput("indexing dir...: " & fodder_path & "<br>"); </cfscript> <cfquery name="qry_fls" dbtype="query"> select * from listfls where type='File' </cfquery> <!--- <cfdump var=#qry_fls# format="html"> ---> <hr> <cfscript> indx_stat = {}; try { for(n_fl=1; n_fl LTE listfls.recordcount; n_fl++) { if (listfls["Type"][n_fl] EQ "File" && ((Right(listfls["Name"][n_fl], 4) EQ ".doc") || (Right(listfls["Name"][n_fl], 5) EQ ".docx"))) { fl_uri = listfls["Directory"][n_fl] & fl_separator & listfls["Name"][n_fl]; writeOutput("Indexing file ..." & fl_uri & "<br>"); cfindex( action='refresh', collection=coln_name, type='file', key=fl_uri, status="indx_stat"); //, throwonError=false writeOutput("File " & IIF( indx_stat.inserted EQ 1, DE(""), DE(" <b>NOT</b> ")) & "indexed.<br>"); //writeOutput("Is the error struct empty: " & "<b>" & StructIsEmpty(indx_stat.errors) & "</b><br>"); } } } catch(any exp) { writeOutput("Exception caught: <br>" & "exception msg: " & exp.message & "<br>exception detail: " & exp.detail); } writeDump(indx_stat); sleep(2000); lookup_term = "coldfusion"; writeOutput("searching for #lookup_term#..<br>"); /*cfsearch(name="testlookUp", collection=coln_name, criteria="#lookup_term#", status="srcStatus");*/ </cfscript> <cfsearch name="lookup" collection="#coln_name#" criteria="#lookup_term#" status = "srch_stats"> matches found : <cfoutput><b>#srch_stats.FOUND#</b></cfoutput><br> <cfdump var=#lookup# label="search result">
Comment by Piyush K.
949 | May 17, 2017 09:58:15 AM GMT
Calvert, Can you please share the information requested in my previous note.
Comment by Piyush K.
950 | August 07, 2017 08:03:19 AM GMT
closing this as the issue is not reproducible. can revisit the issue if the details sought earlier are made available.
Comment by Piyush K.
951 | August 10, 2017 01:10:10 PM GMT