Unable to index older versions of MS Word using SOLR and ColdFusion

Status/Resolution/Reason: Closed/Withdrawn/CannotReproduce

Reporter/Name(from Bugbase): Calvert Acklin / Calvert Acklin ()

Created: 04/10/2017

Components: Text Search, Solr

Versions: 11.0

Failure Type: Incorrectly functioning

Found In Build/Fixed In Build: Version: 11,0,11,301867, Adobe: 5.1.3 (Build 000094) /

Priority/Frequency: Normal /

Locale/System: English / Win 2012 Server x64

Vote Count: 0

Problem Description: Unable to index older versions of MS Word.

Steps to Reproduce:

Use the cfsearch tag to index a MS Office Word document with a .doc extension.

Actual Result:

WARNING: Could not index {file name}.doc in SOLR. Check the exception for more details: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF) 

Expected Result:

Older MS Word documents are indexed without issue.

Any Workarounds: N/A



Calvert, I am unable to observe the issue with 11,0,12,302575/win 7x64 I do not seem to get any errors when indexing .doc or docx files. Just to be clear, cfsearch tag is used to search for a text pattern. cfindex tag is used to index a collection of docs. Can you share the following: - The update level of your CF11 server. - Where do you see the error you've reported (log file, CF output console)? If it is a log file, the name and the location of the log file. - The complete stack trace of the reported error. - Are you not able to index any .doc files or certain specific .doc file? - The relevant part of the CFML which is resulting in the error. For the record, here the test code I used to try and reproduce the issue: <cfscript> fl_separator = IIF( FindNoCase("windows",, DE("\"), DE("/")); coln_name = "winDocs"; cfcollection( action="list", name="lst_col", engine="solr"); col_lst = ValueList(lst_col.NAME, ","); if( ListContainsNoCase(col_lst, "#coln_name#") EQ 0) { cfcollection( action="create", collection="#coln_name#", engine="solr", path="#expandpath(".")##fl_separator#Coln_vld_1"); writeOutput("creating collection " & coln_name & "...<br>"); } else { writeOutput("collection " & coln_name & " pre-exists...<br>"); } //fodder_path = "#expandpath("./")#pdfs"; fodder_path = "C:\inetpub\cf2016\misc\solr\docs"; cfdirectory (action="list", directory="#fodder_path#", name="listfls", recurse=true); WriteOutput("indexing dir...: " & fodder_path & "<br>"); </cfscript> <cfquery name="qry_fls" dbtype="query"> select * from listfls where type='File' </cfquery> <!--- <cfdump var=#qry_fls# format="html"> ---> <hr> <cfscript> indx_stat = {}; try { for(n_fl=1; n_fl LTE listfls.recordcount; n_fl++) { if (listfls["Type"][n_fl] EQ "File" && ((Right(listfls["Name"][n_fl], 4) EQ ".doc") || (Right(listfls["Name"][n_fl], 5) EQ ".docx"))) { fl_uri = listfls["Directory"][n_fl] & fl_separator & listfls["Name"][n_fl]; writeOutput("Indexing file ..." & fl_uri & "<br>"); cfindex( action='refresh', collection=coln_name, type='file', key=fl_uri, status="indx_stat"); //, throwonError=false writeOutput("File " & IIF( indx_stat.inserted EQ 1, DE(""), DE(" <b>NOT</b> ")) & "indexed.<br>"); //writeOutput("Is the error struct empty: " & "<b>" & StructIsEmpty(indx_stat.errors) & "</b><br>"); } } } catch(any exp) { writeOutput("Exception caught: <br>" & "exception msg: " & exp.message & "<br>exception detail: " & exp.detail); } writeDump(indx_stat); sleep(2000); lookup_term = "coldfusion"; writeOutput("searching for #lookup_term#..<br>"); /*cfsearch(name="testlookUp", collection=coln_name, criteria="#lookup_term#", status="srcStatus");*/ </cfscript> <cfsearch name="lookup" collection="#coln_name#" criteria="#lookup_term#" status = "srch_stats"> matches found : <cfoutput><b>#srch_stats.FOUND#</b></cfoutput><br> <cfdump var=#lookup# label="search result">
Calvert, Can you please share the information requested in my previous note.
closing this as the issue is not reproducible. can revisit the issue if the details sought earlier are made available.
