kodeagent.tools.extract_as_markdown#
- kodeagent.tools.extract_as_markdown(url_or_path: str, max_length: int = 20000) str[source]#
Extract content from documents (PDF, DOCX, XLSX, PPTX) as Markdown text. Works with both URLs and local file paths.
Supported formats: - PDF files (.pdf) - Word documents (.docx) - Excel spreadsheets (.xlsx) - PowerPoint presentations (.pptx)
For reading HTML web pages, use ‘read_webpage’ instead (faster and cleaner).
Examples
Extract from PDF: “https://example.com/paper.pdf”
Extract from local file: “/tmp/document.docx”
Extract from Excel: “https://example.com/data.xlsx”
- Parameters:
url_or_path – URL or file path to a PDF, DOCX, XLSX, or PPTX file.
max_length – Optional limit on output length in characters. Use this to truncate very long documents (may lose information).
- Returns:
Document content as Markdown text, or an error message if extraction fails.