markitdown

Author	SHA1	Message	Date
lesyk	251dddcf0c	[MS] Update PDF table extraction to support aligned Markdown (#1499 ) * Added PDF table extraction feature with aligned Markdown (#1419) * Add PDF test files and enhance extraction tests - Added a medical report scan PDF for testing scanned PDF handling. - Included a retail purchase receipt PDF to validate receipt extraction functionality. - Introduced a multipage invoice PDF to test extraction of complex invoice structures. - Added a borderless table PDF for testing inventory reconciliation report extraction. - Implemented comprehensive tests for PDF table extraction, ensuring proper structure and data integrity. - Enhanced existing tests to validate the order and presence of extracted content across various PDF types. * fix: update dependencies for PDF processing and improve table extraction logic * Bumped version of pdfminer.six --------- Authored-by: Ashok <ashh010101@gmail.com>	2026-01-07 16:38:45 -08:00
一I	38261fd31c	Update Python version requirement and add .cursorrules to .gitignore (#1249 ) * update markdown * Update and install Python version suggestions * Update README with prerequisites. --------- Co-authored-by: Lucas Liu <lucas@LucasdeMacBook-Pro.local> Co-authored-by: afourney <adamfo@microsoft.com>	2025-05-21 10:47:29 -07:00
Sugato Ray	6f3c762526	Merge branch 'main' into update_commandline_help	2024-12-18 17:50:07 -05:00
Sugato Ray	1384e80725	update .gitignore to exclude .vscode folder	2024-12-18 21:46:06 +00:00
Joel Esler	6e4caac70d	Safeguard against path traversal for ZipConverter fix: prevent path traversal vulnerabilities in ZipConverter Added a secure check for path traversal vulnerabilities in the ZipConverter class. Now validates extracted file paths using `os.path.commonprefix` to ensure all files remain within the intended extraction directory. Raises a `ValueError` if a path traversal attempt is detected. - Normalized file paths using `os.path.normpath`. - Added specific exception handling for `zipfile.BadZipFile` and traversal errors. - Ensured cleanup of extracted files after processing when `cleanup_extracted` is enabled.	2024-12-18 13:12:55 -05:00
microsoft-github-operations[bot]	f454a6d3c8	Initial commit	2024-11-13 19:56:40 +00:00

6 Commits