[MS] Update PDF table extraction to support aligned Markdown (#1499)

* Added PDF table extraction feature with aligned Markdown (#1419)

* Add PDF test files and enhance extraction tests

- Added a medical report scan PDF for testing scanned PDF handling.
- Included a retail purchase receipt PDF to validate receipt extraction functionality.
- Introduced a multipage invoice PDF to test extraction of complex invoice structures.
- Added a borderless table PDF for testing inventory reconciliation report extraction.
- Implemented comprehensive tests for PDF table extraction, ensuring proper structure and data integrity.
- Enhanced existing tests to validate the order and presence of extracted content across various PDF types.

* fix: update dependencies for PDF processing and improve table extraction logic

* Bumped version of pdfminer.six
---------

Authored-by: Ashok <ashh010101@gmail.com>
This commit is contained in:
lesyk
2026-01-08 01:38:45 +01:00
committed by GitHub
parent dde250a456
commit 251dddcf0c
8 changed files with 1501 additions and 21 deletions
+1
View File
@@ -52,6 +52,7 @@ coverage.xml
.hypothesis/
.pytest_cache/
cover/
.test-logs/
# Translations
*.mo