[MS] Update PDF table extraction to support aligned Markdown (#1499)

* Added PDF table extraction feature with aligned Markdown (#1419)

* Add PDF test files and enhance extraction tests

- Added a medical report scan PDF for testing scanned PDF handling.
- Included a retail purchase receipt PDF to validate receipt extraction functionality.
- Introduced a multipage invoice PDF to test extraction of complex invoice structures.
- Added a borderless table PDF for testing inventory reconciliation report extraction.
- Implemented comprehensive tests for PDF table extraction, ensuring proper structure and data integrity.
- Enhanced existing tests to validate the order and presence of extracted content across various PDF types.

* fix: update dependencies for PDF processing and improve table extraction logic

* Bumped version of pdfminer.six
---------

Authored-by: Ashok <ashh010101@gmail.com>

This commit is contained in:

lesyk

2026-01-08 01:38:45 +01:00

committed by

GitHub

parent dde250a456

commit 251dddcf0c

8 changed files with 1501 additions and 21 deletions

.gitignore

View File

@@ -52,6 +52,7 @@ coverage.xml
 .hypothesis/
 .pytest_cache/
 cover/
 .test-logs/
 # Translations
 *.mo