[MS] Add OCR layer service for embedded images and PDF scans (#1541)

* Add OCR test data and implement tests for various document formats - Created HTML file with multiple images for testing OCR extraction. - Added several PDF files with different layouts and image placements to validate OCR functionality. - Introduced PPTX files with complex layouts and images at various positions for comprehensive testing. - Included XLSX files with multiple images and complex layouts to ensure accurate OCR extraction. - Implemented a new test suite in `test_ocr.py` to validate OCR functionality across all document types, ensuring context preservation and accuracy. * Enhance OCR functionality and validation in document converters - Refactor image extraction and processing in PDF, PPTX, and XLSX converters for improved readability and consistency. - Implement detailed validation for OCR text positioning relative to surrounding text in test cases. - Introduce comprehensive tests for expected OCR results across various document types, ensuring no base64 images are present. - Improve error handling and logging for better debugging during OCR extraction. * Add support for scanned PDFs with full-page OCR fallback and implement tests * Bump version to 0.1.6b1 in __about__.py * Refactor OCR services to support LLM Vision, update README and tests accordingly * Add OCR-enabled converters and ensure consistent OCR format across document types * Refactor converters to improve import organization and enhance OCR functionality across DOCX, PDF, PPTX, and XLSX converters * Refactor exception imports for consistency across converters and tests * Fix OCR tests to match MockOCRService output and fix cross-platform file URI handling * Bump version to 0.1.6b1 in __about__.py * Skip DOCX/XLSX/PPTX OCR tests when optional dependencies are missing * Add comprehensive OCR test suite for various document formats - Introduced multiple test documents for PDF, DOCX, XLSX, and PPTX formats, covering scenarios with images at the start, middle, and end. - Implemented tests for complex layouts, multi-page documents, and documents with multiple images. - Created a new test script `test_ocr.py` to validate OCR functionality, ensuring context preservation and accurate text extraction. - Added expected OCR results for validation against ground truth. - Included tests for scanned documents to verify OCR fallback mechanisms. * Remove obsolete HTML test files and refactor test cases for file URIs and OCR format consistency - Deleted `html_image_start.html` and `html_multiple_images.html` as they are no longer needed. - Updated `test_file_uris` in `test_module_misc.py` to simplify assertions by removing unnecessary `url2pathname` usage. - Removed `test_ocr_format_consistency.py` as it is no longer relevant to the current testing framework. * Refactor OCR processing in PdfConverterWithOCR and enhance unit tests for multipage PDFs * Revert * Revert * Update REDMEs * Refactor import statements for consistency and improve formatting in converter and test files
2026-03-10 16:17:17 +00:00
parent 4a5340f93b
commit c6308dc822
45 changed files with 5382 additions and 2 deletions
@@ -9,7 +9,7 @@

 > [!IMPORTANT]
 > Breaking changes between 0.0.1 to 0.1.0:
-> * Dependencies are now organized into optional feature-groups (further details below). Use `pip install 'markitdown[all]'` to have backward-compatible behavior. 
+> * Dependencies are now organized into optional feature-groups (further details below). Use `pip install 'markitdown[all]'` to have backward-compatible behavior.
 > * convert\_stream() now requires a binary file-like object (e.g., a file opened in binary mode, or an io.BytesIO object). This is a breaking change from the previous version, where it previously also accepted text file-like objects, like io.StringIO.
 > * The DocumentConverter class interface has changed to read from file-like streams rather than file paths. *No temporary files are created anymore*. If you are the maintainer of a plugin, or custom DocumentConverter, you likely need to update your code. Otherwise, if only using the MarkItDown class or CLI (as in these examples), you should not need to change anything.

@@ -132,6 +132,38 @@ markitdown --use-plugins path-to-file.pdf

 To find available plugins, search GitHub for the hashtag `#markitdown-plugin`. To develop a plugin, see `packages/markitdown-sample-plugin`.

+#### markitdown-ocr Plugin
+
+The `markitdown-ocr` plugin adds OCR support to PDF, DOCX, PPTX, and XLSX converters, extracting text from embedded images using LLM Vision — the same `llm_client` / `llm_model` pattern that MarkItDown already uses for image descriptions. No new ML libraries or binary dependencies required.
+
+**Installation:**
+
+```bash
+pip install markitdown-ocr
+pip install openai  # or any OpenAI-compatible client
+```
+
+**Usage:**
+
+Pass the same `llm_client` and `llm_model` you would use for image descriptions:
+
+```python
+from markitdown import MarkItDown
+from openai import OpenAI
+
+md = MarkItDown(
+    enable_plugins=True,
+    llm_client=OpenAI(),
+    llm_model="gpt-4o",
+)
+result = md.convert("document_with_images.pdf")
+print(result.text_content)
+```
+
+If no `llm_client` is provided the plugin still loads, but OCR is silently skipped and the standard built-in converter is used instead.
+
+See [`packages/markitdown-ocr/README.md`](packages/markitdown-ocr/README.md) for detailed documentation.
+
 ### Azure Document Intelligence

 To use Microsoft Document Intelligence for conversion:
@@ -0,0 +1,21 @@
+    MIT License
+
+    Copyright (c) Microsoft Corporation.
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to deal
+    in the Software without restriction, including without limitation the rights
+    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+    copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be included in all
+    copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+    SOFTWARE
@@ -0,0 +1,200 @@
+# MarkItDown OCR Plugin
+
+LLM Vision plugin for MarkItDown that extracts text from images embedded in PDF, DOCX, PPTX, and XLSX files.
+
+Uses the same `llm_client` / `llm_model` pattern that MarkItDown already supports for image descriptions — no new ML libraries or binary dependencies required.
+
+## Features
+
+- **Enhanced PDF Converter**: Extracts text from images within PDFs, with full-page OCR fallback for scanned documents
+- **Enhanced DOCX Converter**: OCR for images in Word documents
+- **Enhanced PPTX Converter**: OCR for images in PowerPoint presentations
+- **Enhanced XLSX Converter**: OCR for images in Excel spreadsheets
+- **Context Preservation**: Maintains document structure and flow when inserting extracted text
+
+## Installation
+
+```bash
+pip install markitdown-ocr
+```
+
+The plugin uses whatever OpenAI-compatible client you already have. Install one if you don't have it yet:
+
+```bash
+pip install openai
+```
+
+## Usage
+
+### Command Line
+
+```bash
+markitdown document.pdf --use-plugins --llm-client openai --llm-model gpt-4o
+```
+
+### Python API
+
+Pass `llm_client` and `llm_model` to `MarkItDown()` exactly as you would for image descriptions:
+
+```python
+from markitdown import MarkItDown
+from openai import OpenAI
+
+md = MarkItDown(
+    enable_plugins=True,
+    llm_client=OpenAI(),
+    llm_model="gpt-4o",
+)
+
+result = md.convert("document_with_images.pdf")
+print(result.text_content)
+```
+
+If no `llm_client` is provided the plugin still loads, but OCR is silently skipped — falling back to the standard built-in converter.
+
+### Custom Prompt
+
+Override the default extraction prompt for specialized documents:
+
+```python
+md = MarkItDown(
+    enable_plugins=True,
+    llm_client=OpenAI(),
+    llm_model="gpt-4o",
+    llm_prompt="Extract all text from this image, preserving table structure.",
+)
+```
+
+### Any OpenAI-Compatible Client
+
+Works with any client that follows the OpenAI API:
+
+```python
+from openai import AzureOpenAI
+
+md = MarkItDown(
+    enable_plugins=True,
+    llm_client=AzureOpenAI(
+        api_key="...",
+        azure_endpoint="https://your-resource.openai.azure.com/",
+        api_version="2024-02-01",
+    ),
+    llm_model="gpt-4o",
+)
+```
+
+## How It Works
+
+When `MarkItDown(enable_plugins=True, llm_client=..., llm_model=...)` is called:
+
+1. MarkItDown discovers the plugin via the `markitdown.plugin` entry point group
+2. It calls `register_converters()`, forwarding all kwargs including `llm_client` and `llm_model`
+3. The plugin creates an `LLMVisionOCRService` from those kwargs
+4. Four OCR-enhanced converters are registered at **priority -1.0** — before the built-in converters at priority 0.0
+
+When a file is converted:
+
+1. The OCR converter accepts the file
+2. It extracts embedded images from the document
+3. Each image is sent to the LLM with an extraction prompt
+4. The returned text is inserted inline, preserving document structure
+5. If the LLM call fails, conversion continues without that image's text
+
+## Supported File Formats
+
+### PDF
+
+- Embedded images are extracted by position (via `page.images` / page XObjects) and OCR'd inline, interleaved with the surrounding text in vertical reading order.
+- **Scanned PDFs** (pages with no extractable text) are detected automatically: each page is rendered at 300 DPI and sent to the LLM as a full-page image.
+- **Malformed PDFs** that pdfplumber/pdfminer cannot open (e.g. truncated EOF) are retried with PyMuPDF page rendering, so content is still recovered.
+
+### DOCX
+
+- Images are extracted via document part relationships (`doc.part.rels`).
+- OCR is run before the DOCX→HTML→Markdown pipeline executes: placeholder tokens are injected into the HTML so that the markdown converter does not escape the OCR markers, and the final placeholders are replaced with the formatted `*[Image OCR]...[End OCR]*` blocks after conversion.
+- Document flow (headings, paragraphs, tables) is fully preserved around the OCR blocks.
+
+### PPTX
+
+- Picture shapes, placeholder shapes with images, and images inside groups are all supported.
+- Shapes are processed in top-to-left reading order per slide.
+- If an `llm_client` is configured, the LLM is asked for a description first; OCR is used as the fallback when no description is returned.
+
+### XLSX
+
+- Images embedded in worksheets (`sheet._images`) are extracted per sheet.
+- Cell position is calculated from the image anchor coordinates (column/row → Excel letter notation).
+- Images are listed under a `### Images in this sheet:` section after the sheet's data table — they are not interleaved into the table rows.
+
+### Output format
+
+Every extracted OCR block is wrapped as:
+
+```text
+*[Image OCR]
+<extracted text>
+[End OCR]*
+```
+
+## Troubleshooting
+
+### OCR text missing from output
+
+The most likely cause is a missing `llm_client` or `llm_model`. Verify:
+
+```python
+from openai import OpenAI
+from markitdown import MarkItDown
+
+md = MarkItDown(
+    enable_plugins=True,
+    llm_client=OpenAI(),   # required
+    llm_model="gpt-4o",    # required
+)
+```
+
+### Plugin not loading
+
+Confirm the plugin is installed and discovered:
+
+```bash
+markitdown --list-plugins   # should show: ocr
+```
+
+### API errors
+
+The plugin propagates LLM API errors as warnings and continues conversion. Check your API key, quota, and that the chosen model supports vision inputs.
+
+## Development
+
+### Running Tests
+
+```bash
+cd packages/markitdown-ocr
+pytest tests/ -v
+```
+
+### Building from Source
+
+```bash
+git clone https://github.com/microsoft/markitdown.git
+cd markitdown/packages/markitdown-ocr
+pip install -e .
+```
+
+## Contributing
+
+Contributions are welcome! See the [MarkItDown repository](https://github.com/microsoft/markitdown) for guidelines.
+
+## License
+
+MIT — see [LICENSE](LICENSE).
+
+## Changelog
+
+### 0.1.0 (Initial Release)
+
+- LLM Vision OCR for PDF, DOCX, PPTX, XLSX
+- Full-page OCR fallback for scanned PDFs
+- Context-aware inline text insertion
+- Priority-based converter replacement (no code changes required)
@@ -0,0 +1,57 @@
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[project]
+name = "markitdown-ocr"
+dynamic = ["version"]
+description = 'OCR plugin for MarkItDown - Extracts text from images in PDF, DOCX, PPTX, and XLSX via LLM Vision'
+readme = "README.md"
+requires-python = ">=3.10"
+license = "MIT"
+keywords = ["markitdown", "ocr", "pdf", "docx", "xlsx", "pptx", "llm", "vision"]
+authors = [
+  { name = "Contributors", email = "noreply@github.com" },
+]
+classifiers = [
+  "Development Status :: 4 - Beta",
+  "Programming Language :: Python",
+  "Programming Language :: Python :: 3.10",
+  "Programming Language :: Python :: 3.11",
+  "Programming Language :: Python :: 3.12",
+  "Programming Language :: Python :: 3.13",
+  "Programming Language :: Python :: Implementation :: CPython",
+]
+
+# Core dependencies — matches the file-format libraries markitdown already uses
+dependencies = [
+  "markitdown>=0.1.0",
+  "pdfminer.six>=20251230",
+  "pdfplumber>=0.11.9",
+  "PyMuPDF>=1.24.0",
+  "mammoth~=1.11.0",
+  "python-docx",
+  "python-pptx",
+  "pandas",
+  "openpyxl",
+  "Pillow>=9.0.0",
+]
+
+# llm_client is passed in by the user (same as for markitdown image descriptions);
+# install openai or any OpenAI-compatible SDK separately.
+[project.optional-dependencies]
+llm = [
+  "openai>=1.0.0",
+]
+
+[project.urls]
+Documentation = "https://github.com/microsoft/markitdown#readme"
+Issues = "https://github.com/microsoft/markitdown/issues"
+Source = "https://github.com/microsoft/markitdown"
+
+[tool.hatch.version]
+path = "src/markitdown_ocr/__about__.py"
+
+# CRITICAL: Plugin entry point - MarkItDown will discover this plugin through this entry point
+[project.entry-points."markitdown.plugin"]
+ocr = "markitdown_ocr"
@@ -0,0 +1,4 @@
+# SPDX-FileCopyrightText: 2025-present Contributors
+# SPDX-License-Identifier: MIT
+
+__version__ = "0.1.0"
@@ -0,0 +1,31 @@
+# SPDX-FileCopyrightText: 2025-present Contributors
+# SPDX-License-Identifier: MIT
+
+"""
+markitdown-ocr: OCR plugin for MarkItDown
+
+Adds LLM Vision-based text extraction from images embedded in PDF, DOCX, PPTX, and XLSX files.
+"""
+
+from ._plugin import __plugin_interface_version__, register_converters
+from .__about__ import __version__
+from ._ocr_service import (
+    OCRResult,
+    LLMVisionOCRService,
+)
+from ._pdf_converter_with_ocr import PdfConverterWithOCR
+from ._docx_converter_with_ocr import DocxConverterWithOCR
+from ._pptx_converter_with_ocr import PptxConverterWithOCR
+from ._xlsx_converter_with_ocr import XlsxConverterWithOCR
+
+__all__ = [
+    "__version__",
+    "__plugin_interface_version__",
+    "register_converters",
+    "OCRResult",
+    "LLMVisionOCRService",
+    "PdfConverterWithOCR",
+    "DocxConverterWithOCR",
+    "PptxConverterWithOCR",
+    "XlsxConverterWithOCR",
+]
@@ -0,0 +1,189 @@
+"""
+Enhanced DOCX Converter with OCR support for embedded images.
+Extracts images from Word documents and performs OCR while maintaining context.
+"""
+
+import io
+import re
+import sys
+from typing import Any, BinaryIO, Optional
+
+from markitdown.converters import HtmlConverter
+from markitdown.converter_utils.docx.pre_process import pre_process_docx
+from markitdown import DocumentConverterResult, StreamInfo
+from markitdown._exceptions import (
+    MissingDependencyException,
+    MISSING_DEPENDENCY_MESSAGE,
+)
+from ._ocr_service import LLMVisionOCRService
+
+# Try loading dependencies
+_dependency_exc_info = None
+try:
+    import mammoth
+    from docx import Document
+except ImportError:
+    _dependency_exc_info = sys.exc_info()
+
+# Placeholder injected into HTML so that mammoth never sees the OCR markers.
+# Must be a single token with no special markdown characters.
+_PLACEHOLDER = "MARKITDOWNOCRBLOCK{}"
+
+
+class DocxConverterWithOCR(HtmlConverter):
+    """
+    Enhanced DOCX Converter with OCR support for embedded images.
+    Maintains document flow while extracting text from images inline.
+    """
+
+    def __init__(self, ocr_service: Optional[LLMVisionOCRService] = None):
+        super().__init__()
+        self._html_converter = HtmlConverter()
+        self.ocr_service = ocr_service
+
+    def accepts(
+        self,
+        file_stream: BinaryIO,
+        stream_info: StreamInfo,
+        **kwargs: Any,
+    ) -> bool:
+        mimetype = (stream_info.mimetype or "").lower()
+        extension = (stream_info.extension or "").lower()
+
+        if extension == ".docx":
+            return True
+
+        if mimetype.startswith(
+            "application/vnd.openxmlformats-officedocument.wordprocessingml"
+        ):
+            return True
+
+        return False
+
+    def convert(
+        self,
+        file_stream: BinaryIO,
+        stream_info: StreamInfo,
+        **kwargs: Any,
+    ) -> DocumentConverterResult:
+        if _dependency_exc_info is not None:
+            raise MissingDependencyException(
+                MISSING_DEPENDENCY_MESSAGE.format(
+                    converter=type(self).__name__,
+                    extension=".docx",
+                    feature="docx",
+                )
+            ) from _dependency_exc_info[1].with_traceback(
+                _dependency_exc_info[2]
+            )  # type: ignore[union-attr]
+
+        # Get OCR service if available (from kwargs or instance)
+        ocr_service: Optional[LLMVisionOCRService] = (
+            kwargs.get("ocr_service") or self.ocr_service
+        )
+
+        if ocr_service:
+            # 1. Extract and OCR images — returns raw text per image
+            file_stream.seek(0)
+            image_ocr_map = self._extract_and_ocr_images(file_stream, ocr_service)
+
+            # 2. Convert DOCX → HTML via mammoth
+            file_stream.seek(0)
+            pre_process_stream = pre_process_docx(file_stream)
+            html_result = mammoth.convert_to_html(
+                pre_process_stream, style_map=kwargs.get("style_map")
+            ).value
+
+            # 3. Replace <img> tags with plain placeholder tokens so that
+            #    mammoth's HTML→markdown step never escapes our OCR markers.
+            html_with_placeholders, ocr_texts = self._inject_placeholders(
+                html_result, image_ocr_map
+            )
+
+            # 4. Convert HTML → markdown
+            md_result = self._html_converter.convert_string(
+                html_with_placeholders, **kwargs
+            )
+            md = md_result.markdown
+
+            # 5. Swap placeholders for the actual OCR blocks (post-conversion
+            #    so * and _ are never escaped by the markdown converter).
+            for i, raw_text in enumerate(ocr_texts):
+                placeholder = _PLACEHOLDER.format(i)
+                ocr_block = f"*[Image OCR]\n{raw_text}\n[End OCR]*"
+                md = md.replace(placeholder, ocr_block)
+
+            return DocumentConverterResult(markdown=md)
+        else:
+            # Standard conversion without OCR
+            style_map = kwargs.get("style_map", None)
+            pre_process_stream = pre_process_docx(file_stream)
+            return self._html_converter.convert_string(
+                mammoth.convert_to_html(pre_process_stream, style_map=style_map).value,
+                **kwargs,
+            )
+
+    def _extract_and_ocr_images(
+        self, file_stream: BinaryIO, ocr_service: LLMVisionOCRService
+    ) -> dict[str, str]:
+        """
+        Extract images from DOCX and OCR them.
+
+        Returns:
+            Dict mapping image relationship IDs to raw OCR text (no markers).
+        """
+        ocr_map = {}
+
+        try:
+            file_stream.seek(0)
+            doc = Document(file_stream)
+
+            for rel in doc.part.rels.values():
+                if "image" in rel.target_ref.lower():
+                    try:
+                        image_bytes = rel.target_part.blob
+                        image_stream = io.BytesIO(image_bytes)
+                        ocr_result = ocr_service.extract_text(image_stream)
+
+                        if ocr_result.text.strip():
+                            # Store raw text only — markers added later
+                            ocr_map[rel.rId] = ocr_result.text.strip()
+
+                    except Exception:
+                        continue
+
+        except Exception:
+            pass
+
+        return ocr_map
+
+    def _inject_placeholders(
+        self, html: str, ocr_map: dict[str, str]
+    ) -> tuple[str, list[str]]:
+        """
+        Replace <img> tags with numbered placeholder tokens.
+
+        Returns:
+            (html_with_placeholders, ordered list of raw OCR texts)
+        """
+        if not ocr_map:
+            return html, []
+
+        ocr_texts = list(ocr_map.values())
+        used: list[int] = []
+
+        def replace_img(match: re.Match) -> str:  # type: ignore[type-arg]
+            for i in range(len(ocr_texts)):
+                if i not in used:
+                    used.append(i)
+                    return f"<p>{_PLACEHOLDER.format(i)}</p>"
+            return ""  # remove image if all OCR texts already used
+
+        result = re.sub(r"<img[^>]*>", replace_img, html)
+
+        # Any OCR texts that had no matching <img> tag go at the end
+        for i in range(len(ocr_texts)):
+            if i not in used:
+                result += f"<p>{_PLACEHOLDER.format(i)}</p>"
+
+        return result, ocr_texts
@@ -0,0 +1,110 @@
+"""
+OCR Service Layer for MarkItDown
+Provides LLM Vision-based image text extraction.
+"""
+
+import base64
+from typing import Any, BinaryIO
+from dataclasses import dataclass
+
+from markitdown import StreamInfo
+
+
+@dataclass
+class OCRResult:
+    """Result from OCR extraction."""
+
+    text: str
+    confidence: float | None = None
+    backend_used: str | None = None
+    error: str | None = None
+
+
+class LLMVisionOCRService:
+    """OCR service using LLM vision models (OpenAI-compatible)."""
+
+    def __init__(
+        self,
+        client: Any,
+        model: str,
+        default_prompt: str | None = None,
+    ) -> None:
+        """
+        Initialize LLM Vision OCR service.
+
+        Args:
+            client: OpenAI-compatible client
+            model: Model name (e.g., 'gpt-4o', 'gemini-2.0-flash')
+            default_prompt: Default prompt for OCR extraction
+        """
+        self.client = client
+        self.model = model
+        self.default_prompt = default_prompt or (
+            "Extract all text from this image. "
+            "Return ONLY the extracted text, maintaining the original "
+            "layout and order. Do not add any commentary or description."
+        )
+
+    def extract_text(
+        self,
+        image_stream: BinaryIO,
+        prompt: str | None = None,
+        stream_info: StreamInfo | None = None,
+        **kwargs: Any,
+    ) -> OCRResult:
+        """Extract text using LLM vision."""
+        if self.client is None:
+            return OCRResult(
+                text="",
+                backend_used="llm_vision",
+                error="LLM client not configured",
+            )
+
+        try:
+            image_stream.seek(0)
+
+            content_type: str | None = None
+            if stream_info:
+                content_type = stream_info.mimetype
+
+            if not content_type:
+                try:
+                    from PIL import Image
+
+                    image_stream.seek(0)
+                    img = Image.open(image_stream)
+                    fmt = img.format.lower() if img.format else "png"
+                    content_type = f"image/{fmt}"
+                except Exception:
+                    content_type = "image/png"
+
+            image_stream.seek(0)
+            base64_image = base64.b64encode(image_stream.read()).decode("utf-8")
+            data_uri = f"data:{content_type};base64,{base64_image}"
+
+            actual_prompt = prompt or self.default_prompt
+            response = self.client.chat.completions.create(
+                model=self.model,
+                messages=[
+                    {
+                        "role": "user",
+                        "content": [
+                            {"type": "text", "text": actual_prompt},
+                            {
+                                "type": "image_url",
+                                "image_url": {"url": data_uri},
+                            },
+                        ],
+                    }
+                ],
+            )
+
+            text = response.choices[0].message.content
+            return OCRResult(
+                text=text.strip() if text else "",
+                backend_used="llm_vision",
+            )
+        except Exception as e:
+            return OCRResult(text="", backend_used="llm_vision", error=str(e))
+        finally:
+            image_stream.seek(0)
@@ -0,0 +1,422 @@
+"""
+Enhanced PDF Converter with OCR support for embedded images.
+Extracts images from PDFs and performs OCR while maintaining document context.
+"""
+
+import io
+import sys
+from typing import Any, BinaryIO, Optional
+
+from markitdown import DocumentConverter, DocumentConverterResult, StreamInfo
+from markitdown._exceptions import (
+    MissingDependencyException,
+    MISSING_DEPENDENCY_MESSAGE,
+)
+from ._ocr_service import LLMVisionOCRService
+
+# Import dependencies
+_dependency_exc_info = None
+try:
+    import pdfminer
+    import pdfminer.high_level
+    import pdfplumber
+    from PIL import Image
+except ImportError:
+    _dependency_exc_info = sys.exc_info()
+
+
+def _extract_images_from_page(page: Any) -> list[dict]:
+    """
+    Extract images from a PDF page by rendering page regions.
+
+    Returns:
+        List of dicts with 'stream', 'bbox', 'name', 'y_pos' keys
+    """
+    images_info = []
+
+    try:
+        # Try multiple methods to detect images
+        images = []
+
+        # Method 1: Use page.images (standard approach)
+        if hasattr(page, "images") and page.images:
+            images = page.images
+
+        # Method 2: If no images found, try underlying PDF objects
+        if not images and hasattr(page, "objects") and "image" in page.objects:
+            images = page.objects.get("image", [])
+
+        # Method 3: Try filtering all objects for image types
+        if not images and hasattr(page, "objects"):
+            all_objs = page.objects
+            for obj_type in all_objs.keys():
+                if "image" in obj_type.lower() or "xobject" in obj_type.lower():
+                    potential_imgs = all_objs.get(obj_type, [])
+                    if potential_imgs:
+                        images = potential_imgs
+                        break
+
+        for i, img_dict in enumerate(images):
+            try:
+                # Try to get the actual image stream from the PDF
+                img_stream = None
+                y_pos = 0
+
+                # Method A: If img_dict has 'stream' key, use it directly
+                if "stream" in img_dict and hasattr(img_dict["stream"], "get_data"):
+                    try:
+                        img_bytes = img_dict["stream"].get_data()
+
+                        # Try to open as PIL Image to validate/decode
+                        pil_img = Image.open(io.BytesIO(img_bytes))
+
+                        # Convert to RGB if needed (handle CMYK, etc.)
+                        if pil_img.mode not in ("RGB", "L"):
+                            pil_img = pil_img.convert("RGB")
+
+                        # Save to stream as PNG
+                        img_stream = io.BytesIO()
+                        pil_img.save(img_stream, format="PNG")
+                        img_stream.seek(0)
+
+                        y_pos = img_dict.get("top", 0)
+                    except Exception:
+                        pass
+
+                # Method B: Fallback to rendering page region
+                if img_stream is None:
+                    x0 = img_dict.get("x0", 0)
+                    y0 = img_dict.get("top", 0)
+                    x1 = img_dict.get("x1", 0)
+                    y1 = img_dict.get("bottom", 0)
+                    y_pos = y0
+
+                    # Check if dimensions are valid
+                    if x1 <= x0 or y1 <= y0:
+                        continue
+
+                    # Use pdfplumber's within_bbox to crop, then render
+                    # This preserves coordinate system correctly
+                    bbox = (x0, y0, x1, y1)
+                    cropped_page = page.within_bbox(bbox)
+
+                    # Render at 150 DPI (balance between quality and size)
+                    page_img = cropped_page.to_image(resolution=150)
+
+                    # Save to stream
+                    img_stream = io.BytesIO()
+                    page_img.original.save(img_stream, format="PNG")
+                    img_stream.seek(0)
+
+                if img_stream:
+                    images_info.append(
+                        {
+                            "stream": img_stream,
+                            "name": f"page_{page.page_number}_img_{i}",
+                            "y_pos": y_pos,
+                        }
+                    )
+
+            except Exception:
+                continue
+
+    except Exception:
+        pass
+
+    return images_info
+
+
+class PdfConverterWithOCR(DocumentConverter):
+    """
+    Enhanced PDF Converter with OCR support for embedded images.
+    Maintains document structure while extracting text from images inline.
+    """
+
+    def __init__(self, ocr_service: Optional[LLMVisionOCRService] = None):
+        super().__init__()
+        self.ocr_service = ocr_service
+
+    def accepts(
+        self,
+        file_stream: BinaryIO,
+        stream_info: StreamInfo,
+        **kwargs: Any,
+    ) -> bool:
+        mimetype = (stream_info.mimetype or "").lower()
+        extension = (stream_info.extension or "").lower()
+
+        if extension == ".pdf":
+            return True
+
+        if mimetype.startswith("application/pdf") or mimetype.startswith(
+            "application/x-pdf"
+        ):
+            return True
+
+        return False
+
+    def convert(
+        self,
+        file_stream: BinaryIO,
+        stream_info: StreamInfo,
+        **kwargs: Any,
+    ) -> DocumentConverterResult:
+        if _dependency_exc_info is not None:
+            raise MissingDependencyException(
+                MISSING_DEPENDENCY_MESSAGE.format(
+                    converter=type(self).__name__,
+                    extension=".pdf",
+                    feature="pdf",
+                )
+            ) from _dependency_exc_info[1].with_traceback(
+                _dependency_exc_info[2]
+            )  # type: ignore[union-attr]
+
+        # Get OCR service if available (from kwargs or instance)
+        ocr_service: LLMVisionOCRService | None = (
+            kwargs.get("ocr_service") or self.ocr_service
+        )
+
+        # Read PDF into BytesIO
+        file_stream.seek(0)
+        pdf_bytes = io.BytesIO(file_stream.read())
+
+        markdown_content = []
+
+        try:
+            with pdfplumber.open(pdf_bytes) as pdf:
+                for page_num, page in enumerate(pdf.pages, 1):
+                    markdown_content.append(f"\n## Page {page_num}\n")
+
+                    # If OCR is enabled, interleave text and images by position
+                    if ocr_service:
+                        images_on_page = self._extract_page_images(pdf_bytes, page_num)
+
+                        if images_on_page:
+                            # Extract text lines with Y positions
+                            chars = page.chars
+                            if chars:
+                                # Group chars into lines based on Y position
+                                lines_with_y = []
+                                current_line = []
+                                current_y = None
+
+                                for char in sorted(
+                                    chars, key=lambda c: (c["top"], c["x0"])
+                                ):
+                                    y = char["top"]
+                                    if current_y is None:
+                                        current_y = y
+                                    elif abs(y - current_y) > 2:  # New line threshold
+                                        if current_line:
+                                            text = "".join(
+                                                [c["text"] for c in current_line]
+                                            )
+                                            lines_with_y.append(
+                                                {"y": current_y, "text": text.strip()}
+                                            )
+                                        current_line = []
+                                        current_y = y
+                                    current_line.append(char)
+
+                                # Add last line
+                                if current_line:
+                                    text = "".join([c["text"] for c in current_line])
+                                    lines_with_y.append(
+                                        {"y": current_y, "text": text.strip()}
+                                    )
+                            else:
+                                # Fallback: use simple text extraction
+                                text_content = page.extract_text() or ""
+                                lines_with_y = [
+                                    {"y": i * 10, "text": line}
+                                    for i, line in enumerate(text_content.split("\n"))
+                                ]
+
+                            # OCR all images
+                            image_data = []
+                            for img_info in images_on_page:
+                                ocr_result = ocr_service.extract_text(
+                                    img_info["stream"]
+                                )
+                                if ocr_result.text.strip():
+                                    image_data.append(
+                                        {
+                                            "y_pos": img_info["y_pos"],
+                                            "name": img_info["name"],
+                                            "ocr_text": ocr_result.text,
+                                            "backend": ocr_result.backend_used,
+                                            "type": "image",
+                                        }
+                                    )
+
+                            # Add text items
+                            content_items = [
+                                {
+                                    "y_pos": item["y"],
+                                    "text": item["text"],
+                                    "type": "text",
+                                }
+                                for item in lines_with_y
+                                if item["text"]
+                            ]
+                            content_items.extend(image_data)
+
+                            # Sort all items by Y position (top to bottom)
+                            content_items.sort(key=lambda x: x["y_pos"])
+
+                            # Build markdown by interleaving text and images
+                            for item in content_items:
+                                if item["type"] == "text":
+                                    markdown_content.append(item["text"])
+                                else:  # image
+                                    ocr_text = item["ocr_text"]
+                                    img_marker = (
+                                        f"\n\n*[Image OCR]\n{ocr_text}\n[End OCR]*\n"
+                                    )
+                                    markdown_content.append(img_marker)
+                        else:
+                            # No images detected - just extract regular text
+                            text_content = page.extract_text() or ""
+                            if text_content.strip():
+                                markdown_content.append(text_content.strip())
+                    else:
+                        # No OCR, just extract text
+                        text_content = page.extract_text() or ""
+                        if text_content.strip():
+                            markdown_content.append(text_content.strip())
+
+                # Build final markdown
+                markdown = "\n\n".join(markdown_content).strip()
+
+                # Fallback to pdfminer if empty
+                if not markdown:
+                    pdf_bytes.seek(0)
+                    markdown = pdfminer.high_level.extract_text(pdf_bytes)
+
+        except Exception:
+            # Fallback to pdfminer
+            try:
+                pdf_bytes.seek(0)
+                markdown = pdfminer.high_level.extract_text(pdf_bytes)
+            except Exception:
+                markdown = ""
+
+        # Final fallback: If still empty/whitespace and OCR is available,
+        # treat as scanned PDF and OCR full pages
+        if ocr_service and (not markdown or not markdown.strip()):
+            pdf_bytes.seek(0)
+            markdown = self._ocr_full_pages(pdf_bytes, ocr_service)
+
+        return DocumentConverterResult(markdown=markdown)
+
+    def _extract_page_images(self, pdf_bytes: io.BytesIO, page_num: int) -> list[dict]:
+        """
+        Extract images from a PDF page using pdfplumber.
+
+        Args:
+            pdf_bytes: PDF file as BytesIO
+            page_num: Page number (1-indexed)
+
+        Returns:
+            List of image info dicts with 'stream', 'bbox', 'name', 'y_pos'
+        """
+        images = []
+
+        try:
+            pdf_bytes.seek(0)
+            with pdfplumber.open(pdf_bytes) as pdf:
+                if page_num <= len(pdf.pages):
+                    page = pdf.pages[page_num - 1]  # 0-indexed
+                    images = _extract_images_from_page(page)
+        except Exception:
+            pass
+
+        # Sort by vertical position (top to bottom)
+        images.sort(key=lambda x: x["y_pos"])
+
+        return images
+
+    def _ocr_full_pages(
+        self, pdf_bytes: io.BytesIO, ocr_service: LLMVisionOCRService
+    ) -> str:
+        """
+        Fallback for scanned PDFs: Convert entire pages to images and OCR them.
+        Used when text extraction returns empty/whitespace results.
+
+        Args:
+            pdf_bytes: PDF file as BytesIO
+            ocr_service: OCR service to use
+
+        Returns:
+            Markdown text extracted from OCR of full pages
+        """
+        markdown_parts = []
+
+        try:
+            pdf_bytes.seek(0)
+            with pdfplumber.open(pdf_bytes) as pdf:
+                for page_num, page in enumerate(pdf.pages, 1):
+                    try:
+                        markdown_parts.append(f"\n## Page {page_num}\n")
+
+                        # Render page to image
+                        page_img = page.to_image(resolution=300)
+                        img_stream = io.BytesIO()
+                        page_img.original.save(img_stream, format="PNG")
+                        img_stream.seek(0)
+
+                        # Run OCR
+                        ocr_result = ocr_service.extract_text(img_stream)
+
+                        if ocr_result.text.strip():
+                            text = ocr_result.text.strip()
+                            markdown_parts.append(f"*[Image OCR]\n{text}\n[End OCR]*")
+                        else:
+                            markdown_parts.append(
+                                "*[No text could be extracted from this page]*"
+                            )
+
+                    except Exception as e:
+                        markdown_parts.append(
+                            f"*[Error processing page {page_num}: {str(e)}]*"
+                        )
+                        continue
+
+        except Exception:
+            # pdfplumber failed (e.g. malformed EOF) — try PyMuPDF for rendering
+            markdown_parts = []
+            try:
+                import fitz  # PyMuPDF
+
+                pdf_bytes.seek(0)
+                doc = fitz.open(stream=pdf_bytes.read(), filetype="pdf")
+                for page_num in range(1, doc.page_count + 1):
+                    try:
+                        markdown_parts.append(f"\n## Page {page_num}\n")
+                        page = doc[page_num - 1]
+                        mat = fitz.Matrix(300 / 72, 300 / 72)  # 300 DPI
+                        pix = page.get_pixmap(matrix=mat)
+                        img_stream = io.BytesIO(pix.tobytes("png"))
+                        img_stream.seek(0)
+
+                        ocr_result = ocr_service.extract_text(img_stream)
+
+                        if ocr_result.text.strip():
+                            text = ocr_result.text.strip()
+                            markdown_parts.append(f"*[Image OCR]\n{text}\n[End OCR]*")
+                        else:
+                            markdown_parts.append(
+                                "*[No text could be extracted from this page]*"
+                            )
+
+                    except Exception as e:
+                        markdown_parts.append(
+                            f"*[Error processing page {page_num}: {str(e)}]*"
+                        )
+                        continue
+                doc.close()
+            except Exception:
+                return "*[Error: Could not process scanned PDF]*"
+
+        return "\n\n".join(markdown_parts).strip()
@@ -0,0 +1,68 @@
+"""
+Plugin registration for markitdown-ocr.
+Registers OCR-enhanced converters with priority-based replacement strategy.
+"""
+
+from typing import Any
+from markitdown import MarkItDown
+
+from ._ocr_service import LLMVisionOCRService
+from ._pdf_converter_with_ocr import PdfConverterWithOCR
+from ._docx_converter_with_ocr import DocxConverterWithOCR
+from ._pptx_converter_with_ocr import PptxConverterWithOCR
+from ._xlsx_converter_with_ocr import XlsxConverterWithOCR
+
+
+__plugin_interface_version__ = 1
+
+
+def register_converters(markitdown: MarkItDown, **kwargs: Any) -> None:
+    """
+    Register OCR-enhanced converters with MarkItDown.
+
+    This plugin provides OCR support for PDF, DOCX, PPTX, and XLSX files.
+    The converters are registered with priority -1.0 to run BEFORE built-in
+    converters (which have priority 0.0), effectively replacing them when
+    the plugin is enabled.
+
+    Args:
+        markitdown: MarkItDown instance to register converters with
+        **kwargs: Additional keyword arguments that may include:
+            - llm_client: OpenAI-compatible client for LLM-based OCR (required for OCR to work)
+            - llm_model: Model name (e.g., 'gpt-4o')
+            - llm_prompt: Custom prompt for text extraction
+    """
+    # Create OCR service — reads the same llm_client/llm_model kwargs
+    # that MarkItDown itself already accepts for image descriptions
+    llm_client = kwargs.get("llm_client")
+    llm_model = kwargs.get("llm_model")
+    llm_prompt = kwargs.get("llm_prompt")
+
+    ocr_service: LLMVisionOCRService | None = None
+    if llm_client and llm_model:
+        ocr_service = LLMVisionOCRService(
+            client=llm_client,
+            model=llm_model,
+            default_prompt=llm_prompt,
+        )
+
+    # Register converters with priority -1.0 (before built-ins at 0.0)
+    # This effectively "replaces" the built-in converters when plugin is installed
+    # Pass the OCR service to each converter's constructor
+    PRIORITY_OCR_ENHANCED = -1.0
+
+    markitdown.register_converter(
+        PdfConverterWithOCR(ocr_service=ocr_service), priority=PRIORITY_OCR_ENHANCED
+    )
+
+    markitdown.register_converter(
+        DocxConverterWithOCR(ocr_service=ocr_service), priority=PRIORITY_OCR_ENHANCED
+    )
+
+    markitdown.register_converter(
+        PptxConverterWithOCR(ocr_service=ocr_service), priority=PRIORITY_OCR_ENHANCED
+    )
+
+    markitdown.register_converter(
+        XlsxConverterWithOCR(ocr_service=ocr_service), priority=PRIORITY_OCR_ENHANCED
+    )
@@ -0,0 +1,249 @@
+"""
+Enhanced PPTX Converter with improved OCR support.
+Already has LLM-based image description, this enhances it with traditional OCR fallback.
+"""
+
+import io
+import sys
+from typing import Any, BinaryIO, Optional
+
+from typing import BinaryIO, Any, Optional
+
+from markitdown.converters import HtmlConverter
+from markitdown import DocumentConverter, DocumentConverterResult, StreamInfo
+from markitdown._exceptions import (
+    MissingDependencyException,
+    MISSING_DEPENDENCY_MESSAGE,
+)
+from ._ocr_service import LLMVisionOCRService
+
+_dependency_exc_info = None
+try:
+    import pptx
+except ImportError:
+    _dependency_exc_info = sys.exc_info()
+
+
+class PptxConverterWithOCR(DocumentConverter):
+    """Enhanced PPTX Converter with OCR fallback."""
+
+    def __init__(self, ocr_service: Optional[LLMVisionOCRService] = None):
+        super().__init__()
+        self._html_converter = HtmlConverter()
+        self.ocr_service = ocr_service
+
+    def accepts(
+        self,
+        file_stream: BinaryIO,
+        stream_info: StreamInfo,
+        **kwargs: Any,
+    ) -> bool:
+        mimetype = (stream_info.mimetype or "").lower()
+        extension = (stream_info.extension or "").lower()
+
+        if extension == ".pptx":
+            return True
+
+        if mimetype.startswith(
+            "application/vnd.openxmlformats-officedocument.presentationml"
+        ):
+            return True
+
+        return False
+
+    def convert(
+        self,
+        file_stream: BinaryIO,
+        stream_info: StreamInfo,
+        **kwargs: Any,
+    ) -> DocumentConverterResult:
+        if _dependency_exc_info is not None:
+            raise MissingDependencyException(
+                MISSING_DEPENDENCY_MESSAGE.format(
+                    converter=type(self).__name__,
+                    extension=".pptx",
+                    feature="pptx",
+                )
+            ) from _dependency_exc_info[1].with_traceback(
+                _dependency_exc_info[2]
+            )  # type: ignore[union-attr]
+
+        # Get OCR service (from kwargs or instance)
+        ocr_service: Optional[LLMVisionOCRService] = (
+            kwargs.get("ocr_service") or self.ocr_service
+        )
+        llm_client = kwargs.get("llm_client")
+
+        presentation = pptx.Presentation(file_stream)
+        md_content = ""
+        slide_num = 0
+
+        for slide in presentation.slides:
+            slide_num += 1
+            md_content += f"\\n\\n<!-- Slide number: {slide_num} -->\\n"
+
+            title = slide.shapes.title
+
+            def get_shape_content(shape, **kwargs):
+                nonlocal md_content
+
+                # Pictures
+                if self._is_picture(shape):
+                    # Get image data
+                    image_stream = io.BytesIO(shape.image.blob)
+
+                    # Try LLM description first if available
+                    llm_description = ""
+                    if llm_client and kwargs.get("llm_model"):
+                        try:
+                            from ._llm_caption import llm_caption
+
+                            image_filename = shape.image.filename
+                            image_extension = None
+                            if image_filename:
+                                import os
+
+                                image_extension = os.path.splitext(image_filename)[1]
+
+                            image_stream_info = StreamInfo(
+                                mimetype=shape.image.content_type,
+                                extension=image_extension,
+                                filename=image_filename,
+                            )
+
+                            llm_description = llm_caption(
+                                image_stream,
+                                image_stream_info,
+                                client=llm_client,
+                                model=kwargs.get("llm_model"),
+                                prompt=kwargs.get("llm_prompt"),
+                            )
+                        except Exception:
+                            pass
+
+                    # Try OCR if LLM failed or not available
+                    ocr_text = ""
+                    if not llm_description and ocr_service:
+                        try:
+                            image_stream.seek(0)
+                            ocr_result = ocr_service.extract_text(image_stream)
+                            if ocr_result.text.strip():
+                                ocr_text = ocr_result.text.strip()
+                        except Exception:
+                            pass
+
+                    # Format extracted content using unified OCR block format
+                    content = (llm_description or ocr_text or "").strip()
+                    if content:
+                        md_content += f"\n*[Image OCR]\n{content}\n[End OCR]*\n"
+
+                # Tables
+                if self._is_table(shape):
+                    md_content += self._convert_table_to_markdown(shape.table, **kwargs)
+
+                # Charts
+                if shape.has_chart:
+                    md_content += self._convert_chart_to_markdown(shape.chart)
+
+                # Text areas
+                elif shape.has_text_frame:
+                    if shape == title:
+                        md_content += "# " + shape.text.lstrip() + "\\n"
+                    else:
+                        md_content += shape.text + "\\n"
+
+                # Group Shapes
+                if shape.shape_type == pptx.enum.shapes.MSO_SHAPE_TYPE.GROUP:
+                    sorted_shapes = sorted(
+                        shape.shapes,
+                        key=lambda x: (
+                            float("-inf") if not x.top else x.top,
+                            float("-inf") if not x.left else x.left,
+                        ),
+                    )
+                    for subshape in sorted_shapes:
+                        get_shape_content(subshape, **kwargs)
+
+            sorted_shapes = sorted(
+                slide.shapes,
+                key=lambda x: (
+                    float("-inf") if not x.top else x.top,
+                    float("-inf") if not x.left else x.left,
+                ),
+            )
+            for shape in sorted_shapes:
+                get_shape_content(shape, **kwargs)
+
+            md_content = md_content.strip()
+
+            if slide.has_notes_slide:
+                md_content += "\\n\\n### Notes:\\n"
+                notes_frame = slide.notes_slide.notes_text_frame
+                if notes_frame is not None:
+                    md_content += notes_frame.text
+                md_content = md_content.strip()
+
+        return DocumentConverterResult(markdown=md_content.strip())
+
+    def _is_picture(self, shape):
+        if shape.shape_type == pptx.enum.shapes.MSO_SHAPE_TYPE.PICTURE:
+            return True
+        if shape.shape_type == pptx.enum.shapes.MSO_SHAPE_TYPE.PLACEHOLDER:
+            if hasattr(shape, "image"):
+                return True
+        return False
+
+    def _is_table(self, shape):
+        if shape.shape_type == pptx.enum.shapes.MSO_SHAPE_TYPE.TABLE:
+            return True
+        return False
+
+    def _convert_table_to_markdown(self, table, **kwargs):
+        import html
+
+        html_table = "<html><body><table>"
+        first_row = True
+        for row in table.rows:
+            html_table += "<tr>"
+            for cell in row.cells:
+                if first_row:
+                    html_table += "<th>" + html.escape(cell.text) + "</th>"
+                else:
+                    html_table += "<td>" + html.escape(cell.text) + "</td>"
+            html_table += "</tr>"
+            first_row = False
+        html_table += "</table></body></html>"
+
+        return (
+            self._html_converter.convert_string(html_table, **kwargs).markdown.strip()
+            + "\\n"
+        )
+
+    def _convert_chart_to_markdown(self, chart):
+        try:
+            md = "\\n\\n### Chart"
+            if chart.has_title:
+                md += f": {chart.chart_title.text_frame.text}"
+            md += "\\n\\n"
+            data = []
+            category_names = [c.label for c in chart.plots[0].categories]
+            series_names = [s.name for s in chart.series]
+            data.append(["Category"] + series_names)
+
+            for idx, category in enumerate(category_names):
+                row = [category]
+                for series in chart.series:
+                    row.append(series.values[idx])
+                data.append(row)
+
+            markdown_table = []
+            for row in data:
+                markdown_table.append("| " + " | ".join(map(str, row)) + " |")
+            header = markdown_table[0]
+            separator = "|" + "|".join(["---"] * len(data[0])) + "|"
+            return md + "\\n".join([header, separator] + markdown_table[1:])
+        except ValueError as e:
+            if "unsupported plot type" in str(e):
+                return "\\n\\n[unsupported chart]\\n\\n"
+        except Exception:
+            return "\\n\\n[unsupported chart]\\n\\n"
@@ -0,0 +1,225 @@
+"""
+Enhanced XLSX Converter with OCR support for embedded images.
+Extracts images from Excel spreadsheets and performs OCR while maintaining cell context.
+"""
+
+import io
+import sys
+from typing import Any, BinaryIO, Optional
+
+from markitdown.converters import HtmlConverter
+from markitdown import DocumentConverter, DocumentConverterResult, StreamInfo
+from markitdown._exceptions import (
+    MissingDependencyException,
+    MISSING_DEPENDENCY_MESSAGE,
+)
+from ._ocr_service import LLMVisionOCRService
+
+# Try loading dependencies
+_xlsx_dependency_exc_info = None
+try:
+    import pandas as pd
+    from openpyxl import load_workbook
+except ImportError:
+    _xlsx_dependency_exc_info = sys.exc_info()
+
+
+class XlsxConverterWithOCR(DocumentConverter):
+    """
+    Enhanced XLSX Converter with OCR support for embedded images.
+    Extracts images with their cell positions and performs OCR.
+    """
+
+    def __init__(self, ocr_service: Optional[LLMVisionOCRService] = None):
+        super().__init__()
+        self._html_converter = HtmlConverter()
+        self.ocr_service = ocr_service
+
+    def accepts(
+        self,
+        file_stream: BinaryIO,
+        stream_info: StreamInfo,
+        **kwargs: Any,
+    ) -> bool:
+        mimetype = (stream_info.mimetype or "").lower()
+        extension = (stream_info.extension or "").lower()
+
+        if extension == ".xlsx":
+            return True
+
+        if mimetype.startswith(
+            "application/vnd.openxmlformats-officedocument.spreadsheetml"
+        ):
+            return True
+
+        return False
+
+    def convert(
+        self,
+        file_stream: BinaryIO,
+        stream_info: StreamInfo,
+        **kwargs: Any,
+    ) -> DocumentConverterResult:
+        if _xlsx_dependency_exc_info is not None:
+            raise MissingDependencyException(
+                MISSING_DEPENDENCY_MESSAGE.format(
+                    converter=type(self).__name__,
+                    extension=".xlsx",
+                    feature="xlsx",
+                )
+            ) from _xlsx_dependency_exc_info[1].with_traceback(
+                _xlsx_dependency_exc_info[2]
+            )  # type: ignore[union-attr]
+
+        # Get OCR service if available (from kwargs or instance)
+        ocr_service: Optional[LLMVisionOCRService] = (
+            kwargs.get("ocr_service") or self.ocr_service
+        )
+
+        if ocr_service:
+            # Remove ocr_service from kwargs to avoid duplicate argument error
+            kwargs_without_ocr = {k: v for k, v in kwargs.items() if k != "ocr_service"}
+            return self._convert_with_ocr(
+                file_stream, ocr_service, **kwargs_without_ocr
+            )
+        else:
+            return self._convert_standard(file_stream, **kwargs)
+
+    def _convert_standard(
+        self, file_stream: BinaryIO, **kwargs: Any
+    ) -> DocumentConverterResult:
+        """Standard conversion without OCR."""
+        file_stream.seek(0)
+        sheets = pd.read_excel(file_stream, sheet_name=None, engine="openpyxl")
+        md_content = ""
+
+        for sheet_name in sheets:
+            md_content += f"## {sheet_name}\n"
+            html_content = sheets[sheet_name].to_html(index=False)
+            md_content += (
+                self._html_converter.convert_string(
+                    html_content, **kwargs
+                ).markdown.strip()
+                + "\n\n"
+            )
+
+        return DocumentConverterResult(markdown=md_content.strip())
+
+    def _convert_with_ocr(
+        self, file_stream: BinaryIO, ocr_service: LLMVisionOCRService, **kwargs: Any
+    ) -> DocumentConverterResult:
+        """Convert XLSX with image OCR."""
+        file_stream.seek(0)
+        wb = load_workbook(file_stream)
+
+        md_content = ""
+
+        for sheet_name in wb.sheetnames:
+            sheet = wb[sheet_name]
+            md_content += f"## {sheet_name}\n\n"
+
+            # Convert sheet data to markdown table
+            file_stream.seek(0)
+            try:
+                df = pd.read_excel(
+                    file_stream, sheet_name=sheet_name, engine="openpyxl"
+                )
+                html_content = df.to_html(index=False)
+                md_content += (
+                    self._html_converter.convert_string(
+                        html_content, **kwargs
+                    ).markdown.strip()
+                    + "\n\n"
+                )
+            except Exception:
+                # If pandas fails, just skip the table
+                pass
+
+            # Extract and OCR images in this sheet
+            images_with_ocr = self._extract_and_ocr_sheet_images(sheet, ocr_service)
+
+            if images_with_ocr:
+                md_content += "### Images in this sheet:\n\n"
+                for img_info in images_with_ocr:
+                    ocr_text = img_info["ocr_text"]
+                    md_content += f"*[Image OCR]\n{ocr_text}\n[End OCR]*\n\n"
+
+        return DocumentConverterResult(markdown=md_content.strip())
+
+    def _extract_and_ocr_sheet_images(
+        self, sheet: Any, ocr_service: LLMVisionOCRService
+    ) -> list[dict]:
+        """
+        Extract and OCR images from an Excel sheet.
+
+        Args:
+            sheet: openpyxl worksheet
+            ocr_service: OCR service
+
+        Returns:
+            List of dicts with 'cell_ref' and 'ocr_text'
+        """
+        results = []
+
+        try:
+            # Check if sheet has images
+            if hasattr(sheet, "_images"):
+                for img in sheet._images:
+                    try:
+                        # Get image data
+                        if hasattr(img, "_data"):
+                            image_data = img._data()
+                        elif hasattr(img, "image"):
+                            # Some versions store it differently
+                            image_data = img.image
+                        else:
+                            continue
+
+                        # Create image stream
+                        image_stream = io.BytesIO(image_data)
+
+                        # Get cell reference
+                        cell_ref = "unknown"
+                        if hasattr(img, "anchor"):
+                            anchor = img.anchor
+                            if hasattr(anchor, "_from"):
+                                from_cell = anchor._from
+                                if hasattr(from_cell, "col") and hasattr(
+                                    from_cell, "row"
+                                ):
+                                    # Convert column number to letter
+                                    col_letter = self._column_number_to_letter(
+                                        from_cell.col
+                                    )
+                                    cell_ref = f"{col_letter}{from_cell.row + 1}"
+
+                        # Perform OCR
+                        ocr_result = ocr_service.extract_text(image_stream)
+
+                        if ocr_result.text.strip():
+                            results.append(
+                                {
+                                    "cell_ref": cell_ref,
+                                    "ocr_text": ocr_result.text.strip(),
+                                    "backend": ocr_result.backend_used,
+                                }
+                            )
+
+                    except Exception:
+                        continue
+
+        except Exception:
+            pass
+
+        return results
+
+    @staticmethod
+    def _column_number_to_letter(n: int) -> str:
+        """Convert column number to Excel column letter (0-indexed)."""
+        result = ""
+        n = n + 1  # Make 1-indexed
+        while n > 0:
+            n -= 1
+            result = chr(65 + (n % 26)) + result
+            n //= 26
+        return result
@@ -0,0 +1,79 @@
+%PDF-1.3
+%“Œ‹ž ReportLab Generated PDF document http://www.reportlab.com
+1 0 obj
+<<
+/F1 2 0 R
+>>
+endobj
+2 0 obj
+<<
+/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font
+>>
+endobj
+3 0 obj
+<<
+/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 80 /Length 4282 /Subtype /Image 
+  /Type /XObject /Width 400
+>>
+stream
+Gb"/k$+*^]+31jd1_Sc48j,Pi+@:`R01h=9+]FPXQDmE0%*Lb4@[Wi36jU!;cssJbQ5,g%R?K'+$#.h<qu?Z`Dn#2Gqj`$$\bE9$XS)%of4Vd>cT_6mF8#7^^Y_6P]N!%L#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j4Z3bU/Gm9s<T86G'Ht,"?C)(`3j[rI\Y+2%=-fXJAVq>eJc&9=)%mQH&Yh<W#b:QHSf&hc5bR6(7?1RO+2U+,2j=JTJHV/n1*m;JAGbDu[IX.pg30l0S'.0*($<'u;[b/GRDJ[J=c-W0HaX>?jIi$uR^]%u?lJ6Z*VV,Z28T=.3[G"!N]2!6iqW[_CVOQZ9Um#Qd)&t%d!r@Y0=g5[M*c.,qcc"UaVkc?<W;kud8W>KHVDN2.L1\=-s4#kKB4PQPI/e#*[DZR^Y$^Xi6K`(0^so>.#p<K8pN:ein%PZ#Y7*R@MH861DpLo<$rl6(&#6H_8?WBC8!u2*l:OF52PthFZhN<gX0$3^m^nt74tEo^W)lR[oP32CBgdV=rlCn/?MttP@YH\'hL=3DKjK>7hgY.jk*u=UQX(s.K*Fd*WL[eV0^25,*fS$V)Q-Z/Ii`SFMMDe;iM2HMUsg37cCLf8L+b)KY6rDWN#jR5YQ.1R1gba'M2,4kCR4578^=b\Bn#r]R"h8?'u=7,fh/#GD*$m:^@g'YaL,g&OD*(\9V3qM@J4qIp#mRhYee0oG3^KQSpS`k(a)%\\KrFo]NtW+D/curY.W3&31Xd58H(q0_cISK<s@\A@tq[3b*;)prprplo?NNWP"U[H"mT)":p2cCY.e)nGK>lIaB)]QNP9.mWE2l3Yi$/1lIFGq?b"J/%A'=e_4!7DM'qA.H+Eb6+$Wn7]h3.+R^:;FLJDKh6Z0V`KM>R1?7!q\hg`Vs6PqO%XsU&A'e9Y:\qjB8:9X.p&5omkN]TihU"VSM0Gdu%IekLW7Z.T+=gZ8?G+)N`8D1/:))EVV'V%>@o^?^e2`FI#RXkRcVk5<aX)Gb<Anp($Eo(tDRA['co9=J!r?\(k+3obVpgT(rh)[&!*=m"?fb<WboEW]&a*9n`H`'s&IkQkBo"rK"ncnu1$k!hAk*UR=.2DO6/^buFp=jJKa-\QsgioA9CY[S7mOaK!FH$moNAmYB\)0A"En%&3d-/qKIR!Etjt(g9!ZEncRkEWkI'W[)AEhgn9/$+_o'-r&1h\!<m^U0S31\a,_&Rd"r.'Np%gT9f8?\3>1O$"8"QQu<lL!5$/k#`#:KEulTq4-7/YE)\g=Sb$OCcUO)BHRhQu09oJWIZkkosFc_B+&7'c_j!")cW%]@9&7fl>'od45V^,W"[b:mfhGls_c]o[:8?WXO3sS%ABFnD/;VaJp_j5H4#BX4qPO#9Rd2UQp3!11,Hp.<#+W;VjHWUp(CD>tmalGRY\Uq<)U7[bb1;!ICRCfbRQSW`R7B!G^;uZXfo5`5U7D;,E%89G,#:1)%4QDA%S)!5IL>R;C=R4rV;kBDYiJMaSpYGjR@-K1X!l^X,hul.*@fk/SRgZtX.(?1#F1Z,3(>l3p>i<Pf+sbdtG`]h4ZQR\8ke79"3MknReA"?c^RDe@Fjk1.cu5MjEDpdNhJ+7mGf2KIY:S*Y\2Cs77pae4B]4nt\0_9hN"cX*)6Y95CMGu-b.h<l/f)o087<L+.ZYQN\"^nJd(2$BIfm6^Z>hKjO-5DXhZKVbK,R$;#1+h5rhZ?WW+cIfDMR5M:U;WJk^U1M2V=1pp;4^.2,/RU"b7N@$b8R4LOO?H"DR.Lf`L[*m,BTYmDZ_t`L-M$_)#8#p!)[O706GPi_l#Yq>cO^MHRc(Jp:hO`,H*Y]jp")!6$Iu21q$\8nLN&Ju<?TEli:_c^Fu;7mar?jW@Fi5=&+@XX3Du$Vp!Z:kp'-MBe4(Gq5273Z*<l$oQj,ndL:>:,=6H/*LPHo45Js7W8j$_!Qm0FH1P&^"`>@W4%?`Nma<X,sJlXF*,/9?-'cJp]Gl[CD(*jN88AiD,rcf:jl=)$?G1A+QH`L1Y,qGh381N!)?4VfakRqR\de*W_P5=i_rQ88,Nf"08ju'!L3:gtBn9tR`<1O'UehuL-ao(I9mcdD[iu:\EjK;,iTiXhVd0(hgkW_rte\s*ID1Wu(.MjQ`-_-:KRW+1tA<S?3r8>E^)_qfq#N;4tr+%k&Ep8k#92@_4?NnV=N@"8F,!hg:if"abZSI*B&dFMB&j8pk=5i_MJAeY/_a-bBH!b7VKr\Kt#C"Ke<_A>`"`=AC>VJ=jpNj/XAJ.8N&11/:hfIr$D^^R2#qRLKK:(9GU8"CB@_;$5Fq-q:K0TBPN]^2`GM'aEs1Y+T=D'>N2JXWoc8.%IYO^gsm'1RJSeGm+YDRQhLku5aKi&&h'k:Ae':8oK<la[fL\k0;fH3(LIfJts]t<l4*,ri:knWWe!M_E[M,&V9JH2`"=)ml_1[8!OOU7V,rHd]X#^@U_hK>1_Fu*NH]a>^r>**\J#14;Ei@8Dd[B!VZ.j64i(icM@UQ_>]1i+QL[q8@sXNl,qq<0pH2r<c]E5`R>K@bgt+3u4X[=5N,XXpe$Pa+h/i2Ns+!9@kBH_P,uQG__S.W7M^frRPr4EZHW;p0Je?#:'3`%IWs^jMgsS>TFs]-96.iKS'H_`---RRk+q]Jr]FS(In4Pq-F!6Cm%,U[%@0.OI2<<)q%YS\L]"SQrA8jisi-Yc]j.NcUR5eZO4@bV<6:Q<7Y8Tbc.:)0RB[f;uae0#hXi-F,V+Y7!Mj#7a2'<d>UX7up@?R.l5hdJ`J2qIRW9l3nLb6mCBmOi<W\odW.t='rA%`7sRbXB5/RD_LA/<@gLr;i'i3jlV::Z3F&:]ir"sAd&Y6P"h>gnWA-O?D51eitk>F^2j&Iq6CcN2Ju0jXH_V;7Z"7$/f.cVY>Mu"+'&]*\$$EFH_au5?=QCNV/dCcC.k5.]`boT#$n8q"$7k7cbB=_S?6!sI(ERNS%rY/q#(V&?"M=dPp_pD^a<mS>iJ84-qUUOnpsEBD(@=c8&j(fD<_iW8Y:1]3'Z*lk814$BMEn>20Z3q9`%2[odf^kVG8_KfBHJTq!iP-bZf!WUjfi-Z(mjNh$1Mk%I4bUXT_KbmDgHtQ7Z[%/U=`ol;d(+7MLe^9J.%pG(>?Z7,R7p2_!_Qbsrj-nZ^jp1P_<pXaK9!2C;b6ck"=Cj_ThjdJFfoo]T[$FN[^)H53%_>QETt1O4#d3il&h>-]FM?E7.3BltXHM-bbl_r^C;;uMGdf.Kh%L(0?a^%V$SMIKn-g/OBA,Ng_8qOt.G4*;07b-d&^'[LU$f5ngd%r-XNimO'c=1SVor0:Eg?<1-k=*lR5.^@!L"%EH/XBn&hq=*'_o;%t#(A>I6JN':Wh5=&pRCU'1C"15l6HQiH<#l)E>c9A33g31NEH\$h]o'o:W53E#msr(FBMb0g*jP1nCIbQ^<-?M19Kr3mq8.j:>;q*:p4Rb"@"DU#`i.DU&`=Vn-ANGOK'T46_'jF^$R0`j>ib(E*\_<8o*cItM:B3D-9Z>Of29HcT0]Z'G'co.PNW`2:qpYXp0-36TIRP-&3V+PPe>^kkuHt*7[/f`Z?74q^`DXV.TS7@]I@7J7#?[&(&hPL%\629`r50o^;oKq?P9#!l9@Fff9p3njK2nUHBg!&A`c[uXD61%4M,a"/_P#gZUo)#L[uI,Q>:BQkk3P?Scmo]DXk])TLK"NX2u"><@[CElgT\uF2.fcn<iiPL)@TrV2\AYDo>2%@`(OZ<M6L#'7K_ZStJZ)]&Fp39s]tR`?'J?rE-I11YEH*I?3FE.#D8]B:lU#l-Q&"X6RDb@GL2>K[lYeY=buQU?HWK8[#]q-;`G![Hb_HQO`H8MPPH7g](3ooEl5/4f<4*3K2RX<59b07B^anj+,ZW=XiC0F_c7bFM+n7)(]:<Ao2d9eEHWd<E81SK4JM$5dbM`T<KnN,YjlTi4>kV>d%&i?1&P=:i,4>V2MnI*kV+_s8='X"H,gcL;Uo:%-"-M]-mmX/gFJ;bSiNq;:Y3_r5g<a"7!Y]Bk,;T:p3c2CBn/b6lYENkm?LZ[fW1tg10cT`#9kR&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j46GWU$GHRA>~>endstream
+endobj
+4 0 obj
+<<
+/Contents 8 0 R /MediaBox [ 0 0 612 792 ] /Parent 7 0 R /Resources <<
+/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject <<
+/FormXob.0315aed9f6006a101b3226a3b7404028 3 0 R
+>>
+>> /Rotate 0 /Trans <<
+
+>> 
+  /Type /Page
+>>
+endobj
+5 0 obj
+<<
+/PageMode /UseNone /Pages 7 0 R /Type /Catalog
+>>
+endobj
+6 0 obj
+<<
+/Author (anonymous) /CreationDate (D:20260126172022+01'00') /Creator (ReportLab PDF Library - www.reportlab.com) /Keywords () /ModDate (D:20260126172022+01'00') /Producer (ReportLab PDF Library - www.reportlab.com) 
+  /Subject (unspecified) /Title (untitled) /Trapped /False
+>>
+endobj
+7 0 obj
+<<
+/Count 1 /Kids [ 4 0 R ] /Type /Pages
+>>
+endobj
+8 0 obj
+<<
+/Filter [ /ASCII85Decode /FlateDecode ] /Length 260
+>>
+stream
+Gas3/9kseb&-h'>I`6Z84fgHmCc;"L7g6_e&889#h,kA$Zt,m0Hdcho6>O[sLZ+YF+:QDRLY`5CAhdUI=MeslW_fp84Bms2r(UspMdQW.jtWA9rW?q[M1*5b[XIYc1kOQ$55sEf7La^q2$a/'T.)S#<V#*e,['$SVK^(f9:,Nq;AW\a?Zt7p:RM+pHF)-4F;E;l5ui'$5;T>HA_.,@?H2a/)Ol=NY+4r->>:n6'/ubPg6GC78<Gb)GJls9>QKuE<U0~>endstream
+endobj
+xref
+0 9
+0000000000 65535 f 
+0000000073 00000 n 
+0000000104 00000 n 
+0000000211 00000 n 
+0000004683 00000 n 
+0000004939 00000 n 
+0000005007 00000 n 
+0000005303 00000 n 
+0000005362 00000 n 
+trailer
+<<
+/ID 
+[<5d5eceaa0d906ef66e559ebcd616f18d><5d5eceaa0d906ef66e559ebcd616f18d>]
+% ReportLab generated PDF document -- digest (http://www.reportlab.com)
+
+/Info 6 0 R
+/Root 5 0 R
+/Size 9
+>>
+startxref
+5712
+%%EOF
@@ -0,0 +1,79 @@
+%PDF-1.3
+%“Œ‹ž ReportLab Generated PDF document http://www.reportlab.com
+1 0 obj
+<<
+/F1 2 0 R
+>>
+endobj
+2 0 obj
+<<
+/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font
+>>
+endobj
+3 0 obj
+<<
+/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 100 /Length 4720 /Subtype /Image 
+  /Type /XObject /Width 500
+>>
+stream
+Gb"0VH#OJ:qoA4A'Hn[Z$4u82K`j4ZR%PRX#F(qaK&V?K8<b&(;#dar'M8Oni!CfV<*>q$(4m``.P7J2`EVk#7#VtC6:(*s$)76DC7esMC3FfEP21e=J%f9>Up;d>4l&7WcZJp*DIk*ozzzzzzzzzzzzzzzzzzzzzzz!!#9dqY/lsQRl8pCtPt8mFiS/o[0Y;WL90Bj2R)5Y[N/Gfn0f!(^fES:Hsio97D?(Xn?5jq!"]KU?6X;&P"ZofW]f$p(q"VdAg3IP#\)UWg`=MO$9TD0>@3js(id(lnKeb'nVdJ7e;beRl:iu3cs8nIEn"+F\0s:]mE81*hAmoIah4b[;OfHn`%MZXAic<H:=+SS^nBEXOMIOI2D3Y<1k'jGjr"Mb7m#8djq#sB[J"W0CSVhD_EOgna,8=^]'+;Q/-Q29o7K`^M3`IrI+P7M:JgD-;8@g?QuaS(L!S'NU#&pVmQm&;+ooep7=EoT&nD(?bbsoCn2i[<[UJujmhj_M<u#m'ah(clITBmFV`P'C`bB@K_BHIO[n\%kN(^*?b\d]&LhJ.U*@&CM]/WZY.jbti9?:_k*Y>(J)7jH@XF(Q5('jYY[u"DM\[nu[VacM!sePdg%40X+-%@'f%P<0baG`E4oP$%Q/JgWmY\Um466eV$?Y)Q9)nh\c]Mq3f`',ShaFWth-oBcOd=q3cTY"+4B:C5D=,9ZUiS/hg5kVA4*K+[1fUQ5bbN_se_nC2++F"$]j*.j.s0:>;7?2OB:nigYB[?_a,Ulb<cmfe?*!8BYQ*mgYI_'G:#:9hc!%'nIu9\#K.OFt>Aq1i\Q1k(X_QRsW=DbSc%F'bgtZ/2>e+n:Kbn'oJ*sr;^;r/1Z+Vo?T!W4\7L>qdS!IH-WeB#r>ch5>%TRL&'caJA=^o?na4Y*tXgMf5H"N01jlPU;HhZ+Fp?gW#Ip5H[Y;u'ao8X`t6%]C0Gr*LD?+V"4C6Y<]^1GKRW1+$NV&M@2Zk6GD=d]J*qS.+7cN!h6O!e)cfIf(11iD*Y7*8G.!Fu"f5Q5o`Fk=$<gK"E?j,ZG(Q<S6H:\dHiQ8`a=>Zb+\X]m_AEjKB&FH%hV\FAFmKDpQr8j8Fc:%F6j4nD>GFYUI\FW`_flD4$L7>hqmCTh$Uf"^_*P&AuHA>O*GCsJP2O$EX=EQ9*O\8c#4&JZa4ARj7@85!.BB?cmQFmIWVr;9Tt>%neNS8ud$:Htthg-nk9OnPg[;$TI`>0jlL=-.a+_hJV!J^kl(gQd$+PUS\<me#jih7@a_WGZZ(.4K"nV+[.iR&jScFk0\%2K,Zhh0o%R)Nq/WF?l+\*Dd3krNhA#g[-\oC]W"'hnHd4_hLeQjHEBn&n644.4m-ZIetY!]Maa6"3/b.DR^j3BPL<$4Z,)s+'5XPm7A'P[OZn]2^N_0O[lEQSs)o18I6(3r!Qc+D!gf8aN/&O]Qq2:ob7jW;;79<m7q;t-1bA`T7?icP9s!jiRtHa:-1$`1XkWli8@t0Uu_-o6OtUq>"[U''if?a#-Wq5c7%ag\H8!"ALF'oU?-RQD7B?-GJ]">g6^'>CFf@5s8D[^<m$#K27ioe<`YMIH[l(oGML?\W`P:J[(9Uijd"PR!k=->Y$F-4CB"/,6\Z#s5Dlg2HM"F7VJkA+mbDo1>N-;k32'EW?6)(KY`B.a^\mY\PAK*gH+(7uU^F-po`(RMK06EPHHdD0;Bn\le,c[Y^V3lFa4(IC]mFtt!K@dP<o88m]iqJ+@)2E/k&hb!@XF*Gief89[^t6$$47L+p[?u]I+odK?/mT/=%`2+)fO@AogV9q*r!U1m1g?N]&Ti<e"o\RXmOcGj);^2<k\(R>\obnltl>ED4q/%Sh>o`U)Q4>YWf)aUl].\e=Ep??@2&sT>Dj(+.kQ]\905L.Bs_jQMg@#5Ad*SY8q<&o<LoLF#$U&]1eeYfg_3L=pCsBn2Zo8@8IZ+Xg>-8CKR8V]"=q/;I!IC.2@XQ-aimNpYWG+0>@4U4t=ghn%S,KY6Ikm?-DX7<>iL_CI@3\nVAbo+bpOJCAW-_HQp]RX&=4gH(-^/Z@s5UCp8LSn\c))=o$*Q*<M^D^#4JMJtt=95Q%_uUo1-F7q-h)d`H"f/Jt%*3B9)\WKo2Erql0!qemE![bE'dH=+t)O0/c#?9IC9XEPfCe/A/_qsP1I:Gi93mAc2E?t738#p#IUW/S7Mm!5=<`aIfEM<Ys2>e&.Y0ZhJX-aq'tbOsIoYE.k=J%d;VB:aB<bI_rblEfBj^p1R=K*Hi'nV;J/+I,YT[]<3iSq0j"_fD5.GHQ9sRi?Hm3c3S-p!79pR,Q`;q!mC0XK\qU5$G/?oAYN4XKK9![O9M9;(JK$g@P<;orgiE)Wd/_e6&iR=?W_Xldsm><h,SP+L,3L,ntesb+_2WpLI=+=Q*1F"S(>qmjXiPmbHLOYe%@qT_T#OK#H+:rVJ+)kHmJLjHDqC"3ei1+I0_7b&fC4SN?G-:Hn<P`F4_m)X;X7>C^ac96r3OHOcmBeK9_BW&gqhjl7$/j4:&TqtBm]_@&#AP,Y']*Eo)'ONPAD?/7inL-[;Y?u405b]Ka3.k@s]eE(j,^\#rI[BQU.aF>#4B$F4/opmSM*F5.or8NVf4NVD[_hmc;1iLl9739e\*dBrn2.Z2*I#rOpSJRG>K>mOaX&`@A]5&)7CQehdOsNbC>B_D5cTCSXB*di92jW_WX!=EhTKNR%fVt.%Q<$j[iSM!0d6/k`CY(3)+XeI\r:.i,Kg1O$IGhnlT&gm[L0VDFcUFbqACJhEm'4S\ChfI]q:mQ"ZL[OBmJ_6*\'6h<$$-W"[^F9VSm/"@Ys!-Y/kBOeN9p]O$ui,lR;]Y'fWi?-gh&>)baIM5;iRQYFQq5Me#,uCc8P?h`0K;LQ;G)X')<<ZlIDq&LLPU^bo=&gcErEUfq:W`I+ft!GF'4,DQL*1IX_:-FmGd!pPJ9q(G?8P`t=nl4**0bl/9C1$Pk;?WMl,4lD^\UVMQ6bn$qBfs80*^[?ET$I!^-aH!XgKQFCSW7VR7m>c6G0oT/C6@E@rs_sQ]md5bQ1:uFPQQE5J6oF@\//u>D@rl6i0Db`9"CffLREne*h9ea#&lL)T6cQ+8d[]`uKp7-3LZ,`(u?(+fr>(mI*G/\k-YJ:uXP.0=t4*2mZ-eQ't.M\@&WZ\RXWp+0AS>cW33cqTe`:fXp?:r\D9j`52V-"$nNukEh^\mZGUTX8#HKq+^Sc;g;39(Dp^!D$\>%6A^eKD`,3r`Ehh.Y<QB$Hd6Dn\4V,K$O2eQ#]H'IHuY<'.PWh7M8sFIp`W<?e^(pf-sk`:cWX(0Rr5S=GEL-Ru1K?[mL]^3qom@^04`'ab3DaOa@KMi0rX@XE^O>K:6c7M&12fk$N'7q-hiZ)75?UbZH"N)8kd3WGbMX"P/.K^RR%.rqaT@h#u_AEs1)jILMOBks:._,[m*eQ/HMh/2T8\^k5p-Gbk4:UO]EUnspPj)`O0k>MR,M8scu:M"<+[OYCfCtV].VbWfJfu8q0hWVoO!s]=g_kY<KG3'BXc*o(K]QH6CDr&"TSejAI&r>p4`r`?FMo`BP7H7X"Km<=Xfhj^&%sjRIEf&?W*]uFIg@FfT]->7T*G\=GA%PJ<HdnOVSmLf.+KHmSZ$jZQ*BecC<(3<9grri,I2*)b1:SDnGU+d]Xg6MY8$T(:.4?Uk`u[BiGe0Oe2f<H[Ue1=Kh4riO*S$)8!@o+s?;W"![]<f%`Y5^Zc;'ok\LTFOfW\3I*DUf6[:**:Q92N&d_']\[d1Hqldmd(IV"Q.V:G^=4P&F/.]W6G(>cB1O#e,SV5<H8f\>>$gU<+7PBoDYDti\Up=RQ$_FGu!./Yu`]tGE[]1Xflr3@X<>TH,QPDG[k\(f5EN#4:dBj9Dtm'hE@kFId$O4XJRs#<fi\gX(P(\HEsYBB9>>%h8.(c#WXs*h$)D\#t'aEg:?XOs\SA`#9^5CU9:p<JsU>L#D+>hdhTPki]s+6c*j=3>f=V>+D"=D5gHfUbY*f!X/5kZq(aU0s.TSSbiDpX_$Rm57L'='`/+(t;Rbo#i[rD<hl-MMd9XiU7<R]U8H\\)5nGm!GTqIWJoT^k&3K2m'gnqWfVra4/mgQ`:tY5PX.=H\ipm,paod-T=!9ie-6^rsOM%b,=gWO8LhMekGg^s513Ue4#ZT>@pYrm#1Im](Qt<AVg4pp1hYAJ<c+q=&_b]DnkJ,HRr<'>$A+9]t/CS)@C[llo,Jtei=]'$rSMO^!toPHeWb/:-J8LrU&LWJ)\^W(Lh_BC(Sq=]a:sW(9CiU>+L>jbY2:SMO\P;Zl(Q*_#4$"rP+Wa'D/kYlP9isqjdB"Z4^4CMs\(rfP=ldGLb]=a4-[40"Pb'F3QT9K-?3m2,`JlHL%\1Ij*8m=o$^)IJ``GG>8o)=:i+tB%*VOA&aJ4rTY`4<mp,AAS#lUS&fu(ONL&D/#q[E-aS3rEpZinTItFX7`LZA;mpPt<aK*M4`L.JdGj.pL%3[B<0^9=Js@ifcC6aG'JXr;^#lG4Z!G16L%!Kgc\r_t,%&S@[KH;Sbh`b-,X:&f!<?*P^juT/EcNB$W&AtP0_oZ%$360Lace*-_EY$)IJ\1l;I3ZnIJS%;Cu2i#mbPJcDt*f-eZa,X:4"Ls6%]AE=]p#qGtjbd%>DQu\n93U_d,;'5W.rd^OPtDft"Z(lDUVVUibkLA^[AGcHdpAzzzzzzzzzzzzzzzzzz!.Z!\J#>u+ci~>endstream
+endobj
+4 0 obj
+<<
+/Contents 8 0 R /MediaBox [ 0 0 612 792 ] /Parent 7 0 R /Resources <<
+/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject <<
+/FormXob.41b05a9cf8679f0fe6e7c30c9462b767 3 0 R
+>>
+>> /Rotate 0 /Trans <<
+
+>> 
+  /Type /Page
+>>
+endobj
+5 0 obj
+<<
+/PageMode /UseNone /Pages 7 0 R /Type /Catalog
+>>
+endobj
+6 0 obj
+<<
+/Author (anonymous) /CreationDate (D:20260126172022+01'00') /Creator (ReportLab PDF Library - www.reportlab.com) /Keywords () /ModDate (D:20260126172022+01'00') /Producer (ReportLab PDF Library - www.reportlab.com) 
+  /Subject (unspecified) /Title (untitled) /Trapped /False
+>>
+endobj
+7 0 obj
+<<
+/Count 1 /Kids [ 4 0 R ] /Type /Pages
+>>
+endobj
+8 0 obj
+<<
+/Filter [ /ASCII85Decode /FlateDecode ] /Length 250
+>>
+stream
+Gas2BZ&Z[T&4Ckp`KUTrY_02PMb#<CFN=Wfj',kM@19sp55uUe"pptDD)Los"F*-#r%7t"K39EA8f/'^$OO.*D:jQe'n<f:3Cq8'p9Rm8qll,u+[sQj[W6hrFQL%\7G?"sX/%4LXYeUkIBuT`A)Y3?=ouE3GIShId3E("2qqVte.E2,r_bJ%q1G(F,@9C<XiC-L`O1W5it(MP9X]^nj..r=,_#ecrj!ceT&ATWd4)p.7/d!C@/gP%;p#~>endstream
+endobj
+xref
+0 9
+0000000000 65535 f 
+0000000073 00000 n 
+0000000104 00000 n 
+0000000211 00000 n 
+0000005122 00000 n 
+0000005378 00000 n 
+0000005446 00000 n 
+0000005742 00000 n 
+0000005801 00000 n 
+trailer
+<<
+/ID 
+[<38bd217c814ddf937f148e537dce51f8><38bd217c814ddf937f148e537dce51f8>]
+% ReportLab generated PDF document -- digest (http://www.reportlab.com)
+
+/Info 6 0 R
+/Root 5 0 R
+/Size 9
+>>
+startxref
+6141
+%%EOF
@@ -0,0 +1,139 @@
+%PDF-1.3
+%“Œ‹ž ReportLab Generated PDF document http://www.reportlab.com
+1 0 obj
+<<
+/F1 2 0 R
+>>
+endobj
+2 0 obj
+<<
+/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font
+>>
+endobj
+3 0 obj
+<<
+/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 80 /Length 3374 /Subtype /Image 
+  /Type /XObject /Width 400
+>>
+stream
+Gb"/kH&U9Q'ZR>43%?O'&jT85=3qe+N1f0n5S02hEQ*VDL]ZS)-n8)hB\WWtW#6!n.\\/*',>!9/rH$h#V%%<D1W$*'jLaXkBH(D['+neI4_Y@WVLF\J+T;ghR7Y(mQ%b#VGJrM63n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&/S]&BhHiT>FDK@e$Z6UXYgOsk+@,I0-;pc?Ln*mjmTTlQ+?K]e$#D.eB)O5NG7.u<*,RJ_p,8ck3p(&R=G"JnR/3XR:fF1m&!LSDdTjBc:RULFd1\@RVR9Y?IMP'Ajfu#dnfOdRMIRU5?(oCkRY2lY[PsI[bY!Lo7-qefWg9>U_8tF1IHqd?[hMs%mDBpEmdN`p:`8WC.(r[:;%"2F3\c4Mk792G,A7iab,>+^X:Q1B)Ct4JU?`lHM9=Q*\(H$'0?&(T*=du53O+iUIN?O=m=In-UqEE?ghWKl;bQuY*`RG[FO81(L.M1]2c_&%<`*Rq.JU[hs+4A7O4D[l.+Bm#OHs>Bg2RPN#ZQX55r(P>4!naFjcC+c`URpH'c],PFQUH`f2c]IHB5B)pF^[%Q[/+G3L2gM0I)]FA`$/.TW]/!sdN]/(^i(\!C)"CY$!K4SDm&fpLsK;:U@[oi*BN6O,DaRecA5ZZ2bqqSNigQne<3\(m/F9af2d;H'i,rO4%j7$89^HKB/%EH:d:UE-7IC?3kP08QNWFSkIn&tXU0hVtf\g_heWn%H2@\EYR$R+7l,it!qkZIsA%.8%:<b4Q+4o7Re<@/=uc<D?1CkG#t*Sj+#k=3?C8<eoR]gMmuDKs]#U,%B6^\'Q>l.UaQb$n85YGMOQX1#>:k7>o*u_[[jr:Hg5Je^Y:RKsDj_5eqt.H?@qPba^,b_QC<DQE1;H6P%jjc90Rgp0)'S2.M@\$lIp4C<@5NV3A+Sq=@L,V9GDV`O9L@%f5,%,!P;IfdpJOYbPun*G`61QOD3'0Y6GmF^2XsRCVfROl*/ge#f)W1s!?*VEG*0."1`MfU_Qj-_HcIc]o.$p?:mMIQ<MK^AHukRr/l=E7?<+:@V_=m:@ob4Q**VPHYWj[Zo=CRr=V!*BT#C(LJ_:?.qW/l#KCp@t8d\[H2o7Bt=PmC=",eKi0L-.*!_c/%pOEK43<A[3H$o[#=noHdqYr^oBKc5fjO>LsH:W;>mQ8`hXnJRbG389AnU"B(jou524Uh#I;&<U<No:XESS[)bnZpSA"f;hd8gI+P]H$/\(H<O[nan=8V^RLa-H@:Xf+/INI%?Zd4q"#jj>q*.4u=YF[nrO\>7O#of";9>"U0op9fI&cNL!f<:OL7XG#UCV^-W3n&X!D@elVYI"hK-(KVsoa8aWL<4F`IGb_t#tRF<8C]_m^L^GO6G3G5S09nd$$CBb>0uZ(bhmeXG1t'(r;5s6L_nY]J6S``aIm.70L=toNMorm#gcR5B07$ZWs&l!IuphJhN(6@caN->pY971W`M`H*G1Tioii3W&eZ2TZB^Na&P9Djoa9RTg9i]Nf@JZ^+U;<o?jn>70)4?,Z*\6\f"L%[`BJ6Kj9)%X!(RqAJ!s1(OmA61d_0J*HI^@ba8Pi</l@aVqWUPab%]CO5^)(hH3sGpX-^oTZd5(_lb]&C\k)B3IsfnOdI_nbfq2>Pl-9KMgjahN*A`OY%3@&b90[&bRjRh=*NU*X?K'!LVo@=gF%20@?O=gnO^q+Jr=$XI<l#nIRPZI$e8J>159G:[l)_5.-%,a+qd/.CDY\U4-ibD!])@[SqAD!$G2,qi7HmfIX"M?gq4^a:eU]L7T>7]\Ni+_8Hg7U"jUaRtKJ]bt'P<e&53@l:okMAKlUSY_?MI_!>3`l9op5MT]g<,Eau4CBfS9qgs8'hVO^q,7ISl%l9Re9VphQHBQpHf7;ouN+#41(!]Bpq.a"s<1RM\\qcK'V\`Q!N1@Y46f"8:opL%80Oi]-T\^Ju'R*bNtSnP$N;[5T*[5i*NZi1b"eWK@pn&8g.BWL$qSS5iR0*5>0K<j*&mC2tplBiAcJVg9(]ip]PZ1oSVKVeSV_/S3Pg9=ab"pNZ5V'2SCk-Vb@KNuhlT6mp3R?8XX`:Q.L9H,UNhF3YacG3W(VX0*uZ8>4e?>K_G;Rn$CUR?oeU93At-Arf=,=bsA0p$p(CN!F<.%bX@`pKfj\]c&XOS1!:`do;;tZ6cVZD;+'sUf$Bt,Q7PD^1rYmq+$QG]u!IsY+87o.os_h#/VHUYNfDXl;b!f:?gYC^>D*J>\&T\cBJ-sI_$N&=^rPk>LSVt;>P.46N:frEgm*GGj4=<qT,Y(1Zc+Z]h7%8,[8^^eLRgoB$=^7=$"Xl3[>=b41/JMmG;,"j2T%QRo@!%UIU4G6f=MZjM:Z2<iT63Xu]!qai:DQ:RWOH]Hn]!\#0inkU)(>L9M]+g80^DM^g"!X,S'0pD;8jH/WctXls@'Zr*MP]h7%8,[8^^e]B13k1X#5g%F[m.h'jn0tqYq>l-H\%mM%:Cju%L]qAkq0pc,X(bDM0Q0YGKF?F$>3]@tZ>A#n)hKGBreCD[e6OjG&;>-_QZ2iafV]GHA*H^f/E.J;%Om]_HNhJJ%MZ3l1UbMX*]CbunmMSLel=3ef=.@Tn,XYJoeZ)Vm1L<aCJD&:+Ijur)RA66H:%K_'"UbD-=0FNC4t9b>F/i:Y!Za@[WkasO461hJ%d@!MX3Q&p.'mXeE"=VfNAkB83#nEb-?5c%',NbTmp@Ic`7),9GbJp)qSmsX%1!Y73W6=/j,Gsc`f7.5:+Yeeq`Ai+!fL?NOM]56n(<,D$J18;P:'SXB'[uZ2^6A$<-j66f6Jk.&&c3W_A9(cRi[QK19NKCB\1b$_[lLN+.MEm1P"`E)t++5a[#+RZP^-(?@hY,mDDVfkSlm8e\\@>q0l2NF1QP^9\%[*ac\o<7!Y^`e%EF9dL)U==0$Z?$N0G;m+M)j+f_J:*uda;lh'4kU0Eu=[1gfDqKnase*WU=mVh!(:]"OUWP//]CqZjE&P4=Fd]4EP,j2"jQH?"R,$k%R$h6Q3^%g%3\k2'Q2t#0eqVHl3N]f0YdOSV61-pC:UfT.\lB:IuJp7hZ'6FlP!LEm!e%`Z.s*j]KJu),bq<-MI+2hQBC-aRH=2j-B+b#)kL'"'pn0^RRn5.!9II/5DmNHR"$lgsSKY-\*\*MgP]XVFu.o`0C'fI6ZKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&<`?/!KNiek5~>endstream
+endobj
+4 0 obj
+<<
+/Contents 12 0 R /MediaBox [ 0 0 612 792 ] /Parent 11 0 R /Resources <<
+/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject <<
+/FormXob.b6d21e33426b982eedc18c3d4e93428b 3 0 R
+>>
+>> /Rotate 0 /Trans <<
+
+>> 
+  /Type /Page
+>>
+endobj
+5 0 obj
+<<
+/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 80 /Length 3746 /Subtype /Image 
+  /Type /XObject /Width 400
+>>
+stream
+Gb"/kGE<NX(WT`Jj\uYAo)R):e01]`";Q!n&d*n+.):?M$;N-L92SQib[Ni],#Y)4BR+![6n$6QZkR#<71h=R.UYRO+N",lDr8pJ4*g7;J)9%2]0:`2oQ9iO]^,RBHegiOad<J[KFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgJmN6a^Irj:'BVG/#9rV!+sehf4Np$4uGT6?Z%jn76cYI/bg\b0"PYFheo17N)h[bT;Qm:q@c230t>rr(GQpkKpmDTn_bcA]O+Jd#cU@+2Zm>]f;6bs;T&C"&dtb&Y8uEeu<LE-(N92GKeb>4LdJj]\*:qP]'A(VpnpR/2-8pYO1I<E>P5N\HYDRFS?rdSVTS(Rm^Cb\t8pp[.ji7rj(S`LM@bl-r9u\(u5id79+1Zkc$%??stuVZtblhmB,pZt^mu0`%Ls1i]8CHul4%*Hj.6mVDOM9@Q>X0"[KH505<^W'N@NK#fUn2VXT_I7/`G*I"#V]/HO(ZsFbo9PDDePMK]!H;tTT$fdk/bb^Xe<o%RJ;c=p@KgDW9>;u04LW/P]CXtHuJmX$+n(Vc<?4@i#F^)<je+N!;?@B4`(2I>\^&%Tk]_oPE2P5G57Z;<354V60?+`jnL(BV81&C9p'qn^>huY?a*aVo\^AQF(LMnkjY1\5I3=GilDM^Mf#@3^HP(^=%G)"Pl2nSAIgiK?0>KOPWqO$PVHF<:_o#LfLahWdh*9(3^m/EKhl,$Q3cLgQYNON\9m^^C9op:#?rd<2$Vk!&)d<th.i@gbDG"YOud]6@72asongem@R4YF`QZs^cCb&Z^>EY^Bos&AIDEpB'*`7%OW#]/JaVk$ICr,?V+Pq1smN.e[Kbq-r/47c-[@E9!u4pDp'F`gCN0YN(T*.>3lQo[*tf$^DS35K&9pYYmC(C!8KWI=YO-Yh<iYq"0n-PcX/R22T"lAiUQ?;[;g"V[q<\)6WG:J^rp+%ZAH>K@cWZ,bpLf<4]8ob:WF?31lf84UU8ba9QV_YEY=:-f*?M'pH:5PUm1J)N_lEV.S5l4J>"ICf>9o#Q>b\(rC/e:S+@s,o'A+RfKf[#p[C^,r^GPUR4gNZ=J]6kHiV:DYI45Dd,\<-li[]NSt[l+63)OsT7L1W3eYFAoO;cK:jZ>hAN$F,_R#nIAbhgQ+SUFQjta@9[u&(LAMenD)m[`R55a*Y3l"))/k=oMU)6$+i]jVa*618S=TZ_GFf=XBs_%K:K'Fo]DciT&f4e(<2n?Vtb+e^g%Nu,1VYfeOc,e:Kg5KNLk9Rcmq(6W8>+.5UZlp$,%&K@JAXf9t/kp;B>ou]%HuU9<jA3FHc!7?g+9s6HK-!P:8IcUIVq-FJPOLMNgE:%ADX":FE$rF][b6ElT3'f8;d%Jo:3eXU\iUJe7VBlWsY`p!Z_)<M"V>Q7>RO1)%6mY+XGj^Cfi\llJ`iigb(cAMjb)$@^lQ9+"%O3RN/\G-)Fg+T0@+7sBYYOlju6E^l%O_dec#[W(oiP)i[<g)DQB?>1+.GAD<*#ee)n]ZlQc:X6"mf=0EeR3-\Rc-pb@okO8@.<a?P.;n:M[q+PD?$2E+e8+3J=j@AkSTd+T3ms/eoJ)7>\^lM5KERoAErtC0R56,oZN&WoDC3T.FfcQ):>]e:aVd1keQ_tITI3"f#+HG?-@C\E:ct,l:h<Cp?GYt&rHEP%f@DuqiK3-OdO5;eandB'^*u(E>'Y7/kYTAclDW&K5RRRs"3=c::U8&qE3Bk7N4522.XFX-a_8A&BTV-MqW1^SOa5rC:q\>me%lnUEUg#`JHMa+Iu3uP#:.&O#=ggsUgq2ho1`OG(%W"^SARTTWNR+&lC)M$Q`-sKP).<Rn<-H*Y^_.@YouJ.NulS]BgMA@p*ha_HBliRAPSEe%/q,5$rS?H&(.ebb:^u]K\M$!o#]`(^ABPX>'>!(C!`PX,)AUtl)+6md<^LC%`&<0kH\Z:!VH;uD<4`a?Bq\X!.ko\0k6B4?aIal%rsfT"YPIS4"n?"LH<iq4aDoZS1+2c#!%I<oORlE.6243F,5XrA7$Vp<=5I%YtpMPkuDakPrW:M755EjC1M+ASW1"l%6ssh".hAcrM+K$.)r`aK*P&Hs.u$/ckS5US2A?-\M(ZV;>Fn=!Y?$@qsJM8hgJQ9F[&m!?Bq\X'Z@O/_1jZPJh0WIlPaMENKSCZH_qf=)`E/ta?<^&&/;eKN[POQ>c>>*@<Qcq<skZL!t1i)YtpMPkuDakPrZ,@m=)4N13gIaPe1(Bgc3F?fe]L"FH<.+3eU/;OU:9l)jAj1eZ7B0jUb:P*^U'nk0B7LJU1=RVTVD0$LIB(o)AMa*Y*&F=l%,@f3RrO7lp:h;as9gS_OV%PBlb1<mC&X1YMQ7Z;Q.M?N'DLPOF@X4;:2e@\4k)e#VPa.Wa&'e[fnk(@eW9s8Hp3(LD&9UR,/A3p9VHP#n-p/dS?.Tbjb2E/1mX<eeWghoetsFg_!8\hVa==/BRk&$!s#r7l.`*`fG.c&hEW(G1egFu0Agn(Uo=c'TZhS"`tF_h6I>QRnu,W>Ap+0*lZ6>P/?C=#Kk>pfEI%X5o!bF40@(o?U'<]Eiua1#T-.-6qJtTfI(.G1oN.lKY+4/`*/<nC_FrBr@t'B!tXgNRb)RL+TCgW3<iX5O9#P?a!)IFFI8oG32\=#jga2H_&2^^0Kf,.OsLu_#eO08A/mmJDDtTcmr4t6O1`Dr,Q`30k4J%!o:K3@7,[V(bXXTZYYa2Cd1qoBXD(l2cQ3/<j,7X5ml5p#+n>6f$6'lUmhZt>sFRC2D);h+q6STAmNibWJLoa!_K+fl3/2UYVS=VJ+Mu+BpgT4#ns,rG4")XqHRFhp?e]lW)6<M/i!6i]r"JcHp"^;CaL.d(qc=(QVd0f"!@6_597/@.grp/FPoFQ-+&Hk[Kh<ZWObTpod[MGb+)FWKf>msB7&pCcn]iARI''Q3&WZn4`cfmXRYX$=L#_*oT6]L\$1K[YD^6ieQ7a0Sj]d/M^g5G<=fIGJCn-7*ka$`dtNA96I:UC5Sul1`\4p@]Ls&$eZG>,T=j\`^pYF(5psRDI[ZIJUoPB8(M_b:GbMQoKNEpNmMNdcINqeCi3'cEM:qF.X06_nLu%gkBg5VlBSp+B1fTm,9!@mC$E:N.o^gaK:4o/&_:c/cq+m2[#j^;N_DJlWf4=p-!+I-6h@omP!WV_Y"`IN?:CP*<`.u@#A9nDKO*4\G2pT\?kZ'Dt?,HQ7f!)_^L]dn6@h4u8d7pi9\@8;-o?'k"lKf\AD;4odYK88Q^$"H$>rOH)/`G1DGh5_0eMt&PpsLK>qs%#:d;7WAZfKR;rBm5'b+%b!VCn!a[@b*Y1e"S\)QM"QV-!NdrV>Ws'[ojRr<li1<loMopq>5.dVU]'X/_ssN#`kAmA/umL!VA/'h#6Ik/uc^1EO5Ek,(eJ=3@$n19'-D]1fkFPi7&q%$9QZs+c](m.qMB*W<LO;^VmoE[;onAu'qSYrV;=B=>iUUHSFKA3ttm5!=55Q\.qs8;OBp(ii!qKaVIY^A]oF]ER6$H=l(;gJ?HbR]':Z$q1FFKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`fq"Z!rjOlg~>endstream
+endobj
+6 0 obj
+<<
+/Contents 13 0 R /MediaBox [ 0 0 612 792 ] /Parent 11 0 R /Resources <<
+/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject <<
+/FormXob.67f2b803142796cfcc78829acfff7782 5 0 R
+>>
+>> /Rotate 0 /Trans <<
+
+>> 
+  /Type /Page
+>>
+endobj
+7 0 obj
+<<
+/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 80 /Length 3671 /Subtype /Image 
+  /Type /XObject /Width 400
+>>
+stream
+Gb"/kH&NHV(WW/Z*isSg&d2e95Yqb:,+b_Ylbt"c&J%p\W#GlYi-cjh1$F4VW);.c'TXPH7"cs*'CeNo0aOLC"P*Zn'GO!upO2q-m[9Y1pX_cEoNV.hZ!I=.X/ihg[pN.eCb@(q63n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f<&gLu-H0\W/S)K\UiU/d1e=((AE1\ViZgoP7G`;;^96"eA^VhA0L.[BPc_E\[V_jF2]4eaSpjj$D"&Bno1d#['rMp*ip0m[:kfFCp?cGGD5F+!`f-%F.hd'N;+G=82r*?QTUB\d3]4;&O$@@]/UdP:"a?Mun%X(hIdfY%c!Gar_>X+@1HlIml`F@TKbp&ZMDsE$keh8Gd3hgr.uuh/jX)6'!qj^&c9!\hln?+ERl7S6Q>-NjMlpa9'WJ6Y0"C)9EqnU6a<@k<:/9+6qo^@Z'\I'[FS"YZTL.@L2sJd]G1f=ahJl&2l`Ks-M:S_+:'iN)f]\_,l;^8p>qsj0Lbs?q^q550MlTodIT_d4`uhTtM2WG=S:0CRJ?g#U89K(O`^f6E_&QM!\8c8?YcYWG^A,Rga7Gib>7NblcZ\TL@>T?RFh4gP,RLuW%NY0c='mP/s6[2ZP"RWEO$.%PqO$8NHF;:(g19^%:Vd53pXdDSgjf-D?!)SSY:tXbbWl,ljiaKo6&,Tk^%ZEq<rWC2djpdFNmk>T*#!;VcpRKUM_Ah@JTSpQ_,km?"fI5ldt/$XrDiRJ>7IaG`llTEkqIRJqXuLseKJ0E@^B[c'G&YC.*Td\lQ;0M&l<>n.Lhop@hJHBr`p<Goua16/n;ob.>5F)Zdo(A@eK#h]C[X$;ni/uM_oq\m:G*7H0QjW]3@5ik9%GVKE;f&Uf!n]DI`Nb%2C3boPu^,\d9$\T7(8@A:HcW%-c/0@u<e?e^USp7r<*.WB9Qj*&d0_/#)>2Bsh8UF<RaQg/[=u(e[]i3HG8U$[j'W<'t#[TmqC\O=RK\k4gLjcBXSgP'657?9`D%/7'<t=0\%EfUaok*dBq#D;/+Kc<ku7gob>(_H9.A]-D0Nm^Ygqir3#\O:*\f:`5QV2)9W.6-PkGX:u*s.q82:[bLF*!YSl>fWgl`93^mI>>?W-=Q[qRY)bl6lGcHZFNr),2$%FM_ALH%]g?+ZMc<[[9]ZgI@@Xj1?$uZ`fQ@E=T_=^Y)Iq>Z]n3T+,ESq+[<(gAZ#oB@"U39sQJhH7qWUVC,rjBb5Bpf01/"POnP#E1HLYF]r-FX(;KH&I":;elcp?*VMmI:t9XJ+lKojSD4*?HTLsA.bG26.GQ$>[hlK'QDo^(hmQ-cTH%506+oa4EtR1#lVq>$DdUYh/>J)/5O7A3XUoj?\?S"2`7HXiK0'jc'n-K65=5Xbo07YG+,DnJ#k)B0'A/6f!>6\]:+"l=`SG$XA)CADn0K'_K4f/f=5]6+1l9?W_X6Ot?rDnk[F$>,)cO;]%-S-9:BXpp/7pgGNTNfPU?P,hXj.lFe))E5rE0u.QQXBgC'[=5fD-+Dd7D_BiCOsStaKInr&6L*#iQ7hibR@-51:cbp\1q]mqe13@abo2#F%iXN!ou3+QMh*ChlU('RV`>Se52@/A>k8Q@LYVs5!&,>nq4C`OI,m?KN"dj,lR]4^*`J4t3RN0'e>.R)(f4&I7-<086hRGl]?\CGX4ZLuPqD?mGbY492PJuG5NhO5Rl'E?jVGU6ID)(X#%WK)7j1.B[t1a9Ju#GK#qImB754Z:n*t7Ur]Z,U]e<VdfHLtQ-nqGDhosqd,=eUe.n.A!MBr':nf?;3P9TgoKmg#m"o/ECONol,Ita.<KBmQk4*.;]m50gE0nc>k*9ufGe;5djX]Ln4+n3J_qX-GkTR1n9@0\q1VH9&8FY9fC.nhd]SpA>*.D5WHRi4b(7#_j-Wf>90+Nlk8XG?8Zml*$eFnI4mV<54R5pU/km!].$fd-Iknlq@5,Xe$Tq95^0d<lAV<+_t?GZbWe?PK(;2'+J=hKZqF"CV9=i:=Sc3pXBNm0iUoS9^uD((5S3VosJdC=XqK(Y2"k`E5Uq'nDYoh05K4psDTXB`"b1or?HP0("#t;7\'`7fWI:CgYu@0,G<mOUj\+!#:VG'\hb3LlHB]mfcAP9ZI;M,2&bnaX]6X/Y7e('"4F+\QPhZp1:%8f78Z'U.)Ue6KD?ha_fc%12pV.ZP#..XGC/#0BU7nKDib`:Hn"\?[&('o]Qm.c(BLBHqjocLXHf@LM`\>@eFKu9Kg=%YequpeAKtGp$Y/ZWq<GeX&l?&`O&_;(00M@OlMMsp8pnZ(i0jKoBoC=e\>Nh>cB:jR9h2CeD)rjW#<;*rq4m'&E21"B#_8-[n2C1%.R]OKZKFC,\A?;GZg/0Y7QnAmMs\'7ippJ^\Xso)'/EhB"`5cHhZ>EZWOn-3.t*;K+PqCj$o$J&L5tQQ=@Q((N`qd]u'*NP.O+%`e+d_QG%XgkgA[eF6Dhl!:9<aH2$USNm2LWq28%<VYR)jaXbV4YCJ4u\jDXW7Cc+bW^I:L/(3_5/$Gmk<L%t/D89:Y9U<>n!ZO%2OGm.G+*H6L3F-L($4-&[X?+S`,)XG+'c8f#F"j,#4M.6<MY4!T\h9(FHlkcV,>FdOF]l)Y>rspUrd+UDb:`D!)fGlVc8C*chooP,GPs!_V7E]#$=\U/qWTG4Pfm%09%<@9,->21E?O4?&pP2@H$`4$?gM>J.^gGA7I8>OOjhu9o]#%SpYD:!d-.[JU5C>G.uT">4k:L+m`uN(or>=//s',t<F).:*dk3lKC#=$qW5D':K;=$Vh$e>G-.OlmLurLLn0%0^[#WL$M5f>V7B:m$9_l<#pr?m_rNDl<j,-Cn?O7'?6LH#Ilm/to:\)&a/]7'o(b@./-_E+cQ^)a4*],55ON>nbM;@Ec#YLh)mC]GG;L?q"@ifc)?NL)=1E(%%]V"'[:(Jp]+fX=<H2:\81X=InR?-T/csEXCV7l>pXRL(KCof)C>7a(8CDp=I/^[HE.V#Z]?\'*R<,cj#3Qd'QlFa+LO?d-;J@`k]u&gL%D;=rZV;%XY.6PuMmCm6;Dc%f8>TCO-`\Q1J;CGJ(.F>s_rU",SE[+39$;uD4Q!l$]cFl9np^it-Z]/KiBJ2.rd<i1"5=SRESH6hk"I1b&.Z[f2i1jldA*7J@TMK#qXgf3]<7t,gRFY%(IWDRW[lrsp-a6$p3oYdYf(Bd^OH#r>$B]'[tKFhF0C=aRdt[f,Y&iJ]18[YX+V2TDbj8FlguYXiND`1&gV9j[X(r2L6iXSoZBYBjd4#Tfh\E%5A]:QeC^]5U+TaDlRI4g@n3(]?$0/`*ag3C]s<bYFJo\=W[`.=4S7639@Ps.ou\;sq=0D>YKFND8ubs#ku->#@_ZT/BZ#nhJEtc$fKAnu*.>2c7>0%$]DfZY`<r1%=hp-6L:goFqV(ALl`"4(F?bVarqV"-(gH8)Y?-O\1!d^"N#`kAb+5=sRHml8%4?f?63n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUEtn)Jn'&1>[~>endstream
+endobj
+8 0 obj
+<<
+/Contents 14 0 R /MediaBox [ 0 0 612 792 ] /Parent 11 0 R /Resources <<
+/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject <<
+/FormXob.7ce3f428fed09445afad362830e52447 7 0 R
+>>
+>> /Rotate 0 /Trans <<
+
+>> 
+  /Type /Page
+>>
+endobj
+9 0 obj
+<<
+/PageMode /UseNone /Pages 11 0 R /Type /Catalog
+>>
+endobj
+10 0 obj
+<<
+/Author (anonymous) /CreationDate (D:20260126185515+01'00') /Creator (ReportLab PDF Library - www.reportlab.com) /Keywords () /ModDate (D:20260126185515+01'00') /Producer (ReportLab PDF Library - www.reportlab.com) 
+  /Subject (unspecified) /Title (untitled) /Trapped /False
+>>
+endobj
+11 0 obj
+<<
+/Count 3 /Kids [ 4 0 R 6 0 R 8 0 R ] /Type /Pages
+>>
+endobj
+12 0 obj
+<<
+/Filter [ /ASCII85Decode /FlateDecode ] /Length 302
+>>
+stream
+Gas2D0i,\@'SL]1MAmuWBq2]41U`B=+JA;@)ETUJW/a5S$%>'U)FQlACq\foqfM9dic+ZLN<Q`tbE@K*:^spj"Oo"1`W"9\N8S]7/.WgR/fq$*$ITZl?0A3Yd+#RVYd`S"!VHM:q3ue\ZE&.5ico>/#%%PKVtVn!b+n6KWeM,?U:f@u6(=k$)>9=A;GQ#t3m&eV#g&$:bL-jnalu?/Fi#S%7?Zn?-:G9#d\O:D4D7XQ`j*RVq8@Qm.FMjt9rX$+<uAFWrR=.*pU4ORU>6iZ0lp3O3um&1LmEd6.tN*K;n6j'~>endstream
+endobj
+13 0 obj
+<<
+/Filter [ /ASCII85Decode /FlateDecode ] /Length 281
+>>
+stream
+Gas2DYti1j&;GBn`BOD:C$`F9ZVnjI&kF'G*Fh#tN?'!3]KRM,ciKk&h=0aE:V*="Q_Ne&,OhA1lR406EhL)):sXAaA0[ug"g)PsmBSG*k#J$))")C&+kr+KmIL<Brl.":L6#Q;:T1n?*25E!Zk"i,4uuBV3G4oRN56iFD+G.*U'<hlkt*7N8pVC@\#B7T'\f?qTfO:fq24F=Moh9cYOO9_Ug3_JW1$`&3Et?9G$Rf%HgIe&37c9!:H9)*A"58?9%Ib;S.e4E4@\m25^i]720%7~>endstream
+endobj
+14 0 obj
+<<
+/Filter [ /ASCII85Decode /FlateDecode ] /Length 249
+>>
+stream
+Gas2B8INBh&;BTK(%8)Q2+a"?*aMq\4K.?![FW8"e$p+akF8n$UkfH'`dI4a"80L!>ZbC60Zk4LLGE6]&5Z#qMYu/6Ns)3ldF]OCoN(cR,K(-$>Bb@Hb$Fm@B;e+Uh$?f>L6HTg25p.\@EBp=GIr"0+>.bL"Ab!5e$0H>2u,XrGS3n+\I^LXNi]kl12d&'Y,la0?'!jr\BDiS++DQrec,bZT6(6I/"hnM&*R'u?RM762ns?o2j@QC[f~>endstream
+endobj
+xref
+0 15
+0000000000 65535 f 
+0000000073 00000 n 
+0000000104 00000 n 
+0000000211 00000 n 
+0000003775 00000 n 
+0000004033 00000 n 
+0000007969 00000 n 
+0000008227 00000 n 
+0000012088 00000 n 
+0000012346 00000 n 
+0000012415 00000 n 
+0000012712 00000 n 
+0000012784 00000 n 
+0000013177 00000 n 
+0000013549 00000 n 
+trailer
+<<
+/ID 
+[<8efaabb9b9953607755769fba673a5bf><8efaabb9b9953607755769fba673a5bf>]
+% ReportLab generated PDF document -- digest (http://www.reportlab.com)
+
+/Info 10 0 R
+/Root 9 0 R
+/Size 15
+>>
+startxref
+13889
+%%EOF
@@ -0,0 +1,88 @@
+%PDF-1.3
+%東京 ReportLab Generated PDF document http://www.reportlab.com
+1 0 obj
+<<
+/F1 2 0 R
+>>
+endobj
+2 0 obj
+<<
+/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font
+>>
+endobj
+3 0 obj
+<<
+/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 80 /Length 4030 /Subtype /Image 
+  /Type /XObject /Width 400
+>>
+stream
+Gb"/jGApR4(<3O%]`e\8(`3O5&gh'9UL4Y/2^+lN@RPf5J1sk+NQ;)EK[=iEKil-Qc;6BQ-:LGDM9+$J',f.nAHF$>+;)0QS%B_Sc9&LGT7Z-dn+T=!BA[iT=_mDCmsXW7DDr_l&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j4;C$M\fk4SCf9@^_*csp?0E:tA:NU]#jiWj*eoUTRh)0!!`5]fOL5)!H>oFG9DVR2p+lUH`IrqKY)BXD"&`V6FB2;HMc(7';GJ3Ri.fjIH*^/fZ0Or*2Il>Q?21r^]?[Q:^FZY!?_$=YY:S0iN>)NU7,O;iAF)l:J:7R.&b*4>RZXupbn&1%rQH_7Vb5!5MER63=/j;H?_mC;n]Y$@HQVU)1)MO]*WmC]s?Bm'Eo$F't5&H1H?C?\PZOT*=k"OUBF^Z7('_KU*cW%)S*WMHX>B\o<I2:$`SBCZ%`^?q2(GB+]eubDeR*F:Vn)#4P":#10VP[s<B,;6r>eYSG,9p^:L_3O7EcSHah6r%e]`O=YOgf8dp9lDfH=\S3c8r1HgU==+3,mg#Rl>?_pYUI1'87(cTWVY:DUqL#0^"?4&$]I&kNC0`5JLC0C)C?hEoh,WmelnPO$)t=.f&emDnSAh"(gjeWnZaC>tjK_q=<Y;2^p2tgSS*;Q.a5>kWnKp@"uqSN>jhKjj-*a*6R/cmlaT]DPqQi#i_sfOu^V];l<C(qWb+)+X,K]55F9'T7-DN5.u%#%NNom8J@WaU$Ys4EcZ<p'k6@=H1U1Nf\"4dJ%Sa[;MZc\a,_<liPGacdte(N4#Ld&J7G4#qWW.gemV!7e*Ykse!lmlI6&MpTjGEYW""=AdA+bUmFtWqp@60F94#=0o#op<o8V#IIK09?Vu_>9H0E/'"u"N,<U8_fPGUCUB[J#'4)`ugV+[/l@hgImB]$Q&++O4II3)@-X=:iG5aNlrihrDt1>!G@S%i^qIJ7$3^\6AsEbQnB1qh&Ubj<rb+7oR9`51Wc:HuhG`mCl"YPN?sfuR?9TKbo,*Xsp8_GH7iUV'<j2Q"^R:?R!:`1LALo[6Bt.TJfM[:mr31c/1%Y[kk=hS"9r+/DM.<0XIVdF$A<$KL.*`4/*c#08_`F9A=5:/6g][Wq=O\\1b/3Y3fE$[VL2A^Dsg+d,P8$hWM:-_?FRhKnLi6AL;36.CAZjVR[)*:k&[VG3Q>UVt*h^pT^rHWE2?D;2KcRo]!jbUdu=J*Y^iO7NMpK&+/M?E#p8P[:&KcCI&WT>lj0hn!r'Ddu;@XC[FI)\EY_V0_d]7q%$Vah7c\%+%Y5*O#<]LtTjQE1fFkLQENDq6=GM6pd`rMIpbfSH&W.T3_Ol$<b!Gm_&PqlR5%C@JK*Ol!h2Imp7Ot.+d%:%3%3]MgkH[#Hbj+HhPP+H1'Iu;KCj>&_r2rYl%'!QJ.^n(hm(#X5AF,*LSQi,0+l+X_QCd-sXH3[JDTElP16rE1kDGP]cKR_8u#V]Y"6aMOg*%"icN@-gSBDCX=S#a-tPZm-JQ<M>nqsR%UpnUK?#%8%U]B4C%d;CZj!6'e<<Q+hZci=8cPWZ5+GD%id30L5h:gr7\PodKJi:2tP_K@\Qq+EV)a?Ul]h5_1DjI[kCso9J35<SVlOcqg71^,=fU%5!E:*Yls*-m+ARh)lulg3U$-Nba:,pm+SkJTsi93ruC,0)`CY;VB`@`"Ms*Z\d)j&;boQ1M6h3^7dju]MOg*%KdG1:BYpEDMN0Qp=1H1W:%UjRoYunt=j%eq(Yulu6#ZJULEF)i>=uH5kuE5#MQ?sdq?&m6=[kl8Tc=t$9jhO62tP_K@\QrVip+eZoCIBVVI.)e.%EMOIc+6PiK4ajcW;'k,f-)WZWXVHl1GC1G[.CW]@LAEm#oop\)2X5*2\@n4)j*X1-$m:bbYF=S8$HLkmmrTSX5auF,e"0nU1VTA)3IC$D8-]"Dr>:d49"#,PSWbhql]_\g6^db0"bZp1dtL,AY,Hrd`/-m-ruOhB-14LQ;od4K*.0TNLGYVbWfTAdFC7kkt8JqJt7[c^Ql>:b?jjnDUs$l_[IMNbHS?"ibH+Q6-&jp=Nm3cV`0>dPSYcp:A>VG$`7Io@6oL.1Yru`9tj;1MbRC8OuD!T$h`FdRA1Y^%4"c]QM>g?3P;LgT"R'VH'Wq6-8?<USYnh?<PGk\JN:FmgdODqo^Y-[-q!`!^tV."8sD%$a=W62-nVR5dA`f_`,bBeeu3ao@Bu0gUBEIr:BIZ;o%Z0&e^p5.nho"?_Kdin'6=Th05;oSNV>N<>b"^YjQ;nmbG@*@ga"'Fml$&HKSjO@C'C@dp'!_Ffa>t?@e@l=1ULa\_XlA]C"jJ[EOb[EU<Ad0TKc?#SM"3X(Lm^XP:Ai-*%LAh7FIFeZ)VBqd3-*?CmNmaQd@AW)n[b?#",SQm'MSet=LGn*8H(EmU"ap$8frb$7:h(cmkPT'j1f=')P0OfDf-J!fq>LM]CT:sc(6S,=-4)`A+!]_LKEDDRhbe129S\o$XGLlIB_@GSM;0aiBoeY#3\$rnT",rr#-L8WThrVH3Wd>f5/m!I81VBY=an%\pL-#\@;=L#`iU,5`YFD7-fMIo%:V-`E0oiVNt"U>:mo'NpD2RN%p)fKE=Wh?#X9URXg?^k0Ij3g**Ork^G>.)NP0^Zo`HhZs,@E=NRrX<D_QiVi,Ql*A5m(A3^.6?$s:Tscqo1s(3kg6"-]oontmG$5h'tiM,?M3U6bKrXpDQZ+8OY)5\YPQ1RA1]df+(N?OL"VhJ@gqIkI.E-;o/3Qt1BZ.-^fcFk]\"'SkiU-Zp$1)VkDu]4'.6Q)Rpf>e7Rl\9C@L/t\;Z<&1+XN&%j.rRWDZ,P`5RWN'o-KfFuX0F4HJ=HdaGcm]mfpk]J#M4P%(H_.XIrZ=?Cg4eurF6+-eE^<^5E+044/<]H!b,^3/b-4Pk'SYL'H2BJX;H*1(;>.@2s+l4^Ld[GX<"m,,S8jnVW1p25U-)leT"(Rd*85eRMpFgl;HQ5B?dNZ>%sGi@`*P8u]+OF+6o8`@u[s,9A[)598-5To.NR<lP-If-_B`3<WT]66n!$kEk=@IN'deV@j'G*jECo*=n-CGGPSQS2^c0JU>Hgri>q[;4C6)9l.D<V/o>Yr;9tI4q7F2Vk[EZD9m8%k8qS#MUgZFAT/+X&c>ZakEt-!tIC@gq7p=,@:&"fuR?9TKhl$^"^,@C[&CBoS#=R9q$`uOH:#JSJ9<W:p15N\sY?eMAc-TfVUQCf[/aU7Q<+W&cX!Gg5QIU/.fucFm:(bne,@%k0<G*A&jUu)&=dF=S^2O(/T;7N!1"hV*7RChA=/aUheSb/jG#CKc/_f;X(jRo.*8Mg=Ih\Oh@5uQu:VmJml*(fb1#VW`5t>P:&Gm=.MEs`m+]EUC$a"&A7V[4&1(O+/U5tc%5l8bfgduMA7WcZLS/+L8KGT=PZX]or?B?!uj1:N/iq<5oqK2W)9>[j2YeFBE.YV?a;6JT9Q4NVaHp2:S]C*Y[u"D`JYPMk(OUXco6M[L(>@Y58HVo85]"55<n&T0HJOk!N-,QmPlFjWD]R'acb;QGO![lac[s)?XtR.?N'ae(!#%[/$O0^<q#:-:!!5#^Yc+q1Gi1@C=S]=(`^A8mFp['?G60sRthIolIN'VAu5FhcfZf/QG"1R`Q25(?i\KC4#^p(-nO-j%L*Xa(C,)gChDno6`*pRMWEi/GSmFTTK>IG+o_]R.!G(9muq-BFa8ETq]Ipg#U)AK2f>//o4/S?>UdM"B);/a.)]9Kd\TSI[X3Z=U2f//"o0>`-h5:!aHeD^"pG1@4>4BtrUnbQ\p&f=niu3sje\cKZtRhg)ldr?b(YV+-RL1_Dd!GjKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U/qprr=Mu.$a~>endstream
+endobj
+4 0 obj
+<<
+/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 80 /Length 4649 /Subtype /Image 
+  /Type /XObject /Width 400
+>>
+stream
+Gb"/jGB=Qg)obtD(hlkN"f0(5:+-aT<2CgqGS',YK-J92</AC'U_i9g+@S\\U&raT$(ug!$4%UW-.5X!OX),AnUAfpn!Ei.ZhN>CmL:6LTAL_NG:0`'1[i!JbWE/;*Y0EI&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j4Eeer2aX7P8Gl-m;mrV!9$er./Dr&!ITgFGgA]g5nB?j\gC<`5,n"5+/EDb62SNDgA(+`SG0AE<rQgcQJG2U/e+R8qjL(>AddU%7a,tE0=*^(Ec[;Xqd5T\EOW]FtK0RnAjQS4C.$PtF;o^lOY2JjA(q!>?5rnj<G\:)&,dhUp&iW]f.:o[KoXBDknr:%Vf^;G^:].Qgi,t!Cq7>hqh;qnpMFKAOW-;:r<^AH9g>e+lD6ps06kbI-F(^N'<gi-Gcfoqm`D<`e/XBDl/[X1QKi:_Nlme*"2?`R7BfZQ0Yn][CW7>_eACt5OcbEjk(IV6iiD9J4s/kRp`OHAta@uhcDp$1sUc^m;JVdmc-<LpXGp$)c(Hk;,Z7Z;:iTQi5(W(^OWj5YQ"X&GpV4SR^\.hEaCrqG<">P%b#odXg*ftJs\9L;@@2JoU%\UmIb7]_:X"BnAg8PVo7"#l=q;QoL`\om<c?*C)#T0=:[KaS]?>+g+\Q7Q1-1hhO`F6CiVjuIe^m/?\904&j@l.#kH4Fk1D;,Pn,s$FCkgKq>QM?oqqM4!SS5Q<QQX'N=qdO.h^m&2`sM\[n]^V[k)d\U9,\%pa.@q.TDm$L"eIRL!nb*BjG)?7Bqo-V?%d\TU3:EfTg^\mXEJ,E_-&<#c6bEkHfgiJN=njqoeR5#87\R3+#(G8t>*fUYEeuW#*!X5lBcX+"oeKmkS@.(h*0mT1nrUS,bYIsE5U03`ScpJ<e8mf>^IB)[\rqY`>H2da;>5Fp[LW%H??G5X>9I<PAY[@K\1i1gkRchBYhS[+bJ,aqhT)A3+5BlPN)'4Kg>8jeZbo1@@Rl5+ugph@\]QnSZaRRJ1c_/-=odWtJ379>AFMnp+G4!`KBR8dD=K*Pa.$qbpOj^:tQl,H?<^#prXH0"q[r1$MSsu`#-;Bq^d`.=is1np^^aDK96L*.(Hg9*0T*>V9Q`\n^`L]5>i\sQ)AOEM\?DBt!8#7Y0SiiDsB23^-W`?+JZ!P-=ienX,kdX6M.]EUBN#=ETZtP"4E0#kk.nX&MZXupQJK6dnON]"CP^nh:H5s_#io8rs2G=?r4"L]CJn3h!L8UnMn3&`J&s32P.9WsPPW!4%+F!3AJ[djYeuWV-Z>A4"8En[*=5Y$_-RU/bi:`*I1]ICNmogd^&>KHo<_nI&aUW0Z4F*r.YHH(N0/])H@AVt,Qj$t>c%"]+(Grh2@hqR\Kp>)Z"qC's<2ica_Zur<m^u*Y/Q8MTPR<N]nt9$(DsPuVc&uYY%\d#Y-N4c2<Xd/X\=C<^SfiC5&Xk4B%BU5u_1MuTN[G%fFL=0<a#bps%YP9MDr1EZ\)4&m]`M"L$&W*k$l5Y3iR$k4ldeYQkic^(a@KBro!2iME/?\=G3i$/_H*tt(cQ?&U`e+$N@567Stob?D:u4k4BLb^V?V:WLl&4sV)/U+,U1BRd9b&3,)e"jWF"OBU.>-Q2/AL<_pP5LNT<5uc'*A?hCXJ.kFHh"?b\4MT7?jNW:Z<';^CI[++A`7OE8^;3KaGt`tU2'.D<$$(8lJ$qXeKD*.IYNhqsqO(qjtQ7I&aVcqsCYCu`NpZ:0D]`fV90Y<_!ZI:XrsMu4G<QsOrnk)*8Si:<@U^<s54-76lfcp@@u'Ae03>pR/S`Z+]D^@_i>hXTXH<?g3fN.t`mHFrIk+[^u,Nj?A=n'Rm8Z?>Qg<A$#ni\E8Ed[UQk3^PU.?M3aB)jcO&2:>*\=T.d1+*ZF@"?A5D+<XGJ21+oBV+^@Ul)1.3B,E[O-k`eb[<f-3'8ScY12"q)NH?`MY$JYo98TG?o]]l2K>E8\PZb2+R`274i=_&+Z-TjqgJjbPoZE^@ah=VW647kCR58IohRMC(*CR(BT4n*g4pe*Q*FX(Ze.='Up?^1I6=]+CR-!_%#,)!P&(,gr)C9gt'd@U<[_Mh<bGWalRRZ:i#nmA)7@_#-gU80l>7[.BXW9QFj@HU`+`>?^il-h`Cm[n-7Qu"^R/N<pp-\T5XtaG+fY:'fp1.8-b7N1^\)2X5)8a8-&-1WqRO?!W,`XY#S-!.OM%I+5h25bRQ9Y.]eP[9P9!<'"`J%WLB$Hd$-E,<4N*a'd,.Y1#h7D;b:aKfuJfOZ2&A>q)/q=rVY'\h65$\b8K9Z?3pKR74:;W&Vrb0'RUnf8X,*n!Vo@(0TeZW?;S5&\%q=Edol##.]0ta!D>-S?B@9uVpR#`p(A3D=o2%[df2_8e$4'f1)NRB<l#/epT=HLX<p$1'cRu'Req!dUQ^M`WY%C7F7G4"#B&i77,rq+Z8\EqkQUW6mqn`1?2:<99X*Ib9nLECu$5p=q2dRJff._W-+(3b(Yrlpuq[pdum$V%>TH'-m?jdWZp&qR5i[E?3(7'GO%!UQIuh95N^kDD#a!lS/(+69W4/mZ%*VQC%uHIj\7D6iFIm5:M9YL]ma?a!d!FZRM2L8hJQ&\WdViAY@$CLtG[U/u!RSi'E`c5,!URlB=%P(1u[;9l7Q6M'9A^A>u+KnAe0>frXsk/mMom5)EOm'BeFBCOfX;l@XR`#.>7PVnNg$Ar0C2iBc2!hXl2M:?(bVG3Z?oZE^@ah:&r%'`jC6A5c$G<'9m%\d#1i<%XtiOYApFZ#ngUWnJtG/]%:$X.K=GW,cd(,X]VC#UfU)`BPA]i2*s(/L,')$%Fg"HB/6o9V+;QA(poeWD(H%'PK'i*'^M$-GEjj5ZsajEKE@7)e`B&V9IbT7,k5*08(&Z-#ERJlX!nJ6-,qOtU0+)>0FG+$Y3Z0!U5:(0dnE2>ldh:HuO;nY0Pm-Nb)*J,HR6XB5,?i``NJa]m"YMA3m7nYoRq4LCiWU8$(:YI&.rTieR/pt(60)sl<&(qm64bGjc,Wid`T\j#uS,;!3t#ZWX@Qb]H6m+0<mq"pT?/gh,$$MM]52_Qf@HL!0M.BgGYRaS8&f<<*5L9LAQhWCm'9YO$4=E`:QN3t-8WYjUQ_:E*b:=22WPC.B]jq;$$D\t?-7XMIQbD+4-gUCs0@Q;HpGa*a;+>:1;qsHNtRn/+a^TqR>@.Xea;s?o];:@&[UK4L#Bgp,F,RsE=He%6J^-n8]*f2^ig*%<Ho%<Bl^t<YGaN-n_khWk[Q9Js,*5hZBeUD3LcIOkYKViFO>WUU9]mE:;]ttcMZNm[=\K]o`.A)aeHbb.4k%p-knF1D'??PP_$'uAW<n&=urVQ?PaH<5kR52PVqQ%ASi>.YfGi'1*4F,A`4oAd^jGb*;,,KJMg7lSZNi\i-]QnS9fD@keR"@(a"19:*>e>/RJdS>U2U)kn??q^kNZ$^Bf?AOeYG1M#F69N)YKHC81t4$<='G]b*BVjAL@1)g&>WXcmq%"$FN(@d[u(<&)jh7^9q42j;/'(@*qZ9#1_hV7kg;bg1;[eiWMc>NH^cj+,)L?e-tC8Ul?"$BW,("fP"k2kTgOS\'#P]UR$afb6UO5'fV1fm!:>pa-mFu0fN>Vk8V,EUiAp`*k=7lnd6j#G?@gXj^]4:[lhRS7^\h!<=Ohc'UIUA;HcM*b-SH\Up<+TaZX2<A99=J]8a]hLl'35R1/-l(io8r/DFn:Ul4p7&\[%C"A]pCUpdeZ(I(:I`"K>JrHeBK!>nJAaY?kLL/gnU,#h^q""dFhRG&6H/a5UeX7Z<FFkZgN[Qi[]b[qUYmnY9pRZKapTW57tpDaub.C],WPGf!!siWZ[$Gc?(sK1T!bnhJ6m\oc&$UX6PlA0M$=gp2k0U'g^N_FXLaSPN%91<W='3St=UOZ_!6o;:`G7>p6gFeM-UOA/J@Q7cIs89qr*N`gtc.gV7W@Pgu35H&1[il-gG6ps9s12"o1k*p:dX^3kuci>4)9#`+:[3-;KGd.!54*Cm-Y<5R+fnq"UN/<B'K;7<YI,ljoR\iis3%@XTHKF\iIdi7K^8P2@-It.qj05blIf9,65(+?-)OUrW];+]CXb/IHrbtO&cAE>ejI8lFT$382/`"$_QOh#2/DLjqoXT1J5gUR^:Q73Z/%*Q8G$9Be]Pl[k/.(FEb(9d)@N&6Z]ZhSo`8H7)_IDWLQ('dTVL5SII6X+!=b>6UY]Ahta^E[MKH*pf9IX>_4DG/tD:u3@Q=+'LrO%cbH8T["^qG*h2K%:eK3(85o6G^0<BC>e=,qT0_iZI$F6Ci^qWb,K`6fP]$4[.MF'Y6B(&,:Gh,TCP2$t[aqq^Lo&44Hf_5)pDh0RQRF/\'r*qi?.M@`+%+QqN0=0@MG=&Q81PV9AI^:8FXigm1m+bV6r>dtn(*3_sE%hF_WLlbEq6:+#iY$HCPCI\XR.7d!#(c,btV+R!a:h@tE4Z#"&=0Gs$*@i:d&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+lr@e*ukCHmf~>endstream
+endobj
+5 0 obj
+<<
+/Contents 9 0 R /MediaBox [ 0 0 612 792 ] /Parent 8 0 R /Resources <<
+/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject <<
+/FormXob.41b05a9cf8679f0fe6e7c30c9462b767 3 0 R /FormXob.94284ebb61fac7951963d5746d1b193a 4 0 R
+>>
+>> /Rotate 0 /Trans <<
+
+>> 
+  /Type /Page
+>>
+endobj
+6 0 obj
+<<
+/PageMode /UseNone /Pages 8 0 R /Type /Catalog
+>>
+endobj
+7 0 obj
+<<
+/Author (anonymous) /CreationDate (D:20260126172022+01'00') /Creator (ReportLab PDF Library - www.reportlab.com) /Keywords () /ModDate (D:20260126172022+01'00') /Producer (ReportLab PDF Library - www.reportlab.com) 
+  /Subject (unspecified) /Title (untitled) /Trapped /False
+>>
+endobj
+8 0 obj
+<<
+/Count 1 /Kids [ 5 0 R ] /Type /Pages
+>>
+endobj
+9 0 obj
+<<
+/Filter [ /ASCII85Decode /FlateDecode ] /Length 300
+>>
+stream
+Gas3-b=]`-&-h'@TAj4XMk6`hV:j"r+Qu/5_JPH2[*jl@3?0-u[(9'GR:-/*/ft>A_=nj;i7d2;EpsDoJr<OBhVlHiq4E/El7+06*H?(h_eGnqiS:>Dgn0>N^CGqOd65m'$2XdN[8"CN<R^<p;O;.QTL>"4'o-s=`lHc!JpSi8$*d@]6l&@V%Q+V`W6/nPEL_rB?OF1iZbk.;Ju<];RLo@-9lO$dQ,9&`I`%EM@\dBr0Lf$$+R^&+/ncK?;0=7o:`];ceF"uKA7ETdrT"0YNT=QC"`>/@%I83@M@]K&@Nk~>endstream
+endobj
+xref
+0 10
+0000000000 65535 f 
+0000000073 00000 n 
+0000000104 00000 n 
+0000000211 00000 n 
+0000004431 00000 n 
+0000009270 00000 n 
+0000009574 00000 n 
+0000009642 00000 n 
+0000009938 00000 n 
+0000009997 00000 n 
+trailer
+<<
+/ID 
+[<60f7c7338a7d1cfd54f86e6a06e41602><60f7c7338a7d1cfd54f86e6a06e41602>]
+% ReportLab generated PDF document -- digest (http://www.reportlab.com)
+
+/Info 7 0 R
+/Root 6 0 R
+/Size 10
+>>
+startxref
+10387
+%%EOF
@@ -0,0 +1,223 @@
+"""
+Unit tests for DocxConverterWithOCR.
+
+For each DOCX test file: convert with a mock OCR service then compare the
+full output string against the expected snapshot.
+
+OCR block format used by the converter:
+    *[Image OCR]
+    MOCK_OCR_TEXT_12345
+    [End OCR]*
+"""
+
+import sys
+from pathlib import Path
+from typing import Any
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
+
+from markitdown_ocr._ocr_service import OCRResult  # noqa: E402
+from markitdown_ocr._docx_converter_with_ocr import (  # noqa: E402
+    DocxConverterWithOCR,
+)
+from markitdown import StreamInfo  # noqa: E402
+
+TEST_DATA_DIR = Path(__file__).parent / "ocr_test_data"
+
+_MOCK_TEXT = "MOCK_OCR_TEXT_12345"
+
+
+class MockOCRService:
+    def extract_text(  # noqa: ANN101
+        self, image_stream: Any, **kwargs: Any
+    ) -> OCRResult:
+        return OCRResult(text=_MOCK_TEXT, backend_used="mock")
+
+
+@pytest.fixture(scope="module")
+def svc() -> MockOCRService:
+    return MockOCRService()
+
+
+def _convert(filename: str, ocr_service: MockOCRService) -> str:
+    path = TEST_DATA_DIR / filename
+    if not path.exists():
+        pytest.skip(f"Test file not found: {path}")
+    converter = DocxConverterWithOCR()
+    with open(path, "rb") as f:
+        return converter.convert(
+            f, StreamInfo(extension=".docx"), ocr_service=ocr_service
+        ).text_content
+
+
+# ---------------------------------------------------------------------------
+# docx_image_start.docx
+# ---------------------------------------------------------------------------
+
+
+def test_docx_image_start(svc: MockOCRService) -> None:
+    expected = (
+        "Document with Image at Start\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "This is the main content after the header image.\n\n"
+        "More text content here."
+    )
+    assert _convert("docx_image_start.docx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# docx_image_middle.docx
+# ---------------------------------------------------------------------------
+
+
+def test_docx_image_middle(svc: MockOCRService) -> None:
+    expected = (
+        "# Introduction\n\n"
+        "This is the introduction section.\n\n"
+        "We will see an image below.\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "# Analysis\n\n"
+        "This section comes after the image."
+    )
+    assert _convert("docx_image_middle.docx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# docx_image_end.docx
+# ---------------------------------------------------------------------------
+
+
+def test_docx_image_end(svc: MockOCRService) -> None:
+    expected = (
+        "Report\n\n"
+        "Main findings of the report.\n\n"
+        "Details and analysis.\n\n"
+        "Recommendations.\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+    )
+    assert _convert("docx_image_end.docx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# docx_multiple_images.docx
+# ---------------------------------------------------------------------------
+
+
+def test_docx_multiple_images(svc: MockOCRService) -> None:
+    expected = (
+        "Multi-Image Document\n\n"
+        "First section\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "Second section with another image\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "Conclusion"
+    )
+    assert _convert("docx_multiple_images.docx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# docx_multipage.docx
+# ---------------------------------------------------------------------------
+
+
+def test_docx_multipage(svc: MockOCRService) -> None:
+    expected = (
+        "# Page 1 - Mixed Content\n\n"
+        "This is the first paragraph on page 1.\n\n"
+        "BEFORE IMAGE: Important content appears here.\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "AFTER IMAGE: This content follows the image.\n\n"
+        "More text on page 1.\n\n"
+        "# Page 2 - Image at End\n\n"
+        "Content on page 2.\n\n"
+        "Multiple paragraphs of text.\n\n"
+        "Building up to the image...\n\n"
+        "Final paragraph before image.\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "# Page 3 - Image at Start\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "Content that follows the header image.\n\n"
+        "AFTER IMAGE: This text is after the image."
+    )
+    assert _convert("docx_multipage.docx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# docx_complex_layout.docx
+# ---------------------------------------------------------------------------
+
+
+def test_docx_complex_layout(svc: MockOCRService) -> None:
+    expected = (
+        "Complex Document\n\n"
+        "|  |  |\n"
+        "| --- | --- |\n"
+        "| Feature | Status |\n"
+        "| Authentication | Active |\n"
+        "| Encryption | Enabled |\n\n"
+        "Security notice:\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+    )
+    assert _convert("docx_complex_layout.docx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# _inject_placeholders — internal unit tests (no file I/O)
+# ---------------------------------------------------------------------------
+
+
+def test_inject_placeholders_single_image() -> None:
+    converter = DocxConverterWithOCR()
+    html = "<p>Before</p><img src='x.png'/><p>After</p>"
+    result_html, texts = converter._inject_placeholders(html, {"rId1": "TEXT"})
+    assert "<img" not in result_html
+    assert "MARKITDOWNOCRBLOCK0" in result_html
+    assert texts == ["TEXT"]
+
+
+def test_inject_placeholders_two_images_sequential_tokens() -> None:
+    converter = DocxConverterWithOCR()
+    html = "<img src='a.png'/><p>Mid</p><img src='b.png'/>"
+    result_html, texts = converter._inject_placeholders(
+        html, {"rId1": "FIRST", "rId2": "SECOND"}
+    )
+    assert "MARKITDOWNOCRBLOCK0" in result_html
+    assert "MARKITDOWNOCRBLOCK1" in result_html
+    assert result_html.index("MARKITDOWNOCRBLOCK0") < result_html.index(
+        "MARKITDOWNOCRBLOCK1"
+    )
+    assert len(texts) == 2
+
+
+def test_inject_placeholders_no_img_tag_appends_at_end() -> None:
+    converter = DocxConverterWithOCR()
+    html = "<p>No images</p>"
+    result_html, texts = converter._inject_placeholders(html, {"rId1": "ORPHAN"})
+    assert "MARKITDOWNOCRBLOCK0" in result_html
+    assert texts == ["ORPHAN"]
+
+
+def test_inject_placeholders_empty_map_leaves_html_unchanged() -> None:
+    converter = DocxConverterWithOCR()
+    html = "<p>Content</p><img src='pic.jpg'/>"
+    result_html, texts = converter._inject_placeholders(html, {})
+    assert result_html == html
+    assert texts == []
+
+
+# ---------------------------------------------------------------------------
+# No OCR service — no OCR tags emitted
+# ---------------------------------------------------------------------------
+
+
+def test_docx_no_ocr_service_no_tags() -> None:
+    path = TEST_DATA_DIR / "docx_image_middle.docx"
+    if not path.exists():
+        pytest.skip(f"Test file not found: {path}")
+    converter = DocxConverterWithOCR()
+    with open(path, "rb") as f:
+        md = converter.convert(f, StreamInfo(extension=".docx")).text_content
+    assert "*[Image OCR]" not in md
+    assert "[End OCR]*" not in md
@@ -0,0 +1,234 @@
+"""
+Unit tests for PdfConverterWithOCR.
+
+For each PDF test file: convert with a mock OCR service then compare the
+full output string against the expected snapshot.
+
+OCR block format used by the converter:
+    *[Image OCR]
+    MOCK_OCR_TEXT_12345
+    [End OCR]*
+"""
+
+import io
+import sys
+from pathlib import Path
+from typing import Any
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
+
+from markitdown_ocr._ocr_service import OCRResult  # noqa: E402
+from markitdown_ocr._pdf_converter_with_ocr import (  # noqa: E402
+    PdfConverterWithOCR,
+)
+from markitdown import StreamInfo  # noqa: E402
+
+TEST_DATA_DIR = Path(__file__).parent / "ocr_test_data"
+
+_MOCK_TEXT = "MOCK_OCR_TEXT_12345"
+_OCR_BLOCK = f"*[Image OCR]\n{_MOCK_TEXT}\n[End OCR]*"
+_PAGE_1_SCANNED = f"## Page 1\n\n\n\n\n{_OCR_BLOCK}"
+
+
+class MockOCRService:
+    def extract_text(
+        self,  # noqa: ANN101
+        image_stream: Any,
+        **kwargs: Any,
+    ) -> OCRResult:
+        return OCRResult(text=_MOCK_TEXT, backend_used="mock")
+
+
+@pytest.fixture(scope="module")
+def svc() -> MockOCRService:
+    return MockOCRService()
+
+
+def _convert(filename: str, ocr_service: MockOCRService) -> str:
+    path = TEST_DATA_DIR / filename
+    if not path.exists():
+        pytest.skip(f"Test file not found: {path}")
+    converter = PdfConverterWithOCR()
+    with open(path, "rb") as f:
+        return converter.convert(
+            f, StreamInfo(extension=".pdf"), ocr_service=ocr_service
+        ).text_content
+
+
+# ---------------------------------------------------------------------------
+# pdf_image_start.pdf
+# ---------------------------------------------------------------------------
+
+
+def test_pdf_image_start(svc: MockOCRService) -> None:
+    expected = (
+        "## Page 1\n\n\n\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n\n"
+        "This is text BEFORE the image.\n\n"
+        "The image should appear above this text.\n\n"
+        "This is more content after the image."
+    )
+    assert _convert("pdf_image_start.pdf", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# pdf_image_middle.pdf
+# ---------------------------------------------------------------------------
+
+
+def test_pdf_image_middle(svc: MockOCRService) -> None:
+    expected = (
+        "## Page 1\n\n\n"
+        "Section 1: Introduction\n\n"
+        "This document contains an image in the middle.\n\n"
+        "Here is some introductory text.\n\n\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n\n"
+        "Section 2: Details\n\n"
+        "This text appears AFTER the image."
+    )
+    assert _convert("pdf_image_middle.pdf", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# pdf_image_end.pdf
+# ---------------------------------------------------------------------------
+
+
+def test_pdf_image_end(svc: MockOCRService) -> None:
+    expected = (
+        "## Page 1\n\n\n"
+        "Main Content\n\n"
+        "This is the main text content.\n\n"
+        "The image will appear at the end.\n\n"
+        "Keep reading...\n\n\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+    )
+    assert _convert("pdf_image_end.pdf", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# pdf_multiple_images.pdf
+# ---------------------------------------------------------------------------
+
+
+def test_pdf_multiple_images(svc: MockOCRService) -> None:
+    expected = (
+        "## Page 1\n\n\n"
+        "Document with Multiple Images\n\n\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n\n"
+        "Text between first and second image.\n\n\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n\n"
+        "Final text after all images."
+    )
+    assert _convert("pdf_multiple_images.pdf", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# pdf_complex_layout.pdf
+# ---------------------------------------------------------------------------
+
+
+def test_pdf_complex_layout(svc: MockOCRService) -> None:
+    expected = (
+        "## Page 1\n\n\n"
+        "Complex Layout Document\n\n"
+        "Table:\n\n"
+        "ItemQuantity\n\n\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n\n"
+        "Widget A5"
+    )
+    assert _convert("pdf_complex_layout.pdf", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# pdf_multipage.pdf — pdfplumber/pdfminer fail (EOF); PyMuPDF fallback used
+# ---------------------------------------------------------------------------
+
+
+def test_pdf_multipage(svc: MockOCRService) -> None:
+    # pdfplumber cannot open this file (Unexpected EOF), so _ocr_full_pages
+    # falls back to PyMuPDF for page rendering.  Each page becomes one OCR block.
+    expected = (
+        f"## Page 1\n\n\n{_OCR_BLOCK}\n\n\n"
+        f"## Page 2\n\n\n{_OCR_BLOCK}\n\n\n"
+        f"## Page 3\n\n\n{_OCR_BLOCK}"
+    )
+    assert _convert("pdf_multipage.pdf", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# pdf_scanned_*.pdf — raster-only pages → full-page OCR
+# ---------------------------------------------------------------------------
+
+
+def test_pdf_scanned_invoice(svc: MockOCRService) -> None:
+    assert _convert("pdf_scanned_invoice.pdf", svc) == _PAGE_1_SCANNED
+
+
+def test_pdf_scanned_meeting_minutes(svc: MockOCRService) -> None:
+    assert _convert("pdf_scanned_meeting_minutes.pdf", svc) == _PAGE_1_SCANNED
+
+
+def test_pdf_scanned_minimal(svc: MockOCRService) -> None:
+    assert _convert("pdf_scanned_minimal.pdf", svc) == _PAGE_1_SCANNED
+
+
+def test_pdf_scanned_sales_report(svc: MockOCRService) -> None:
+    assert _convert("pdf_scanned_sales_report.pdf", svc) == _PAGE_1_SCANNED
+
+
+def test_pdf_scanned_report(svc: MockOCRService) -> None:
+    expected = (
+        f"{_PAGE_1_SCANNED}\n\n\n\n"
+        f"## Page 2\n\n\n\n\n{_OCR_BLOCK}\n\n\n\n"
+        f"## Page 3\n\n\n\n\n{_OCR_BLOCK}"
+    )
+    assert _convert("pdf_scanned_report.pdf", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# Scanned PDF fallback path (pdfplumber finds no text → full-page OCR)
+# ---------------------------------------------------------------------------
+
+
+def test_pdf_scanned_fallback_format(svc: MockOCRService) -> None:
+    """_ocr_full_pages emits *[Image OCR]...[End OCR]* for each page."""
+    path = TEST_DATA_DIR / "pdf_image_start.pdf"
+    if not path.exists():
+        pytest.skip(f"Test file not found: {path}")
+
+    converter = PdfConverterWithOCR()
+    with patch("pdfplumber.open") as mock_plumber:
+        mock_pdf = MagicMock()
+        mock_page = MagicMock()
+        mock_page.page_number = 1
+        mock_pdf.pages = [mock_page]
+        mock_pdf.__enter__.return_value = mock_pdf
+        mock_plumber.return_value = mock_pdf
+
+        with open(path, "rb") as f:
+            md = converter._ocr_full_pages(io.BytesIO(f.read()), svc)
+
+    expected = "## Page 1\n\n\n" "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+    assert (
+        md == expected
+    ), f"_ocr_full_pages must produce:\n{expected!r}\nActual:\n{md!r}"
+
+
+# ---------------------------------------------------------------------------
+# No OCR service — no OCR tags emitted
+# ---------------------------------------------------------------------------
+
+
+def test_pdf_no_ocr_service_no_tags() -> None:
+    path = TEST_DATA_DIR / "pdf_image_middle.pdf"
+    if not path.exists():
+        pytest.skip(f"Test file not found: {path}")
+    converter = PdfConverterWithOCR()
+    with open(path, "rb") as f:
+        md = converter.convert(f, StreamInfo(extension=".pdf")).text_content
+    assert "*[Image OCR]" not in md
+    assert "[End OCR]*" not in md
@@ -0,0 +1,148 @@
+"""
+Unit tests for PptxConverterWithOCR.
+
+For each PPTX test file: convert with a mock OCR service then compare the
+full output string against the expected snapshot.
+
+OCR block format used by the converter:
+    *[Image OCR]
+    MOCK_OCR_TEXT_12345
+    [End OCR]*
+
+Note: PPTX slide text uses literal backslash-n (\\n) sequences from the
+underlying PPTX converter template; OCR blocks use real newlines.
+"""
+
+import sys
+from pathlib import Path
+from typing import Any
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
+
+from markitdown_ocr._ocr_service import OCRResult  # noqa: E402
+from markitdown_ocr._pptx_converter_with_ocr import (  # noqa: E402
+    PptxConverterWithOCR,
+)
+from markitdown import StreamInfo  # noqa: E402
+
+TEST_DATA_DIR = Path(__file__).parent / "ocr_test_data"
+
+_MOCK_TEXT = "MOCK_OCR_TEXT_12345"
+_OCR_BLOCK = f"*[Image OCR]\n{_MOCK_TEXT}\n[End OCR]*"
+
+
+class MockOCRService:
+    def extract_text(
+        self,  # noqa: ANN101
+        image_stream: Any,
+        **kwargs: Any,
+    ) -> OCRResult:
+        return OCRResult(text=_MOCK_TEXT, backend_used="mock")
+
+
+@pytest.fixture(scope="module")
+def svc() -> MockOCRService:
+    return MockOCRService()
+
+
+def _convert(filename: str, ocr_service: MockOCRService) -> str:
+    path = TEST_DATA_DIR / filename
+    if not path.exists():
+        pytest.skip(f"Test file not found: {path}")
+    converter = PptxConverterWithOCR()
+    with open(path, "rb") as f:
+        return converter.convert(
+            f, StreamInfo(extension=".pptx"), ocr_service=ocr_service
+        ).text_content
+
+
+# ---------------------------------------------------------------------------
+# pptx_image_start.pptx
+# ---------------------------------------------------------------------------
+
+
+def test_pptx_image_start(svc: MockOCRService) -> None:
+    # Slide 1: title "Welcome" followed by an image
+    expected = (
+        "\\n\\n<!-- Slide number: 1 -->\\n# Welcome\\n\\n"
+        "\n*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+    )
+    assert _convert("pptx_image_start.pptx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# pptx_image_middle.pptx
+# ---------------------------------------------------------------------------
+
+
+def test_pptx_image_middle(svc: MockOCRService) -> None:
+    # Slide 1: Introduction | Slide 2: Architecture + image | Slide 3: Conclusion  # noqa: E501
+    expected = (
+        "\\n\\n<!-- Slide number: 1 -->\\n# Introduction"
+        "\\n\\n\\n\\n<!-- Slide number: 2 -->\\n# Architecture\\n\\n"
+        "\n*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+        "\\n\\n<!-- Slide number: 3 -->\\n# Conclusion\\n\\n"
+    )
+    assert _convert("pptx_image_middle.pptx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# pptx_image_end.pptx
+# ---------------------------------------------------------------------------
+
+
+def test_pptx_image_end(svc: MockOCRService) -> None:
+    # Slide 1: Presentation | Slide 2: Thank You + image
+    expected = (
+        "\\n\\n<!-- Slide number: 1 -->\\n# Presentation"
+        "\\n\\n\\n\\n<!-- Slide number: 2 -->\\n# Thank You\\n\\n"
+        "\n*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+    )
+    assert _convert("pptx_image_end.pptx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# pptx_multiple_images.pptx
+# ---------------------------------------------------------------------------
+
+
+def test_pptx_multiple_images(svc: MockOCRService) -> None:
+    # Slide 1: two images, no title text
+    expected = (
+        "\\n\\n<!-- Slide number: 1 -->\\n# \\n"
+        "\n*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+        "\n\n*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+    )
+    assert _convert("pptx_multiple_images.pptx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# pptx_complex_layout.pptx
+# ---------------------------------------------------------------------------
+
+
+def test_pptx_complex_layout(svc: MockOCRService) -> None:
+    expected = (
+        "\\n\\n<!-- Slide number: 1 -->\\n# Product Comparison"
+        "\\n\\nOur products lead the market\\n"
+        "\n*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+    )
+    assert _convert("pptx_complex_layout.pptx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# No OCR service — no OCR tags emitted
+# ---------------------------------------------------------------------------
+
+
+def test_pptx_no_ocr_service_no_tags() -> None:
+    path = TEST_DATA_DIR / "pptx_image_middle.pptx"
+    if not path.exists():
+        pytest.skip(f"Test file not found: {path}")
+    converter = PptxConverterWithOCR()
+    with open(path, "rb") as f:
+        md = converter.convert(f, StreamInfo(extension=".pptx")).text_content
+    assert "*[Image OCR]" not in md
+    assert "[End OCR]*" not in md
@@ -0,0 +1,249 @@
+"""
+Unit tests for XlsxConverterWithOCR.
+
+For each XLSX test file: convert with a mock OCR service then compare the
+full output string against the expected snapshot.
+
+OCR block format used by the converter:
+    *[Image OCR]
+    MOCK_OCR_TEXT_12345
+    [End OCR]*
+
+Images are grouped at the end of each sheet under:
+    ### Images in this sheet:
+"""
+
+import sys
+from pathlib import Path
+from typing import Any
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
+
+from markitdown_ocr._ocr_service import OCRResult  # noqa: E402
+from markitdown_ocr._xlsx_converter_with_ocr import (  # noqa: E402
+    XlsxConverterWithOCR,
+)
+from markitdown import StreamInfo  # noqa: E402
+
+TEST_DATA_DIR = Path(__file__).parent / "ocr_test_data"
+
+_MOCK_TEXT = "MOCK_OCR_TEXT_12345"
+_OCR_BLOCK = f"*[Image OCR]\n{_MOCK_TEXT}\n[End OCR]*"
+_IMG_SECTION = "### Images in this sheet:"
+
+
+class MockOCRService:
+    def extract_text(
+        self,  # noqa: ANN101
+        image_stream: Any,
+        **kwargs: Any,
+    ) -> OCRResult:
+        return OCRResult(text=_MOCK_TEXT, backend_used="mock")
+
+
+@pytest.fixture(scope="module")
+def svc() -> MockOCRService:
+    return MockOCRService()
+
+
+def _convert(filename: str, ocr_service: MockOCRService) -> str:
+    path = TEST_DATA_DIR / filename
+    if not path.exists():
+        pytest.skip(f"Test file not found: {path}")
+    converter = XlsxConverterWithOCR()
+    with open(path, "rb") as f:
+        return converter.convert(
+            f, StreamInfo(extension=".xlsx"), ocr_service=ocr_service
+        ).text_content
+
+
+# ---------------------------------------------------------------------------
+# xlsx_image_start.xlsx
+# ---------------------------------------------------------------------------
+
+
+def test_xlsx_image_start(svc: MockOCRService) -> None:
+    expected = (
+        "## Sales Q1\n\n"
+        "| Product | Sales |\n"
+        "| --- | --- |\n"
+        "| Widget A | 100 |\n"
+        "| Widget B | 150 |\n\n"
+        "### Images in this sheet:\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "## Forecast Q2\n\n"
+        "| Projected Sales | Unnamed: 1 |\n"
+        "| --- | --- |\n"
+        "| Widget A | 120 |\n"
+        "| Widget B | 180 |\n\n"
+        "### Images in this sheet:\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+    )
+    assert _convert("xlsx_image_start.xlsx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# xlsx_image_middle.xlsx
+# ---------------------------------------------------------------------------
+
+
+def test_xlsx_image_middle(svc: MockOCRService) -> None:
+    expected = (
+        "## Revenue\n\n"
+        "| Q1 Report | Unnamed: 1 |\n"
+        "| --- | --- |\n"
+        "| NaN | NaN |\n"
+        "| Revenue | $50,000 |\n"
+        "| NaN | NaN |\n"
+        "| NaN | NaN |\n"
+        "| NaN | NaN |\n"
+        "| NaN | NaN |\n"
+        "| Profit Margin | 40% |\n\n"
+        "### Images in this sheet:\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "## Expenses\n\n"
+        "| Expense Breakdown | Unnamed: 1 |\n"
+        "| --- | --- |\n"
+        "| NaN | NaN |\n"
+        "| Expenses | $30,000 |\n"
+        "| NaN | NaN |\n"
+        "| NaN | NaN |\n"
+        "| NaN | NaN |\n"
+        "| NaN | NaN |\n"
+        "| Savings | $5,000 |\n\n"
+        "### Images in this sheet:\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+    )
+    assert _convert("xlsx_image_middle.xlsx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# xlsx_image_end.xlsx
+# ---------------------------------------------------------------------------
+
+
+def test_xlsx_image_end(svc: MockOCRService) -> None:
+    expected = (
+        "## Sheet\n\n"
+        "| Financial Summary | Unnamed: 1 |\n"
+        "| --- | --- |\n"
+        "| Total Revenue | $500,000 |\n"
+        "| Total Expenses | $300,000 |\n"
+        "| Net Profit | $200,000 |\n"
+        "| NaN | NaN |\n"
+        "| NaN | NaN |\n"
+        "| NaN | NaN |\n"
+        "| NaN | NaN |\n"
+        "| NaN | NaN |\n"
+        "| Signature: | NaN |\n\n"
+        "### Images in this sheet:\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "## Budget\n\n"
+        "| Budget Allocation | Unnamed: 1 |\n"
+        "| --- | --- |\n"
+        "| Marketing | $100,000 |\n"
+        "| R&D | $150,000 |\n"
+        "| Operations | $50,000 |\n"
+        "| NaN | NaN |\n"
+        "| NaN | NaN |\n"
+        "| NaN | NaN |\n"
+        "| NaN | NaN |\n"
+        "| NaN | NaN |\n"
+        "| Approved: | NaN |\n\n"
+        "### Images in this sheet:\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+    )
+    assert _convert("xlsx_image_end.xlsx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# xlsx_multiple_images.xlsx
+# ---------------------------------------------------------------------------
+
+
+def test_xlsx_multiple_images(svc: MockOCRService) -> None:
+    expected = (
+        "## Overview\n\n"
+        "| Dashboard |\n"
+        "| --- |\n"
+        "| Status: Active |\n"
+        "| NaN |\n"
+        "| NaN |\n"
+        "| NaN |\n"
+        "| NaN |\n"
+        "| Performance Summary |\n\n"
+        "### Images in this sheet:\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "## Details\n\n"
+        "| Detailed Metrics |\n"
+        "| --- |\n"
+        "| System Health |\n\n"
+        "### Images in this sheet:\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "## Summary\n\n"
+        "| Quarter Summary |\n"
+        "| --- |\n"
+        "| Overall Performance |\n\n"
+        "### Images in this sheet:\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+    )
+    assert _convert("xlsx_multiple_images.xlsx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# xlsx_complex_layout.xlsx
+# ---------------------------------------------------------------------------
+
+
+def test_xlsx_complex_layout(svc: MockOCRService) -> None:
+    expected = (
+        "## Complex Report\n\n"
+        "| Annual Report 2024 | Unnamed: 1 |\n"
+        "| --- | --- |\n"
+        "| NaN | NaN |\n"
+        "| Month | Sales |\n"
+        "| Jan | 1000 |\n"
+        "| Feb | 1200 |\n"
+        "| NaN | NaN |\n"
+        "| Total | 2200 |\n\n"
+        "### Images in this sheet:\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "## Customers\n\n"
+        "| Customer Metrics | Unnamed: 1 |\n"
+        "| --- | --- |\n"
+        "| NaN | NaN |\n"
+        "| New Customers | 250 |\n"
+        "| Retention Rate | 92% |\n\n"
+        "### Images in this sheet:\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n"
+        "## Regions\n\n"
+        "| Regional Breakdown | Unnamed: 1 |\n"
+        "| --- | --- |\n"
+        "| NaN | NaN |\n"
+        "| Region | Revenue |\n"
+        "| North | $800K |\n"
+        "| South | $600K |\n\n"
+        "### Images in this sheet:\n\n"
+        "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*"
+    )
+    assert _convert("xlsx_complex_layout.xlsx", svc) == expected
+
+
+# ---------------------------------------------------------------------------
+# No OCR service — no OCR tags emitted
+# ---------------------------------------------------------------------------
+
+
+def test_xlsx_no_ocr_service_no_tags() -> None:
+    path = TEST_DATA_DIR / "xlsx_image_middle.xlsx"
+    if not path.exists():
+        pytest.skip(f"Test file not found: {path}")
+    converter = XlsxConverterWithOCR()
+    with open(path, "rb") as f:
+        md = converter.convert(f, StreamInfo(extension=".xlsx")).text_content
+    assert "*[Image OCR]" not in md
+    assert "[End OCR]*" not in md
@@ -1,4 +1,4 @@
 # SPDX-FileCopyrightText: 2024-present Adam Fourney <adamfo@microsoft.com>
 #
 # SPDX-License-Identifier: MIT
-__version__ = "0.1.5"
+__version__ = "0.1.6b1"