diff --git a/README.md b/README.md index 652afc0..6da3ee1 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ > [!IMPORTANT] > Breaking changes between 0.0.1 to 0.1.0: -> * Dependencies are now organized into optional feature-groups (further details below). Use `pip install 'markitdown[all]'` to have backward-compatible behavior. +> * Dependencies are now organized into optional feature-groups (further details below). Use `pip install 'markitdown[all]'` to have backward-compatible behavior. > * convert\_stream() now requires a binary file-like object (e.g., a file opened in binary mode, or an io.BytesIO object). This is a breaking change from the previous version, where it previously also accepted text file-like objects, like io.StringIO. > * The DocumentConverter class interface has changed to read from file-like streams rather than file paths. *No temporary files are created anymore*. If you are the maintainer of a plugin, or custom DocumentConverter, you likely need to update your code. Otherwise, if only using the MarkItDown class or CLI (as in these examples), you should not need to change anything. @@ -132,6 +132,38 @@ markitdown --use-plugins path-to-file.pdf To find available plugins, search GitHub for the hashtag `#markitdown-plugin`. To develop a plugin, see `packages/markitdown-sample-plugin`. +#### markitdown-ocr Plugin + +The `markitdown-ocr` plugin adds OCR support to PDF, DOCX, PPTX, and XLSX converters, extracting text from embedded images using LLM Vision — the same `llm_client` / `llm_model` pattern that MarkItDown already uses for image descriptions. No new ML libraries or binary dependencies required. + +**Installation:** + +```bash +pip install markitdown-ocr +pip install openai # or any OpenAI-compatible client +``` + +**Usage:** + +Pass the same `llm_client` and `llm_model` you would use for image descriptions: + +```python +from markitdown import MarkItDown +from openai import OpenAI + +md = MarkItDown( + enable_plugins=True, + llm_client=OpenAI(), + llm_model="gpt-4o", +) +result = md.convert("document_with_images.pdf") +print(result.text_content) +``` + +If no `llm_client` is provided the plugin still loads, but OCR is silently skipped and the standard built-in converter is used instead. + +See [`packages/markitdown-ocr/README.md`](packages/markitdown-ocr/README.md) for detailed documentation. + ### Azure Document Intelligence To use Microsoft Document Intelligence for conversion: diff --git a/packages/markitdown-ocr/LICENSE b/packages/markitdown-ocr/LICENSE new file mode 100644 index 0000000..9e841e7 --- /dev/null +++ b/packages/markitdown-ocr/LICENSE @@ -0,0 +1,21 @@ + MIT License + + Copyright (c) Microsoft Corporation. + + Permission is hereby granted, free of charge, to any person obtaining a copy + of this software and associated documentation files (the "Software"), to deal + in the Software without restriction, including without limitation the rights + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the Software is + furnished to do so, subject to the following conditions: + + The above copyright notice and this permission notice shall be included in all + copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + SOFTWARE diff --git a/packages/markitdown-ocr/README.md b/packages/markitdown-ocr/README.md new file mode 100644 index 0000000..d0883db --- /dev/null +++ b/packages/markitdown-ocr/README.md @@ -0,0 +1,200 @@ +# MarkItDown OCR Plugin + +LLM Vision plugin for MarkItDown that extracts text from images embedded in PDF, DOCX, PPTX, and XLSX files. + +Uses the same `llm_client` / `llm_model` pattern that MarkItDown already supports for image descriptions — no new ML libraries or binary dependencies required. + +## Features + +- **Enhanced PDF Converter**: Extracts text from images within PDFs, with full-page OCR fallback for scanned documents +- **Enhanced DOCX Converter**: OCR for images in Word documents +- **Enhanced PPTX Converter**: OCR for images in PowerPoint presentations +- **Enhanced XLSX Converter**: OCR for images in Excel spreadsheets +- **Context Preservation**: Maintains document structure and flow when inserting extracted text + +## Installation + +```bash +pip install markitdown-ocr +``` + +The plugin uses whatever OpenAI-compatible client you already have. Install one if you don't have it yet: + +```bash +pip install openai +``` + +## Usage + +### Command Line + +```bash +markitdown document.pdf --use-plugins --llm-client openai --llm-model gpt-4o +``` + +### Python API + +Pass `llm_client` and `llm_model` to `MarkItDown()` exactly as you would for image descriptions: + +```python +from markitdown import MarkItDown +from openai import OpenAI + +md = MarkItDown( + enable_plugins=True, + llm_client=OpenAI(), + llm_model="gpt-4o", +) + +result = md.convert("document_with_images.pdf") +print(result.text_content) +``` + +If no `llm_client` is provided the plugin still loads, but OCR is silently skipped — falling back to the standard built-in converter. + +### Custom Prompt + +Override the default extraction prompt for specialized documents: + +```python +md = MarkItDown( + enable_plugins=True, + llm_client=OpenAI(), + llm_model="gpt-4o", + llm_prompt="Extract all text from this image, preserving table structure.", +) +``` + +### Any OpenAI-Compatible Client + +Works with any client that follows the OpenAI API: + +```python +from openai import AzureOpenAI + +md = MarkItDown( + enable_plugins=True, + llm_client=AzureOpenAI( + api_key="...", + azure_endpoint="https://your-resource.openai.azure.com/", + api_version="2024-02-01", + ), + llm_model="gpt-4o", +) +``` + +## How It Works + +When `MarkItDown(enable_plugins=True, llm_client=..., llm_model=...)` is called: + +1. MarkItDown discovers the plugin via the `markitdown.plugin` entry point group +2. It calls `register_converters()`, forwarding all kwargs including `llm_client` and `llm_model` +3. The plugin creates an `LLMVisionOCRService` from those kwargs +4. Four OCR-enhanced converters are registered at **priority -1.0** — before the built-in converters at priority 0.0 + +When a file is converted: + +1. The OCR converter accepts the file +2. It extracts embedded images from the document +3. Each image is sent to the LLM with an extraction prompt +4. The returned text is inserted inline, preserving document structure +5. If the LLM call fails, conversion continues without that image's text + +## Supported File Formats + +### PDF + +- Embedded images are extracted by position (via `page.images` / page XObjects) and OCR'd inline, interleaved with the surrounding text in vertical reading order. +- **Scanned PDFs** (pages with no extractable text) are detected automatically: each page is rendered at 300 DPI and sent to the LLM as a full-page image. +- **Malformed PDFs** that pdfplumber/pdfminer cannot open (e.g. truncated EOF) are retried with PyMuPDF page rendering, so content is still recovered. + +### DOCX + +- Images are extracted via document part relationships (`doc.part.rels`). +- OCR is run before the DOCX→HTML→Markdown pipeline executes: placeholder tokens are injected into the HTML so that the markdown converter does not escape the OCR markers, and the final placeholders are replaced with the formatted `*[Image OCR]...[End OCR]*` blocks after conversion. +- Document flow (headings, paragraphs, tables) is fully preserved around the OCR blocks. + +### PPTX + +- Picture shapes, placeholder shapes with images, and images inside groups are all supported. +- Shapes are processed in top-to-left reading order per slide. +- If an `llm_client` is configured, the LLM is asked for a description first; OCR is used as the fallback when no description is returned. + +### XLSX + +- Images embedded in worksheets (`sheet._images`) are extracted per sheet. +- Cell position is calculated from the image anchor coordinates (column/row → Excel letter notation). +- Images are listed under a `### Images in this sheet:` section after the sheet's data table — they are not interleaved into the table rows. + +### Output format + +Every extracted OCR block is wrapped as: + +```text +*[Image OCR] + +[End OCR]* +``` + +## Troubleshooting + +### OCR text missing from output + +The most likely cause is a missing `llm_client` or `llm_model`. Verify: + +```python +from openai import OpenAI +from markitdown import MarkItDown + +md = MarkItDown( + enable_plugins=True, + llm_client=OpenAI(), # required + llm_model="gpt-4o", # required +) +``` + +### Plugin not loading + +Confirm the plugin is installed and discovered: + +```bash +markitdown --list-plugins # should show: ocr +``` + +### API errors + +The plugin propagates LLM API errors as warnings and continues conversion. Check your API key, quota, and that the chosen model supports vision inputs. + +## Development + +### Running Tests + +```bash +cd packages/markitdown-ocr +pytest tests/ -v +``` + +### Building from Source + +```bash +git clone https://github.com/microsoft/markitdown.git +cd markitdown/packages/markitdown-ocr +pip install -e . +``` + +## Contributing + +Contributions are welcome! See the [MarkItDown repository](https://github.com/microsoft/markitdown) for guidelines. + +## License + +MIT — see [LICENSE](LICENSE). + +## Changelog + +### 0.1.0 (Initial Release) + +- LLM Vision OCR for PDF, DOCX, PPTX, XLSX +- Full-page OCR fallback for scanned PDFs +- Context-aware inline text insertion +- Priority-based converter replacement (no code changes required) diff --git a/packages/markitdown-ocr/pyproject.toml b/packages/markitdown-ocr/pyproject.toml new file mode 100644 index 0000000..eda3cdd --- /dev/null +++ b/packages/markitdown-ocr/pyproject.toml @@ -0,0 +1,57 @@ +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[project] +name = "markitdown-ocr" +dynamic = ["version"] +description = 'OCR plugin for MarkItDown - Extracts text from images in PDF, DOCX, PPTX, and XLSX via LLM Vision' +readme = "README.md" +requires-python = ">=3.10" +license = "MIT" +keywords = ["markitdown", "ocr", "pdf", "docx", "xlsx", "pptx", "llm", "vision"] +authors = [ + { name = "Contributors", email = "noreply@github.com" }, +] +classifiers = [ + "Development Status :: 4 - Beta", + "Programming Language :: Python", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3.11", + "Programming Language :: Python :: 3.12", + "Programming Language :: Python :: 3.13", + "Programming Language :: Python :: Implementation :: CPython", +] + +# Core dependencies — matches the file-format libraries markitdown already uses +dependencies = [ + "markitdown>=0.1.0", + "pdfminer.six>=20251230", + "pdfplumber>=0.11.9", + "PyMuPDF>=1.24.0", + "mammoth~=1.11.0", + "python-docx", + "python-pptx", + "pandas", + "openpyxl", + "Pillow>=9.0.0", +] + +# llm_client is passed in by the user (same as for markitdown image descriptions); +# install openai or any OpenAI-compatible SDK separately. +[project.optional-dependencies] +llm = [ + "openai>=1.0.0", +] + +[project.urls] +Documentation = "https://github.com/microsoft/markitdown#readme" +Issues = "https://github.com/microsoft/markitdown/issues" +Source = "https://github.com/microsoft/markitdown" + +[tool.hatch.version] +path = "src/markitdown_ocr/__about__.py" + +# CRITICAL: Plugin entry point - MarkItDown will discover this plugin through this entry point +[project.entry-points."markitdown.plugin"] +ocr = "markitdown_ocr" diff --git a/packages/markitdown-ocr/src/markitdown_ocr/__about__.py b/packages/markitdown-ocr/src/markitdown_ocr/__about__.py new file mode 100644 index 0000000..1c700dc --- /dev/null +++ b/packages/markitdown-ocr/src/markitdown_ocr/__about__.py @@ -0,0 +1,4 @@ +# SPDX-FileCopyrightText: 2025-present Contributors +# SPDX-License-Identifier: MIT + +__version__ = "0.1.0" diff --git a/packages/markitdown-ocr/src/markitdown_ocr/__init__.py b/packages/markitdown-ocr/src/markitdown_ocr/__init__.py new file mode 100644 index 0000000..f608e96 --- /dev/null +++ b/packages/markitdown-ocr/src/markitdown_ocr/__init__.py @@ -0,0 +1,31 @@ +# SPDX-FileCopyrightText: 2025-present Contributors +# SPDX-License-Identifier: MIT + +""" +markitdown-ocr: OCR plugin for MarkItDown + +Adds LLM Vision-based text extraction from images embedded in PDF, DOCX, PPTX, and XLSX files. +""" + +from ._plugin import __plugin_interface_version__, register_converters +from .__about__ import __version__ +from ._ocr_service import ( + OCRResult, + LLMVisionOCRService, +) +from ._pdf_converter_with_ocr import PdfConverterWithOCR +from ._docx_converter_with_ocr import DocxConverterWithOCR +from ._pptx_converter_with_ocr import PptxConverterWithOCR +from ._xlsx_converter_with_ocr import XlsxConverterWithOCR + +__all__ = [ + "__version__", + "__plugin_interface_version__", + "register_converters", + "OCRResult", + "LLMVisionOCRService", + "PdfConverterWithOCR", + "DocxConverterWithOCR", + "PptxConverterWithOCR", + "XlsxConverterWithOCR", +] diff --git a/packages/markitdown-ocr/src/markitdown_ocr/_docx_converter_with_ocr.py b/packages/markitdown-ocr/src/markitdown_ocr/_docx_converter_with_ocr.py new file mode 100644 index 0000000..f2463de --- /dev/null +++ b/packages/markitdown-ocr/src/markitdown_ocr/_docx_converter_with_ocr.py @@ -0,0 +1,189 @@ +""" +Enhanced DOCX Converter with OCR support for embedded images. +Extracts images from Word documents and performs OCR while maintaining context. +""" + +import io +import re +import sys +from typing import Any, BinaryIO, Optional + +from markitdown.converters import HtmlConverter +from markitdown.converter_utils.docx.pre_process import pre_process_docx +from markitdown import DocumentConverterResult, StreamInfo +from markitdown._exceptions import ( + MissingDependencyException, + MISSING_DEPENDENCY_MESSAGE, +) +from ._ocr_service import LLMVisionOCRService + +# Try loading dependencies +_dependency_exc_info = None +try: + import mammoth + from docx import Document +except ImportError: + _dependency_exc_info = sys.exc_info() + +# Placeholder injected into HTML so that mammoth never sees the OCR markers. +# Must be a single token with no special markdown characters. +_PLACEHOLDER = "MARKITDOWNOCRBLOCK{}" + + +class DocxConverterWithOCR(HtmlConverter): + """ + Enhanced DOCX Converter with OCR support for embedded images. + Maintains document flow while extracting text from images inline. + """ + + def __init__(self, ocr_service: Optional[LLMVisionOCRService] = None): + super().__init__() + self._html_converter = HtmlConverter() + self.ocr_service = ocr_service + + def accepts( + self, + file_stream: BinaryIO, + stream_info: StreamInfo, + **kwargs: Any, + ) -> bool: + mimetype = (stream_info.mimetype or "").lower() + extension = (stream_info.extension or "").lower() + + if extension == ".docx": + return True + + if mimetype.startswith( + "application/vnd.openxmlformats-officedocument.wordprocessingml" + ): + return True + + return False + + def convert( + self, + file_stream: BinaryIO, + stream_info: StreamInfo, + **kwargs: Any, + ) -> DocumentConverterResult: + if _dependency_exc_info is not None: + raise MissingDependencyException( + MISSING_DEPENDENCY_MESSAGE.format( + converter=type(self).__name__, + extension=".docx", + feature="docx", + ) + ) from _dependency_exc_info[1].with_traceback( + _dependency_exc_info[2] + ) # type: ignore[union-attr] + + # Get OCR service if available (from kwargs or instance) + ocr_service: Optional[LLMVisionOCRService] = ( + kwargs.get("ocr_service") or self.ocr_service + ) + + if ocr_service: + # 1. Extract and OCR images — returns raw text per image + file_stream.seek(0) + image_ocr_map = self._extract_and_ocr_images(file_stream, ocr_service) + + # 2. Convert DOCX → HTML via mammoth + file_stream.seek(0) + pre_process_stream = pre_process_docx(file_stream) + html_result = mammoth.convert_to_html( + pre_process_stream, style_map=kwargs.get("style_map") + ).value + + # 3. Replace tags with plain placeholder tokens so that + # mammoth's HTML→markdown step never escapes our OCR markers. + html_with_placeholders, ocr_texts = self._inject_placeholders( + html_result, image_ocr_map + ) + + # 4. Convert HTML → markdown + md_result = self._html_converter.convert_string( + html_with_placeholders, **kwargs + ) + md = md_result.markdown + + # 5. Swap placeholders for the actual OCR blocks (post-conversion + # so * and _ are never escaped by the markdown converter). + for i, raw_text in enumerate(ocr_texts): + placeholder = _PLACEHOLDER.format(i) + ocr_block = f"*[Image OCR]\n{raw_text}\n[End OCR]*" + md = md.replace(placeholder, ocr_block) + + return DocumentConverterResult(markdown=md) + else: + # Standard conversion without OCR + style_map = kwargs.get("style_map", None) + pre_process_stream = pre_process_docx(file_stream) + return self._html_converter.convert_string( + mammoth.convert_to_html(pre_process_stream, style_map=style_map).value, + **kwargs, + ) + + def _extract_and_ocr_images( + self, file_stream: BinaryIO, ocr_service: LLMVisionOCRService + ) -> dict[str, str]: + """ + Extract images from DOCX and OCR them. + + Returns: + Dict mapping image relationship IDs to raw OCR text (no markers). + """ + ocr_map = {} + + try: + file_stream.seek(0) + doc = Document(file_stream) + + for rel in doc.part.rels.values(): + if "image" in rel.target_ref.lower(): + try: + image_bytes = rel.target_part.blob + image_stream = io.BytesIO(image_bytes) + ocr_result = ocr_service.extract_text(image_stream) + + if ocr_result.text.strip(): + # Store raw text only — markers added later + ocr_map[rel.rId] = ocr_result.text.strip() + + except Exception: + continue + + except Exception: + pass + + return ocr_map + + def _inject_placeholders( + self, html: str, ocr_map: dict[str, str] + ) -> tuple[str, list[str]]: + """ + Replace tags with numbered placeholder tokens. + + Returns: + (html_with_placeholders, ordered list of raw OCR texts) + """ + if not ocr_map: + return html, [] + + ocr_texts = list(ocr_map.values()) + used: list[int] = [] + + def replace_img(match: re.Match) -> str: # type: ignore[type-arg] + for i in range(len(ocr_texts)): + if i not in used: + used.append(i) + return f"

{_PLACEHOLDER.format(i)}

" + return "" # remove image if all OCR texts already used + + result = re.sub(r"]*>", replace_img, html) + + # Any OCR texts that had no matching tag go at the end + for i in range(len(ocr_texts)): + if i not in used: + result += f"

{_PLACEHOLDER.format(i)}

" + + return result, ocr_texts diff --git a/packages/markitdown-ocr/src/markitdown_ocr/_ocr_service.py b/packages/markitdown-ocr/src/markitdown_ocr/_ocr_service.py new file mode 100644 index 0000000..2885e1f --- /dev/null +++ b/packages/markitdown-ocr/src/markitdown_ocr/_ocr_service.py @@ -0,0 +1,110 @@ +""" +OCR Service Layer for MarkItDown +Provides LLM Vision-based image text extraction. +""" + +import base64 +from typing import Any, BinaryIO +from dataclasses import dataclass + +from markitdown import StreamInfo + + +@dataclass +class OCRResult: + """Result from OCR extraction.""" + + text: str + confidence: float | None = None + backend_used: str | None = None + error: str | None = None + + +class LLMVisionOCRService: + """OCR service using LLM vision models (OpenAI-compatible).""" + + def __init__( + self, + client: Any, + model: str, + default_prompt: str | None = None, + ) -> None: + """ + Initialize LLM Vision OCR service. + + Args: + client: OpenAI-compatible client + model: Model name (e.g., 'gpt-4o', 'gemini-2.0-flash') + default_prompt: Default prompt for OCR extraction + """ + self.client = client + self.model = model + self.default_prompt = default_prompt or ( + "Extract all text from this image. " + "Return ONLY the extracted text, maintaining the original " + "layout and order. Do not add any commentary or description." + ) + + def extract_text( + self, + image_stream: BinaryIO, + prompt: str | None = None, + stream_info: StreamInfo | None = None, + **kwargs: Any, + ) -> OCRResult: + """Extract text using LLM vision.""" + if self.client is None: + return OCRResult( + text="", + backend_used="llm_vision", + error="LLM client not configured", + ) + + try: + image_stream.seek(0) + + content_type: str | None = None + if stream_info: + content_type = stream_info.mimetype + + if not content_type: + try: + from PIL import Image + + image_stream.seek(0) + img = Image.open(image_stream) + fmt = img.format.lower() if img.format else "png" + content_type = f"image/{fmt}" + except Exception: + content_type = "image/png" + + image_stream.seek(0) + base64_image = base64.b64encode(image_stream.read()).decode("utf-8") + data_uri = f"data:{content_type};base64,{base64_image}" + + actual_prompt = prompt or self.default_prompt + response = self.client.chat.completions.create( + model=self.model, + messages=[ + { + "role": "user", + "content": [ + {"type": "text", "text": actual_prompt}, + { + "type": "image_url", + "image_url": {"url": data_uri}, + }, + ], + } + ], + ) + + text = response.choices[0].message.content + return OCRResult( + text=text.strip() if text else "", + backend_used="llm_vision", + ) + except Exception as e: + return OCRResult(text="", backend_used="llm_vision", error=str(e)) + finally: + image_stream.seek(0) diff --git a/packages/markitdown-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py b/packages/markitdown-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py new file mode 100644 index 0000000..c1dc0f6 --- /dev/null +++ b/packages/markitdown-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py @@ -0,0 +1,422 @@ +""" +Enhanced PDF Converter with OCR support for embedded images. +Extracts images from PDFs and performs OCR while maintaining document context. +""" + +import io +import sys +from typing import Any, BinaryIO, Optional + +from markitdown import DocumentConverter, DocumentConverterResult, StreamInfo +from markitdown._exceptions import ( + MissingDependencyException, + MISSING_DEPENDENCY_MESSAGE, +) +from ._ocr_service import LLMVisionOCRService + +# Import dependencies +_dependency_exc_info = None +try: + import pdfminer + import pdfminer.high_level + import pdfplumber + from PIL import Image +except ImportError: + _dependency_exc_info = sys.exc_info() + + +def _extract_images_from_page(page: Any) -> list[dict]: + """ + Extract images from a PDF page by rendering page regions. + + Returns: + List of dicts with 'stream', 'bbox', 'name', 'y_pos' keys + """ + images_info = [] + + try: + # Try multiple methods to detect images + images = [] + + # Method 1: Use page.images (standard approach) + if hasattr(page, "images") and page.images: + images = page.images + + # Method 2: If no images found, try underlying PDF objects + if not images and hasattr(page, "objects") and "image" in page.objects: + images = page.objects.get("image", []) + + # Method 3: Try filtering all objects for image types + if not images and hasattr(page, "objects"): + all_objs = page.objects + for obj_type in all_objs.keys(): + if "image" in obj_type.lower() or "xobject" in obj_type.lower(): + potential_imgs = all_objs.get(obj_type, []) + if potential_imgs: + images = potential_imgs + break + + for i, img_dict in enumerate(images): + try: + # Try to get the actual image stream from the PDF + img_stream = None + y_pos = 0 + + # Method A: If img_dict has 'stream' key, use it directly + if "stream" in img_dict and hasattr(img_dict["stream"], "get_data"): + try: + img_bytes = img_dict["stream"].get_data() + + # Try to open as PIL Image to validate/decode + pil_img = Image.open(io.BytesIO(img_bytes)) + + # Convert to RGB if needed (handle CMYK, etc.) + if pil_img.mode not in ("RGB", "L"): + pil_img = pil_img.convert("RGB") + + # Save to stream as PNG + img_stream = io.BytesIO() + pil_img.save(img_stream, format="PNG") + img_stream.seek(0) + + y_pos = img_dict.get("top", 0) + except Exception: + pass + + # Method B: Fallback to rendering page region + if img_stream is None: + x0 = img_dict.get("x0", 0) + y0 = img_dict.get("top", 0) + x1 = img_dict.get("x1", 0) + y1 = img_dict.get("bottom", 0) + y_pos = y0 + + # Check if dimensions are valid + if x1 <= x0 or y1 <= y0: + continue + + # Use pdfplumber's within_bbox to crop, then render + # This preserves coordinate system correctly + bbox = (x0, y0, x1, y1) + cropped_page = page.within_bbox(bbox) + + # Render at 150 DPI (balance between quality and size) + page_img = cropped_page.to_image(resolution=150) + + # Save to stream + img_stream = io.BytesIO() + page_img.original.save(img_stream, format="PNG") + img_stream.seek(0) + + if img_stream: + images_info.append( + { + "stream": img_stream, + "name": f"page_{page.page_number}_img_{i}", + "y_pos": y_pos, + } + ) + + except Exception: + continue + + except Exception: + pass + + return images_info + + +class PdfConverterWithOCR(DocumentConverter): + """ + Enhanced PDF Converter with OCR support for embedded images. + Maintains document structure while extracting text from images inline. + """ + + def __init__(self, ocr_service: Optional[LLMVisionOCRService] = None): + super().__init__() + self.ocr_service = ocr_service + + def accepts( + self, + file_stream: BinaryIO, + stream_info: StreamInfo, + **kwargs: Any, + ) -> bool: + mimetype = (stream_info.mimetype or "").lower() + extension = (stream_info.extension or "").lower() + + if extension == ".pdf": + return True + + if mimetype.startswith("application/pdf") or mimetype.startswith( + "application/x-pdf" + ): + return True + + return False + + def convert( + self, + file_stream: BinaryIO, + stream_info: StreamInfo, + **kwargs: Any, + ) -> DocumentConverterResult: + if _dependency_exc_info is not None: + raise MissingDependencyException( + MISSING_DEPENDENCY_MESSAGE.format( + converter=type(self).__name__, + extension=".pdf", + feature="pdf", + ) + ) from _dependency_exc_info[1].with_traceback( + _dependency_exc_info[2] + ) # type: ignore[union-attr] + + # Get OCR service if available (from kwargs or instance) + ocr_service: LLMVisionOCRService | None = ( + kwargs.get("ocr_service") or self.ocr_service + ) + + # Read PDF into BytesIO + file_stream.seek(0) + pdf_bytes = io.BytesIO(file_stream.read()) + + markdown_content = [] + + try: + with pdfplumber.open(pdf_bytes) as pdf: + for page_num, page in enumerate(pdf.pages, 1): + markdown_content.append(f"\n## Page {page_num}\n") + + # If OCR is enabled, interleave text and images by position + if ocr_service: + images_on_page = self._extract_page_images(pdf_bytes, page_num) + + if images_on_page: + # Extract text lines with Y positions + chars = page.chars + if chars: + # Group chars into lines based on Y position + lines_with_y = [] + current_line = [] + current_y = None + + for char in sorted( + chars, key=lambda c: (c["top"], c["x0"]) + ): + y = char["top"] + if current_y is None: + current_y = y + elif abs(y - current_y) > 2: # New line threshold + if current_line: + text = "".join( + [c["text"] for c in current_line] + ) + lines_with_y.append( + {"y": current_y, "text": text.strip()} + ) + current_line = [] + current_y = y + current_line.append(char) + + # Add last line + if current_line: + text = "".join([c["text"] for c in current_line]) + lines_with_y.append( + {"y": current_y, "text": text.strip()} + ) + else: + # Fallback: use simple text extraction + text_content = page.extract_text() or "" + lines_with_y = [ + {"y": i * 10, "text": line} + for i, line in enumerate(text_content.split("\n")) + ] + + # OCR all images + image_data = [] + for img_info in images_on_page: + ocr_result = ocr_service.extract_text( + img_info["stream"] + ) + if ocr_result.text.strip(): + image_data.append( + { + "y_pos": img_info["y_pos"], + "name": img_info["name"], + "ocr_text": ocr_result.text, + "backend": ocr_result.backend_used, + "type": "image", + } + ) + + # Add text items + content_items = [ + { + "y_pos": item["y"], + "text": item["text"], + "type": "text", + } + for item in lines_with_y + if item["text"] + ] + content_items.extend(image_data) + + # Sort all items by Y position (top to bottom) + content_items.sort(key=lambda x: x["y_pos"]) + + # Build markdown by interleaving text and images + for item in content_items: + if item["type"] == "text": + markdown_content.append(item["text"]) + else: # image + ocr_text = item["ocr_text"] + img_marker = ( + f"\n\n*[Image OCR]\n{ocr_text}\n[End OCR]*\n" + ) + markdown_content.append(img_marker) + else: + # No images detected - just extract regular text + text_content = page.extract_text() or "" + if text_content.strip(): + markdown_content.append(text_content.strip()) + else: + # No OCR, just extract text + text_content = page.extract_text() or "" + if text_content.strip(): + markdown_content.append(text_content.strip()) + + # Build final markdown + markdown = "\n\n".join(markdown_content).strip() + + # Fallback to pdfminer if empty + if not markdown: + pdf_bytes.seek(0) + markdown = pdfminer.high_level.extract_text(pdf_bytes) + + except Exception: + # Fallback to pdfminer + try: + pdf_bytes.seek(0) + markdown = pdfminer.high_level.extract_text(pdf_bytes) + except Exception: + markdown = "" + + # Final fallback: If still empty/whitespace and OCR is available, + # treat as scanned PDF and OCR full pages + if ocr_service and (not markdown or not markdown.strip()): + pdf_bytes.seek(0) + markdown = self._ocr_full_pages(pdf_bytes, ocr_service) + + return DocumentConverterResult(markdown=markdown) + + def _extract_page_images(self, pdf_bytes: io.BytesIO, page_num: int) -> list[dict]: + """ + Extract images from a PDF page using pdfplumber. + + Args: + pdf_bytes: PDF file as BytesIO + page_num: Page number (1-indexed) + + Returns: + List of image info dicts with 'stream', 'bbox', 'name', 'y_pos' + """ + images = [] + + try: + pdf_bytes.seek(0) + with pdfplumber.open(pdf_bytes) as pdf: + if page_num <= len(pdf.pages): + page = pdf.pages[page_num - 1] # 0-indexed + images = _extract_images_from_page(page) + except Exception: + pass + + # Sort by vertical position (top to bottom) + images.sort(key=lambda x: x["y_pos"]) + + return images + + def _ocr_full_pages( + self, pdf_bytes: io.BytesIO, ocr_service: LLMVisionOCRService + ) -> str: + """ + Fallback for scanned PDFs: Convert entire pages to images and OCR them. + Used when text extraction returns empty/whitespace results. + + Args: + pdf_bytes: PDF file as BytesIO + ocr_service: OCR service to use + + Returns: + Markdown text extracted from OCR of full pages + """ + markdown_parts = [] + + try: + pdf_bytes.seek(0) + with pdfplumber.open(pdf_bytes) as pdf: + for page_num, page in enumerate(pdf.pages, 1): + try: + markdown_parts.append(f"\n## Page {page_num}\n") + + # Render page to image + page_img = page.to_image(resolution=300) + img_stream = io.BytesIO() + page_img.original.save(img_stream, format="PNG") + img_stream.seek(0) + + # Run OCR + ocr_result = ocr_service.extract_text(img_stream) + + if ocr_result.text.strip(): + text = ocr_result.text.strip() + markdown_parts.append(f"*[Image OCR]\n{text}\n[End OCR]*") + else: + markdown_parts.append( + "*[No text could be extracted from this page]*" + ) + + except Exception as e: + markdown_parts.append( + f"*[Error processing page {page_num}: {str(e)}]*" + ) + continue + + except Exception: + # pdfplumber failed (e.g. malformed EOF) — try PyMuPDF for rendering + markdown_parts = [] + try: + import fitz # PyMuPDF + + pdf_bytes.seek(0) + doc = fitz.open(stream=pdf_bytes.read(), filetype="pdf") + for page_num in range(1, doc.page_count + 1): + try: + markdown_parts.append(f"\n## Page {page_num}\n") + page = doc[page_num - 1] + mat = fitz.Matrix(300 / 72, 300 / 72) # 300 DPI + pix = page.get_pixmap(matrix=mat) + img_stream = io.BytesIO(pix.tobytes("png")) + img_stream.seek(0) + + ocr_result = ocr_service.extract_text(img_stream) + + if ocr_result.text.strip(): + text = ocr_result.text.strip() + markdown_parts.append(f"*[Image OCR]\n{text}\n[End OCR]*") + else: + markdown_parts.append( + "*[No text could be extracted from this page]*" + ) + + except Exception as e: + markdown_parts.append( + f"*[Error processing page {page_num}: {str(e)}]*" + ) + continue + doc.close() + except Exception: + return "*[Error: Could not process scanned PDF]*" + + return "\n\n".join(markdown_parts).strip() diff --git a/packages/markitdown-ocr/src/markitdown_ocr/_plugin.py b/packages/markitdown-ocr/src/markitdown_ocr/_plugin.py new file mode 100644 index 0000000..f4d7bcf --- /dev/null +++ b/packages/markitdown-ocr/src/markitdown_ocr/_plugin.py @@ -0,0 +1,68 @@ +""" +Plugin registration for markitdown-ocr. +Registers OCR-enhanced converters with priority-based replacement strategy. +""" + +from typing import Any +from markitdown import MarkItDown + +from ._ocr_service import LLMVisionOCRService +from ._pdf_converter_with_ocr import PdfConverterWithOCR +from ._docx_converter_with_ocr import DocxConverterWithOCR +from ._pptx_converter_with_ocr import PptxConverterWithOCR +from ._xlsx_converter_with_ocr import XlsxConverterWithOCR + + +__plugin_interface_version__ = 1 + + +def register_converters(markitdown: MarkItDown, **kwargs: Any) -> None: + """ + Register OCR-enhanced converters with MarkItDown. + + This plugin provides OCR support for PDF, DOCX, PPTX, and XLSX files. + The converters are registered with priority -1.0 to run BEFORE built-in + converters (which have priority 0.0), effectively replacing them when + the plugin is enabled. + + Args: + markitdown: MarkItDown instance to register converters with + **kwargs: Additional keyword arguments that may include: + - llm_client: OpenAI-compatible client for LLM-based OCR (required for OCR to work) + - llm_model: Model name (e.g., 'gpt-4o') + - llm_prompt: Custom prompt for text extraction + """ + # Create OCR service — reads the same llm_client/llm_model kwargs + # that MarkItDown itself already accepts for image descriptions + llm_client = kwargs.get("llm_client") + llm_model = kwargs.get("llm_model") + llm_prompt = kwargs.get("llm_prompt") + + ocr_service: LLMVisionOCRService | None = None + if llm_client and llm_model: + ocr_service = LLMVisionOCRService( + client=llm_client, + model=llm_model, + default_prompt=llm_prompt, + ) + + # Register converters with priority -1.0 (before built-ins at 0.0) + # This effectively "replaces" the built-in converters when plugin is installed + # Pass the OCR service to each converter's constructor + PRIORITY_OCR_ENHANCED = -1.0 + + markitdown.register_converter( + PdfConverterWithOCR(ocr_service=ocr_service), priority=PRIORITY_OCR_ENHANCED + ) + + markitdown.register_converter( + DocxConverterWithOCR(ocr_service=ocr_service), priority=PRIORITY_OCR_ENHANCED + ) + + markitdown.register_converter( + PptxConverterWithOCR(ocr_service=ocr_service), priority=PRIORITY_OCR_ENHANCED + ) + + markitdown.register_converter( + XlsxConverterWithOCR(ocr_service=ocr_service), priority=PRIORITY_OCR_ENHANCED + ) diff --git a/packages/markitdown-ocr/src/markitdown_ocr/_pptx_converter_with_ocr.py b/packages/markitdown-ocr/src/markitdown_ocr/_pptx_converter_with_ocr.py new file mode 100644 index 0000000..7e91ed6 --- /dev/null +++ b/packages/markitdown-ocr/src/markitdown_ocr/_pptx_converter_with_ocr.py @@ -0,0 +1,249 @@ +""" +Enhanced PPTX Converter with improved OCR support. +Already has LLM-based image description, this enhances it with traditional OCR fallback. +""" + +import io +import sys +from typing import Any, BinaryIO, Optional + +from typing import BinaryIO, Any, Optional + +from markitdown.converters import HtmlConverter +from markitdown import DocumentConverter, DocumentConverterResult, StreamInfo +from markitdown._exceptions import ( + MissingDependencyException, + MISSING_DEPENDENCY_MESSAGE, +) +from ._ocr_service import LLMVisionOCRService + +_dependency_exc_info = None +try: + import pptx +except ImportError: + _dependency_exc_info = sys.exc_info() + + +class PptxConverterWithOCR(DocumentConverter): + """Enhanced PPTX Converter with OCR fallback.""" + + def __init__(self, ocr_service: Optional[LLMVisionOCRService] = None): + super().__init__() + self._html_converter = HtmlConverter() + self.ocr_service = ocr_service + + def accepts( + self, + file_stream: BinaryIO, + stream_info: StreamInfo, + **kwargs: Any, + ) -> bool: + mimetype = (stream_info.mimetype or "").lower() + extension = (stream_info.extension or "").lower() + + if extension == ".pptx": + return True + + if mimetype.startswith( + "application/vnd.openxmlformats-officedocument.presentationml" + ): + return True + + return False + + def convert( + self, + file_stream: BinaryIO, + stream_info: StreamInfo, + **kwargs: Any, + ) -> DocumentConverterResult: + if _dependency_exc_info is not None: + raise MissingDependencyException( + MISSING_DEPENDENCY_MESSAGE.format( + converter=type(self).__name__, + extension=".pptx", + feature="pptx", + ) + ) from _dependency_exc_info[1].with_traceback( + _dependency_exc_info[2] + ) # type: ignore[union-attr] + + # Get OCR service (from kwargs or instance) + ocr_service: Optional[LLMVisionOCRService] = ( + kwargs.get("ocr_service") or self.ocr_service + ) + llm_client = kwargs.get("llm_client") + + presentation = pptx.Presentation(file_stream) + md_content = "" + slide_num = 0 + + for slide in presentation.slides: + slide_num += 1 + md_content += f"\\n\\n\\n" + + title = slide.shapes.title + + def get_shape_content(shape, **kwargs): + nonlocal md_content + + # Pictures + if self._is_picture(shape): + # Get image data + image_stream = io.BytesIO(shape.image.blob) + + # Try LLM description first if available + llm_description = "" + if llm_client and kwargs.get("llm_model"): + try: + from ._llm_caption import llm_caption + + image_filename = shape.image.filename + image_extension = None + if image_filename: + import os + + image_extension = os.path.splitext(image_filename)[1] + + image_stream_info = StreamInfo( + mimetype=shape.image.content_type, + extension=image_extension, + filename=image_filename, + ) + + llm_description = llm_caption( + image_stream, + image_stream_info, + client=llm_client, + model=kwargs.get("llm_model"), + prompt=kwargs.get("llm_prompt"), + ) + except Exception: + pass + + # Try OCR if LLM failed or not available + ocr_text = "" + if not llm_description and ocr_service: + try: + image_stream.seek(0) + ocr_result = ocr_service.extract_text(image_stream) + if ocr_result.text.strip(): + ocr_text = ocr_result.text.strip() + except Exception: + pass + + # Format extracted content using unified OCR block format + content = (llm_description or ocr_text or "").strip() + if content: + md_content += f"\n*[Image OCR]\n{content}\n[End OCR]*\n" + + # Tables + if self._is_table(shape): + md_content += self._convert_table_to_markdown(shape.table, **kwargs) + + # Charts + if shape.has_chart: + md_content += self._convert_chart_to_markdown(shape.chart) + + # Text areas + elif shape.has_text_frame: + if shape == title: + md_content += "# " + shape.text.lstrip() + "\\n" + else: + md_content += shape.text + "\\n" + + # Group Shapes + if shape.shape_type == pptx.enum.shapes.MSO_SHAPE_TYPE.GROUP: + sorted_shapes = sorted( + shape.shapes, + key=lambda x: ( + float("-inf") if not x.top else x.top, + float("-inf") if not x.left else x.left, + ), + ) + for subshape in sorted_shapes: + get_shape_content(subshape, **kwargs) + + sorted_shapes = sorted( + slide.shapes, + key=lambda x: ( + float("-inf") if not x.top else x.top, + float("-inf") if not x.left else x.left, + ), + ) + for shape in sorted_shapes: + get_shape_content(shape, **kwargs) + + md_content = md_content.strip() + + if slide.has_notes_slide: + md_content += "\\n\\n### Notes:\\n" + notes_frame = slide.notes_slide.notes_text_frame + if notes_frame is not None: + md_content += notes_frame.text + md_content = md_content.strip() + + return DocumentConverterResult(markdown=md_content.strip()) + + def _is_picture(self, shape): + if shape.shape_type == pptx.enum.shapes.MSO_SHAPE_TYPE.PICTURE: + return True + if shape.shape_type == pptx.enum.shapes.MSO_SHAPE_TYPE.PLACEHOLDER: + if hasattr(shape, "image"): + return True + return False + + def _is_table(self, shape): + if shape.shape_type == pptx.enum.shapes.MSO_SHAPE_TYPE.TABLE: + return True + return False + + def _convert_table_to_markdown(self, table, **kwargs): + import html + + html_table = "" + first_row = True + for row in table.rows: + html_table += "" + for cell in row.cells: + if first_row: + html_table += "" + else: + html_table += "" + html_table += "" + first_row = False + html_table += "
" + html.escape(cell.text) + "" + html.escape(cell.text) + "
" + + return ( + self._html_converter.convert_string(html_table, **kwargs).markdown.strip() + + "\\n" + ) + + def _convert_chart_to_markdown(self, chart): + try: + md = "\\n\\n### Chart" + if chart.has_title: + md += f": {chart.chart_title.text_frame.text}" + md += "\\n\\n" + data = [] + category_names = [c.label for c in chart.plots[0].categories] + series_names = [s.name for s in chart.series] + data.append(["Category"] + series_names) + + for idx, category in enumerate(category_names): + row = [category] + for series in chart.series: + row.append(series.values[idx]) + data.append(row) + + markdown_table = [] + for row in data: + markdown_table.append("| " + " | ".join(map(str, row)) + " |") + header = markdown_table[0] + separator = "|" + "|".join(["---"] * len(data[0])) + "|" + return md + "\\n".join([header, separator] + markdown_table[1:]) + except ValueError as e: + if "unsupported plot type" in str(e): + return "\\n\\n[unsupported chart]\\n\\n" + except Exception: + return "\\n\\n[unsupported chart]\\n\\n" diff --git a/packages/markitdown-ocr/src/markitdown_ocr/_xlsx_converter_with_ocr.py b/packages/markitdown-ocr/src/markitdown_ocr/_xlsx_converter_with_ocr.py new file mode 100644 index 0000000..481e071 --- /dev/null +++ b/packages/markitdown-ocr/src/markitdown_ocr/_xlsx_converter_with_ocr.py @@ -0,0 +1,225 @@ +""" +Enhanced XLSX Converter with OCR support for embedded images. +Extracts images from Excel spreadsheets and performs OCR while maintaining cell context. +""" + +import io +import sys +from typing import Any, BinaryIO, Optional + +from markitdown.converters import HtmlConverter +from markitdown import DocumentConverter, DocumentConverterResult, StreamInfo +from markitdown._exceptions import ( + MissingDependencyException, + MISSING_DEPENDENCY_MESSAGE, +) +from ._ocr_service import LLMVisionOCRService + +# Try loading dependencies +_xlsx_dependency_exc_info = None +try: + import pandas as pd + from openpyxl import load_workbook +except ImportError: + _xlsx_dependency_exc_info = sys.exc_info() + + +class XlsxConverterWithOCR(DocumentConverter): + """ + Enhanced XLSX Converter with OCR support for embedded images. + Extracts images with their cell positions and performs OCR. + """ + + def __init__(self, ocr_service: Optional[LLMVisionOCRService] = None): + super().__init__() + self._html_converter = HtmlConverter() + self.ocr_service = ocr_service + + def accepts( + self, + file_stream: BinaryIO, + stream_info: StreamInfo, + **kwargs: Any, + ) -> bool: + mimetype = (stream_info.mimetype or "").lower() + extension = (stream_info.extension or "").lower() + + if extension == ".xlsx": + return True + + if mimetype.startswith( + "application/vnd.openxmlformats-officedocument.spreadsheetml" + ): + return True + + return False + + def convert( + self, + file_stream: BinaryIO, + stream_info: StreamInfo, + **kwargs: Any, + ) -> DocumentConverterResult: + if _xlsx_dependency_exc_info is not None: + raise MissingDependencyException( + MISSING_DEPENDENCY_MESSAGE.format( + converter=type(self).__name__, + extension=".xlsx", + feature="xlsx", + ) + ) from _xlsx_dependency_exc_info[1].with_traceback( + _xlsx_dependency_exc_info[2] + ) # type: ignore[union-attr] + + # Get OCR service if available (from kwargs or instance) + ocr_service: Optional[LLMVisionOCRService] = ( + kwargs.get("ocr_service") or self.ocr_service + ) + + if ocr_service: + # Remove ocr_service from kwargs to avoid duplicate argument error + kwargs_without_ocr = {k: v for k, v in kwargs.items() if k != "ocr_service"} + return self._convert_with_ocr( + file_stream, ocr_service, **kwargs_without_ocr + ) + else: + return self._convert_standard(file_stream, **kwargs) + + def _convert_standard( + self, file_stream: BinaryIO, **kwargs: Any + ) -> DocumentConverterResult: + """Standard conversion without OCR.""" + file_stream.seek(0) + sheets = pd.read_excel(file_stream, sheet_name=None, engine="openpyxl") + md_content = "" + + for sheet_name in sheets: + md_content += f"## {sheet_name}\n" + html_content = sheets[sheet_name].to_html(index=False) + md_content += ( + self._html_converter.convert_string( + html_content, **kwargs + ).markdown.strip() + + "\n\n" + ) + + return DocumentConverterResult(markdown=md_content.strip()) + + def _convert_with_ocr( + self, file_stream: BinaryIO, ocr_service: LLMVisionOCRService, **kwargs: Any + ) -> DocumentConverterResult: + """Convert XLSX with image OCR.""" + file_stream.seek(0) + wb = load_workbook(file_stream) + + md_content = "" + + for sheet_name in wb.sheetnames: + sheet = wb[sheet_name] + md_content += f"## {sheet_name}\n\n" + + # Convert sheet data to markdown table + file_stream.seek(0) + try: + df = pd.read_excel( + file_stream, sheet_name=sheet_name, engine="openpyxl" + ) + html_content = df.to_html(index=False) + md_content += ( + self._html_converter.convert_string( + html_content, **kwargs + ).markdown.strip() + + "\n\n" + ) + except Exception: + # If pandas fails, just skip the table + pass + + # Extract and OCR images in this sheet + images_with_ocr = self._extract_and_ocr_sheet_images(sheet, ocr_service) + + if images_with_ocr: + md_content += "### Images in this sheet:\n\n" + for img_info in images_with_ocr: + ocr_text = img_info["ocr_text"] + md_content += f"*[Image OCR]\n{ocr_text}\n[End OCR]*\n\n" + + return DocumentConverterResult(markdown=md_content.strip()) + + def _extract_and_ocr_sheet_images( + self, sheet: Any, ocr_service: LLMVisionOCRService + ) -> list[dict]: + """ + Extract and OCR images from an Excel sheet. + + Args: + sheet: openpyxl worksheet + ocr_service: OCR service + + Returns: + List of dicts with 'cell_ref' and 'ocr_text' + """ + results = [] + + try: + # Check if sheet has images + if hasattr(sheet, "_images"): + for img in sheet._images: + try: + # Get image data + if hasattr(img, "_data"): + image_data = img._data() + elif hasattr(img, "image"): + # Some versions store it differently + image_data = img.image + else: + continue + + # Create image stream + image_stream = io.BytesIO(image_data) + + # Get cell reference + cell_ref = "unknown" + if hasattr(img, "anchor"): + anchor = img.anchor + if hasattr(anchor, "_from"): + from_cell = anchor._from + if hasattr(from_cell, "col") and hasattr( + from_cell, "row" + ): + # Convert column number to letter + col_letter = self._column_number_to_letter( + from_cell.col + ) + cell_ref = f"{col_letter}{from_cell.row + 1}" + + # Perform OCR + ocr_result = ocr_service.extract_text(image_stream) + + if ocr_result.text.strip(): + results.append( + { + "cell_ref": cell_ref, + "ocr_text": ocr_result.text.strip(), + "backend": ocr_result.backend_used, + } + ) + + except Exception: + continue + + except Exception: + pass + + return results + + @staticmethod + def _column_number_to_letter(n: int) -> str: + """Convert column number to Excel column letter (0-indexed).""" + result = "" + n = n + 1 # Make 1-indexed + while n > 0: + n -= 1 + result = chr(65 + (n % 26)) + result + n //= 26 + return result diff --git a/packages/markitdown-ocr/tests/__init__.py b/packages/markitdown-ocr/tests/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/packages/markitdown-ocr/tests/ocr_test_data/docx_complex_layout.docx b/packages/markitdown-ocr/tests/ocr_test_data/docx_complex_layout.docx new file mode 100644 index 0000000..4ddd697 Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/docx_complex_layout.docx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/docx_image_end.docx b/packages/markitdown-ocr/tests/ocr_test_data/docx_image_end.docx new file mode 100644 index 0000000..f2a9a86 Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/docx_image_end.docx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/docx_image_middle.docx b/packages/markitdown-ocr/tests/ocr_test_data/docx_image_middle.docx new file mode 100644 index 0000000..200f3c6 Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/docx_image_middle.docx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/docx_image_start.docx b/packages/markitdown-ocr/tests/ocr_test_data/docx_image_start.docx new file mode 100644 index 0000000..7855bd1 Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/docx_image_start.docx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/docx_multipage.docx b/packages/markitdown-ocr/tests/ocr_test_data/docx_multipage.docx new file mode 100644 index 0000000..c698b0f Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/docx_multipage.docx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/docx_multiple_images.docx b/packages/markitdown-ocr/tests/ocr_test_data/docx_multiple_images.docx new file mode 100644 index 0000000..790ce0b Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/docx_multiple_images.docx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pdf_complex_layout.pdf b/packages/markitdown-ocr/tests/ocr_test_data/pdf_complex_layout.pdf new file mode 100644 index 0000000..f843ab8 --- /dev/null +++ b/packages/markitdown-ocr/tests/ocr_test_data/pdf_complex_layout.pdf @@ -0,0 +1,79 @@ +%PDF-1.3 +% ReportLab Generated PDF document http://www.reportlab.com +1 0 obj +<< +/F1 2 0 R +>> +endobj +2 0 obj +<< +/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font +>> +endobj +3 0 obj +<< +/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 80 /Length 4282 /Subtype /Image + /Type /XObject /Width 400 +>> +stream +Gb"/k$+*^]+31jd1_Sc48j,Pi+@:`R01h=9+]FPXQDmE0%*Lb4@[Wi36jU!;cssJbQ5,g%R?K'+$#.hcT_6mF8#7^^Y_6P]N!%L#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j4Z3bU/Gm9seJc&9=)%mQH&Yh?jIi$uR^]%u?lJ6Z*VV,Z28T=.3[G"!N]2!6iqW[_CVOQZ9Um#Qd)&t%d!r@Y0=g5[M*c.,qcc"UaVkc?KHVDN2.L1\=-s4#kKB4PQPI/e#*[DZR^Y$^Xi6K`(0^so>.#p7hgY.jk*u=UQX(s.K*Fd*WL[eV0^25,*fS$V)Q-Z/Ii`SFMMDe;iM2HMUsg37cCLf8L+b)KY6rDWN#jR5YQ.1R1gba'M2,4kCR4578^=b\Bn#r]R"h8?'u=7,fh/#GD*$m:^@g'YaL,g&OD*(\9V3qM@J4qIp#mRhYee0oG3^KQSpS`k(a)%\\KrFo]NtW+D/curY.W3&31Xd58H(q0_cISKlIaB)]QNP9.mWE2l3Yi$/1lIFGq?b"J/%A'=e_4!7DM'qA.H+Eb6+$Wn7]h3.+R^:;FLJDKh6Z0V`KM>R1?7!q\hg`Vs6PqO%XsU&A'e9Y:\qjB8:9X.p&5omkN]TihU"VSM0Gdu%IekLW7Z.T+=gZ8?G+)N`8D1/:))EVV'V%>@o^?^e2`FI#RXkRcVk51O$"8"QQu'od45V^,W"[b:mfhGls_c]o[:8?WXO3sS%ABFnD/;VaJp_j5H4#BX4qPO#9Rd2UQp3!11,Hp.<#+W;VjHWUp(CD>tmalGRY\Uq<)U7[bb1;!ICRCfbRQSW`R7B!G^;uZXfo5`5U7D;,E%89G,#:1)%4QDA%S)!5IL>R;C=R4rV;kBDYiJMaSpYGjR@-K1X!l^X,hul.*@fk/SRgZtX.(?1#F1Z,3(>l3p>ihKjO-5DXhZKVbK,R$;#1+h5rhZ?WW+cIfDMR5M:U;WJk^U1M2V=1pp;4^.2,/RU"b7N@$b8R4LOO?H"DR.Lf`L[*m,BTYmDZ_t`L-M$_)#8#p!)[O706GPi_l#Yq>cO^MHRc(Jp:hO`,H*Y]jp")!6$Iu21q$\8nLN&Ju:,=6H/*LPHo45Js7W8j$_!Qm0FH1P&^"`>@W4%?`NmaE^)_qfq#N;4tr+%k&Ep8k#92@_4?NnV=N@"8F,!hg:if"abZSI*B&dFMB&j8pk=5i_MJAeY/_a-bBH!b7VKr\Kt#C"Ke<_A>`"`=AC>VJ=jpNj/XAJ.8N&11/:hfIr$D^^R2#qRLKK:(9GU8"CB@_;$5Fq-q:K0TBPN]^2`GM'aEs1Y+T=D'>N2JXWoc8.%IYO^gsm'1RJSeGm+YDRQhLku5aKi&&h'k:Ae':8oK1_Fu*NH]a>^r>**\J#14;Ei@8Dd[B!VZ.j64i(icM@UQ_>]1i+QL[q8@sXNl,qq<0pH2rK@bgt+3u4X[=5N,XXpe$Pa+h/i2Ns+!9@kBH_P,uQG__S.W7M^frRPr4EZHW;p0Je?#:'3`%IWs^jMgsS>TFs]-96.iKS'H_`---RRk+q]Jr]FS(In4Pq-F!6Cm%,U[%@0.OI2<<)q%YS\L]"SQrA8jisi-Yc]j.NcUR5eZO4@bV<6:Q<7Y8Tbc.:)0RB[f;uae0#hXi-F,V+Y7!Mj#7a2'UX7up@?R.l5hdJ`J2qIRW9l3nLb6mCBmOignWA-O?D51eitk>F^2j&Iq6CcN2Ju0jXH_V;7Z"7$/f.cVY>Mu"+'&]*\$$EFH_au5?=QCNV/dCcC.k5.]`boT#$n8q"$7k7cbB=_S?6!sI(ERNS%rY/q#(V&?"M=dPp_pD^aiJ84-qUUOnpsEBD(@=c8&j(fD<_iW8Y:1]3'Z*lk814$BMEn>20Z3q9`%2[odf^kVG8_KfBHJTq!iP-bZf!WUjfi-Z(mjNh$1Mk%I4bUXT_KbmDgHtQ7Z[%/U=`ol;d(+7MLe^9J.%pG(>?Z7,R7p2_!_Qbsrj-nZ^jp1P_QETt1O4#d3il&h>-]FM?E7.3BltXHM-bbl_r^C;;uMGdf.Kh%L(0?a^%V$SMIKn-g/OBA,Ng_8qOt.G4*;07b-d&^'[LU$f5ngd%r-XNimO'c=1SVor0:Eg?<1-k=*lR5.^@!L"%EH/XBn&hq=*'_o;%t#(A>I6JN':Wh5=&pRCU'1C"15l6HQiH<#l)E>c9A33g31NEH\$h]o'o:W53E#msr(FBMb0g*jP1nCIbQ^<-?M19Kr3mq8.j:>;q*:p4Rb"@"DU#`i.DU&`=Vn-ANGOK'T46_'jF^$R0`j>ib(E*\_<8o*cItM:B3D-9Z>Of29HcT0]Z'G'co.PNW`2:qpYXp0-36TIRP-&3V+PPe>^kkuHt*7[/f`Z?74q^`DXV.TS7@]I@7J7#?[&(&hPL%\629`r50o^;oKq?P9#!l9@Fff9p3njK2nUHBg!&A`c[uXD61%4M,a"/_P#gZUo)#L[uI,Q>:BQkk3P?Scmo]DXk])TLK"NX2u"><@[CElgT\uF2.fcn2%@`(OZK[lYeY=buQU?HWK8[#]q-;`G![Hb_HQO`H8MPPH7g](3ooEl5/4f<4*3K2RX<59b07B^anj+,ZW=XiC0F_c7bFM+n7)(]:kV>d%&i?1&P=:i,4>V2MnI*kV+_s8='X"H,gcL;Uo:%-"-M]-mmX/gFJ;bSiNq;:Y3_r5g~>endstream +endobj +4 0 obj +<< +/Contents 8 0 R /MediaBox [ 0 0 612 792 ] /Parent 7 0 R /Resources << +/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject << +/FormXob.0315aed9f6006a101b3226a3b7404028 3 0 R +>> +>> /Rotate 0 /Trans << + +>> + /Type /Page +>> +endobj +5 0 obj +<< +/PageMode /UseNone /Pages 7 0 R /Type /Catalog +>> +endobj +6 0 obj +<< +/Author (anonymous) /CreationDate (D:20260126172022+01'00') /Creator (ReportLab PDF Library - www.reportlab.com) /Keywords () /ModDate (D:20260126172022+01'00') /Producer (ReportLab PDF Library - www.reportlab.com) + /Subject (unspecified) /Title (untitled) /Trapped /False +>> +endobj +7 0 obj +<< +/Count 1 /Kids [ 4 0 R ] /Type /Pages +>> +endobj +8 0 obj +<< +/Filter [ /ASCII85Decode /FlateDecode ] /Length 260 +>> +stream +Gas3/9kseb&-h'>I`6Z84fgHmCc;"L7g6_e&889#h,kA$Zt,m0Hdcho6>O[sLZ+YF+:QDRLY`5CAhdUI=MeslW_fp84Bms2r(UspMdQW.jtWA9rW?q[M1*5b[XIYc1kOQ$55sEf7La^q2$a/'T.)S#HA_.,@?H2a/)Ol=NY+4r->>:n6'/ubPg6GC78QKuEendstream +endobj +xref +0 9 +0000000000 65535 f +0000000073 00000 n +0000000104 00000 n +0000000211 00000 n +0000004683 00000 n +0000004939 00000 n +0000005007 00000 n +0000005303 00000 n +0000005362 00000 n +trailer +<< +/ID +[<5d5eceaa0d906ef66e559ebcd616f18d><5d5eceaa0d906ef66e559ebcd616f18d>] +% ReportLab generated PDF document -- digest (http://www.reportlab.com) + +/Info 6 0 R +/Root 5 0 R +/Size 9 +>> +startxref +5712 +%%EOF diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pdf_image_end.pdf b/packages/markitdown-ocr/tests/ocr_test_data/pdf_image_end.pdf new file mode 100644 index 0000000..8b020ed --- /dev/null +++ b/packages/markitdown-ocr/tests/ocr_test_data/pdf_image_end.pdf @@ -0,0 +1,79 @@ +%PDF-1.3 +% ReportLab Generated PDF document http://www.reportlab.com +1 0 obj +<< +/F1 2 0 R +>> +endobj +2 0 obj +<< +/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font +>> +endobj +3 0 obj +<< +/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 100 /Length 5779 /Subtype /Image + /Type /XObject /Width 500 +>> +stream +Gb"/lGu.L@*62ts-nSVsLep]WJ0p#g,,H"eV3Mo:'q-'naCLc5M\VG_=;mW/W+e\(C$!d1Bi^8$=u%\`U(31kgKY_H-]dB(1s893ML;ZEe)fS68opf-L/N7R]a7G'8%'9hg`3+/J,1jj9\gX&c>@2/6*:H?u&=D*gTWRl>7,*?C-"0lc3kVWrCl_,F,mKlM>.br6a5JrjJqhqAS1r9h2c4Q`W7:=7U01<'Y\G=[ZZ-\NZGn7rqetOGG)[a*j3;fr;#NJ_3f_6U>qPNbEd;=gfaJ==BJ[';Gq+=-GkU_mFk-5#mgnU5(*.NSNDmDac@7fpAZ((gEG93p.F(*@U^O9R(H(&gA^EA^FC2$EE-#h47Bi2V+VVPFRmO$^\?T3r`'!(=e@m+O@cKRX^682KnBU@WfL2a?7hok1#$W]rT7JdcM[Y8hbq(]UqK8[ruf;6/a*gkhgIFI4lqo,L("qt<`\q_^].IH[VabWH1L%KfN?drHg\G5SNj'H[f>7S>MBa+j59Cea1L*iBY1"AY3r$B/s7C?hr;`k:HX3__o'AT#3S'klp/iMaN;Qo,BL%bgQ)`fDAb0<"?o.:>eba$6RE'@`f(c;918?cs8Mmm/*mQC[Jg,[hqJ)-o?U(B]lrdHRuL4t%PXuQ2fB1YQX5![fCJCa[7fpVGoF#W1M9Hf>[2`,5/K=FL^*tCj,e[_mI=GkqWb,J04/V19.K`3j,\-7\hu,o0.JG$1c4j;o?J::G3ke`Hc0OGW):"r-+-Kjq!__98d5\@&02dN)t>>p$tAkmd;ZUtGSLr.aJWOgNi36+e$!,nO$EWrC]=/QJ!;&4*YoN7p=X(cElfU\^A+E[NbrU^U[C[.Q`L[Bo[-0?Tc7.BB[NPENuiB*c:);]Ie.<9@]aH7]W@Ud]3;RVi@='A7WFPQZ`dna3:RQ#/V@4htg840:&Lss]^Rl>7,3co%'6j+F/>ISLR]^cCn7:R&K(ZR,Y)]9NMSnrhLBfb$Z.m)A1)2EZVoodTKo<7/l)8ETXT7?jNief8U7m]NDJgd[ZYM'J,F)tc?ri]3P\DRFaT^t.0#=I?%=&XO)q+F@RA[SsHX&bTFZu=i.8pJAsG:4EpFVkLj_'9B=7$j*[g+H[B8(/0InfrNE+,Q[Z?>KaBk_:[L#cig[85L?B[@m##Q0[JPb_?d0e1^p;q"FH'h$g8)>oBtCLU`s>VpTgeAg9A?!Rnh,-/D(pSpoQ/R+O/RG"lWbI29B,4h7l)Mg!5`KuHJn(t`tq:/PRWEUKbdOqGgG$8p,Ei2['jQdk7n3LXWjHEA3hh*OQb-l0Ad#LYe11c=ue*.1u1lMJm(3BuBGg8>4L`%76rVQ?0+&Ln`h7H;jm+&/'b$WWVm]EQha2W)d\,,J28B$EAJS7+H>)X9qNJXdf`XcpY5(1"N:5r82VKPN:LU!gWGbMd_**R3HTh[sBK-W'QZj8f7M("4V8?W+E.4L`ED3Dgj^\gtFUG*LG'T\.S$PtE`$CnG7bo2kWHM)$`GOjh:P-TN5FqNU.m^l7XRh/PkAFdns^GM7k^43I%gVLSthCDH:D/HD->OK&b*[E@@Z:$=-o<#LYMA5h!cNR#"n42*5?15Q.IVZ](Z!mkhn%A6e\_OK%MmNlIHf2[?eU5^l,cZ9B7$XaNWCZ:.\$*u6ca0p_E/Eb6$CE\tJoVdJ?fH@l+=.O)$,Q#t^]2#MQBd_lhpd$Ve)@`9euYW:FS`,CUWedV'5)F4a]s>oouc-kmC:*a8Qt*%:Hqi;IVAW<`JY1HkKA;-mL3P_gj%!o\*2]n_#U2Y>Pd^>ZOJ_)Q0tXfSST\80S`*nL_1lmd\Fkt()A)DF$D!T:f%/8]Y$A;'SYf#ViT*hVOMI\O,oX68HH!'GNdDiE*LFMi.P2]6)E!'&2_NU7,DFcbagg3l$,IlPRSQi+?(R[q<"/&B$A%4#bDXA.g+%f+ARPC<`W6NRuJ@G8fb%?!._$U&lnO;3o&-b5A;"E@*qf#M34%NFC*ke`cTlgG*IZ)qtqtMX85i#]&/Gf@N[V@#^oa6l-JqI.GB,Xs:W_>ISK.Z`-m$,!A"L>:t?P?5[+;50Kdm&UNcF[iB8l:tmc]1(84u"!O'>T0/E3;qodLdbIJDqY&B65A?SN[t8;*;I+'UmVZ1bACZj!mmWY$LSFd\TSoc1G4+ZlQKcE'kTZ?4ZR_pi*AQ]Zim#ZFXfjp7%*[mAc+oA1go=B\e*KgE?UCS*b4[GU(Q'Gpo"[bth`=TAQj"A7)?aehW[9]l9lt[VYGjVuSN\^Va:7MVqgKd=6))9R>i*'4K8%I=#>EJ(ItM)HclVT&/#Tg&J:^+1hJ/!melCl^s>8pqsCjQ3B:MV@B(StL(5.\9H>AJ8_Y>4j:V-i85E%%8+YlfdkoNl"jqMZbYdln(-LT=,#9g.+FdX^Q^?Ip?*I];dVleg7[:8lcQ9ICjd0?B1)H,`DqLANMi*B;h0*C"nCeO\NH@')EOKMb7rA"S0%O(sUQ,\PIN.2ne#YNKVFe-8$lC\60ZCT2[akt+7sGGp)fE1%fCH]Rm^n4!jNRIa,o\71LCP=/%j&Ate#(:18h#]hHPm30]^^_,0Q!.F7?EauK]roN_GdY8VFF<]aG<AMFQBW>VV__EBmY#H[W,h\Do7;=LA;OirFuY[R1)TD!KJpMH]ufUg:h(;Nn1=:)s4T+YM-^9o4D.bdp/r\;HHpIbBGaSb\53i%QktnfXZ'96UQJoY[;eJL!?^5^0ZhBIJ2?,XoPgL@Nqt"kb3cJpWD'Y)UWTjm-V:>[[H*TUS=L&NZIF?s1k$XAVrH8)A.T*QNII&[$f#hJ8EplB0r[>3,h-KIAD:6bk,,U-Z)@NOb0b"3YFL5)m-"j?"Sk+f<1ZQfm/7t_;"`_\Il2_dmf1T1MCPaT:)=DJ(nhL)HCF'lKYY"OCV/52BDQama(rF!E0O5SRtRZeG(@#eO/,o9[2j+4l(s^bk(^U"^Ip.2/iLp0ZlXi2bT+O<_dP6>j@Dk)@tF&0piGTA1!CYc)4R)KNq,j1e:HnbW[a&^`E8%TM6SOqr%tC&s8KI1OMa;'5gDNBI8H*Ve5Su,3ohV1i\J`F$>[.L\@i.&5P<#Ct&+)cqs$g&ET#Ah7NPpVV)<9BtHmEa@&DfADb+LU277o6K?(F1X(]Ie/b-C3MQ`EJfLj\;/J&fqXs.d8AJ*Qe^]g,CPhO7*\f("]YXr2b5[nn+`M;ml>W6$ldsN)SUMO!.&440#u.AcXs;4.S@'Bc!S'Wk+@/257NJu3'pRe=M_@!4j(IR9C)SM8#E4\'2U#Yn._nEek'llEac[sU]k'9Si)SN_BS\]QX9pdVdONf6eZ5#n4VUmsA#-0@+`#]49h@p=r2lI`GkT7&+XWn%QV'C?r:@%ZL9Vj+;&1`e,Ur?JcPN#Jn])_W^@[)o[G@n$D;)-q7uum+gH\4G[Co/YHk3#6bNc1rcij%87A&E=krul2r,%IE@4MRp%.cu4M2@)Rl07ah%)\S^>>Mk^QX>2E2VR4.F)hOk`h471qV%O2S1J]3$&D!+nqnC(Hq,&H&f2?3eWbRh1r1b88P-uqL-1pHHH7Zk02Dop)]K0q6NL"53Q>maG"f+#;^dLP$VZpA6"Qm,VV):dg@"BprVQ>!c57WF0C(;r[r)q1[VXV0q;Yh0Oi6sJ\b.V*+G+`#*Jt=U\dcFIkg9Kn8?mp\jc'r;s&&i$&!9up5*>r@fb4G=G-;C"h_Fk/7mWk&>>N`4[sjAnC3u/k[+aq'IuGo_cJ";O$b`9Ic^mQo7II?]EVZR#!b.G2K*QVce-Q3sSLQBMZ4\n*MXIj_ar"scGnWZ1F^49gYn6t^]#/M!RHSknF4/?qqoN[mjmOUm[M*[4C$]9kT@m)9]"/gM3<[*uXA&XHL_>p]qW%M1a:jctlYo0]YS.`)%M5g%CXrnkn[*#Zg^c[(3.1KT-CB#tKS5#kqWb-%VM%[eEHaAc,cD"bD0ME=0)Ii;Z*B":Kko8@EF5'e^4#nFfs?m*OplEZ?Lendstream +endobj +4 0 obj +<< +/Contents 8 0 R /MediaBox [ 0 0 612 792 ] /Parent 7 0 R /Resources << +/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject << +/FormXob.3b0bdb063edb008ff53c9619abf15dd2 3 0 R +>> +>> /Rotate 0 /Trans << + +>> + /Type /Page +>> +endobj +5 0 obj +<< +/PageMode /UseNone /Pages 7 0 R /Type /Catalog +>> +endobj +6 0 obj +<< +/Author (anonymous) /CreationDate (D:20260126172022+01'00') /Creator (ReportLab PDF Library - www.reportlab.com) /Keywords () /ModDate (D:20260126172022+01'00') /Producer (ReportLab PDF Library - www.reportlab.com) + /Subject (unspecified) /Title (untitled) /Trapped /False +>> +endobj +7 0 obj +<< +/Count 1 /Kids [ 4 0 R ] /Type /Pages +>> +endobj +8 0 obj +<< +/Filter [ /ASCII85Decode /FlateDecode ] /Length 261 +>> +stream +Gas2D5B"Cm&B/jCMDrrn`_N$8f)B),$qMGNg@5>O@4+c$[3$LN1/t8=pN4=;rLWXV6KF8?pAoNe=DfBp"[,+#Y+W7KeL]V>e[hcd"E_FgB7.Yh(_am\!;ursptskMWgNJ,Ij$<_$/kq4qFTn#Bmq"&h%_=4?k.j.IlbD>':B^bPrsrl:;Wi%:3\"P0U$K?*]X2cs($7&$;em=&Lr9_/N;]0hFGb)/$-(OXXS29%7=(glW10J*G"28KpRqh);#fn!`]X~>endstream +endobj +xref +0 9 +0000000000 65535 f +0000000073 00000 n +0000000104 00000 n +0000000211 00000 n +0000006181 00000 n +0000006437 00000 n +0000006505 00000 n +0000006801 00000 n +0000006860 00000 n +trailer +<< +/ID +[<8f9a328183a2bbe032e3e49c333ce784><8f9a328183a2bbe032e3e49c333ce784>] +% ReportLab generated PDF document -- digest (http://www.reportlab.com) + +/Info 6 0 R +/Root 5 0 R +/Size 9 +>> +startxref +7211 +%%EOF diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pdf_image_middle.pdf b/packages/markitdown-ocr/tests/ocr_test_data/pdf_image_middle.pdf new file mode 100644 index 0000000..d90bc9d --- /dev/null +++ b/packages/markitdown-ocr/tests/ocr_test_data/pdf_image_middle.pdf @@ -0,0 +1,79 @@ +%PDF-1.3 +% ReportLab Generated PDF document http://www.reportlab.com +1 0 obj +<< +/F1 2 0 R +>> +endobj +2 0 obj +<< +/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font +>> +endobj +3 0 obj +<< +/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 100 /Length 5401 /Subtype /Image + /Type /XObject /Width 500 +>> +stream +Gb"/kIrs;k+338sVO=`WX@]kC,17I-,f\doZ./]H/O5Y<6,S7r*WVdm^V:P55F:-h7LRFqtBES_uiF'ieoJ.m%8e,E@K_I^]%tBfub7P]]#X.SF[^6'@W[EMg"LAp$^eQW2Spps8MVZXC!#oj:U2AQ^>FroB+:ZGjo*s)/I`">8]VP\3+,?(0^mk)KfM!@o3Q#R@/sK/TPP3iRV+:?!X'dI82op7,+bXf<-,NBM`_i12[5a9.mbciPL<"]PZB0C%1&sVjqc7qI"&QlI`@OJ,/1Bp\Q/r@u`S3N&/a5L%!m3gSrZXIf&L5$:qh)C\uX`?gVA^J,Xh,(-pVsioB'c/mYJ)*[")(e92MCL(3aDXBulAjUN:l,bt2(^]3`@^8@dihL"_1k[m6'jg;BOkgcZ1SgmafZ"(gm^\p'l]H'R\a3YS<(Y>`k1Y`QdDeij`m5iqRfsS5qVHBPD(khkMRr:c$>A]umF/I_C+X$k='0Hi:d=;6"rr)RuA[Z@(.Z?t*(XK+(ldkjG\Va>I09I$u$Q5B5R\5Vk6p2RdlCY4'&FbdA9JP\_VW$]luG5r+`j8k>8D-1I-DtCtWjX9fL2E,tjPWHX@GGVfF2A'Z#^96WO!Vm0hs<:-*+Wgpqq7rO9AFHM-R0cTW;7i!98Zo?X,km7:tR>\MnC5pN^,]]C$oD6p6FH#S6EdYb4*P"ReQ%bAX0us!YX/-SP>=BAPT1^DA]pAFS3DA4KV3FS3$;b[Ue,$M&rdqq)Bl#LFrH*^gKWQS_[UWnX0tfl1'QXGh0,:.;NMG%D+CAgf<6:q.4d$Wg/M*R_Jd7!0!Mc7d,6RgkTX7prUeRhX><,+-EHNkb:^C8=&E\jhUb./d<:=uG^?g$k,r"%nEaPES!t?hhN"D??Z9gu155S#++l)jB^A6n1VlG9R-VpY5TYSZ]_1Di)Bka>3NInpF!lHLqJ-Q)c1F4=n2E"T6qs_K6Satg'daD#<\g.R['G9ZXTn?YS_7_]Y>e&.:Y-Fpe^bt7b.`%Z6FS-BJM3k0jB%F%dC79lIS<8qbniBbS7OsiL5JFT+67b!EN(\0K\>$N#THr*:pYDaFk4.C[`f1pMeF/klhP"G6Sbq^/)Xk?1:(^eI'!0*C=L)HD&hb[__>e33[bC-Pl4*INpb'1lb-Tg/Bp0TE>?d"TJf4OPeZ5MN9\TLEpYR2,nihHrE'09X.!UQ%KnW%TQiU(4/78B,qNo*4aT'qg9ti;3(ak9G1;f$jnn`:-R#5O@)@R'd]bHnipYK+IkBuk%mBP]9hgTa@h(#QCQr!h2K%<+:'=3IF+Gut8X?1:=5,5U/)^RTAE(Oa8m_#BA"meo[OpKO9dW(h+(gjHI_!(Y::R-TpF+"s;`esVJ-82X0PV_,hhCg$mALmo!D,k=X?9j2br$h@gX^W5clpWVSp96$pe)1Q/R,Z^9*&eO,C[%+(7HorZkr$+PMk?cX22#V5KNJSSd#MW-RpP(D37+QmgZ=SgrMdh$0X,/;3lMp.:GpiFh&^I8+=!H$PrBGEu/phR+A1]kuYN+l#u=K!r'HZHS]%QKC>?3jK"n;,U0OqXs.dK&O`!:u0I>FQh)fjREGQ0ou+XG"eI&3#HD4=4%ja&`AbRZ=hgTN@%b4.LRZ&bHHFkNg^$1SkV#qu8e;Gq)16[s*OmXK\H!u^EMlR&WX#,Bm^\'`WMdQgO@Vu.,4;q[]]3;P0nMc_@>pV.X[,]W:9S!;`7'J?A$2N&G^o_gVejguhRluXCpi(*fs-NtO#Qkp,[>G.#=F7V`R!_>IpRSiPa!$Xa*L,&p/bRG`J>RNo^92XU,8FS<1TDD(^uFD'N&@T?W%GP2biJED;42N$E;b`$Pm-K'f38UTTFRY^Y1<`Ve'FN/d:mMm/mP$0V$I<>7%dgQmde/`m"o"E`?7-4mN\:76F:XRctOG1XA0n6r,2q#6LIA5ZUlb5simYlRZ:58P)cWk-r@Cj&+Roc'p)UhECVgdpSW``Pll?IH8J3&gBhne^@U=VmaK?Pc^745Q9;n4UHj7gY[;>*W/='OIpdiq"_`2+%/HGp;\5$$jQ+R[gsITl"I\%SomHU9lgG2eW&cZPAM!o5DG#slq'KXX>Z.M^Y00OOdoY5p=8]9mBY]Lg.[-P>O'AR9'[U8_XuiW(ERtn62*4pCa`fff!o51%f!!4MK(pKr778&]\q775IFZQW9q"#[1al\O;-'>4D$j(u>!!N(^@gqW`"6r?c-D$_(tZ3t.%UA+F1d]Dqm-NI#HFX/hH9=MjX``?.>YZ7Uc,?!WGMO,Rr3\i&8@6Z'aM.D+cl4/a@rg85OsHhUS7%CSaZIo480rRP(Qg5_WGb/uK6#@j&O>NWUccj&R.?[_U49KLsk'=4'FTm,-A]Z-^3WIN0dTV+PRPWDbTQF?hgj,Yr;<"#K&U)Y3-?;jc9(DhjGlO2McAE?W08r>o8%W&4LZc6N&U91e.:(1HUmBn8sR7+)5)*&mC<)dB`_a@3p:;6Vn]-Dd3rSYMEZ:M-anuOtJgiGs7=,17??>?tl7W^V/c_l!DJ,ar=mEV7aX]K(pe^2SE2V8p:&(Pak>@ZN2m\9(c/EE1E;Gts>p$6Ii_])*WUA1U)V6Pp$XX`/7RL5PjRgKNrifV;oUWf/m@gg5aI:P`=ns"-;oIg."(`Zoq=Ud908(aHA<)LR_-oM6h`dOL_Jk3B?o^V+VW\$rnQZW/<42-=ph&Rr0_=2eL:$&W'Q#'[Z)#'KS$_c8;7CeST[!p?gVrXi\tc7lNo:j[eA+kFZZRb>DGH$40.a)dm]CL'mbOVulIGliru1KbPI@5ujPcu4V7E\Ham@[69'=Ed@mj+mWroCLCtE3-Zt\*m)+O+[PSEdq%S,:$k0]p]=lFoMd=jZ1)^Rb6UINumagI`4Id=,FA;+'t+]r&W/fUGp??oB[o4p?1W2lH$N*o%\,*no`C1'-jrZ(m`hKNm`IZ8+/PP]Qm(r@7<\1XEEK_VB^7/-SOmS89me0qp$YjZFc=\AgYjV![_"*?Hi1^b4$r[8m9S)(:UhZmFpgU$WXuZ.4"[AQeuDl`Rj1lPU4QeuW$$(5-`R[BZuO3uBbcN'(QCIT<%JD0H&@gp@t2+A.9ac6+#;cej<'O%+ot?E+;ZD$PcKKlS)+hU"XqOd'dtoiBo-=h8^SBMNSg1t[)UU_*LkjQ;,1msaoMoC-YgmpP8D^:f5n3#p7]#BH"$Lu&;#,)qZ0AW^3d:!sq1N?H639b)?b++Zm]mZu_ST@.=X\n.Gp=TbCRMH'GQom/%n&M)*%&oistgHb.$;:*-#YOd'QiU=s:fD`f5%B=S>a)%:ZC*CY,_+YRBZ%)F8?I%NL;`*r/+>^];lWO6ps_B&1%@G^N$ebm[B.0/%7]L(+0@7L1/j&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bYrSci3tN"Yd"~>endstream +endobj +4 0 obj +<< +/Contents 8 0 R /MediaBox [ 0 0 612 792 ] /Parent 7 0 R /Resources << +/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject << +/FormXob.94284ebb61fac7951963d5746d1b193a 3 0 R +>> +>> /Rotate 0 /Trans << + +>> + /Type /Page +>> +endobj +5 0 obj +<< +/PageMode /UseNone /Pages 7 0 R /Type /Catalog +>> +endobj +6 0 obj +<< +/Author (anonymous) /CreationDate (D:20260126172022+01'00') /Creator (ReportLab PDF Library - www.reportlab.com) /Keywords () /ModDate (D:20260126172022+01'00') /Producer (ReportLab PDF Library - www.reportlab.com) + /Subject (unspecified) /Title (untitled) /Trapped /False +>> +endobj +7 0 obj +<< +/Count 1 /Kids [ 4 0 R ] /Type /Pages +>> +endobj +8 0 obj +<< +/Filter [ /ASCII85Decode /FlateDecode ] /Length 314 +>> +stream +Garo<_,%rk(rbtB/)F,dFJYh@h+b45RWq[iGbr1qB##rKdJmNBQ8g`0?Uu%0NPi#(l,WOYXYU_/hbu,524oZ3GAPH[n#fl`6VeHqD4`/5Dk1qfc[1X>L=.s=H%^;YC7r+>5(ul'-eG>]:gMQCK0%cVT;/[7UY>9V`tl[otSYJQWWJdBGW:Qe#*-/iQ);0heOqOgl?d\SAXM:cIl'B")R[Nq#~>endstream +endobj +xref +0 9 +0000000000 65535 f +0000000073 00000 n +0000000104 00000 n +0000000211 00000 n +0000005803 00000 n +0000006059 00000 n +0000006127 00000 n +0000006423 00000 n +0000006482 00000 n +trailer +<< +/ID +[<49c620c4d98161aabde15f35beb609d0><49c620c4d98161aabde15f35beb609d0>] +% ReportLab generated PDF document -- digest (http://www.reportlab.com) + +/Info 6 0 R +/Root 5 0 R +/Size 9 +>> +startxref +6886 +%%EOF diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pdf_image_start.pdf b/packages/markitdown-ocr/tests/ocr_test_data/pdf_image_start.pdf new file mode 100644 index 0000000..0b57b7f --- /dev/null +++ b/packages/markitdown-ocr/tests/ocr_test_data/pdf_image_start.pdf @@ -0,0 +1,79 @@ +%PDF-1.3 +% ReportLab Generated PDF document http://www.reportlab.com +1 0 obj +<< +/F1 2 0 R +>> +endobj +2 0 obj +<< +/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font +>> +endobj +3 0 obj +<< +/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 100 /Length 4720 /Subtype /Image + /Type /XObject /Width 500 +>> +stream +Gb"0VH#OJ:qoA4A'Hn[Z$4u82K`j4ZR%PRX#F(qaK&V?K8q$(4m``.P7J2`EVk#7#VtC6:(*s$)76DC7esMC3FfEP21e=J%f9>Up;d>4l&7WcZJp*DIk*ozzzzzzzzzzzzzzzzzzzzzzz!!#9dqY/lsQRl8pCtPt8mFiS/o[0Y;WL90Bj2R)5Y[N/Gfn0f!(^fES:Hsio97D?(Xn?5jq!"]KU?6X;&P"ZofW]f$p(q"VdAg3IP#\)UWg`=MO$9TD0>@3js(id(lnKeb'nVdJ7e;beRl:iu3cs8nIEn"+F\0s:]mE81*hAmoIah4b[;OfHn`%MZXAic(J)7jH@XF(Q5('jYY[u"DM\[nu[VacM!sePdg%40X+-%@'f%P<0baG`E4oP$%Q/JgWmY\Um466eV$?Y)Q9)nh\c]Mq3f`',ShaFWth-oBcOd=q3cTY"+4B:C5D=,9ZUiS/hg5kVA4*K+[1fUQ5bbN_se_nC2++F"$]j*.j.s0:>;7?2OB:nigYB[?_a,UlbAq1i\Q1k(X_QRsW=DbSc%F'bgtZ/2>e+n:Kbn'oJ*sr;^;r/1Z+Vo?T!W4\7L>qdS!IH-WeB#r>ch5>%TRL&'caJA=^o?na4Y*tXgMf5H"N01jlPU;HhZ+Fp?gW#Ip5H[Y;u'ao8X`t6%]C0Gr*LD?+V"4C6Y<]^1GKRW1+$NV&M@2Zk6GD=d]J*qS.+7cN!h6O!e)cfIf(11iD*Y7*8G.!Fu"f5Q5o`Fk=$Zb+\X]m_AEjKB&FH%hV\FAFmKDpQr8j8Fc:%F6j4nD>GFYUI\FW`_flD4$L7>hqmCTh$Uf"^_*P&AuHA>O*GCsJP2O$EX=EQ9*O\8c#4&JZa4ARj7@85!.BB?cmQFmIWVr;9Tt>%neNS8ud$:Htthg-nk9OnPg[;$TI`>0jlL=-.a+_hJV!J^kl(gQd$+PUS\"[U''if?a#-Wq5c7%ag\H8!"ALF'oU?-RQD7B?-GJ]">g6^'>CFf@5s8D[^Y$F-4CB"/,6\Z#s5Dlg2HM"F7VJkA+mbDo1>N-;k32'EW?6)(KY`B.a^\mY\PAK*gH+(7uU^F-po`(RMK06EPHHdD0;Bn\le,c[Y^V3lFa4(IC]mFtt!K@dP\obnltl>ED4q/%Sh>o`U)Q4>YWf)aUl].\e=Ep??@2&sT>Dj(+.kQ]\905L.Bs_jQMg@#5Ad*SY8q<&o-8CKR8V]"=q/;I!IC.2@XQ-aimNpYWG+0>@4U4t=ghn%S,KY6Ikm?-DX7<>iL_CI@3\nVAbo+bpOJCAW-_HQp]RX&=4gH(-^/Z@s5UCp8LSn\c))=o$*Q*e&.Y0ZhJX-aq'tbOsIoYE.k=J%d;VB:aBqmjXiPmbHLOYe%@qT_T#OK#H+:rVJ+)kHmJLjHDqC"3ei1+I0_7b&fC4SN?G-:HnC^ac96r3OHOcmBeK9_BW&gqhjl7$/j4:&TqtBm]_@&#AP,Y']*Eo)'ONPAD?/7inL-[;Y?u405b]Ka3.k@s]eE(j,^\#rI[BQU.aF>#4B$F4/opmSM*F5.or8NVf4NVD[_hmc;1iLl9739e\*dBrn2.Z2*I#rOpSJRG>K>mOaX&`@A]5&)7CQehdOsNbC>B_D5cTCSXB*di92jW_WX!=EhTKNR%fVt.%Q<$j[iSM!0d6/k`CY(3)+XeI\r:.i,Kg1O$IGhnlT&gm[L0VDFcUFbqACJhEm'4S\ChfI]q:mQ"ZL[OBmJ_6*\'6h<$$-W"[^F9VSm/"@Ys!-Y/kBOeN9p]O$ui,lR;]Y'fWi?-gh&>)baIM5;iRQYFQq5Me#,uCc8P?h`0K;LQ;G)X')<c6G0oT/C6@E@rs_sQ]md5bQ1:uFPQQE5J6oF@\//u>D@rl6i0Db`9"CffLREne*h9ea#&lL)T6cQ+8d[]`uKp7-3LZ,`(u?(+fr>(mI*G/\k-YJ:uXP.0=t4*2mZ-eQ't.M\@&WZ\RXWp+0AS>cW33cqTe`:fXp?:r\D9j`52V-"$nNukEh^\mZGUTX8#HKq+^Sc;g;39(Dp^!D$\>%6A^eKD`,3r`Ehh.YK:6c7M&12fk$N'7q-hiZ)75?UbZH"N)8kd3WGbMX"P/.K^RR%.rqaT@h#u_AEs1)jILMOBks:._,[m*eQ/HMh/2T8\^k5p-Gbk4:UO]EUnspPj)`O0k>MR,M8scu:M"<+[OYCfCtV].VbWfJfu8q0hWVoO!s]=g_kYp4`r`?FMo`BP7H7X"Km<=Xfhj^&%sjRIEf&?W*]uFIg@FfT]->7T*G\=GA%PJcB1O#e,SV5>$gU<+7PBoDYDti\Up=RQ$_FGu!./Yu`]tGE[]1Xflr3@X<>TH,QPDG[k\(f5EN#4:dBj9Dtm'hE@kFId$O4XJRs#>%h8.(c#WXs*h$)D\#t'aEg:?XOs\SA`#9^5CU9:pL#D+>hdhTPki]s+6c*j=3>f=V>+D"=D5gHfUbY*f!X/5kZq(aU0s.TSSbiDpX_$Rm57L'='`/+(t;Rbo#i[rD@pYrm#1Im](Qt$A+9]t/CS)@C[llo,Jtei=]'$rSMO^!toPHeWb/:-J8LrU&LWJ)\^W(Lh_BC(Sq=]a:sW(9CiU>+L>jbY2:SMO\P;Zl(Q*_#4$"rP+Wa'D/kYlP9isqjdB"Z4^4CMs\(rfP=ldGLb]=a4-[40"Pb'F3QT9K-?3m2,`JlHL%\1Ij*8m=o$^)IJ``GG>8o)=:i+tB%*VOA&aJ4rTY`4DQu\n93U_d,;'5W.rd^OPtDft"Z(lDUVVUibkLA^[AGcHdpAzzzzzzzzzzzzzzzzzz!.Z!\J#>u+ci~>endstream +endobj +4 0 obj +<< +/Contents 8 0 R /MediaBox [ 0 0 612 792 ] /Parent 7 0 R /Resources << +/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject << +/FormXob.41b05a9cf8679f0fe6e7c30c9462b767 3 0 R +>> +>> /Rotate 0 /Trans << + +>> + /Type /Page +>> +endobj +5 0 obj +<< +/PageMode /UseNone /Pages 7 0 R /Type /Catalog +>> +endobj +6 0 obj +<< +/Author (anonymous) /CreationDate (D:20260126172022+01'00') /Creator (ReportLab PDF Library - www.reportlab.com) /Keywords () /ModDate (D:20260126172022+01'00') /Producer (ReportLab PDF Library - www.reportlab.com) + /Subject (unspecified) /Title (untitled) /Trapped /False +>> +endobj +7 0 obj +<< +/Count 1 /Kids [ 4 0 R ] /Type /Pages +>> +endobj +8 0 obj +<< +/Filter [ /ASCII85Decode /FlateDecode ] /Length 250 +>> +stream +Gas2BZ&Z[T&4Ckp`KUTrY_02PMb#endstream +endobj +xref +0 9 +0000000000 65535 f +0000000073 00000 n +0000000104 00000 n +0000000211 00000 n +0000005122 00000 n +0000005378 00000 n +0000005446 00000 n +0000005742 00000 n +0000005801 00000 n +trailer +<< +/ID +[<38bd217c814ddf937f148e537dce51f8><38bd217c814ddf937f148e537dce51f8>] +% ReportLab generated PDF document -- digest (http://www.reportlab.com) + +/Info 6 0 R +/Root 5 0 R +/Size 9 +>> +startxref +6141 +%%EOF diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pdf_multipage.pdf b/packages/markitdown-ocr/tests/ocr_test_data/pdf_multipage.pdf new file mode 100644 index 0000000..71ffe8d --- /dev/null +++ b/packages/markitdown-ocr/tests/ocr_test_data/pdf_multipage.pdf @@ -0,0 +1,139 @@ +%PDF-1.3 +% ReportLab Generated PDF document http://www.reportlab.com +1 0 obj +<< +/F1 2 0 R +>> +endobj +2 0 obj +<< +/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font +>> +endobj +3 0 obj +<< +/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 80 /Length 3374 /Subtype /Image + /Type /XObject /Width 400 +>> +stream +Gb"/kH&U9Q'ZR>43%?O'&jT85=3qe+N1f0n5S02hEQ*VDL]ZS)-n8)hB\WWtW#6!n.\\/*',>!9/rH$h#V%%FDK@e$Z6UXYgOsk+@,I0-;pc?Ln*mjmTTlQ+?K]e$#D.eB)O5NG7.u<*,RJ_p,8ck3p(&R=G"JnR/3XR:fF1m&!LSDdTjBc:RULFd1\@RVR9Y?IMP'Ajfu#dnfOdRMIRU5?(oCkRY2lY[PsI[bY!Lo7-qefWg9>U_8tF1IHqd?[hMs%mDBpEmdN`p:`8WC.(r[:;%"2F3\c4Mk792G,A7iab,>+^X:Q1B)Ct4JU?`lHM9=Q*\(H$'0?&(T*=du53O+iUIN?O=m=In-UqEE?ghWKl;bQuY*`RG[FO81(L.M1]2c_&%<`*Rq.JU[hs+4A7O4D[l.+Bm#OHs>Bg2RPN#ZQX55r(P>4!naFjcC+c`URpH'c],PFQUH`f2c]IHB5B)pF^[%Q[/+G3L2gM0I)]FA`$/.TW]/!sdN]/(^i(\!C)"CY$!K4SDm&fpLsK;:U@[oi*BN6O,DaRecA5ZZ2bqqSNigQne<3\(m/F9af2d;H'i,rO4%j7$89^HKB/%EH:d:UE-7IC?3kP08QNWFSkIn&tXU0hVtf\g_heWn%H2@\EYR$R+7l,it!qkZIsA%.8%:l.UaQb$n85YGMOQX1#>:k7>o*u_[[jr:Hg5Je^Y:RKsDj_5eqt.H?@qPba^,b_QCLsH:W;>mQ8`hXnJRbG389AnU"B(jou524Uh#I;&q*.4u=YF[nrO\>7O#of";9>"U0op9fI&cNL!f<:OL7XG#UCV^-W3n&X!D@elVYI"hK-(KVsoa8aWL<4F`IGb_t#tRF<8C]_m^L^GO6G3G5S09nd$$CBb>0uZ(bhmeXG1t'(r;5s6L_nY]J6S``aIm.70L=toNMorm#gcR5B07$ZWs&l!IuphJhN(6@caN->pY971W`M`H*G1Tioii3W&eZ2TZB^Na&P9Djoa9RTg9i]Nf@JZ^+U;70)4?,Z*\6\f"L%[`BJ6Kj9)%X!(RqAJ!s1(OmA61d_0J*HI^@ba8PiPl-9KMgjahN*A`OY%3@&b90[&bRjRh=*NU*X?K'!LVo@=gF%20@?O=gnO^q+Jr=$XI159G:[l)_5.-%,a+qd/.CDY\U4-ibD!])@[SqAD!$G2,qi7HmfIX"M?gq4^a:eU]L7T>7]\Ni+_8Hg7U"jUaRtKJ]bt'P3`l9op5MT]g<,Eau4CBfS9qgs8'hVO^q,7ISl%l9Re9VphQHBQpHf7;ouN+#41(!]Bpq.a"s<1RM\\qcK'V\`Q!N1@Y46f"8:opL%80Oi]-T\^Ju'R*bNtSnP$N;[5T*[5i*NZi1b"eWK@pn&8g.BWL$qSS5iR0*5>0K4e?>K_G;Rn$CUR?oeU93At-Arf=,=bsA0p$p(CN!F<.%bX@`pKfj\]c&XOS1!:`do;;tZ6cVZD;+'sUf$Bt,Q7PD^1rYmq+$QG]u!IsY+87o.os_h#/VHUYNfDXl;b!f:?gYC^>D*J>\&T\cBJ-sI_$N&=^rPk>LSVt;>P.46N:frEgm*GGj4==b41/JMmG;,"j2T%QRo@!%UIU4G6f=MZjM:Z2L9M]+g80^DM^g"!X,S'0pD;8jH/WctXls@'Zr*MP]h7%8,[8^^e]B13k1X#5g%F[m.h'jn0tqYq>l-H\%mM%:Cju%L]qAkq0pc,X(bDM0Q0YGKF?F$>3]@tZ>A#n)hKGBreCD[e6OjG&;>-_QZ2iafV]GHA*H^f/E.J;%Om]_HNhJJ%MZ3l1UbMX*]CbunmMSLel=3ef=.@Tn,XYJoeZ)Vm1LF/i:Y!Za@[WkasO461hJ%d@!MX3Q&p.'mXeE"=VfNAkB83#nEb-?5c%',NbTmp@Ic`7),9GbJp)qSmsX%1!Y73W6=/j,Gsc`f7.5:+Yeeq`Ai+!fL?NOM]56n(<,D$J18;P:'SXB'[uZ2^6A$<-j66f6Jk.&&c3W_A9(cRi[QK19NKCB\1b$_[lLN+.MEm1P"`E)t++5a[#+RZP^-(?@hY,mDDVfkSlm8e\\@>q0l2NF1QP^9\%[*ac\o<7!Y^`e%EF9dL)U==0$Z?$N0G;m+M)j+f_J:*uda;lh'4kU0Eu=[1gfDqKnase*WU=mVh!(:]"OUWP//]CqZjE&P4=Fd]4EP,j2"jQH?"R,$k%R$h6Q3^%g%3\k2'Q2t#0eqVHl3N]f0YdOSV61-pC:UfT.\lB:IuJp7hZ'6FlP!LEm!e%`Z.s*j]KJu),bq<-MI+2hQBC-aRH=2j-B+b#)kL'"'pn0^RRn5.!9II/5DmNHR"$lgsSKY-\*\*MgP]XVFu.o`0C'fI6ZKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&<`?/!KNiek5~>endstream +endobj +4 0 obj +<< +/Contents 12 0 R /MediaBox [ 0 0 612 792 ] /Parent 11 0 R /Resources << +/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject << +/FormXob.b6d21e33426b982eedc18c3d4e93428b 3 0 R +>> +>> /Rotate 0 /Trans << + +>> + /Type /Page +>> +endobj +5 0 obj +<< +/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 80 /Length 3746 /Subtype /Image + /Type /XObject /Width 400 +>> +stream +Gb"/kGErr(GQpkKpmDTn_bcA]O+Jd#cU@+2Zm>]f;6bs;T&C"&dtb&Y8uEeu4LdJj]\*:qP]'A(VpnpR/2-8pYO1IP5N\HYDRFS?rdSVTS(Rm^Cb\t8pp[.ji7rj(S`LM@bl-r9u\(u5id79+1Zkc$%??stuVZtblhmB,pZt^mu0`%Ls1i]8CHul4%*Hj.6mVDOM9@Q>X0"[KH505<^W'N@NK#fUn2VXT_I7/`G*I"#V]/HO(ZsFbo9PDDePMK]!H;tTT$fdk/bb^Xe;u04LW/P]CXtHuJmX$+n(Vc\^&%Tk]_oPE2P5G57Z;<354V60?+`jnL(BV81&C9p'qn^>huY?a*aVo\^AQF(LMnkjY1\5I3=GilDM^Mf#@3^HP(^=%G)"Pl2nSAIgiK?0>KOPWqO$PVHF<:_o#LfLahWdh*9(3^m/EKhl,$Q3cLgQYNON\9m^^C9op:#?rd<2$Vk!&)dEY^Bos&AIDEpB'*`7%OW#]/JaVk$ICr,?V+Pq1smN.e[Kbq-r/47c-[@E9!u4pDp'F`gCN0YN(T*.>3lQo[*tf$^DS35K&9pYYmC(C!8KWI=YO-YhK@cWZ,bpLf<4]8ob:WF?31lf84UU8ba9QV_YEY=:-f*?M'pH:5PUm1J)N_lEV.S5l4J>"ICf>9o#Q>b\(rC/e:S+@s,o'A+RfKf[#p[C^,r^GPUR4gNZ=J]6kHiV:DYI45Dd,\<-li[]NSt[l+63)OsT7L1W3eYFAoO;cK:jZ>hAN$F,_R#nIAbhgQ+SUFQjta@9[u&(LAMenD)m[`R55a*Y3l"))/k=oMU)6$+i]jVa*618S=TZ_GFf=XBs_%K:K'Fo]DciT&f4e(<2n?Vtb+e^g%Nu,1VYfeOc,e:Kg5KNLk9Rcmq(6W8>+.5UZlp$,%&K@JAXf9t/kp;B>ou]%HuU9Q7>RO1)%6mY+XGj^Cfi\llJ`iigb(cAMjb)$@^lQ9+"%O3RN/\G-)Fg+T0@+7sBYYOlju6E^l%O_dec#[W(oiP)i[1+.GAD<*#ee)n]ZlQc:X6"mf=0EeR3-\Rc-pb@okO8@.\^lM5KERoAErtC0R56,oZN&WoDC3T.FfcQ):>]e:aVd1keQ_tITI3"f#+HG?-@C\E:ct,l:h'Y7/kYTAclDW&K5RRRs"3=c::U8&qE3Bk7N4522.XFX-a_8A&BTV-MqW1^SOa5rC:q\>me%lnUEUg#`JHMa+Iu3uP#:.&O#=ggsUgq2ho1`OG(%W"^SARTTWNR+&lC)M$Q`-sKP).'>!(C!`PX,)AUtl)+6md<^LC%`&<0kH\Z:!VH;uD<4`a?Bq\X!.ko\0k6B4?aIal%rsfT"YPIS4"n?"LHFn=!Y?$@qsJM8hgJQ9F[&m!?Bq\X'Z@O/_1jZPJh0WIlPaMENKSCZH_qf=)`E/ta?<^&&/;eKN[POQ>c>>*@QRnu,W>Ap+0*lZ6>P/?C=#Kk>pfEI%X5o!bF40@(o?U'<]Eiua1#T-.-6qJtTfI(.G1oN.lKY+4/`*/6f$6'lUmhZt>sFRC2D);h+q6STAmNibWJLoa!_K+fl3/2UYVS=VJ+Mu+BpgT4#ns,rG4")XqHRFhp?e]lW)6msB7&pCcn]iARI''Q3&WZn4`cfmXRYX$=L#_*oT6]L\$1K[YD^6ieQ7a0Sj]d/M^g5G<=fIGJCn-7*ka$`dtNA96I:UC5Sul1`\4p@]Ls&$eZG>,T=j\`^pYF(5psRDI[ZIJUoPB8(M_b:GbMQoKNEpNmMNdcINqeCi3'cEM:qF.X06_nLu%gkBg5VlBSp+B1fTm,9!@mC$E:N.o^gaK:4o/&_:c/cq+m2[#j^;N_DJlWf4=p-!+I-6h@omP!WV_Y"`IN?:CP*<`.u@#A9nDKO*4\G2pT\?kZ'Dt?,HQ7f!)_^L]dn6@h4u8d7pi9\@8;-o?'k"lKf\AD;4odYK88Q^$"H$>rOH)/`G1DGh5_0eMt&PpsLK>qs%#:d;7WAZfKR;rBm5'b+%b!VCn!a[@b*Y1e"S\)QM"QV-!NdrV>Ws'[ojRr5.dVU]'X/_ssN#`kAmA/umL!VA/'h#6Ik/uc^1EO5Ek,(eJ=3@$n19'-D]1fkFPi7&q%$9QZs+c](m.qMB*WiUUHSFKA3ttm5!=55Q\.qs8;OBp(ii!qKaVIY^A]oF]ER6$H=l(;gJ?HbR]':Z$q1FFKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`fq"Z!rjOlg~>endstream +endobj +6 0 obj +<< +/Contents 13 0 R /MediaBox [ 0 0 612 792 ] /Parent 11 0 R /Resources << +/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject << +/FormXob.67f2b803142796cfcc78829acfff7782 5 0 R +>> +>> /Rotate 0 /Trans << + +>> + /Type /Page +>> +endobj +7 0 obj +<< +/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 80 /Length 3671 /Subtype /Image + /Type /XObject /Width 400 +>> +stream +Gb"/kH&NHV(WW/Z*isSg&d2e95Yqb:,+b_Ylbt"c&J%p\W#GlYi-cjh1$F4VW);.c'TXPH7"cs*'CeNo0aOLC"P*Zn'GO!upO2q-m[9Y1pX_cEoNV.hZ!I=.X/ihg[pN.eCb@(q63n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f<&gLu-H0\W/S)K\UiU/d1e=((AE1\ViZgoP7G`;;^96"eA^VhA0L.[BPc_E\[V_jF2]4eaSpjj$D"&Bno1d#['rMp*ip0m[:kfFCp?cGGD5F+!`f-%F.hd'N;+G=82r*?QTUB\d3]4;&O$@@]/UdP:"a?Mun%X(hIdfY%c!Gar_>X+@1HlIml`F@TKbp&ZMDsE$keh8Gd3hgr.uuh/jX)6'!qj^&c9!\hln?+ERl7S6Q>-NjMlpa9'WJ6Y0"C)9EqnU6a<@k<:/9+6qo^@Z'\I'[FS"YZTL.@L2sJd]G1f=ahJl&2l`Ks-M:S_+:'iN)f]\_,l;^8p>qsj0Lbs?q^q550MlTodIT_d4`uhTtM2WG=S:0CRJ?g#U89K(O`^f6E_&QM!\8c8?YcYWG^A,Rga7Gib>7NblcZ\TL@>T?RFh4gP,RLuW%NY0c='mP/s6[2ZP"RWEO$.%PqO$8NHF;:(g19^%:Vd53pXdDSgjf-D?!)SSY:tXbbWl,ljiaKo6&,Tk^%ZEqT*#!;VcpRKUM_Ah@JTSpQ_,km?"fI5ldt/$XrDiRJ>7IaG`llTEkqIRJqXuLseKJ0E@^B[c'G&YC.*Td\lQ;0M&l<>n.Lhop@hJHBr`p5F)Zdo(A@eK#h]C[X$;ni/uM_oq\m:G*7H0QjW]3@5ik9%GVKE;f&Uf!n]DI`Nb%2C3boPu^,\d9$\T7(8@A:HcW%-c/0@u2Bsh8UF(_H9.A]-D0Nm^Ygqir3#\O:*\f:`5QV2)9W.6-PkGX:u*s.q82:[bLF*!YSl>fWgl`93^mI>>?W-=Q[qRY)bl6lGcHZFNr),2$%FM_ALH%]g?+ZMc<[[9]ZgI@@Xj1?$uZ`fQ@E=T_=^Y)Iq>Z]n3T+,ESq+[<(gAZ#oB@"U39sQJhH7qWUVC,rjBb5Bpf01/"POnP#E1HLYF]r-FX(;KH&I":;elcp?*VMmI:t9XJ+lKojSD4*?HTLsA.bG26.GQ$>[hlK'QDo^(hmQ-cTH%506+oa4EtR1#lVq>$DdUYh/>J)/5O7A3XUoj?\?S"2`7HXiK0'jc'n-K65=5Xbo07YG+,DnJ#k)B0'A/6f!>6\]:+"l=`SG$XA)CADn0K'_K4f/f=5]6+1l9?W_X6Ot?rDnk[F$>,)cO;]%-S-9:BXpp/7pgGNTNfPU?P,hXj.lFe))E5rE0u.QQXBgC'[=5fD-+Dd7D_BiCOsStaKInr&6L*#iQ7hibR@-51:cbp\1q]mqe13@abo2#F%iXN!ou3+QMh*ChlU('RV`>Se52@/A>k8Q@LYVs5!&,>nq4C`OI,m?KN"dj,lR]4^*`J4t3RN0'e>.R)(f4&I7-<086hRGl]?\CGX4ZLuPqD?mGbY492PJuG5NhO5Rl'E?jVGU6ID)(X#%WK)7j1.B[t1a9Ju#GK#qImB754Z:n*t7Ur]Z,U]ek*9ufGe;5djX]Ln4+n3J_qX-GkTR1n9@0\q1VH9&8FY9fC.nhd]SpA>*.D5WHRi4b(7#_j-Wf>90+Nlk8XG?8Zml*$eFnI4mV<54R5pU/km!].$fd-Iknlq@5,Xe$Tq95^0d@eFKu9Kg=%YequpeAKtGp$Y/ZWqNh>cB:jR9h2CeD)rjW#<;*rq4m'&E21"B#_8-[n2C1%.R]OKZKFC,\A?;GZg/0Y7QnAmMs\'7ippJ^\Xso)'/EhB"`5cHhZ>EZWOn-3.t*;K+PqCj$o$J&L5tQQ=@Q((N`qd]u'*NP.O+%`e+d_QG%XgkgA[eF6Dhl!:9n!ZO%2OGm.G+*H6L3F-L($4-&[X?+S`,)XG+'c8f#F"j,#4M.6FdOF]l)Y>rspUrd+UDb:`D!)fGlVc8C*chooP,GPs!_V7E]#$=\U/qWTG4Pfm%09%<@9,->21E?O4?&pP2@H$`4$?gM>J.^gGA7I8>OOjhu9o]#%SpYD:!d-.[JU5C>G.uT">4k:L+m`uN(or>=//s',tG-.OlmLurLLn0%0^[#WL$M5f>V7B:m$9_l<#pr?m_rNDlnbM;@Ec#YLh)mC]GG;L?q"@ifc)?NL)=1E(%%]V"'[:(Jp]+fX=pXRL(KCof)C>7a(8CDp=I/^[HE.V#Z]?\'*R<,cj#3Qd'QlFa+LO?d-;J@`k]u&gL%D;=rZV;%XY.6PuMmCm6;Dc%f8>TCO-`\Q1J;CGJ(.F>s_rU",SE[+39$;uD4Q!l$]cFl9np^it-Z]/KiBJ2.rd$B]'[tKFhF0C=aRdt[f,Y&iJ]18[YX+V2TDbj8FlguYXiND`1&gV9j[X(r2L6iXSoZBYBjd4#Tfh\E%5A]:QeC^]5U+TaDlRI4g@n3(]?$0/`*ag3C]sYKFND8ubs#ku->#@_ZT/BZ#nhJEtc$fKAnu*.>2c7>0%$]DfZY`[~>endstream +endobj +8 0 obj +<< +/Contents 14 0 R /MediaBox [ 0 0 612 792 ] /Parent 11 0 R /Resources << +/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject << +/FormXob.7ce3f428fed09445afad362830e52447 7 0 R +>> +>> /Rotate 0 /Trans << + +>> + /Type /Page +>> +endobj +9 0 obj +<< +/PageMode /UseNone /Pages 11 0 R /Type /Catalog +>> +endobj +10 0 obj +<< +/Author (anonymous) /CreationDate (D:20260126185515+01'00') /Creator (ReportLab PDF Library - www.reportlab.com) /Keywords () /ModDate (D:20260126185515+01'00') /Producer (ReportLab PDF Library - www.reportlab.com) + /Subject (unspecified) /Title (untitled) /Trapped /False +>> +endobj +11 0 obj +<< +/Count 3 /Kids [ 4 0 R 6 0 R 8 0 R ] /Type /Pages +>> +endobj +12 0 obj +<< +/Filter [ /ASCII85Decode /FlateDecode ] /Length 302 +>> +stream +Gas2D0i,\@'SL]1MAmuWBq2]41U`B=+JA;@)ETUJW/a5S$%>'U)FQlACq\foqfM9dic+ZLN/#%%PKVtVn!b+n6KWeM,?U:f@u6(=k$)>9=A;GQ#t3m&eV#g&$:bL-jnalu?/Fi#S%7?Zn?-:G9#d\O:D4D7XQ`j*RVq8@Qm.FMjt9rX$+6iZ0lp3O3um&1LmEd6.tN*K;n6j'~>endstream +endobj +13 0 obj +<< +/Filter [ /ASCII85Decode /FlateDecode ] /Length 281 +>> +stream +Gas2DYti1j&;GBn`BOD:C$`F9ZVnjI&kF'G*Fh#tN?'!3]KRM,ciKk&h=0aE:V*="Q_Ne&,OhA1lR406EhL)):sXAaA0[ug"g)PsmBSG*k#J$))")C&+kr+KmILendstream +endobj +14 0 obj +<< +/Filter [ /ASCII85Decode /FlateDecode ] /Length 249 +>> +stream +Gas2B8INBh&;BTK(%8)Q2+a"?*aMq\4K.?![FW8"e$p+akF8n$UkfH'`dI4a"80L!>ZbC60Zk4LLGE6]&5Z#qMYu/6Ns)3ldF]OCoN(cR,K(-$>Bb@Hb$Fm@B;e+Uh$?f>L6HTg25p.\@EBp=GIr"0+>.bL"Ab!5e$0H>2u,XrGS3n+\I^LXNi]kl12d&'Y,la0?'!jr\BDiS++DQrec,bZT6(6I/"hnM&*R'u?RM762ns?o2j@QC[f~>endstream +endobj +xref +0 15 +0000000000 65535 f +0000000073 00000 n +0000000104 00000 n +0000000211 00000 n +0000003775 00000 n +0000004033 00000 n +0000007969 00000 n +0000008227 00000 n +0000012088 00000 n +0000012346 00000 n +0000012415 00000 n +0000012712 00000 n +0000012784 00000 n +0000013177 00000 n +0000013549 00000 n +trailer +<< +/ID +[<8efaabb9b9953607755769fba673a5bf><8efaabb9b9953607755769fba673a5bf>] +% ReportLab generated PDF document -- digest (http://www.reportlab.com) + +/Info 10 0 R +/Root 9 0 R +/Size 15 +>> +startxref +13889 +%%EOF diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pdf_multiple_images.pdf b/packages/markitdown-ocr/tests/ocr_test_data/pdf_multiple_images.pdf new file mode 100644 index 0000000..8a5e474 --- /dev/null +++ b/packages/markitdown-ocr/tests/ocr_test_data/pdf_multiple_images.pdf @@ -0,0 +1,88 @@ +%PDF-1.3 +% ReportLab Generated PDF document http://www.reportlab.com +1 0 obj +<< +/F1 2 0 R +>> +endobj +2 0 obj +<< +/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font +>> +endobj +3 0 obj +<< +/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 80 /Length 4030 /Subtype /Image + /Type /XObject /Width 400 +>> +stream +Gb"/jGApR4(<3O%]`e\8(`3O5&gh'9UL4Y/2^+lN@RPf5J1sk+NQ;)EK[=iEKil-Qc;6BQ-:LGDM9+$J',f.nAHF$>+;)0QS%B_Sc9&LGT7Z-dn+T=!BA[iT=_mDCmsXW7DDr_l&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j4;C$M\fk4SCf9@^_*csp?0E:tA:NU]#jiWj*eoUTRh)0!!`5]fOL5)!H>oFG9DVR2p+lUH`IrqKY)BXD"&`V6FB2;HMc(7';GJ3Ri.fjIH*^/fZ0Or*2Il>Q?21r^]?[Q:^FZY!?_$=YY:S0iN>)NU7,O;iAF)l:J:7R.&b*4>RZXupbn&1%rQH_7Vb5!5MER63=/j;H?_mC;n]Y$@HQVU)1)MO]*WmC]s?Bm'Eo$F't5&H1H?C?\PZOT*=k"OUBF^Z7('_KU*cW%)S*WMHX>B\oeYSG,9p^:L_3O7EcSHah6r%e]`O=YOgf8dp9lDfH=\S3c8r1HgU==+3,mg#Rl>?_pYUI1'87(cTWVY:DUqL#0^"?4&$]I&kNC0`5JLC0C)C?hEoh,WmelnPO$)t=.f&emDnSAh"(gjeWnZaC>tjK_q=kWnKp@"uqSN>jhKjj-*a*6R/cmlaT]DPqQi#i_sfOu^V];l9H0E/'"u"N,!G@S%i^qIJ7$3^\6AsEbQnB1qh&UbjUVt*h^pT^rHWE2?D;2KcRo]!jbUdu=J*Y^iO7NMpK&+/M?E#p8P[:&KcCI&WT>lj0hn!r'Ddu;@XC[FI)\EY_V0_d]7q%$Vah7c\%+%Y5*O#<]LtTjQE1fFkLQENDq6=GM6pd`rMIpbfSH&W.T3_Ol$&_r2rYl%'!QJ.^n(hm(#X5AF,*LSQi,0+l+X_QCd-sXH3[JDTElP16rE1kDGP]cKR_8u#V]Y"6aMOg*%"icN@-gSBDCX=S#a-tPZm-JQnqsR%UpnUK?#%8%U]B4C%d;CZj!6'e<=uH5kuE5#MQ?sdq?&m6=[kl8Tc=t$9jhO62tP_K@\QrVip+eZoCIBVVI.)e.%EMOIc+6PiK4ajcW;'k,f-)WZWXVHl1GC1G[.CW]@LAEm#oop\)2X5*2\@n4)j*X1-$m:bbYF=S8$HLkmmrTSX5auF,e"0nU1VTA)3IC$D8-]"Dr>:d49"#,PSWbhql]_\g6^db0"bZp1dtL,AY,Hrd`/-m-ruOhB-14LQ;od4K*.0TNLGYVbWfTAdFC7kkt8JqJt7[c^Ql>:b?jjnDUs$l_[IMNbHS?"ibH+Q6-&jp=Nm3cV`0>dPSYcp:A>VG$`7Io@6oL.1Yru`9tj;1MbRC8OuD!T$h`FdRA1Y^%4"c]QM>g?3P;LgT"R'VH'Wq6-8?N<>b"^YjQ;nmbG@*@ga"'Fml$&HKSjO@C'C@dp'!_Ffa>t?@e@l=1ULa\_XlA]C"jJ[EOb[EULM]CT:sc(6S,=-4)`A+!]_LKEDDRhbe129S\o$XGLlIB_@GSM;0aiBoeY#3\$rnT",rr#-L8WThrVH3Wd>f5/m!I81VBY=an%\pL-#\@;=L#`iU,5`YFD7-fMIo%:V-`E0oiVNt"U>:mo'NpD2RN%p)fKE=Wh?#X9URXg?^k0Ij3g**Ork^G>.)NP0^Zo`HhZs,@E=NRrXe7Rl\9C@L/t\;Z<&1+XN&%j.rRWDZ,P`5RWN'o-KfFuX0F4HJ=HdaGcm]mfpk]J#M4P%(H_.XIrZ=?Cg4eurF6+-eE^<^5E+044/<]H!b,^3/b-4Pk'SYL'H2BJX;H*1(;>.@2s+l4^Ld[GX<"m,,S8jnVW1p25U-)leT"(Rd*85eRMpFgl;HQ5B?dNZ>%sGi@`*P8u]+OF+6o8`@u[s,9A[)598-5To.NRHgri>q[;4C6)9l.DYr;9tI4q7F2Vk[EZD9m8%k8qS#MUgZFAT/+X&c>ZakEt-!tIC@gq7p=,@:&"fuR?9TKhl$^"^,@C[&CBoS#=R9q$`uOH:#JSJ9P:&Gm=.MEs`m+]EUC$a"&A7V[4&1(O+/U5tc%5l8bfgduMA7WcZLS/+L8KGT=PZX]or?B?!uj1:N/iq<5oqK2W)9>[j2YeFBE.YV?a;6JT9Q4NVaHp2:S]C*Y[u"D`JYPMk(OUXco6M[L(>@Y58HVo85]"55IG+o_]R.!G(9muq-BFa8ETq]Ipg#U)AK2f>//o4/S?>UdM"B);/a.)]9Kd\TSI[X3Z=U2f//"o0>`-h5:!aHeD^"pG1@4>4BtrUnbQ\p&f=niu3sje\cKZtRhg)ldr?b(YV+-RL1_Dd!GjKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U/qprr=Mu.$a~>endstream +endobj +4 0 obj +<< +/BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /ASCII85Decode /FlateDecode ] /Height 80 /Length 4649 /Subtype /Image + /Type /XObject /Width 400 +>> +stream +Gb"/jGB=Qg)obtD(hlkN"f0(5:+-aT<2CgqGS',YK-J92CmL:6LTAL_NG:0`'1[i!JbWE/;*Y0EI&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j4Eeer2aX7P8Gl-m;mrV!9$er./Dr&!ITgFGgA]g5nB?j\gC<`5,n"5+/EDb62SNDgA(+`SG0AEAddU%7a,tE0=*^(Ec[;Xqd5T\EOW]FtK0RnAjQS4C.$PtF;o^lOY2JjA(q!>?5rnjhqh;qnpMFKAOW-;:r<^AH9g>e+lD6ps06kbI-F(^N'_eACt5OcbEjk(IV6iiD9J4s/kRp`OHAta@uhcDp$1sUc^m;JVdmc-P%b#odXg*ftJs\9L;@@2JoU%\UmIb7]_:X"BnAg8PVo7"#l=q;QoL`\om+g+\Q7Q1-1hhO`F6CiVjuIe^m/?\904&j@l.#kH4Fk1D;,Pn,s$FCkgKq>QM?oqqM4!SS5Q*fUYEeuW#*!X5lBcX+"oeKmkS@.(h*0mT1nrUS,bYIsE5U03`ScpJ^IB)[\rqY`>H2da;>5Fp[LW%H??G5X>9I8jeZbo1@@Rl5+ugph@\]QnSZaRRJ1c_/-=odWtJ379>AFMnp+G4!`KBR8dD=K*Pa.$qbpOj^:tQl,H?<^#prXH0"q[r1$MSsu`#-;Bq^d`.=is1np^^aDK96L*.(Hg9*0T*>V9Q`\n^`L]5>i\sQ)AOEM\?DBt!8#7Y0SiiDsB23^-W`?+JZ!P-=ienX,kdX6M.]EUBN#=ETZtP"4E0#kk.nX&MZXupQJK6dnON]"CP^nh:H5s_#io8rs2G=?r4"L]CJn3h!L8UnMn3&`J&s32P.9WsPPW!4%+F!3AJ[djYeuWV-Z>A4"8En[*=5Y$_-RU/bi:`*I1]ICNmogd^&>KHo<_nI&aUW0Z4F*r.YHH(N0/])H@AVt,Qj$t>c%"]+(Grh2@hqR\Kp>)Z"qC's<2ica_Zur-Q2/AL<_pP5LNT<5uc'*A?hCXJ.kFHh"?b\4MT7?jNW:Z<';^CI[++A`7OE8^;3KaGt`tU2'.D<$$(8lJ$qXeKD*.IYNhqsqO(qjtQ7I&aVcqsCYCu`NpZ:0D]`fV90Y<_!ZI:XrsMu4GpR/S`Z+]D^@_i>hXTXHQg*\=T.d1+*ZF@"?A5D+E8\PZb2+R`274i=_&+Z-TjqgJjbPoZE^@ah=VW647kCR58IohRMC(*CR(BT4n*g4pe*Q*FX(Ze.='Up?^1I6=]+CR-!_%#,)!P&(,gr)C9gt'd@U<[_Mh7[.BXW9QFj@HU`+`>?^il-h`Cm[n-7Qu"^R/Nq)/q=rVY'\h65$\b8K9Z?3pKR74:;W&Vrb0'RUnf8X,*n!Vo@(0TeZW?;S5&\%q=Edol##.]0ta!D>-S?B@9uVpR#`p(A3D=o2%[df2_8e$4'f1)NRBTH'-m?jdWZp&qR5i[E?3(7'GO%!UQIuh95N^kDD#a!lS/(+69W4/mZ%*VQC%uHIj\7D6iFIm5:M9YL]ma?a!d!FZRM2L8hJQ&\WdViAY@$CLtG[U/u!RSi'E`c5,!URlB=%P(1u[;9l7Q6M'9A^A>u+KnAe0>frXsk/mMom5)EOm'BeFBCOfX;l@XR`#.>7PVnNg$Ar0C2iBc2!hXl2M:?(bVG3Z?oZE^@ah:&r%'`jC6A5c$G<'9m%\d#1i<%XtiOYApFZ#ngUWnJtG/]%:$X.K=GW,cd(,X]VC#UfU)`BPA]i2*s(/L,')$%Fg"HB/6o9V+;QA(poeWD(H%'PK'i*'^M$-GEjj5ZsajEKE@7)e`B&V9IbT7,k5*08(&Z-#ERJlX!nJ6-,qOtU0+)>0FG+$Y3Z0!U5:(0dnE2>ldh:HuO;nY0Pm-Nb)*J,HR6XB5,?i``NJa]m"YMA3m7nYoRq4LCiWU8$(:YI&.rTieR/pt(60)sl<&(qm64bGjc,Wid`T\j#uS,;!3t#ZWX@Qb]H6m+0:1;qsHNtRn/+a^TqR>@.Xea;s?o];:@&[UK4L#Bgp,F,RsE=He%6J^-n8]*f2^ig*%WUU9]mE:;]ttcMZNm[=\K]o`.A)aeHbb.4k%p-knF1D'??PP_$'uAW.YfGi'1*4F,A`4oAd^jGb*;,,KJMg7lSZNi\i-]QnS9fD@keR"@(a"19:*>e>/RJdS>U2U)kn??q^kNZ$^Bf?AOeYG1M#F69N)YKHC81t4$<='G]b*BVjAL@1)g&>WXcmq%"$FN(@d[u(<&)jh7^9q42j;/'(@*qZ9#1_hV7kg;bg1;[eiWMc>NH^cj+,)L?e-tC8Ul?"$BW,("fP"k2kTgOS\'#P]UR$afb6UO5'fV1fm!:>pa-mFu0fN>Vk8V,EUiAp`*k=7lnd6j#G?@gXj^]4:[lhRS7^\h!<=Ohc'UIUA;HcM*b-SH\Up<+TaZX2JrHeBK!>nJAaY?kLL/gnU,#h^q""dFhRG&6H/a5UeX7Zp6gFeM-UOA/J@Q7cIs89qr*N`gtc.gV7W@Pgu35H&1[il-gG6ps9s12"o1k*p:dX^3kuci>4)9#`+:[3-;KGd.!54*Cm-Y<5R+fnq"UN/ejI8lFT$382/`"$_QOh#2/DLjqoXT1J5gUR^:Q73Z/%*Q8G$9Be]Pl[k/.(FEb(9d)@N&6Z]ZhSo`8H7)_IDWLQ('dTVL5SII6X+!=b>6UY]Ahta^E[MKH*pf9IX>_4DG/tD:u3@Q=+'LrO%cbH8T["^qG*h2K%:eK3(85o6G^0e=,qT0_iZI$F6Ci^qWb,K`6fP]$4[.MF'Y6B(&,:Gh,TCP2$t[aqq^Lo&44Hf_5)pDh0RQRF/\'r*qi?.M@`+%+QqN0=0@MG=&Q81PV9AI^:8FXigm1m+bV6r>dtn(*3_sE%hF_WLlbEq6:+#iY$HCPCI\XR.7d!#(c,btV+R!a:h@tE4Z#"&=0Gs$*@i:d&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+bUCn#U+j463n`f&4-XGKFgHU+lr@e*ukCHmf~>endstream +endobj +5 0 obj +<< +/Contents 9 0 R /MediaBox [ 0 0 612 792 ] /Parent 8 0 R /Resources << +/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject << +/FormXob.41b05a9cf8679f0fe6e7c30c9462b767 3 0 R /FormXob.94284ebb61fac7951963d5746d1b193a 4 0 R +>> +>> /Rotate 0 /Trans << + +>> + /Type /Page +>> +endobj +6 0 obj +<< +/PageMode /UseNone /Pages 8 0 R /Type /Catalog +>> +endobj +7 0 obj +<< +/Author (anonymous) /CreationDate (D:20260126172022+01'00') /Creator (ReportLab PDF Library - www.reportlab.com) /Keywords () /ModDate (D:20260126172022+01'00') /Producer (ReportLab PDF Library - www.reportlab.com) + /Subject (unspecified) /Title (untitled) /Trapped /False +>> +endobj +8 0 obj +<< +/Count 1 /Kids [ 5 0 R ] /Type /Pages +>> +endobj +9 0 obj +<< +/Filter [ /ASCII85Decode /FlateDecode ] /Length 300 +>> +stream +Gas3-b=]`-&-h'@TAj4XMk6`hV:j"r+Qu/5_JPH2[*jl@3?0-u[(9'GR:-/*/ft>A_=nj;i7d2;EpsDoJrDgn0>N^CGqOd65m'$2XdN[8"CN"4'o-s=`lHc!JpSi8$*d@]6l&@V%Q+V`W6/nPEL_rB?OF1iZbk.;Ju<];RLo@-9lO$dQ,9&`I`%EM@\dBr0Lf$$+R^&+/ncK?;0=7o:`];ceF"uKA7ETdrT"0YNT=QC"`>/@%I83@M@]K&@Nk~>endstream +endobj +xref +0 10 +0000000000 65535 f +0000000073 00000 n +0000000104 00000 n +0000000211 00000 n +0000004431 00000 n +0000009270 00000 n +0000009574 00000 n +0000009642 00000 n +0000009938 00000 n +0000009997 00000 n +trailer +<< +/ID +[<60f7c7338a7d1cfd54f86e6a06e41602><60f7c7338a7d1cfd54f86e6a06e41602>] +% ReportLab generated PDF document -- digest (http://www.reportlab.com) + +/Info 7 0 R +/Root 6 0 R +/Size 10 +>> +startxref +10387 +%%EOF diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_invoice.pdf b/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_invoice.pdf new file mode 100644 index 0000000..5e1caac --- /dev/null +++ b/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_invoice.pdf @@ -0,0 +1,354 @@ +%PDF-1.7 +%µ¶ + +1 0 obj +<> +endobj + +2 0 obj +<> +endobj + +3 0 obj +<>>> +endobj + +4 0 obj +<> +endobj + +5 0 obj +<>/Width 800/Height 1100/BitsPerComponent 8/ColorSpace 7 0 R/Length 2640000>> +stream +''' {{{ ccc[[[PPP㛛TTT/// JJJ{{{܋PPP$$$%%%TTT+++]]]BBB{{{ ccc000555UUU888{{{FFFfff+++~~~xxx{{{ iiiccc!!!VVV{{{aaa+++HHH\\\{{{ ccc|||666 + + +VVV<<>>333^^^ {{{ttt+++ gggCCC{{{ OOOrrr444cccZZZ+++gggfff{{{\\\ccc@@@+++ gggCCC{{{ OOO%%%ccc  + + + +++{{{ +++ wwwccc{{{ OOOcccddd&&&kkkhhh'''{{{bbb888+++ eeeuuu{{{ OOOmmm111cccOOOTTT {{{ + + ++++ ...???{{{ OOO!!!ccc)))zzzDDD{{{ddd111///+++  {{{ OOOccc...XXXWWWyyy{{{"""  + + ++++ QQQ;;;{{{ OOOhhhccc999}}}VVV???QQQ !!!{{{___ttt'''+++qqq{{{ OOOcccNNNNNN{{{000+++wwwTTTggg{{{ OOOcccCCCVVV999{{{ccc+++www222{{{ OOOcccccc擓VVV''' KKK{{{lll=== &&&VVV+++www苋DDD + + +###XXXsss苋DDD + + +###XXXsssjjjyyy222sssyyy222sssxxx???;;;:::YYY鿿222;;;:::YYYsssKKK???JJJ{{{&&&CCCJJJ{{{&&&sssKKK???iiiyyyCCCiiiyyysssKKK???,,, OOOXXX888aaafff??? 111UUU + + +***sssTTT SSSggg LLLnnnCCCaaa!!!666bbb&&& 999hhh!!!JJJ,,, OOOXXX888 777UUU + + +***sssggg LLL>>> ;;;sssoooOOO///qqq 777aaa!!!666OOO///qqq^^^^^^ bbbrrr 666ggg ~~~ ___CCC}}}www777fff + + +^^^^^^fff  ~~~FFFwwwsssoooeeefff}}}eee```QQQ蔔oooccc,,, 򤤤 GGGߜ^^^hhh(((CCCJJJ죣HHH{{{RRRﲲ```QQQ蔔VVV 򤤤 TTTttt===sssoooqqq쯯KKK???VVVJJJ죣qqq쯯wwwdddNNN&&&___ ___}}} + + +CCC\\\ gggRRRyyywwwdddNNN  ___ + + +999sssoooYYY===444KKK??? \\\ gggYYY===444hhh___ ttt[[[```WWWppp ...hhh___eeeVVVCCC~~~xxx 444]]]ccchhh___ tttnnnWWWppp ___eeehhhsssoooPPP'''KKK???nnn~~~xxx PPP'''kkkTTTvvv{{{ddd)))]]]xxx + + +XXX???CCC>>>999KKK---444VVVkkkTTT)))]]]&&&SSSsssooo'''KKK???>>>999'''|||www{{{şnnn///ccc///[[[ + + +CCC///RRR777SSS|||///[[[ + + +vvv===sssooo777]]]KKK???777]]]{{{www{{{CCC ccc///[[[555MMMCCC \\\777SSS{{{///[[[KKKsssooo KKK??? TTT[[[CCC|||www{{{777+++SSSccc///[[[IIIVVVCCCaaa'''RRR777SSSTTT[[[CCC|||///[[[ uuuCCCsssoooЊHHH + + +KKK???aaaЊHHH + + +>>> ,,, SSSwww{{{mmmiii ccc///[[[CCC444333DDD000CCC'''777SSS>>> ,,, SSS///[[[)))NNN!!!sssooo888KKK???444333888gggsss]]] qqqwww{{{tttiii:::aaa///[[[OOOXXXCCCrrr777SSSgggsss]]] qqqtttiii///[[[qqq sssoooJJJ!!!PPPLLL???rrrJJJ!!!PPPGGG|||,,,cccCCCwww{{{ ttt;;;KKK[[[///[[[]]]mmmCCCAAAjjj+++^^^xxx777SSSGGG|||,,,cccCCC ttt///[[[jjjLLLsssSSSJJJooo)))***+++BBBQQQ:::AAAjjj+++)))***+++BBB)))444LLL^^^QQQ蕕www{{{LLLrrr稨222OOO///[[[CCCttt888XXXOOOyyyVVVkkkwww777SSS)))444LLL^^^QQQ蕕///[[[rrr^^^LLLsssMMMrrrDDD ooo<<>>```QQQ蔔HHH{{{RRRJJJ죣###SSS'''___ooooooccc,,, 񤤤 JJJ죣VVV###SSS'''___PPP}}} MMMFFF222 MMM<<<iii {{{FFF222FFF222 + + +dddNNNRRRyyy\\\ ggg###SSS~~~sss___ooo&&&___```\\\ ggg ###SSS~~~sss___lllUUU222BBBUUU%%% {{{222BBB222BBB___eee ~~~ ___ ttt444~~~xxx ###SSS///___ooo[[[```bbbkkk ~~~xxx nnn###SSS///___666FFF|||ccc***KKK|||ccc {{{***KKK***KKK)))]]]```%%%777XXXTTTKKK--->>>999###SSS%%%___ooovvv{{{>>>999###SSS%%%___222===}}}:::---LLL}}}:::'''}}} {{{---LLL---LLL///[[[|||///RRR###SSSppp___ooowww{{{###SSSppp___xxxfff999999CCCfff999hhh333 {{{999CCC999CCC///[[[uuu{{{\\\ ###SSS222___ooowww{{{ ###SSS222___ sssEEEJJJHHH444EEEJJJ {{{HHH444HHH444///[[[{{{+++qqq|||'''RRRaaa###SSS$$$___ooowww{{{aaa###SSS$$$___]]]***+++gggnnnxxx+++ggg[[[nnnxxxnnnxxx///[[[ SSSDDD000CCC'''444333###SSSnnn___ooowww{{{444333###SSSnnn___[[[<<<GGG555[[[GGG555GGG555///[[[777{{{}}}---]]] qqqrrr###SSS333(((ooowww{{{xxx```rrr###SSS333(((%%%|||   {{{  ///[[[...---cccCCC^^^xxxAAAjjj+++###SSSSSSJJJooowww{{{uuuAAAjjj+++###SSS***000FFFnnn000 {{{FFFnnnFFFnnn///[[[^^^QQQ蕕yyyVVVkkkwwwttt888XXXOOO###SSSMMMrrrDDD ooowww{{{ ttt888XXXOOO###SSS555iiiUUU+++iii|||UUU {{{+++iii|||+++iii|||///[[[QQQGGGYYY iii\\\mmmHHH###SSS555mmm[[[ooowww{{{"""mmmHHH###SSS555NNN\\\ {{{\\\\\\///[[[UUU<<<VVVEEE___!!!***[[[###SSSvvv'''FFF[[[ooowww{{{JJJ ,,,___!!!***[[[###SSS111bbb??? {{{bbb???bbb???''' + + +DDDjjj++++++qqq{{{RRR ...wwwaaaEEERRR ...www---{{{'''xxx???++++++==={{{RRRXXXRRRppp{{{'''###HHHKKK???++++++ mmmeee]]]졡 333hhh|||eee]]]졡 {{{'''/// + + +KKK???++++++hhh YYYUUUnnnhhh YYY{{{'''///eeeKKK???++++++444<<<ccc cccUUUTTT {{{'''///888TTT SSSaaa!!!666++++++TTT SSSggg LLLoooTTT SSS 777pppnnn ooo333!!!&&&III333!!!&&& {{{'''///666ggg}}}++++++666ggg ~~~ooo666gggfff ___777<<<fff>>>&&&qqq``` gggxxx```  {{{'''///wwwGGGߜKKK???JJJ죣++++++GGGߜoooGGGߜVVV^^^hhh(((  MMMFFF222 MMM<<<iii {{{'''///```}}}KKK???\\\ ggg++++++}}} + + +ooo}}} rrr⋋ UUU222BBBUUU%%% {{{'''///LLL...hhhKKK???~~~xxx ++++++...hhh___eeeooo...hhhnnnVVV---@@@|||ccc***KKK|||ccc {{{'''///MMMdddKKK???>>>999++++++ddd)))]]]ooodddxxx + + +XXX???eee}}}:::---LLL}}}:::'''}}} {{{'''///]]]şnnn///cccKKK???++++++şnnn///ccc///[[[oooşnnn///ccc + + +fff999999CCCfff999hhh333 {{{'''///~~~CCC cccKKK??? ++++++CCC ccc///[[[oooCCC ccc555MMMEEEJJJHHH444EEEJJJ {{{'''///777+++SSScccKKK???aaa+++---777+++SSSccc///[[[ooo777+++SSScccIIIVVV+++gggnnnxxx+++ggg[[['''///''' mmmiii cccKKK???444333<<< 777mmmiii ccc///[[[ooommmiii ccc<<<GGG555[[['''///jjj:::aaaLLL???rrrRRR:::aaa///[[[ooo:::aaaOOOXXXsssGGG  {{{'''/// + + +;;;KKK[[[QQQ:::AAAjjj+++;;;;;;KKK[[[///[[[SSSJJJooo;;;KKK[[[]]]mmmDDD000FFFnnn000 {{{'''###===LLLrrr稨222OOOeeettt888XXXOOOnnn>>>vvvLLLrrr稨222OOO///[[[MMMrrrDDD oooLLLrrr稨222OOOkkkHHHzzz444{{{UUU+++iii|||UUU {{{'''```XXX***mmmHHH111 + + +```XXX***///[[[mmm[[[ooo```XXX***iii''']]]999{{{\\\ {{{''' '''VVVjjj### &&&fff%%%___!!!***[[[222 ###iiijjj### &&&fff///[[[vvv'''FFF[[[ooojjj### &&&fffPPPCCC{{{bbb??? {{{sssAAA QQQɰ///%%%MMM555 %%%``` KKK KKKsssKKK PPP ccc KKK KKKsss 888!!! KKK KKK鿿222UUUvvvooo KKK  KKK KKKCCCjjj___888 KKK KKK KKKCCCVVV))) KKK KKK KKKCCC>>> + + +___ KKKCCC,,, KKK KKKCCCaaa,,, + + + KKKRRR KKK KKKCCC||| ''' KKK KKKCCC{{{ iii KKK KKKCCCiii 888<<</// KKK KKKCCC``` KKKqqq KKK KKKCCCqqq KKK### KKK KKKCCC KKK___ KKK KKKCCCddd<<< KKKbbb KKK KKKCCC??? + + +\\\ KKK''' KKK KKKCCCMMM/// KKK[[[ KKK KKKCCCkkkZZZ@@@ 888QQQ 888 888CCCYYYЃjjj  kkk kkkCCC DDD kkk kkkCCCIII RRRuuuHHH苋DDD + + +###XXXjjjyyy222xxx???%%%vvv;;;:::YYYKKK???NNNMMMuuuJJJ{{{&&&KKK???iiiyyyKKK???&&&;;;bbb&&& 999aaafff??? 111aaa!!!666,,, OOOXXX888 777UUU + + +***sssXXX888 777TTT SSSXXX888ggg LLL'''===www777 bbbrrr}}}^^^^^^fff ^^^fff666ggg^^^ ~~~iiiHHH{{{RRRoooccc,,,JJJ죣```QQQ蔔VVV 򤤤 ```QQQ蔔VVVGGGߜKKK???```QQQ蔔aaa=== RRRyyy&&&___\\\ gggwwwdddNNN  ___dddNNN }}}KKK???dddNNN + + + VVV444[[[```~~~xxx hhh___ tttnnnWWWppp ___ tttnnn...hhhKKK???___ ttt___eee ...KKK---vvv{{{>>>999kkkTTTTTTdddKKK???TTT)))]]]999XXX///RRRwww{{{||||||şnnn///cccKKK???|||///[[[\\\\\\www{{{ {{{{{{CCC cccKKK???{{{///[[[ttt'''RRRwww{{{aaaTTT[[[CCC||||||777+++SSScccKKK???|||///[[[bbbDDD000CCC'''www{{{444333>>> ,,, SSS SSSmmmiii cccKKK??? SSS///[[[~~~"""www{{{rrrgggsss]]] qqqtttiii]]] qqq:::aaaLLL???]]] qqq///[[[LLLEEE^^^xxxwww{{{AAAjjj+++GGG|||,,,cccCCC tttcccCCC;;;KKK[[[QQQ:::cccCCC///[[[BBBOOOyyyVVVkkkwwwwww{{{ttt888XXXOOO)))444LLL^^^QQQ蕕^^^QQQ蕕LLLrrr稨222OOOeee^^^QQQ蕕///[[[iii\\\www{{{mmmHHHRRR<<>>[[[~~~ TTT SSSggg LLLYYY  777aaa!!!666aaa!!!666777``` III[[[eeexxx666ggg ~~~777fff}}}}}} MMM欬999[[[---jjjѧGGGߜtttKKK???VVVJJJ죣JJJ죣KKK???UUU;;;[[[***}}} + + +lllHHHKKK??? \\\ ggg\\\ gggKKK???|||ccc444[[[\\\...hhh___eee+++```KKK???nnn~~~xxx ~~~xxx KKK???}}}:::랞[[[222iii ddd)))]]]҇FFF KKK???>>>999>>>999KKK???fff999 222[[[666şnnn///ccc///[[[Ɇ///KKK???KKK???EEEJJJrrr[[[SSSCCC ccc///[[[!!!KKK??? KKK???+++ggg[[[888hhh 777+++SSSccc///[[[sss)))mmmKKK???aaaaaaKKK???ppp&&&CCC[[[BBBmmmiii ccc///[[[vvvKKK???444333444333KKK???gggXXX[[[KKK:::aaa///[[[eeeeeeLLL???rrrrrrLLL???000___```[[[>>>]]];;;KKK[[[///[[[666sssfffQQQ:::AAAjjj+++AAAjjj+++QQQ:::UUU\\\RRRᄄ[[[MMMLLLrrr稨222OOO///[[[lll999$$$eeettt888XXXOOOttt888XXXOOOeeeTTT,,,[[[```XXX***///[[[>>>BBBmmmHHHmmmHHHLLLBBB[[[CCC + + +jjj### &&&fff///[[[mmm000(((ccc%%%___!!!***[[[___!!!***[[[%%%###___eeeaaa###___eeeaaaaaaEEEaaaEEEaaaEEE###***___ppp###***___pppXXXXXXXXX###www___+++999###www___+++999 333hhh||| 333hhh||| 333hhh||| ###___TTT444###___TTT444UUUnnnUUUnnnUUUnnn###''')))___ + + +###''')))___ + + +444<<<   444<<<###SSS|||uuu___aaa!!!666888>>>}}}www'''KKK999XXX888 777999###SSS|||uuu___KKK999 III333III333III333 ###SSS...___}}}xxx>>>///tttDDDUUU^^^fff999###SSS...___DDDUUU777gggxxxgggxxxgggxxx777###SSS'''___JJJ죣666 + + +$$$```QQQ蔔VVV888###SSS'''___ + + +$$$FFF222FFF222FFF222###SSS~~~sss___\\\ ggguuu )))999dddNNN 888###SSS~~~sss___)))999222BBB222BBB222BBB###SSS///___~~~xxx hhh222<<<<<<XXX\\\555___ tttnnn777###SSS///___555***KKK***KKK***KKK###SSS%%%___>>>999ppp}}}pppDDDTTT###SSS%%%___---LLL---LLL---LLL###SSSppp___ + + +... + + +|||999|||###SSSppp___|||999999CCC999CCC999CCC###SSS222___ LLLiiiDDD{{{000...###SSS222___HHH444HHH444HHH444###SSS$$$___aaa&&&AAA...999|||111QQQ###SSS$$$___nnnxxxnnnxxxnnnxxx###SSSnnn___444333iiirrrFFF SSSjjj###SSSnnn___GGG555GGG555GGG555###SSS333(((rrr000''' ,,,]]] qqq###SSS333(((   ###SSSAAAjjj+++|||555 + + +yyycccCCC```###SSSFFFnnnFFFnnnFFFnnn###SSSttt888XXXOOOEEE===^^^QQQ蕕KKK{{{###SSS+++iii|||+++iii|||+++iii|||###SSS555mmmHHHYYY + + + {{{###SSS555\\\\\\\\\###SSS___!!!***[[[aaaaaaUUU<<<nnn{{{###SSSbbb???bbb???bbb??? QQQMMM''' + + +DDD777KKK[[[$$$ EEE苋DDD + + +###XXX333 &&&XXX'''---gggsssKKK PPP###___'''777KKK///yyy222333kkk'''kkksss###***___'''###HHH777QQQfff ;;;:::YYY333###ᙙ'''###~~~鿿222UUUvvvooo###www___'''/// + + +777tttJJJ{{{&&&333###%%%'''/// + + +CCCjjj___888###___'''///eee777(((MMMiiiyyy333###XXX'''///kkkCCCVVV)))###''')))___'''///888777YYY ,,, OOO333###sss'''///CCC>>> + + +___###SSS|||uuu___'''///777777^^^333###III'''///CCCaaa,,, + + +###SSS...___'''///www777ttt333###555'''///~~~CCC|||###SSS'''___'''///```777lllHHHwww333###҅ '''///CCC{{{###SSS~~~sss___'''///LLL777 +++```hhh333'''/// CCCiii###SSS///___'''///MMM777 ҇FFF kkk333\\\'''OOOCCC```###SSS%%%___'''///]]]777Ɇ///333###aaaWWW'''777CCCqqq###SSSppp___'''///~~~777!!!333###:::'''###CCC###SSS222___'''///777sss)))mmmTTT[[[CCC333###VVVYYY'''///CCCddd<<< ###SSS$$$___'''///''' 777vvv>>> ,,,333###'''///CCC??? + + +\\\###SSSnnn___'''///jjj777eeeeeegggsss333###'''///CCCMMM///###SSS333((('''/// + + +777666sssfffGGG|||,,,333###SSS[[['''///CCCkkkZZZ@@@###SSS'''###===777lll999$$$)))444LLL333###'''///CCCYYYЃjjj###SSS'''777>>>BBBRRR<<<333###!!!'''///CCC###SSS555''' '''VVV777mmm000(((cccooo444 + + +$$$]]]333###999uuu'''///CCCIII RRR###SSS[[[$$$ EEEPPPrrrjjj''' + + +DDDjjj[[[$$$ EEE/// xxx???'''xxx???///QQQfff KKK???'''###HHHKKK???QQQfff tttsssKKK???'''/// + + +KKK???ttt(((MMMoooKKK???'''///eeeKKK???(((MMMYYY XXX888CCCGGG888>>>}}}www'''TTT SSS 777aaa!!!666'''///888aaa!!!666aaa!!!666XXX888UUU + + +***sssaaafff??? 111aaa!!!666ggg LLLYYY aaa!!!666 777___bbb&&& 999aaa!!!666OOO///qqq777^^^CCCGGGDŽxxx>>>///ttt666gggfff}}}'''///}}}}}}^^^  bbbrrr}}} ~~~777}}}fffwww777}}}eeettt```QQQ蔔oooKKK???666GGGߜVVVJJJ죣'''///wwwJJJ죣FFF///CCC>>>JJJ죣```QQQ蔔 򤤤 oooccc,,,JJJ죣KKK???tttJJJ죣VVVFFF///CCC>>>HHH{{{RRRJJJ죣qqq쯯lllHHHdddNNNoooKKK???uuu }}} \\\ ggg'''///```\\\ ggg\\\ gggdddNNN ___&&&___\\\ ggg + + +KKK???lllHHH\\\ ggg RRRyyy\\\ gggYYY===444+++```___ tttoooKKK???hhh222<<<<<<XXX\\\...hhhnnn~~~xxx '''///LLL~~~xxx ~~~ ~~~xxx ___ tttWWWppp [[[```~~~xxx ___eeeKKK???+++```~~~xxx nnn ~~~ 444~~~xxx PPP'''҇FFF TTToooKKK???ppp}}}pppDDDddd>>>999'''///MMM>>>999```%%%777XXX>>>999TTTvvv{{{>>>999)))]]]KKK???҇FFF >>>999```%%%777XXXKKK--->>>999'''Ɇ///|||oooKKK??? + + +... + + +şnnn///ccc'''///]]]|||www{{{///[[[KKK???Ɇ//////RRR777]]]!!!{{{oooKKK???LLLiiiDDDCCC ccc '''///~~~ uuu {{{www{{{ ///[[[KKK???!!! uuu\\\ sss)))mmm|||oooKKK???&&&AAA...999777+++SSScccaaa'''///aaa{{{+++qqqaaa|||www{{{aaa///[[[KKK???sss)))mmmaaa{{{+++qqq'''RRRaaaЊHHH + + +vvv SSSoooKKK???iiirrrFFFmmmiii ccc444333'''///''' 444333444333 SSSwww{{{444333///[[[KKK???vvv444333DDD000CCC'''444333888eeeeee]]] qqqoooLLL???000''' ,,,:::aaarrr'''///jjjrrr777{{{}}}---rrr]]] qqqtttiiiwww{{{rrr///[[[LLL???eeeeeerrr777{{{}}}---rrrJJJ!!!PPP666sssfffcccCCCoooQQQ:::|||555 + + +yyy;;;KKK[[[AAAjjj+++'''/// + + +AAAjjj+++...---AAAjjj+++cccCCC tttwww{{{AAAjjj+++///[[[QQQ:::666sssfffAAAjjj+++...---^^^xxxAAAjjj+++)))***+++BBBlll999$$$^^^QQQ蕕oooeeeEEE===LLLrrr稨222OOOttt888XXXOOO'''###===ttt888XXXOOOttt888XXXOOO^^^QQQ蕕www{{{ttt888XXXOOO///[[[eeelll999$$$ttt888XXXOOOyyyVVVkkkwwwttt888XXXOOO<<>>BBBYYY ooo```XXX***mmmHHH'''mmmHHHQQQGGGmmmHHHYYY RRR'''www{{{mmmHHH///[[[>>>BBBmmmHHHQQQGGGiii\\\mmmHHHhhhmmm000(((cccUUU<<>>999''''''TTT)))]]]dddkkkTTT)))]]]'''oooKKK???)))]]]"""]]]000KKK'''777|||ooo777]]]777]]]|||///[[[şnnn///ccc|||///[[[777]]]oooKKK???///[[[XXXKKK'''###{{{ooo  {{{///[[[CCC ccc{{{///[[[ oooKKK???///[[[cccKKK'''///|||oooaaaЊHHH + + +ЊHHH + + +|||///[[[777+++SSScccTTT[[[CCC|||///[[[ЊHHH + + +oooKKK???///[[[XXXKKK'''/// SSSooo444333888888 SSS///[[[mmmiii ccc>>> ,,, SSS///[[[888oooKKK???///[[[[[[333KKK'''///]]] qqqooorrrJJJ!!!PPPJJJ!!!PPP]]] qqq///[[[:::aaagggsss]]] qqq///[[[JJJ!!!PPPoooLLL???///[[[]]]KKK'''///cccCCCoooAAAjjj+++)))***+++BBB)))***+++BBBcccCCC///[[[;;;KKK[[[GGG|||,,,cccCCC///[[[)))***+++BBBSSSJJJoooQQQ:::///[[[XXXKKK'''///^^^QQQ蕕ooottt888XXXOOO<<>><<<&&&PPP'''///UUU<<>>999KKK---444VVV)))]]]KKK---ddd҇FFF oooTTTKKK???CCC///RRR777SSS///[[[///RRRşnnn///cccɆ///ooo|||KKK???CCC \\\777SSS///[[[\\\CCC ccc!!!ooo{{{KKK???CCCaaa'''RRR777SSS///[[['''RRR777+++SSScccsss)))mmmooo|||KKK???CCC444333DDD000CCC'''777SSS///[[[DDD000CCC'''mmmiii cccvvvooo SSSKKK???CCCrrr777SSS///[[[:::aaaeeeeeeoootttiiitttiii]]] qqqLLL???CCCAAAjjj+++^^^xxx777SSS///[[[^^^xxx;;;KKK[[[666sssfffSSSJJJooo ttt tttcccCCCQQQ:::CCCttt888XXXOOOyyyVVVkkkwww777SSS///[[[yyyVVVkkkwwwLLLrrr稨222OOOlll999$$$MMMrrrDDD ooo^^^QQQ蕕eeeCCCmmmHHHiii\\\777SSS///[[[iii\\\```XXX***>>>BBBmmm[[[oooRRR'''RRR'''YYY CCC___!!!***[[[VVVEEE777SSS///[[[VVVEEEjjj### &&&fffmmm000(((cccvvv'''FFF[[[oooKKK444KKK444UUU<<<%%%sssKKK PPPsssuuuHHH KKKuuuHHHvvvKKK PPP333###OOO###___sss''' + + +DDD333###OOO777KKKCCC777}}}qqq{{{aaaEEEaaaEEEaaaEEEaaaEEEaaaEEEssssss KKKkkk 333###OOO###***___sss'''333###OOO777KKK333==={{{XXXXXXXXXXXXXXX鿿222UUUvvvooo鿿222%%%vvv KKK%%%vvvUUUvvvooo333###OOO###www___鿿222'''###HHH333###OOO777FFFlll  mmm 333hhh||| 333hhh||| 333hhh||| 333hhh||| 333hhh|||CCCjjj___888CCCNNNMMMuuu KKKNNNMMMuuu,,,nnnjjj___888333###OOO###___CCC'''/// + + +333###OOO777wwwUUUnnnUUUnnnUUUnnnUUUnnnUUUnnnCCCVVV)))CCC KKK[[['''jjjnnnVVV)))333###OOO###''')))___CCC'''///eee333###OOO777!!!111     CCC>>> + + +___CCC&&&;;; KKK&&&;;;[[[~~~ >>> + + +___333###OOO###SSS|||uuu___CCC'''///888333###OOO777rrrooo333III333III333III333III333III333CCCaaa,,, + + +CCC'''=== KKK'''===[[[eeexxxaaa,,, + + +333###OOO###SSS...___CCC'''///333###OOO777<<<fff>>>&&&qqqgggxxxgggxxxgggxxxgggxxxgggxxxCCC|||CCCiii KKKiii[[[---jjjѧ|||333###OOO###SSS'''___CCC'''///www333###OOO777jjj FFF222FFF222FFF222FFF222FFF222CCC{{{CCCaaa=== KKKaaa=== [[[***{{{333###OOO###SSS~~~sss___CCC'''///```333###OOO777aaaRRRrrr⋋ 222BBB222BBB222BBB222BBB222BBBCCCiiiCCC VVV KKK VVV[[[\\\iii333###OOO###SSS///___CCC'''///LLL333###OOO777 nnn888---@@@***KKK***KKK***KKK***KKK***KKKCCC```CCC ... KKK ...[[[222iii ```333###OOO###SSS%%%___CCC'''///MMM333###OOO777 mmm!!!DDDeee---LLL---LLL---LLL---LLL---LLLCCCqqqCCC999XXX KKK999XXX[[[666qqq555###MMM ###SSSppp___CCC'''///]]]555###MMM 777RRRFFF999CCC999CCC999CCC999CCC999CCCCCCCCC\\\ KKK\\\[[[SSS===BBB###SSS222___CCC'''///~~~===BBB777555HHH444HHH444HHH444HHH444HHH444CCCddd<<< CCCttt KKKttt[[[888hhh ddd<<< QQQ+++&&&###SSS$$$___CCC'''///QQQ+++&&&777nnnxxxnnnxxxnnnxxxnnnxxxnnnxxxCCC??? + + +\\\CCCbbb KKKbbb[[[BBB??? + + +\\\wwwKKK###SSSnnn___CCC'''///''' wwwKKK777fff///mmm<<<GGG555GGG555GGG555GGG555GGG555CCCMMM///CCC~~~""" KKK~~~"""[[[KKKMMM///xxx###SSS333(((CCC'''///jjjxxx777222!!!sssGGG     CCCkkkZZZ@@@CCCLLLEEE KKKLLLEEE[[[>>>]]]kkkZZZ@@@###SSSCCC'''/// + + +777DDDFFFnnnFFFnnnFFFnnnFFFnnnFFFnnnCCCYYYЃjjjCCCBBBOOO 888BBBOOO[[[MMMYYYЃjjjGGG^^^###SSSCCC'''###===GGG^^^777}}}'''kkkHHHzzz444{{{+++iii|||+++iii|||+++iii|||+++iii|||+++iii|||CCCCCC kkk[[[ ###SSS555CCC''' 777]]]999{{{\\\\\\\\\\\\\\\CCCIII RRRCCC%%%||| kkk%%%|||[[[CCC + + +III RRR싋DDD;;;###SSSCCC''' '''VVV싋DDD;;;777󓓓EEE"""dddPPPCCC{{{bbb???bbb???bbb???bbb???bbb??? QQQMMM +endstream +endobj + +6 0 obj +<> +stream + +mntrRGB XYZ acspAPPL- +desc|cprtx(wtptbkptrXYZgXYZbXYZrTRC gTRC bTRC desc"Artifex Software sRGB ICC Profile"Artifex Software sRGB ICC ProfiletextCopyright Artifex Software 2011XYZ QXYZ XYZ o8XYZ bXYZ $curv +#(-27;@EJOTY^chmrw| %+28>ELRY`gnu| &/8AKT]gqz !-8COZfr~ -;HUcq~ +:IXgw'7HYj{+=Oat 2FZn  % : O d y + +' += +T +j + + + + + + " 9 Q i  * C \ u & @ Z t .Id %A^z &Ca~1Om&Ed#Cc'Ij4Vx&IlAe@e Ek*Qw;c*R{Gp@j>i  A l !!H!u!!!"'"U"""# +#8#f###$$M$|$$% %8%h%%%&'&W&&&''I'z''( (?(q(())8)k))**5*h**++6+i++,,9,n,,- -A-v--..L.../$/Z///050l0011J1112*2c223 3F3334+4e4455M555676r667$7`7788P8899B999:6:t::;-;k;;<' >`>>?!?a??@#@d@@A)AjAAB0BrBBC:C}CDDGDDEEUEEF"FgFFG5G{GHHKHHIIcIIJ7J}JK KSKKL*LrLMMJMMN%NnNOOIOOP'PqPQQPQQR1R|RSS_SSTBTTU(UuUVV\VVWDWWX/X}XYYiYZZVZZ[E[[\5\\]']x]^^l^__a_``W``aOaabIbbcCccd@dde=eef=ffg=ggh?hhiCiijHjjkOkklWlmm`mnnknooxop+ppq:qqrKrss]sttptu(uuv>vvwVwxxnxy*yyzFz{{c{|!||}A}~~b~#G +k͂0WGrׇ;iΉ3dʋ0cʍ1fΏ6n֑?zM _ɖ4 +uL$h՛BdҞ@iءG&vVǥ8nRĩ7u\ЭD-u`ֲK³8%yhYѹJº;.! +zpg_XQKFAǿ=ȼ:ɹ8ʷ6˶5̵5͵6ζ7ϸ9к<Ѿ?DINU\dlvۀ܊ݖޢ)߯6DScs 2F[p(@Xr4Pm8Ww)Km +endstream +endobj + +7 0 obj +[/ICCBased 6 0 R] +endobj + +8 0 obj +<> +stream + +q +595 0 0 818.125 0 11.9375 cm +/fzImg0 Do +Q + +endstream +endobj + +xref +0 9 +0000000000 65535 f +0000000016 00000 n +0000000062 00000 n +0000000114 00000 n +0000000160 00000 n +0000000267 00000 n +0002640422 00000 n +0002643073 00000 n +0002643107 00000 n + +trailer +<<01F042D027A1BB71E85D84806881CC02>]>> +startxref +2643201 +%%EOF diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_meeting_minutes.pdf b/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_meeting_minutes.pdf new file mode 100644 index 0000000..33c717b --- /dev/null +++ b/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_meeting_minutes.pdf @@ -0,0 +1,363 @@ +%PDF-1.7 +%µ¶ + +1 0 obj +<> +endobj + +2 0 obj +<> +endobj + +3 0 obj +<>>> +endobj + +4 0 obj +<> +endobj + +5 0 obj +<>/Width 800/Height 1100/BitsPerComponent 8/ColorSpace 7 0 R/Length 2640000>> +stream +''' ***++++++[[[{{{ cccnnnAAA ***FFF***{{{ ccc###SSS[[[+++sss888!!! + + +333ddd]]]BBB++++++[[[{{{ cccuuu%%%{{{ ccc###SSS[[[+++~~~xxxZZZ++++++[[[{{{ iiiccc@@@ZZZ{{{ iiiccc###SSS[[[+++LLL555HHH\\\,,, ++++++GGG{{{ cccuuudddMMM +++,,, {{{ ccc###SSSGGG+++YYY777|||111000%%%+++ +++ ___{{{ """ccc222}}}[[[{{{ """ccc###SSS___+++ www555YYYUUUHHH+++ +++ ___{{{ nnncccjjj YYYUUUHHH{{{ nnnccc###SSS___+++ HHH kkkRRR___^^^/// + + + +++ +++ ___{{{ +++ccc...gggWWW___^^^/// + + + {{{ +++ccc###SSS___+++ JJJlllnnn___SSS+++ +++ ___{{{ OOO%%%ccc)))*** ___SSS{{{ OOO%%%ccc###SSS___+++ mmmggggggCCC___QQQ+++ +++ ___{{{ OOO444sssccc===222xxx___QQQ{{{ OOO444sssccc###SSS___+++ GGG...zzzgggCCC______222 +++ +++ ___{{{ OOOccc______222 {{{ OOOccc###SSS___+++ '''gggCCC___ZZZ+++ +++ ___{{{ OOO)))cccooo___ZZZ{{{ OOO)))ccc###SSS___+++ tttnnnSSS___LLL+++ +++ ___{{{ OOO000xxxccclll___LLL{{{ OOO000xxxccc###SSS___+++ ###444~~~ooo___```555+++333+++333___{{{ OOOcccCCC___```555{{{ OOOccc###SSS___+++333 777ttt:::LLL___```+++333+++333___{{{ OOO}}},,,ccc...___```{{{ OOO}}},,,ccc###SSS___+++333cccJJJ ___HHH+++333+++333___{{{ OOO,,,}}}ccc###{{{ccc___HHH{{{ OOO,,,}}}ccc###SSS___+++333{{{((()))@@@)))___bbb888++++++___{{{ OOOccc***{{{ccc___bbb888{{{ OOOccc###SSS___+++ݚZZZuuu______fff+++ +++ ___{{{ OOOxxx000ccc;;;{{{ccc___fff{{{ OOOxxx000ccc%%%UUU___+++ ݜUUU fffvvv___CCC+++ +++ ___{{{ OOO(((ccc```޿ccc___CCC{{{ OOO(((ccc+++ + + +\\\___+++ !!!gggCCC___ccc;;;+++ +++ ___{{{ OOOccc|||ccc___ccc;;;{{{ OOOccc555fff___+++ gggCCC___mmm+++ +++ ___{{{ OOOrrr444ccc000ccc___mmm{{{ OOOrrr444cccLLL}}}___+++ @@@***gggCCC___???+++ +++ ___{{{ OOO%%%ccc***ccc___???{{{ OOO%%%ccckkk___+++ PPPlllwwwccc___ddd@@@+++ +++ ___{{{ OOOccc777ccc___ddd@@@{{{ OOOcccHHH___+++  {{{eeeuuu___ꕕsss+++ +++ ___{{{ OOOmmm111ccc$$$rrrccc___ꕕsss{{{ OOOmmm111ccc###___+++ fff{{{...???___AAA+++ +++ ___{{{ OOO!!!ccckkkJJJccc___AAA{{{ OOO!!!ccc444vvvSSSkkk___+++  ___eee:::###+++ +++ ___{{{ OOOccc&&&eeeccc___eee:::###{{{ OOOcccaaaUUU ___+++ gggcccQQQ;;;___zzz++++++___{{{ OOOhhhccc\\\cccPPP___zzz{{{ OOOhhhccc]]] + + +\\\QQQ___+++888888nnn'''FFFqqq___+++www+++www___{{{ OOOccc&&&&&&___{{{ OOOcccWWW___+++wwwKKKNNNTTTggg___fff)))+++www+++www___{{{ OOOccc'''EEE___fff))){{{ OOOccc???___+++www$$$222___+++www+++www___{{{ OOOcccccc̓OOO%%%###999hhh___{{{ OOOccccccnnn=== + + + EEE|||___+++www}}}HHH%%%666kkk''' + + +DDDjjjvvvaaaEEERRR ...wwwaaaEEERRR ...www---{{{'''xxx???kkk XXXRRRXXXRRRppp{{{'''###HHHKKK??? 333hhh|||eee]]]졡 333hhh|||eee]]]졡 {{{'''/// + + +KKK???,,,nnnUUUnnnhhh YYYUUUnnnhhh YYY{{{'''///eeeKKK???[[['''jjjnnn444<<< ccc cccUUUTTT {{{'''///888TTT SSSaaa!!!666[[[~~~ TTT SSS 777bbb&&& 999hhh!!!JJJ III333!!!&&&III333!!!&&& {{{'''///666ggg}}}[[[eeexxx666gggfffwww777fff + + +777gggxxx``` gggxxx```  {{{'''///wwwGGGߜKKK???JJJ죣[[[---jjjѧGGGߜVVVHHH{{{RRRﲲFFF222 MMMFFF222 MMM<<<iii {{{'''///```}}}KKK???\\\ ggg[[[***}}} RRRyyy222BBBUUU222BBBUUU%%% {{{'''///LLL...hhhKKK???~~~xxx [[[\\\...hhhnnn444]]]ccc***KKK|||ccc***KKK|||ccc {{{'''///MMMdddKKK???>>>999[[[222iii dddKKK---444VVV---LLL}}}:::---LLL}}}:::'''}}} {{{'''///]]]şnnn///cccKKK???[[[666şnnn///ccc///RRR777SSS999CCCfff999999CCCfff999hhh333 {{{'''///~~~CCC cccKKK??? [[[SSSCCC ccc\\\777SSSHHH444EEEJJJHHH444EEEJJJ {{{'''///777+++SSScccKKK???aaa[[[888hhh 777+++SSSccc'''RRR777SSSnnnxxx+++gggnnnxxx+++ggg[[['''///''' mmmiii cccKKK???444333[[[BBBmmmiii cccDDD000CCC'''777SSSGGG555GGG555[[['''///jjj:::aaaLLL???rrr[[[KKK:::aaa777SSS   {{{'''/// + + +;;;KKK[[[QQQ:::AAAjjj+++[[[>>>]]];;;KKK[[[^^^xxx777SSSFFFnnn000FFFnnn000 {{{'''###===LLLrrr稨222OOOeeettt888XXXOOO[[[MMMLLLrrr稨222OOOyyyVVVkkkwww777SSS+++iii|||{{{UUU+++iii|||UUU {{{'''```XXX***mmmHHH[[[```XXX***iii\\\777SSS\\\{{{\\\ {{{''' '''VVVjjj### &&&fff%%%___!!!***[[[[[[CCC + + +jjj### &&&fffVVVEEE777SSSbbb???{{{bbb??? {{{ QQQMMMuuuHHHjjjjjjsss++++++[[[$$$ EEEjjj++++++''' + + +DDD %%%```++++++xxx???xxx???sss++++++///xxx???++++++''' ccc++++++%%%vvvKKK???KKK???sss++++++QQQfff KKK???++++++'''###HHH 888!!!++++++NNNMMMuuuKKK???KKK???sss++++++tttKKK???++++++'''/// + + + KKK ++++++KKK???KKK???sss++++++(((MMMKKK???++++++'''///eee KKK++++++&&&;;;aaa!!!666ggg LLL>>> ;;;sssaaa!!!666aaa!!!666OOO///qqq++++++XXX888hhh!!!JJJggg LLLYYY aaafff??? 111hhh!!!JJJ++++++TTT SSSggg LLLaaa!!!666'''///888XXX888aaa!!!666 KKKXXX888VVV%%%ppp++++++XXX888hhh!!!JJJggg LLLOOO///qqqXXX888ggg LLL'''===}}} ~~~FFFwwwsss}}}}}}eee++++++^^^fff + + + ~~~777 bbbrrrfff + + +++++++666ggg ~~~}}}'''///^^^}}} KKKCCC,,,^^^hhh ++++++^^^fff + + + ~~~eee^^^ ~~~iiiKKK???KKK???JJJ죣TTTttt===sssJJJ죣JJJ죣qqq쯯++++++```QQQ蔔ﲲtttoooccc,,,KKK???ﲲ++++++GGGߜJJJ죣'''///www```QQQ蔔JJJ죣 KKKRRR```QQQ蔔 񤤤 ++++++```QQQ蔔ﲲqqq쯯```QQQ蔔aaa=== KKK???KKK???\\\ ggg + + +999sss\\\ ggg\\\ gggYYY===444++++++dddNNN + + +lllHHH&&&___KKK???++++++}}} + + +\\\ ggg'''///```dddNNN\\\ ggg '''dddNNN```++++++dddNNN + + +YYY===444dddNNN + + + VVVKKK???KKK???~~~xxx ___eeehhhsss~~~xxx ~~~xxx PPP'''++++++___ ttt]]]ccc___eee+++```[[[```KKK???]]]ccc++++++...hhh___eee~~~xxx '''///LLL___ ttt~~~xxx iii___ tttbbbkkk ++++++___ ttt]]]ccc___eeePPP'''___ ttt___eee ...KKK???KKK???>>>999)))]]]&&&SSSsss>>>999>>>999'''++++++TTT444VVV)))]]]҇FFF vvv{{{KKK???444VVV++++++ddd)))]]]>>>999'''///MMMTTT>>>999 888<<<///TTT++++++TTT444VVV)))]]]'''TTT)))]]]999XXXKKK???KKK???///[[[ + + +vvv===sss777]]]++++++|||777SSS///[[[Ɇ///www{{{KKK???777SSS++++++şnnn///ccc///[[['''///]]]||| KKKqqq|||++++++|||777SSS///[[[777]]]|||///[[[\\\KKK???KKK??? ///[[[KKKsss ++++++{{{777SSS///[[[!!!www{{{KKK???777SSS++++++CCC ccc///[[[ '''///~~~{{{ KKK### {{{++++++{{{777SSS///[[[ {{{///[[[tttKKK???KKK???aaa///[[[ uuuCCCsssaaaaaaЊHHH + + ++++---|||777SSS///[[[sss)))mmmwww{{{KKK???777SSS+++---777+++SSSccc///[[[aaa'''///|||aaa KKK___|||+++---|||777SSS///[[[ЊHHH + + +|||///[[[bbbKKK???KKK???444333///[[[)))NNN!!!sss444333444333888<<< 777 SSS777SSS///[[[vvvwww{{{KKK???777SSS<<< 777mmmiii ccc///[[[444333'''///''' SSS444333 KKKbbb SSS<<< 777 SSS777SSS///[[[888 SSS///[[[~~~"""LLL???LLL???rrr///[[[qqq sssrrrrrrJJJ!!!PPPRRR]]] qqq777SSS///[[[eeeeeewww{{{LLL???777SSSRRR:::aaa///[[[rrr'''///jjj]]] qqqrrr KKK''' ]]] qqqxxx```RRR]]] qqq777SSS///[[[JJJ!!!PPP]]] qqq///[[[LLLEEEQQQ:::QQQ:::AAAjjj+++///[[[jjjLLLsssAAAjjj+++AAAjjj+++)))***+++BBB;;;cccCCC777SSS///[[[666sssfffwww{{{QQQ:::777SSS;;;;;;KKK[[[///[[[AAAjjj+++'''/// + + +cccCCCAAAjjj+++ KKK[[[cccCCCuuu;;;cccCCC777SSS///[[[)))***+++BBBcccCCC///[[[BBBOOOeeeeeettt888XXXOOO///[[[rrr^^^LLLsssttt888XXXOOOttt888XXXOOO<<>>vvv^^^QQQ蕕777SSS///[[[lll999$$$www{{{eee777SSS{{{nnn>>>vvvLLLrrr稨222OOO///[[[ttt888XXXOOO'''###===^^^QQQ蕕ttt888XXXOOO{{{ 888QQQ ^^^QQQ蕕 nnn>>>vvv^^^QQQ蕕777SSS///[[[<<>>BBBwww{{{777SSS{{{111 + + +```XXX***///[[[mmmHHH'''YYY mmmHHH{{{ YYY """111 + + +YYY 777SSS///[[[hhhYYY ///[[[%%%|||%%%%%%___!!!***[[[///[[[III%%%vvvCCCsss___!!!***[[[___!!!***[[[TTT ###ccc222 ###iiiUUU<<<777SSS///[[[mmm000(((cccwww{{{%%%777SSS{{{222 ###iiijjj### &&&fff///[[[___!!!***[[[''' '''VVVUUU<<<___!!!***[[[{{{ DDDUUU<<<JJJ ,,,222 ###iiiUUU<<<777SSS///[[[TTT ###cccUUU<<<///[[[ QQQ QQQMMMMMMuuuHHH\\\+++ +++ddd777KKK###___''' + + +DDDuuuHHHsss777KKKvvv[[[$$$ EEE'''777KKK###***___'''sss777KKKkkk ///%%%vvvppp sss뼼fff777###www___'''###HHH%%%vvv鿿222777QQQfff NNNMMMuuu;;;777###___'''/// + + +NNNMMMuuuCCC777,,,nnnttt+++777###''')))___'''///eeeCCC777[[['''jjjnnn(((MMM&&&;;;iii444PPP777###SSS|||uuu___'''///888&&&;;;CCC777[[[~~~ YYY '''===,,,777###SSS...___'''///'''===CCC777[[[eeexxx777iiixxx777###SSS'''___'''///wwwiiiCCC777[[[---jjjѧtttaaa=== 777###SSS~~~sss___'''///```aaa=== CCC777[[[***lllHHH VVV777 ###SSS///___'''///LLL VVVCCC777 [[[\\\+++``` ...777 ###SSS%%%___'''///MMM ...CCC777 [[[222iii ҇FFF 999XXX777###SSSppp___'''///]]]999XXXCCC777[[[666Ɇ///\\\⿿SSS777###SSS222___'''///~~~\\\CCC777[[[SSS!!!ttt + + +///ooo777###SSS$$$___'''///tttCCC777[[[888hhh sss)))mmmbbb___ooo777###SSSnnn___'''///''' bbbCCC777[[[BBBvvv~~~"""%%%ooo777###SSS333((('''///jjj~~~"""CCC777[[[KKKeeeeeeLLLEEE333777###SSS'''/// + + +LLLEEECCC777[[[>>>]]]666sssfffBBBOOO ^^^|||777###SSS'''###===BBBOOOCCC777[[[MMMlll999$$$+++ ooo777###SSS555'''CCC777[[[>>>BBB%%%|||lll444'''XXX777###SSS''' '''VVV%%%|||CCC777[[[CCC + + +mmm000(((ccc'''---gggjjj[[[$$$ EEEjjjjjj333###OOOsssjjj'''kkkxxx???///xxx???xxx???333###OOOsssxxx??? '''###~~~KKK???QQQfff KKK???KKK???333###OOOsssKKK???'''/// + + +KKK???tttKKK???KKK???333###OOOsssKKK???444<<<'''///kkkKKK???(((MMMKKK???KKK???333###OOOsssKKK??? '''/// 777XXX888aaa!!!666bbb&&& 999YYY TTT SSSoooOOO///qqq333###OOOUUU + + +***sss>>> ;;;sssTTT SSSaaa!!!666777'''///fff^^^}}}www777777666gggoooeee333###OOO FFFwwwsss666ggg}}}'''///~~~VVV```QQQ蔔JJJ죣HHH{{{RRRKKK???tttKKK???GGGߜKKK???oooqqq쯯333###OOO 򤤤 TTTttt===sssGGGߜKKK???JJJ죣'''/// dddNNN\\\ gggRRRyyyKKK???lllHHHKKK???}}}KKK???oooYYY===444333###OOO ___999sss}}}KKK???\\\ ggg'''/// nnn___ ttt~~~xxx 444KKK???+++```KKK???...hhhKKK???oooPPP'''333###OOOWWWppp hhhsss...hhhKKK???~~~xxx '''OOOTTT>>>999KKK---KKK???҇FFF KKK???dddKKK???ooo'''333###OOO&&&SSSsssdddKKK???>>>999'''777|||///RRRKKK???Ɇ///KKK???şnnn///cccKKK???ooo777]]]555###MMM + + +vvv===sssşnnn///cccKKK???'''###{{{ \\\KKK???!!!KKK???CCC cccKKK???ooo ===BBBKKKsssCCC cccKKK??? '''///|||aaa'''RRRKKK???sss)))mmmKKK???777+++SSScccKKK???oooЊHHH + + +QQQ+++&&& uuuCCCsss777+++SSScccKKK???aaa'''/// SSS444333DDD000CCC'''KKK???vvvKKK???mmmiii cccKKK???ooo888wwwKKK)))NNN!!!sssmmmiii cccKKK???444333'''///]]] qqqrrrLLL???eeeeeeLLL???:::aaaLLL???oooJJJ!!!PPPxxxtttiiiqqq sss:::aaaLLL???rrr'''///cccCCCAAAjjj+++^^^xxxQQQ:::666sssfffQQQ:::;;;KKK[[[QQQ:::SSSJJJooo)))***+++BBB tttjjjLLLsss;;;KKK[[[QQQ:::AAAjjj+++'''///^^^QQQ蕕ttt888XXXOOOyyyVVVkkkwwweeelll999$$$eeeLLLrrr稨222OOOeeeMMMrrrDDD ooo<<>>BBB```XXX***mmm[[[ooohhh RRR'''ccc%%%...sss```XXX***mmmHHH'''///UUU<<<___!!!***[[[VVVEEE%%%mmm000(((ccc%%%jjj### &&&fff%%%vvv'''FFF[[[oooTTT ###ccc싋DDD;;;KKK444III%%%vvvCCCsssjjj### &&&fff%%%___!!!***[[[[[[ddd"""ppp'''---gggjjjsssPPPrrr'''kkkxxx???sss '''###~~~ KKK???sss'''/// + + +KKK???ssssss'''///kkk444<<<KKK???sssooo'''///hhh!!!JJJTTT SSSOOO///qqqaaa!!!666 bbb&&& 999XXX888aaafff??? 111UUU + + +***sssaaa!!!666aaa!!!666>>> ;;;sssOOO///qqqooobbb&&& 999bbb&&& 999aaa!!!666OOO///qqqOOO///qqqCCCGGGooonnn'''///fff + + +666gggeee}}}777www777^^^ bbbrrr }}}}}}FFFwwwssseeeooowww777www777}}}eeeeeeCCCGGGooo ___'''///~~~ﲲGGGߜqqq쯯JJJ죣HHH{{{RRR```QQQ蔔oooccc,,, 򤤤 JJJ죣KKK???JJJ죣TTTttt===sssqqq쯯oooHHH{{{RRRHHH{{{RRRJJJ죣qqq쯯qqq쯯oooooo^^^hhh((('''///}}}YYY===444\\\ gggRRRyyydddNNN&&&___ ___\\\ gggKKK???\\\ ggg999sssYYY===444oooRRRyyyRRRyyy\\\ gggYYY===444YYY===444oooooo'''/// ]]]ccc...hhhPPP'''~~~xxx 444___ ttt[[[```WWWppp ~~~xxx KKK???~~~xxx hhhsssPPP'''ooo444444~~~xxx PPP'''PPP'''ooooooVVV'''OOO444VVVddd'''>>>999KKK---TTTvvv{{{>>>999KKK???>>>999&&&SSSsss'''oooKKK---KKK--->>>999''''''ooooooxxx + + +XXX???'''777777SSSşnnn///ccc777]]]///RRR|||www{{{KKK??? + + +vvv===sss777]]]ooo///RRR///RRR777]]]777]]]oooooo + + +'''###777SSSCCC ccc  \\\{{{www{{{ KKK??? KKKsss ooo\\\\\\  oooooo555MMM'''///777SSS777+++SSScccЊHHH + + +aaa'''RRR|||www{{{aaaKKK???aaa uuuCCCsssЊHHH + + +ooo'''RRR'''RRRaaaЊHHH + + +ЊHHH + + +ooooooIIIVVV'''///777SSSmmmiii ccc888444333DDD000CCC''' SSSwww{{{444333KKK???444333)))NNN!!!sss888oooDDD000CCC'''DDD000CCC'''444333888888oooooo'''///777SSS:::aaaJJJ!!!PPPrrr]]] qqqwww{{{tttiiirrrLLL???rrrqqq sssJJJ!!!PPPooorrrJJJ!!!PPPJJJ!!!PPPooooooOOOXXX'''///777SSS;;;KKK[[[)))***+++BBBAAAjjj+++^^^xxxcccCCCwww{{{ tttAAAjjj+++QQQ:::AAAjjj+++jjjLLLsss)))***+++BBBSSSJJJooo^^^xxx^^^xxxAAAjjj+++)))***+++BBB)))***+++BBBoooSSSJJJooo]]]mmm'''///777SSSLLLrrr稨222OOO<<>> ;;;sssaaa!!!666aaa!!!666 777pppnnn'''///fff + + +666gggeee}}}``` ^^^ ~~~fff666gggwww777999CCCGGG^^^fff$$$``` FFFwwwsss}}}}}}fff ___'''///~~~ﲲGGGߜqqq쯯JJJ죣 MMM```QQQ蔔KKK???VVVGGGߜHHH{{{RRR888ooo```QQQ蔔VVVxxxYYY MMMTTTttt===sssJJJ죣FFF///CCC>>>JJJ죣VVV^^^hhh((('''///}}}YYY===444\\\ gggUUUdddNNN + + +KKK??? }}}RRRyyy 888ooodddNNN MMMUUU999sss\\\ ggg\\\ ggg '''/// ]]]ccc...hhhPPP'''~~~xxx |||ccc___ ttt___eeeKKK???nnn...hhh444777ooo___ tttnnn:::|||ccchhhsss~~~xxx ~~~ ~~~xxx nnnVVV'''OOO444VVVddd'''>>>999}}}:::TTT)))]]]KKK???dddKKK---oooTTT999}}}:::&&&SSSsss>>>999```%%%777XXX>>>999xxx + + +XXX???'''777777SSSşnnn///ccc777]]]fff999|||///[[[KKK???şnnn///ccc///RRRooo|||MMMfff999 + + +vvv===sss + + +'''###777SSSCCC ccc  EEEJJJ{{{///[[[KKK???CCC ccc\\\000...ooo{{{wwwZZZEEEJJJKKKsss uuu 555MMM'''///777SSS777+++SSScccЊHHH + + +aaa+++ggg|||///[[[KKK???777+++SSSccc'''RRR111QQQooo|||(((+++ggg uuuCCCsssaaa{{{+++qqqaaaIIIVVV'''///777SSSmmmiii ccc888444333 SSS///[[[KKK???mmmiii cccDDD000CCC'''jjjooo SSShhhttt)))NNN!!!sss444333444333'''///777SSS:::aaaJJJ!!!PPPrrr]]] qqq///[[[LLL???:::aaaooo]]] qqqrrr qqq sssrrr777{{{}}}---rrrOOOXXX'''///777SSS;;;KKK[[[)))***+++BBBAAAjjj+++000cccCCC///[[[QQQ:::;;;KKK[[[^^^xxx```ooocccCCCEEE{{{999000jjjLLLsssAAAjjj+++...---AAAjjj+++]]]mmm'''///777SSSLLLrrr稨222OOO<<>> ;;;sss@@@ ---oooKKKaaa!!!666333###sssaaa!!!666aaa!!!666888>>>}}}www'''``` KKKCCC,,,oooFFFwwwsssKKK111\\\KKK}}}333###III}}}}}}xxx>>>///ttt MMM KKKRRRoooTTTttt===sss^^^lllCCCKKKJJJ죣KKK???333###555JJJ죣FFF///CCC>>>JJJ죣666UUU '''ooo999sss}}}GGGKKK\\\ gggKKK???333###҅ \\\ ggg\\\ ggguuu |||ccc iiiooohhhsssjjjKKK~~~xxx KKK???333~~~xxx ~~~ ~~~xxx hhh222<<<<<<XXX\\\}}}::: 888<<<///ooo&&&SSSsss"""]]]000KKK>>>999KKK???333\\\>>>999```%%%777XXX>>>999ppp}}}pppDDDfff999 KKKqqqooo + + +vvv===sssXXXKKKKKK???333###aaaWWW + + +... + + +EEEJJJ KKK### oooKKKssscccKKK KKK???333###::: uuu LLLiiiDDD+++ggg KKK___ooo uuuCCCsssXXXKKKaaaKKK???333###VVVYYYaaa{{{+++qqqaaa&&&AAA...999 KKKbbbooo)))NNN!!!sss[[[333KKK444333KKK???333###444333444333iiirrrFFF KKK''' oooqqq sss]]]KKKrrrLLL???333###rrr777{{{}}}---rrr000''' ,,,000 KKK[[[SSSJJJooojjjLLLsssXXXKKKAAAjjj+++QQQ:::333###SSS[[[AAAjjj+++...---AAAjjj+++|||555 + + +yyyUUU 888QQQ MMMrrrDDD ooorrr^^^LLLsssXXXqqqSSSLLLttt888XXXOOOeee333###ttt888XXXOOOttt888XXXOOOEEE=== mmm[[[oooccc%%%...sss>>><<<&&&PPPmmmHHH333###!!!mmmHHHQQQGGGmmmHHH DDDvvv'''FFF[[[oooIII%%%vvvCCCsss999...///ZZZ___!!!***[[[%%%333###999uuu___!!!***[[[___!!!***[[[aaaaaa"""ooottt000pppsss + + +%%%hhhppp^^^nnn))) + + +###XXX苋DDD + + +###XXXjjjsssSSSqqq{{{::: \\\fffPPPrrrjjjssssssjjjyyy222xxx???sssSSS==={{{111### xxx???ssssssxxx???;;;:::YYYKKK???sssԿ<<< mmm333YYY```}}}KKK???ssssssKKK???JJJ{{{&&&KKK???sss888 sssKKK???ssssssKKK???iiiyyyKKK???sssuuu***kkkoooKKK???ssssssKKK???,,, OOOooo 777 777aaa!!!666ggg LLLOOO///qqqUUU + + +***sssaaa!!!666ggg LLL>>> ;;;sssaaaooo333555 %%%XXX888CCCGGGTTT SSSXXX888bbb&&& 999TTT SSSaaa!!!666>>> ;;;sssVVV%%%pppooo>>> ;;;sss@@@ ---oooKKKaaa!!!666^^^oooffffff}}} ~~~eee }}} ~~~FFFwwwsss@@@<<<fff>>>&&&qqq---RRR^^^CCCGGG666ggg^^^www777666ggg}}}FFFwwwssshhh oooFFFwwwsssKKK111\\\KKK}}}oooVVVVVVJJJ죣KKK???qqq쯯 򤤤 JJJ죣TTTttt===sss  + + + 999```QQQ蔔oooGGGߜ```QQQ蔔HHH{{{RRRGGGߜKKK???JJJ죣TTTttt===sss 񤤤 oooTTTttt===sss^^^lllCCCKKKJJJ죣KKK???wwwooo \\\ ggg + + +KKK???YYY===444 ___\\\ ggg + + +999sss222rrr⋋ 000\\\ccc:::dddNNNooo}}}dddNNNRRRyyy}}}KKK???\\\ ggg999sss```ooo999sss}}}GGGKKK\\\ gggKKK???hhhooonnnnnn~~~xxx ___eeeKKK???PPP'''WWWppp ~~~xxx ___eeehhhsss---@@@555(((PPP888^^^___ tttooo...hhh___ ttt444...hhhKKK???~~~xxx hhhsssbbbkkk ooohhhsssjjjKKK~~~xxx KKK???kkkooo>>>999)))]]]KKK???'''>>>999)))]]]&&&SSSsssKKK eeeAAA \\\&&&222!!!TTTooodddTTTKKK---dddKKK???>>>999&&&SSSsssooo&&&SSSsss"""]]]000KKK>>>999KKK???ooo///[[[KKK???777]]]///[[[ + + +vvv===sssaaaggg+++SSSfff|||oooşnnn///ccc|||///RRRşnnn///cccKKK??? + + +vvv===sssooo + + +vvv===sssXXXKKKKKK???ooo ///[[[KKK???  ///[[[KKKsss {{{oooCCC ccc{{{\\\CCC cccKKK??? KKKsssoooKKKssscccKKK KKK???TTT[[[CCCoooaaa///[[[KKK???ЊHHH + + +aaa///[[[ uuuCCCsssGGG{{{666|||ooo777+++SSSccc|||'''RRR777+++SSScccKKK???aaa uuuCCCsssooo uuuCCCsssXXXKKKaaaKKK???>>> ,,,ooo444333///[[[KKK???888444333///[[[)))NNN!!!sssfff<<< + + +AAA SSSooommmiii ccc SSSDDD000CCC'''mmmiii cccKKK???444333)))NNN!!!sssooo)))NNN!!!sss[[[333KKK444333KKK???gggsssooorrr///[[[LLL???JJJ!!!PPPtttiiirrr///[[[qqq ssssssGGGbbb999]]] qqqooo:::aaa]]] qqq:::aaaLLL???rrrqqq sssxxx```oooqqq sss]]]KKKrrrLLL???GGG|||,,,SSSJJJoooAAAjjj+++///[[[QQQ:::)))***+++BBB tttAAAjjj+++///[[[jjjLLLsssDDD'''cccCCCooo;;;KKK[[[cccCCC^^^xxx;;;KKK[[[QQQ:::AAAjjj+++jjjLLLsssuuuSSSJJJooojjjLLLsssXXXKKKAAAjjj+++QQQ:::)))444LLLMMMrrrDDD ooottt888XXXOOO///[[[eee<<>><<<&&&PPPmmmHHHooo444 + + +$$$]]]vvv'''FFF[[[ooo___!!!***[[[///[[[%%%TTT ###cccKKK444___!!!***[[[///[[[III%%%vvvCCCsss+++IIIPPPCCC333>>>```UUU<<>>MMMxxx???sssxxx???KKKKKK??? [[[jjjKKK???sssKKK???KKK KKK???DDD yyy!!!KKK???sssKKK???KKK KKK???eeeNNNXXX KKK???sssKKK???KKK XXX888 777aaa!!!666bbb&&& 999TTT SSSOOO///qqqfffhhh!!!JJJggg LLLVVV%%%pppooo>>> ;;;sss@@@ ---oooKKKaaa!!!666KKK ^^^fff}}}www777666gggeeemmmRRRGGGfff + + + ~~~hhh oooFFFwwwsssKKK111\\\KKK}}}KKK ```QQQ蔔VVVJJJ죣HHH{{{RRRGGGߜqqq쯯KKK???TTT222###KKK???ﲲ 񤤤 oooTTTttt===sss^^^lllCCCKKKJJJ죣KKK???KKK dddNNN \\\ gggRRRyyy}}}YYY===444KKK???DDDjjj\\\888KKK??? + + +```ooo999sss}}}GGGKKK\\\ gggKKK???KKK___ tttnnn~~~xxx 444...hhhPPP'''KKK???'''mmmKKK???]]]ccc___eeebbbkkk ooohhhsssjjjKKK~~~xxx KKK???KKKTTT>>>999KKK---ddd'''KKK???!!!mmm'''KKK???444VVV)))]]]ooo&&&SSSsss"""]]]000KKK>>>999KKK???KKK|||///RRRşnnn///ccc777]]]KKK???dddDDDkkkAAAKKK???777SSS///[[[ooo + + +vvv===sssXXXKKKKKK???KKK {{{ \\\CCC ccc KKK??? ###aaaQQQKKK???777SSS///[[[oooKKKssscccKKK KKK???KKK |||aaa'''RRR777+++SSScccЊHHH + + +KKK???BBBddd###KKK???777SSS///[[[ooo uuuCCCsssXXXKKKaaaKKK???KKK SSS444333DDD000CCC'''mmmiii ccc888KKK???000000KKK???777SSS///[[[ooo)))NNN!!!sss[[[333KKK444333KKK???KKK ]]] qqqrrr:::aaaJJJ!!!PPPLLL???tttfff\\\^^^$$$LLL???777SSS///[[[xxx```oooqqq sss]]]KKKrrrLLL???KKK cccCCCAAAjjj+++^^^xxx;;;KKK[[[)))***+++BBBQQQ:::111GGGjjjQQQ:::777SSS///[[[uuuSSSJJJooojjjLLLsssXXXKKKAAAjjj+++QQQ:::KKK ^^^QQQ蕕ttt888XXXOOOyyyVVVkkkwwwLLLrrr稨222OOO<<>><<<&&&PPPmmmHHHKKK UUU<<<___!!!***[[[VVVEEEjjj### &&&fffTTT ###ccc%%%|||<<<%%%777SSS///[[[JJJ ,,,vvv'''FFF[[[oooIII%%%vvvCCCsss999...///ZZZ___!!!***[[[%%%"""ooottt000pppsss + + +%%%hhhppp^^^nnn))) + + +###XXXPPPUUUuuuHHHjjjjjjxxx]]]xxx???xxx???FFF```xxx%%%vvvKKK???KKK???666WWW + + +NNNMMMuuuKKK???KKK???KKK???KKK???>>>&&&;;;bbb&&& 999XXX888ggg LLLaaa!!!666aaafff??? 111OOO///qqqIII'''===www777^^^ ~~~}}} bbbrrreee欬999iiiHHH{{{RRRKKK???```QQQ蔔KKK???JJJ죣oooccc,,,qqq쯯;;;aaa=== RRRyyyKKK???dddNNN + + +KKK???\\\ ggg&&&___YYY===444444 VVV444KKK???___ ttt___eeeKKK???~~~xxx [[[```PPP'''랞 ...KKK---KKK???TTT)))]]]KKK???>>>999vvv{{{''' 222999XXX///RRRKKK???|||///[[[KKK???www{{{777]]]rrr\\\\\\KKK???{{{///[[[KKK??? www{{{ ttt'''RRRKKK???|||///[[[KKK???aaawww{{{ЊHHH + + +ppp&&&CCCbbbDDD000CCC'''KKK??? SSS///[[[KKK???444333www{{{888gggXXX~~~"""LLL???]]] qqq///[[[LLL???rrrwww{{{JJJ!!!PPP___```LLLEEE^^^xxxQQQ:::cccCCC///[[[QQQ:::AAAjjj+++www{{{)))***+++BBB\\\RRRᄄBBBOOOyyyVVVkkkwwweee^^^QQQ蕕///[[[eeettt888XXXOOOwww{{{<<>> ;;;sssXXX888bbb&&& 999oooaaafff??? 111aaa!!!666ggg LLL++++++^^^fff + + + ~~~KKK ~~~666ggg }}}fff}}}GGG```WWWoooooofff}}} bbbrrr}}} ~~~eeeFFFwwwsss^^^www777ooo bbbrrr}}} ~~~++++++```QQQ蔔ﲲKKK GGGߜSSSJJJ죣VVVJJJ죣WWWjjj444 ooooooVVVJJJ죣oooccc,,,JJJ죣KKK???qqq쯯TTTttt===sss```QQQ蔔HHH{{{RRRooooooccc,,,JJJ죣KKK???++++++dddNNN + + +KKK + + +}}}}}}\\\ ggg \\\ gggttt000oooooo \\\ ggg&&&___\\\ ggg + + +KKK???YYY===444999sssdddNNNRRRyyyooo&&&___\\\ ggg + + +KKK???++++++___ ttt]]]ccc___eeeKKK___eee...hhh\\\~~~xxx nnn~~~xxx kkk oooooonnn~~~xxx [[[```~~~xxx ___eeeKKK???PPP'''hhhsss___ ttt444ooo[[[```~~~xxx ___eeeKKK???++++++TTT444VVV)))]]]KKK)))]]]ddd + + +444>>>999>>>999)))JJJoooooo>>>999vvv{{{>>>999)))]]]KKK???'''&&&SSSsssTTTKKK---ooovvv{{{>>>999)))]]]KKK???++++++|||777SSS///[[[KKK///[[[şnnn///cccnnnBBBoooooowww{{{///[[[KKK???777]]] + + +vvv===sss|||///RRRooowww{{{///[[[KKK???++++++{{{777SSS///[[[KKK ///[[[CCC ccc888 xxxSSSoooooo www{{{ ///[[[KKK??? KKKsss{{{\\\ooowww{{{ ///[[[KKK???+++---|||777SSS///[[[KKK ///[[[777+++SSSccc^^^aaaaaammmKKKooooooaaawww{{{aaa///[[[KKK???ЊHHH + + + uuuCCCsss|||'''RRRooowww{{{aaa///[[[KKK???<<< 777 SSS777SSS///[[[KKK ///[[[mmmiii ccciii444333444333444EEE)))oooooo444333www{{{444333///[[[KKK???888)))NNN!!!sss SSSDDD000CCC'''ooowww{{{444333///[[[KKK???RRR]]] qqq777SSS///[[[KKK ///[[[:::aaa>>>rrrrrruuuoooooorrrwww{{{rrr///[[[LLL???JJJ!!!PPPqqq sss]]] qqqooowww{{{rrr///[[[LLL???;;;cccCCC777SSS///[[[KKK ///[[[;;;KKK[[[ AAAjjj+++AAAjjj+++___SSSoooSSSJJJoooAAAjjj+++www{{{AAAjjj+++///[[[QQQ:::)))***+++BBBjjjLLLssscccCCC^^^xxxSSSJJJooowww{{{AAAjjj+++///[[[QQQ:::nnn>>>vvv^^^QQQ蕕777SSS///[[[KKK ///[[[LLLrrr稨222OOO"""ttt888XXXOOOttt888XXXOOOWWWRRRoooMMMrrrDDD ooottt888XXXOOOwww{{{ttt888XXXOOO///[[[eee<<> +stream + +mntrRGB XYZ acspAPPL- +desc|cprtx(wtptbkptrXYZgXYZbXYZrTRC gTRC bTRC desc"Artifex Software sRGB ICC Profile"Artifex Software sRGB ICC ProfiletextCopyright Artifex Software 2011XYZ QXYZ XYZ o8XYZ bXYZ $curv +#(-27;@EJOTY^chmrw| %+28>ELRY`gnu| &/8AKT]gqz !-8COZfr~ -;HUcq~ +:IXgw'7HYj{+=Oat 2FZn  % : O d y + +' += +T +j + + + + + + " 9 Q i  * C \ u & @ Z t .Id %A^z &Ca~1Om&Ed#Cc'Ij4Vx&IlAe@e Ek*Qw;c*R{Gp@j>i  A l !!H!u!!!"'"U"""# +#8#f###$$M$|$$% %8%h%%%&'&W&&&''I'z''( (?(q(())8)k))**5*h**++6+i++,,9,n,,- -A-v--..L.../$/Z///050l0011J1112*2c223 3F3334+4e4455M555676r667$7`7788P8899B999:6:t::;-;k;;<' >`>>?!?a??@#@d@@A)AjAAB0BrBBC:C}CDDGDDEEUEEF"FgFFG5G{GHHKHHIIcIIJ7J}JK KSKKL*LrLMMJMMN%NnNOOIOOP'PqPQQPQQR1R|RSS_SSTBTTU(UuUVV\VVWDWWX/X}XYYiYZZVZZ[E[[\5\\]']x]^^l^__a_``W``aOaabIbbcCccd@dde=eef=ffg=ggh?hhiCiijHjjkOkklWlmm`mnnknooxop+ppq:qqrKrss]sttptu(uuv>vvwVwxxnxy*yyzFz{{c{|!||}A}~~b~#G +k͂0WGrׇ;iΉ3dʋ0cʍ1fΏ6n֑?zM _ɖ4 +uL$h՛BdҞ@iءG&vVǥ8nRĩ7u\ЭD-u`ֲK³8%yhYѹJº;.! +zpg_XQKFAǿ=ȼ:ɹ8ʷ6˶5̵5͵6ζ7ϸ9к<Ѿ?DINU\dlvۀ܊ݖޢ)߯6DScs 2F[p(@Xr4Pm8Ww)Km +endstream +endobj + +7 0 obj +[/ICCBased 6 0 R] +endobj + +8 0 obj +<> +stream + +q +595 0 0 818.125 0 11.9375 cm +/fzImg0 Do +Q + +endstream +endobj + +xref +0 9 +0000000000 65535 f +0000000016 00000 n +0000000062 00000 n +0000000114 00000 n +0000000160 00000 n +0000000267 00000 n +0002640422 00000 n +0002643073 00000 n +0002643107 00000 n + +trailer +<<8B76016B2ADBF53913D48221C50EC1CA>]>> +startxref +2643201 +%%EOF diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_minimal.pdf b/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_minimal.pdf new file mode 100644 index 0000000..9410339 --- /dev/null +++ b/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_minimal.pdf @@ -0,0 +1,180 @@ +%PDF-1.7 +%µ¶ + +1 0 obj +<> +endobj + +2 0 obj +<> +endobj + +3 0 obj +<>>> +endobj + +4 0 obj +<> +endobj + +5 0 obj +<>/Width 800/Height 1100/BitsPerComponent 8/ColorSpace 7 0 R/Length 2640000>> +stream +''' ccc㛛TTT/// JJJ[[[{{{܋PPP$$$%%%TTT+++]]]BBB cccUUU888[[[{{{FFFfff+++~~~xxx iiicccVVV[[[{{{aaa+++HHH\\\ cccVVV<<>> ;;;sssXXX888bbb&&& 999oooaaafff??? 111aaa!!!666ggg LLLCCCfff + + +eeeeee666ggg bbbrrr ~~~ bbbrrr666ggg}}}eeeFFFwwwsss^^^www777ooo bbbrrr}}} ~~~CCCﲲqqq쯯qqq쯯GGGߜoooccc,,,oooccc,,,GGGߜKKK???JJJ죣qqq쯯KKK???TTTttt===sss```QQQ蔔HHH{{{RRRooooooccc,,,JJJ죣KKK???CCCYYY===444YYY===444}}}&&&___ + + +&&&___}}}KKK???\\\ gggYYY===444KKK???999sssdddNNNRRRyyyooo&&&___\\\ ggg + + +KKK???CCC]]]cccPPP'''PPP'''...hhh[[[```___eee[[[```...hhhKKK???~~~xxx PPP'''KKK???hhhsss___ ttt444ooo[[[```~~~xxx ___eeeKKK???CCC444VVV''''''dddvvv{{{)))]]]vvv{{{dddKKK???>>>999'''KKK???&&&SSSsssTTTKKK---ooovvv{{{>>>999)))]]]KKK???CCC777SSS777]]]777]]]şnnn///cccwww{{{///[[[www{{{şnnn///cccKKK???777]]]KKK??? + + +vvv===sss|||///RRRooowww{{{///[[[KKK???CCC777SSS  CCC cccwww{{{///[[[www{{{CCC cccKKK??? KKK???KKKsss{{{\\\ooowww{{{ ///[[[KKK???CCC777SSSЊHHH + + +ЊHHH + + +777+++SSScccwww{{{///[[[www{{{777+++SSScccKKK???aaaЊHHH + + +KKK??? uuuCCCsss|||'''RRRooowww{{{aaa///[[[KKK???CCC777SSS888888mmmiii cccwww{{{///[[[www{{{mmmiii cccKKK???444333888KKK???)))NNN!!!sss SSSDDD000CCC'''ooowww{{{444333///[[[KKK???CCC777SSSJJJ!!!PPPJJJ!!!PPP:::aaawww{{{///[[[www{{{:::aaaLLL???rrrJJJ!!!PPPLLL???qqq sss]]] qqqooowww{{{rrr///[[[LLL???CCC777SSS)))***+++BBB)))***+++BBB;;;KKK[[[www{{{///[[[www{{{;;;KKK[[[QQQ:::AAAjjj+++)))***+++BBBQQQ:::jjjLLLssscccCCC^^^xxxSSSJJJooowww{{{AAAjjj+++///[[[QQQ:::CCC777SSS<<>>}}}www'''hhh!!!JJJoooOOO///qqqTTT SSSCCCGGGaaa!!!666888>>>}}}www'''ggg LLLaaa!!!666OOO///qqqXXX888CCCGGGaaa!!!666(((xxx>>>///tttfff + + +oooeee666gggCCCGGG}}}xxx>>>///ttt ~~~}}}eee^^^CCCGGG}}}```ooo666KKK???ﲲoooqqq쯯KKK???GGGߜoooJJJ죣666JJJ죣qqq쯯```QQQ蔔oooKKK???JJJ죣OOO^^^KKK???uuu KKK???oooYYY===444KKK???}}}ooo\\\ ggguuu  + + +\\\ gggYYY===444dddNNNoooKKK???\\\ ggg###"""KKK???hhh222<<<<<<XXX\\\KKK???]]]cccoooPPP'''KKK???...hhhooo~~~xxx hhh222<<<<<<XXX\\\___eee~~~xxx PPP'''___ tttoooKKK???~~~xxx cccKKK???ppp}}}pppDDDKKK???444VVVooo'''KKK???dddooo>>>999ppp}}}pppDDD)))]]]>>>999'''TTToooKKK???>>>999ZZZKKK??? + + +... + + +KKK???777SSSooo777]]]KKK???şnnn///cccooo + + +... + + +///[[[777]]]|||oooKKK???999KKK???LLLiiiDDDKKK???777SSSooo KKK???CCC cccooo LLLiiiDDD///[[[ {{{oooKKK??? IIIOOOKKK???&&&AAA...999KKK???777SSSoooЊHHH + + +KKK???777+++SSScccoooaaa&&&AAA...999///[[[aaaЊHHH + + +|||oooKKK???aaaKKK???iiirrrFFFKKK???777SSSooo888KKK???mmmiii cccooo444333iiirrrFFF///[[[444333888 SSSoooKKK???444333EEEDDDKKK???000''' ,,,LLL???777SSSoooJJJ!!!PPPLLL???:::aaaooorrr000''' ,,,///[[[rrrJJJ!!!PPP]]] qqqoooLLL???rrrFFF EEELLL???|||555 + + +yyyQQQ:::777SSSSSSJJJooo)))***+++BBBQQQ:::;;;KKK[[[oooAAAjjj+++|||555 + + +yyy///[[[AAAjjj+++)))***+++BBBcccCCCoooQQQ:::AAAjjj+++QQQ:::EEE===eee777SSSMMMrrrDDD ooo<<>> ;;;sssOOO///qqqVVV%%%pppaaa!!!666UUU + + +***sss 777XXX888bbb&&& 999aaa!!!666OOO///qqqOOO///qqqaaa!!!666>>> ;;;sssbbb&&& 999XXX888 777 777aaa!!!666bbb&&& 999nnneeefff + + +^^^oooFFFwwwssseeehhh }}} fff^^^www777}}}eeeeee}}}FFFwwwssswww777^^^ffffff}}}www777 ___KKK???qqq쯯ﲲ```QQQ蔔oooTTTttt===sssqqq쯯KKK??? 񤤤 JJJ죣 򤤤 VVV```QQQ蔔HHH{{{RRRJJJ죣qqq쯯qqq쯯JJJ죣TTTttt===sssHHH{{{RRR```QQQ蔔VVVVVVJJJ죣HHH{{{RRRKKK???^^^hhh(((KKK???YYY===444dddNNNooo999sssYYY===444KKK???```\\\ ggg ___ dddNNNRRRyyy\\\ gggYYY===444YYY===444\\\ ggg999sssRRRyyydddNNN \\\ gggRRRyyyKKK???KKK???PPP''']]]ccc___ tttooohhhsssPPP'''KKK???bbbkkk ~~~xxx WWWppp nnn___ ttt444~~~xxx PPP'''PPP'''~~~xxx hhhsss444___ tttnnnnnn~~~xxx 444KKK???VVVKKK???'''444VVVTTTooo&&&SSSsss'''KKK???>>>999TTTKKK--->>>999''''''>>>999&&&SSSsssKKK---TTT>>>999KKK---KKK???xxx + + +XXX???KKK???777]]]777SSS|||ooo + + +vvv===sss777]]]KKK???|||///RRR777]]]777]]] + + +vvv===sss///RRR|||///RRRKKK??? + + +KKK??? 777SSS{{{oooKKKsss KKK??? {{{\\\   KKKsss\\\{{{ \\\KKK???555MMMKKK???ЊHHH + + +777SSS|||ooo uuuCCCsssЊHHH + + +KKK???aaa|||'''RRRaaaЊHHH + + +ЊHHH + + +aaa uuuCCCsss'''RRR|||aaa'''RRRKKK???IIIVVVKKK???888777SSS SSSooo)))NNN!!!sss888KKK???444333 SSSDDD000CCC'''444333888888444333)))NNN!!!sssDDD000CCC''' SSS444333DDD000CCC'''KKK???LLL???JJJ!!!PPP777SSS]]] qqqoooqqq sssJJJ!!!PPPLLL???xxx```rrrtttiii]]] qqqrrrJJJ!!!PPPJJJ!!!PPPrrrqqq sss]]] qqqrrrLLL???OOOXXXQQQ:::)))***+++BBB777SSScccCCCSSSJJJooojjjLLLsss)))***+++BBBQQQ:::uuuAAAjjj+++ tttcccCCC^^^xxxAAAjjj+++)))***+++BBB)))***+++BBBAAAjjj+++jjjLLLsss^^^xxxcccCCCAAAjjj+++^^^xxxQQQ:::]]]mmmeee<<> +stream + +mntrRGB XYZ acspAPPL- +desc|cprtx(wtptbkptrXYZgXYZbXYZrTRC gTRC bTRC desc"Artifex Software sRGB ICC Profile"Artifex Software sRGB ICC ProfiletextCopyright Artifex Software 2011XYZ QXYZ XYZ o8XYZ bXYZ $curv +#(-27;@EJOTY^chmrw| %+28>ELRY`gnu| &/8AKT]gqz !-8COZfr~ -;HUcq~ +:IXgw'7HYj{+=Oat 2FZn  % : O d y + +' += +T +j + + + + + + " 9 Q i  * C \ u & @ Z t .Id %A^z &Ca~1Om&Ed#Cc'Ij4Vx&IlAe@e Ek*Qw;c*R{Gp@j>i  A l !!H!u!!!"'"U"""# +#8#f###$$M$|$$% %8%h%%%&'&W&&&''I'z''( (?(q(())8)k))**5*h**++6+i++,,9,n,,- -A-v--..L.../$/Z///050l0011J1112*2c223 3F3334+4e4455M555676r667$7`7788P8899B999:6:t::;-;k;;<' >`>>?!?a??@#@d@@A)AjAAB0BrBBC:C}CDDGDDEEUEEF"FgFFG5G{GHHKHHIIcIIJ7J}JK KSKKL*LrLMMJMMN%NnNOOIOOP'PqPQQPQQR1R|RSS_SSTBTTU(UuUVV\VVWDWWX/X}XYYiYZZVZZ[E[[\5\\]']x]^^l^__a_``W``aOaabIbbcCccd@dde=eef=ffg=ggh?hhiCiijHjjkOkklWlmm`mnnknooxop+ppq:qqrKrss]sttptu(uuv>vvwVwxxnxy*yyzFz{{c{|!||}A}~~b~#G +k͂0WGrׇ;iΉ3dʋ0cʍ1fΏ6n֑?zM _ɖ4 +uL$h՛BdҞ@iءG&vVǥ8nRĩ7u\ЭD-u`ֲK³8%yhYѹJº;.! +zpg_XQKFAǿ=ȼ:ɹ8ʷ6˶5̵5͵6ζ7ϸ9к<Ѿ?DINU\dlvۀ܊ݖޢ)߯6DScs 2F[p(@Xr4Pm8Ww)Km +endstream +endobj + +7 0 obj +[/ICCBased 6 0 R] +endobj + +8 0 obj +<> +stream + +q +595 0 0 818.125 0 11.9375 cm +/fzImg0 Do +Q + +endstream +endobj + +xref +0 9 +0000000000 65535 f +0000000016 00000 n +0000000062 00000 n +0000000114 00000 n +0000000160 00000 n +0000000267 00000 n +0002640422 00000 n +0002643073 00000 n +0002643107 00000 n + +trailer +<<528BDF63938888106EC0B3A3BC326265>]>> +startxref +2643201 +%%EOF diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_report.pdf b/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_report.pdf new file mode 100644 index 0000000..4c2112f --- /dev/null +++ b/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_report.pdf @@ -0,0 +1,1136 @@ +%PDF-1.7 +%µ¶ + +1 0 obj +<> +endobj + +2 0 obj +<> +endobj + +3 0 obj +<>>> +endobj + +4 0 obj +<> +endobj + +5 0 obj +<>/Width 800/Height 1100/BitsPerComponent 8/ColorSpace 7 0 R/Length 2640000>> +stream +''' [[[+++܋PPP$$$%%%TTT333SSS ccc{{{܋PPP$$$%%%TTTGGG### !!!AAAttt+++---ZZZ㛛TTT/// JJJ### !!!AAAttt[[[]]]BBB[[[+++FFFfff333SSS ccc{{{FFFfff]]]aaaGGG###;;;+++ UUU888###;;;[[[~~~xxx[[[+++aaa333SSS iiiccc{{{aaa + + + + + +GGG###+++TTTVVV###[[[HHH\\\GGG+++222^^^<<<333SSS ccc{{{222^^^<<<444GGG### uuu;;;+++VVV<<>>WWWGGG###***]]]+++mmmUUU###***]]]___uuu______+++ !!!333SSS OOOxxx000ccc{{{!!!QQQGGG### 222:::+++ 666www CCC### 222:::___fffvvv___+++ EEE333SSS OOO(((ccc{{{EEELLLGGG###+++ ---###___gggCCC___+++ ttt333SSS OOOccc{{{tttGGG###+++ ^^^ ###___gggCCC___+++ \\\ccc@@@333SSS OOOrrr444ccc{{{\\\ccc@@@GGG###iii+++ gggfff###iii___gggCCC___+++ 333SSS OOO%%%ccc{{{ ))) ***GGG###+++%%%+++  +++###+++%%%___wwwccc___+++ bbb888333SSS OOOccc{{{bbb888VVVLLLGGG###eee+++ kkkhhh'''###eee___eeeuuu___+++  + + +333SSS OOOmmm111ccc{{{ + + +fffGGG###zzz+++  ###zzz___...???___+++ ddd111///333SSS OOO!!!ccc{{{ddd111///QQQ}}}GGG###''''''+++ DDD###''''''___ ___+++ """  + + +333SSS OOOccc{{{"""  + + +jjjGGG###}}}+++ XXXWWWyyy###}}}___QQQ;;;___+++___ttt'''333SSS OOOhhhccc{{{___ttt'''CCC555###ccc+++VVV???QQQ !!!###ccc___qqq___+++www000333SSS OOOccc{{{000###vvvFFF+++###999+++wwwNNN###999___TTTggg___+++wwwccc333SSS OOOccc{{{ccc~~~+++###+++wwwVVV999###___222___+++wwwlll=== &&&VVV333SSS OOOcccccc{{{lll=== &&&VVV!!!:::www+++###JJJ +++www擓VVV''' KKK###JJJ ___''' ---ZZZ```]]]BBB ~~~xxxTTT$$$HHH\\\BBB%%%``` + + +HHH555555...kkkRRRGGG)))nnn)))KKKcccgggCCCڋJJJ$$$###SSSkkk--- """[[[SSSkkkYYY$$$EEE[[[HHHcccgggCCC'''\\\^^^XXXSSSkkk [[[cccgggCCCLLL###333<<>>AAAڙ%%%QQQ☘kkk uuuܖ"""bbbcccooo}}} YYY555***;;;kkk ###QQQccc:::LLLЗ444[[[ ??? + + +kkkqqq%%%===ccc HHHyyyDDD!!!___jjjkkk^^^ccc@@@)))mmmkkkcccuuu___666wwwܩllluuu!!!kkkcccfffvvvқtttSSS/// YYY ;;;kkktttcccgggCCC%%%OOOGGGkkk|||kkkcccgggCCCFFF+++\\\ZZZ<<<kkkcccgggCCC]]]TTTooo$$$kkkcccwwwccc"""kkkuuuccceeeuuunnnkkkccc...???zzzBBB )))kkk[[[}}},,,FFFccc ]]]444333TTTkkkcccQQQ;;;KKK㺺fff rrrYYY祥***lllJJJ㱱PPPcccqqqsssVVV---UUU777PPPqqqjjjvvvcccTTTgggmmm%%%fffZZZgggxxxccc222kkk///+++dddlll***!!!aaaXXX%%%)))YYYccc;;;cccQQQGGG XXX``` {{{趶RRRqqq + + + 555쐐KKK + + +111hhh777KKKfffrrrlll777KKK苋DDD + + +###XXX333###OOOsssPPP777KKK[[[$$$ EEE333###OOOvvvvvvuuuHHH333 &&&XXXeeeaaa777KKK'''222777KKKyyy222333###OOOsssvvv000###777KKK///333###OOOkkk kkk 333kkkppp777222777;;;:::YYY333###OOO鿿222777QQQfff 333###OOO%%%vvv333###ᙙ+++999777TTTiii777JJJ{{{&&&333###OOOCCC{{{rrr777ttt333###OOO,,,nnn,,,nnnNNNMMMuuu333###%%%TTT444777BBBiii777iiiyyy333###OOOCCCcccPPP777(((MMM333###OOO[[['''jjjnnn[[['''jjjnnn333###XXX + + +777 000777,,, OOO333###OOOCCCAAA777YYY 333###OOO[[[~~~ [[[~~~ &&&;;;333###sssKKK999777???;;; 777^^^333###OOOCCCTTT777777333###OOO[[[eeexxx[[[eeexxx'''===333###IIIDDDUUU777ccc>>>777333###OOOCCCPPP}}}777ttt333###OOO[[[---jjjѧ[[[---jjjѧiii333###555 + + +$$$777&&&ggg777www333###OOOCCClll777lllHHH333###OOO[[[***[[[***aaa=== 333###҅ )))999777 777 hhh333###OOOCCC666FFF777 +++```333###OOO[[[\\\[[[\\\ VVV333555777 666RRR777 kkk333###OOOCCC222===777 ҇FFF 333###OOO[[[222iii [[[222iii ...333\\\777qqq&&&777555###MMM CCCxxx777Ɇ///555###MMM [[[666[[[666999XXX333###aaaWWW|||999777RRR + + +777===BBBCCC sss777!!!===BBB[[[SSS[[[SSS\\\333###:::777111777TTT[[[CCCQQQ+++&&&CCC]]]***777sss)))mmmQQQ+++&&&[[[888hhh [[[888hhh ttt333###VVVYYY777;;;VVVrrr777>>> ,,,wwwKKKCCC[[[<<<777vvvwwwKKK[[[BBB[[[BBBbbb333###777wwwrrr777gggsssxxxCCC%%%||| 777eeeeeexxx[[[KKK[[[KKK~~~"""333###777111777GGG|||,,,CCC***777666sssfff[[[>>>]]][[[>>>]]]LLLEEE333###SSS[[[777 + + +sssLLL777)))444LLLGGG^^^CCC555iii777lll999$$$GGG^^^[[[MMM[[[MMMBBBOOO333###777@@@///777RRR<<< CCCNNN777>>>BBB [[[[[[333###!!!777~~~SSS777ooo444 + + +$$$]]]싋DDD;;;CCC111777mmm000(((ccc싋DDD;;;[[[CCC + + +[[[CCC + + +%%%|||333###999uuussssssjjjjjjjjjPPPrrrsssPPPrrrssssssxxx???xxx???xxx??? sss 鿿222sssKKK???KKK???KKK???sssCCCsssKKK???KKK???KKK???sssssssssCCCsssKKK???KKK???KKK???ooosssoooCCChhh!!!JJJOOO///qqq>>> ;;;sssXXX888bbb&&& 999oooaaafff??? 111aaa!!!666ggg LLLUUU + + +***sss 777aaa!!!666OOO///qqqaaa!!!666ggg LLLOOO///qqqhhh!!!JJJaaa!!!666CCCGGGggg LLL>>> ;;;sssggg LLL@@@ ---oooKKKOOO///qqqXXX888CCCGGGXXX888ooo 777CCCfff + + +eeeFFFwwwsss^^^www777ooo bbbrrr}}} ~~~ fff}}}eee}}} ~~~eeefff + + +}}}CCCGGG ~~~FFFwwwsss ~~~KKK111\\\KKKeee^^^CCCGGG^^^ooofffCCCﲲqqq쯯TTTttt===sss```QQQ蔔HHH{{{RRRooooooccc,,,JJJ죣KKK??? 򤤤 VVVJJJ죣qqq쯯JJJ죣KKK???qqq쯯KKK???ﲲJJJ죣oooTTTttt===sss^^^lllCCCKKKqqq쯯```QQQ蔔ooo```QQQ蔔oooVVVCCCYYY===444999sssdddNNNRRRyyyooo&&&___\\\ ggg + + +KKK??? ___ \\\ gggYYY===444\\\ ggg + + +KKK???YYY===444KKK???\\\ gggooo + + +999sss + + +}}}GGGKKKYYY===444dddNNNooodddNNNooo CCC]]]cccPPP'''hhhsss___ ttt444ooo[[[```~~~xxx ___eeeKKK???WWWppp nnn~~~xxx PPP'''~~~xxx ___eeeKKK???PPP'''KKK???]]]ccc~~~xxx ooo___eeehhhsss___eeejjjKKKPPP'''___ tttooo___ tttooonnnCCC444VVV'''&&&SSSsssTTTKKK---ooovvv{{{>>>999)))]]]KKK???>>>999'''>>>999)))]]]KKK???'''KKK???444VVV>>>999ooo)))]]]&&&SSSsss)))]]]"""]]]000KKK'''TTToooTTToooCCC777SSS777]]] + + +vvv===sss|||///RRRooowww{{{///[[[KKK???777]]]///[[[KKK???777]]]KKK???777SSSooo///[[[ + + +vvv===sss///[[[XXXKKK777]]]|||ooo|||oooCCC777SSS KKKsss{{{\\\ooowww{{{ ///[[[KKK???  ///[[[KKK??? KKK???777SSS ooo///[[[KKKsss///[[[cccKKK {{{ooo{{{oooCCC777SSSЊHHH + + + uuuCCCsss|||'''RRRooowww{{{aaa///[[[KKK???aaaЊHHH + + +aaa///[[[KKK???ЊHHH + + +KKK???777SSSaaaooo///[[[ uuuCCCsss///[[[XXXKKKЊHHH + + +|||ooo|||oooCCC777SSS888)))NNN!!!sss SSSDDD000CCC'''ooowww{{{444333///[[[KKK???444333888444333///[[[KKK???888KKK???777SSS444333ooo///[[[)))NNN!!!sss///[[[[[[333KKK888 SSSooo SSSoooCCC777SSSJJJ!!!PPPqqq sss]]] qqqooowww{{{rrr///[[[LLL???tttiiirrrJJJ!!!PPPrrr///[[[LLL???JJJ!!!PPPLLL???777SSSrrrooo///[[[qqq sss///[[[]]]KKKJJJ!!!PPP]]] qqqooo]]] qqqoooCCC777SSS)))***+++BBBjjjLLLssscccCCC^^^xxxSSSJJJooowww{{{AAAjjj+++///[[[QQQ::: tttAAAjjj+++)))***+++BBBAAAjjj+++///[[[QQQ:::)))***+++BBBQQQ:::777SSSAAAjjj+++ooo///[[[jjjLLLsss///[[[XXXKKK)))***+++BBBcccCCCooocccCCCSSSJJJoooCCC777SSS<<>><<<&&&PPPhhhYYY oooYYY mmm[[[oooCCC777SSSTTT ###cccIII%%%vvvCCCsssUUU<<>> ;;;sssooobbb&&& 999aaa!!!666>>> ;;;sssggg LLL""" !!!&&&III333!!!&&& {{{}}}www777fff + + + ~~~www777666ggg666ggg ~~~666ggg ___eeeeeewww777^^^ ~~~FFFwwwsssooowww777}}}FFFwwwsss ~~~$$$777``` gggxxx```  {{{KKK???JJJ죣HHH{{{RRRﲲHHH{{{RRRGGGߜGGGߜGGGߜ^^^hhh(((qqq쯯qqq쯯HHH{{{RRR```QQQ蔔TTTttt===sssoooHHH{{{RRRKKK???JJJ죣TTTttt===sssxxxYYY MMMFFF222 MMM<<<iii {{{KKK???\\\ gggRRRyyy + + +RRRyyy}}}}}} + + +}}}YYY===444YYY===444RRRyyydddNNN + + +999sssoooRRRyyyKKK???\\\ ggg999sss + + +MMMUUU222BBBUUU%%% {{{KKK???~~~xxx 444]]]ccc___eee444...hhh...hhh___eee...hhhVVVPPP'''PPP'''444___ ttt___eeehhhsssooo444KKK???~~~xxx hhhsss___eee:::|||ccc***KKK|||ccc {{{KKK???>>>999KKK---444VVV)))]]]KKK---dddddd)))]]]dddxxx + + +XXX???''''''KKK---TTT)))]]]&&&SSSsssoooKKK---KKK???>>>999&&&SSSsss)))]]]999}}}:::---LLL}}}:::'''}}} {{{KKK???///RRR777SSS///[[[///RRRşnnn///cccşnnn///ccc///[[[şnnn///ccc + + +777]]]777]]]///RRR|||///[[[ + + +vvv===sssooo///RRRKKK??? + + +vvv===sss///[[[MMMfff999999CCCfff999hhh333 {{{KKK??? \\\777SSS///[[[\\\CCC cccCCC ccc///[[[CCC ccc555MMM  \\\{{{///[[[KKKsssooo\\\KKK??? KKKsss///[[[wwwZZZEEEJJJHHH444EEEJJJ {{{KKK???aaa'''RRR777SSS///[[['''RRR777+++SSSccc777+++SSSccc///[[[777+++SSScccIIIVVVЊHHH + + +ЊHHH + + +'''RRR|||///[[[ uuuCCCsssooo'''RRRKKK???aaa uuuCCCsss///[[[(((+++gggnnnxxx+++ggg[[[KKK???444333DDD000CCC'''777SSS///[[[DDD000CCC'''mmmiii cccmmmiii ccc///[[[mmmiii ccc888888DDD000CCC''' SSS///[[[)))NNN!!!sssoooDDD000CCC'''KKK???444333)))NNN!!!sss///[[[hhhtttGGG555[[[LLL???rrr777SSS///[[[:::aaa:::aaa///[[[:::aaaOOOXXXJJJ!!!PPPJJJ!!!PPP]]] qqq///[[[qqq sssoooLLL???rrrqqq sss///[[[rrr   {{{QQQ:::AAAjjj+++^^^xxx777SSS///[[[^^^xxx;;;KKK[[[;;;KKK[[[///[[[;;;KKK[[[]]]mmm)))***+++BBB)))***+++BBB^^^xxxcccCCC///[[[jjjLLLsssSSSJJJooo^^^xxxQQQ:::AAAjjj+++jjjLLLsss///[[[EEE{{{999000FFFnnn000 {{{eeettt888XXXOOOyyyVVVkkkwww777SSS///[[[yyyVVVkkkwwwLLLrrr稨222OOOLLLrrr稨222OOO///[[[LLLrrr稨222OOO<<>> ;;;sssaaa!!!666 KKK+++}}} ___fff + + +KKK111\\\KKKfff + + +KKK111\\\KKKfff + + +eee ~~~www777oooFFFwwwsss}}} KKK***JJJ죣^^^hhh(((ﲲ^^^lllCCCKKKﲲ^^^lllCCCKKKﲲKKK???qqq쯯HHH{{{RRRoooTTTttt===sssJJJ죣 KKK(((\\\ ggg}}}GGGKKK}}}GGGKKKKKK???YYY===444 + + +RRRyyyooo999sss\\\ ggg 444'''&&&~~~xxx VVV]]]cccjjjKKK]]]cccjjjKKK]]]cccKKK???PPP'''___eee444ooohhhsss~~~xxx qqq]]]>>>999xxx + + +XXX???444VVV"""]]]000KKK444VVV"""]]]000KKK444VVVKKK???''')))]]]KKK---ooo&&&SSSsss>>>999 --- + + +777SSSXXXKKK777SSSXXXKKK777SSSKKK???777]]]///[[[///RRRooo + + +vvv===sss + + + 555MMM777SSScccKKK777SSScccKKK777SSSKKK??? ///[[[\\\oooKKKsss KKK000aaaIIIVVV777SSSXXXKKK777SSSXXXKKK777SSSKKK???ЊHHH + + +///[[['''RRRooo uuuCCCsssaaa KKKCCCkkk444333777SSS[[[333KKK777SSS[[[333KKK777SSSKKK???888///[[[DDD000CCC'''ooo)))NNN!!!sss444333 KKKrrrOOOXXX777SSS]]]KKK777SSS]]]KKK777SSSLLL???JJJ!!!PPP///[[[oooqqq sssrrr KKKAAAjjj+++]]]mmm777SSSXXXKKK777SSSXXXKKK777SSSQQQ:::)))***+++BBB///[[[^^^xxxSSSJJJooojjjLLLsssAAAjjj+++ KKK]]];;;ttt888XXXOOO777SSSXXXqqqSSSLLL777SSSXXXqqqSSSLLL777SSSeee<<>><<<&&&PPP777SSS>>><<<&&&PPP777SSShhh///[[[iii\\\mmm[[[oooccc%%%...sssmmmHHH KKK___!!!***[[[777SSS999...///ZZZ777SSS999...///ZZZ777SSS%%%TTT ###ccc///[[[VVVEEEvvv'''FFF[[[oooIII%%%vvvCCCsss___!!!***[[["""ooo"""ooosssAAAttt000pppttt000pppssssssɰ///%%% + + +%%%hhhppp + + +%%%hhhppp^^^^^^555nnn))) + + +###XXXnnn))) + + +###XXX[[[$$$ EEEjjjPPPrrrjjj///xxx??? xxx???QQQfff KKK???KKK???tttKKK???sssKKK???(((MMMKKK???oooKKK???YYY nnnOOO///qqqaaa!!!666aaafff??? 111UUU + + +***sssaaa!!!666 777...GGGXXX888 777aaafff??? 111TTT SSSggg LLLbbb&&& 999aaa!!!666aaafff??? 111UUU + + +***sss 777XXX888aaa!!!666aaafff??? 111aaa!!!666ggg LLLOOO///qqq777 ___eee}}} bbbrrr }}}fff---GGG^^^fff bbbrrr666ggg ~~~www777}}} bbbrrr fff^^^}}} bbbrrr}}} ~~~eeettt^^^hhh(((qqq쯯KKK???JJJ죣oooccc,,, 򤤤 JJJ죣VVVooo```QQQ蔔VVVoooccc,,,GGGߜHHH{{{RRRJJJ죣oooccc,,, 򤤤 VVV```QQQ蔔FFF///CCC>>>JJJ죣oooccc,,,JJJ죣KKK???qqq쯯lllHHHYYY===444KKK???\\\ ggg&&&___ ___\\\ ggg ooodddNNN &&&___}}} + + +RRRyyy\\\ ggg&&&___ ___ dddNNN\\\ ggg&&&___\\\ ggg + + +KKK???YYY===444+++```VVVPPP'''KKK???~~~xxx [[[```WWWppp ~~~xxx nnnooo___ tttnnn[[[```...hhh___eee444~~~xxx [[[```WWWppp nnn___ ttt ~~~ ~~~xxx [[[```~~~xxx ___eeeKKK???PPP'''҇FFF xxx + + +XXX???'''KKK???>>>999vvv{{{>>>999oooTTTvvv{{{ddd)))]]]KKK--->>>999vvv{{{TTT```%%%777XXX>>>999vvv{{{>>>999)))]]]KKK???'''Ɇ/// + + +777]]]KKK???www{{{ooo|||www{{{şnnn///ccc///[[[///RRRwww{{{|||www{{{///[[[KKK???777]]]!!!555MMM KKK??? www{{{ ooo{{{www{{{CCC ccc///[[[\\\ www{{{{{{uuu www{{{ ///[[[KKK??? sss)))mmmIIIVVVЊHHH + + +KKK???aaawww{{{aaaooo|||www{{{777+++SSSccc///[[['''RRRaaawww{{{|||{{{+++qqqaaawww{{{aaa///[[[KKK???ЊHHH + + +vvv888KKK???444333www{{{444333ooo SSSwww{{{mmmiii ccc///[[[DDD000CCC'''444333www{{{ SSS444333www{{{444333///[[[KKK???888eeeeeeOOOXXXJJJ!!!PPPLLL???rrrwww{{{tttiiirrrooo]]] qqqwww{{{:::aaa///[[[rrrwww{{{tttiii]]] qqq777{{{}}}---rrrwww{{{rrr///[[[LLL???JJJ!!!PPP666sssfff]]]mmm)))***+++BBBQQQ:::AAAjjj+++www{{{ tttAAAjjj+++ooocccCCCwww{{{;;;KKK[[[///[[[^^^xxxAAAjjj+++www{{{ tttcccCCC...---AAAjjj+++www{{{AAAjjj+++///[[[QQQ:::)))***+++BBBlll999$$$<<>>BBBiii'''hhhmmmHHHwww{{{RRR'''mmmHHHoooYYY www{{{```XXX***///[[[iii\\\mmmHHHwww{{{RRR'''YYY QQQGGGmmmHHHwww{{{mmmHHH///[[[hhhmmm000(((cccTTT ###ccc%%%___!!!***[[[www{{{KKK444___!!!***[[[oooUUU<<<www{{{jjj### &&&fff///[[[VVVEEE___!!!***[[[www{{{KKK444UUU<<<___!!!***[[[www{{{___!!!***[[[///[[[%%%TTT ###cccsssAAAɰ///%%%555[[[$$$ EEEjjjjjj///xxx???xxx???QQQfff KKK???KKK???tttKKK???KKK???(((MMMKKK???KKK???YYY aaa!!!666bbb&&& 999ooo 777Ǣnnnaaa!!!666ggg LLLhhh!!!JJJTTT SSSggg LLLbbb&&& 999aaa!!!666aaafff??? 111aaa!!!666ggg LLLOOO///qqq777}}}www777ooofff ___}}} ~~~fff + + +666ggg ~~~www777}}} bbbrrr}}} ~~~eeetttJJJ죣HHH{{{RRRoooVVVKKK???^^^hhh(((JJJ죣ﲲGGGߜHHH{{{RRRJJJ죣oooccc,,,JJJ죣KKK???qqq쯯lllHHH\\\ gggRRRyyyooo KKK???\\\ ggg + + +}}} + + +RRRyyy\\\ ggg&&&___\\\ ggg + + +KKK???YYY===444+++```~~~xxx 444ooonnnKKK???VVV~~~xxx ___eee]]]ccc...hhh___eee444~~~xxx [[[```~~~xxx ___eeeKKK???PPP'''҇FFF >>>999KKK---oooKKK???xxx + + +XXX???>>>999)))]]]444VVVddd)))]]]KKK--->>>999vvv{{{>>>999)))]]]KKK???'''Ɇ//////RRRoooKKK??? + + +///[[[777SSSşnnn///ccc///[[[///RRRwww{{{///[[[KKK???777]]]!!! \\\oooKKK???555MMM ///[[[777SSSCCC ccc///[[[\\\ www{{{ ///[[[KKK??? sss)))mmmaaa'''RRRoooKKK???IIIVVVaaa///[[[777SSS777+++SSSccc///[[['''RRRaaawww{{{aaa///[[[KKK???ЊHHH + + +vvv444333DDD000CCC'''oooKKK???444333///[[[777SSSmmmiii ccc///[[[DDD000CCC'''444333www{{{444333///[[[KKK???888eeeeeerrroooLLL???OOOXXXrrr///[[[777SSS:::aaa///[[[rrrwww{{{rrr///[[[LLL???JJJ!!!PPP666sssfffAAAjjj+++^^^xxxSSSJJJoooQQQ:::]]]mmmAAAjjj+++///[[[777SSS;;;KKK[[[///[[[^^^xxxAAAjjj+++www{{{AAAjjj+++///[[[QQQ:::)))***+++BBBlll999$$$ttt888XXXOOOyyyVVVkkkwwwMMMrrrDDD oooeeettt888XXXOOO///[[[777SSSLLLrrr稨222OOO///[[[yyyVVVkkkwwwttt888XXXOOOwww{{{ttt888XXXOOO///[[[eee<<>>BBBmmmHHHiii\\\mmm[[[oooiii'''mmmHHH///[[[777SSS```XXX***///[[[iii\\\mmmHHHwww{{{mmmHHH///[[[hhhmmm000(((ccc___!!!***[[[VVVEEEvvv'''FFF[[[ooo%%%___!!!***[[[///[[[777SSSjjj### &&&fff///[[[VVVEEE___!!!***[[[www{{{___!!!***[[[///[[[%%%TTT ###cccsssAAAɰ///%%%555333###OOOsssjjj333###OOOsssxxx???333###OOOsssKKK???333###OOOsssKKK???333###OOOsssKKK???333###OOOOOO///qqqaaa!!!666 777aaa!!!666(((UUU + + +***sssaaa!!!666 777aaa!!!666ggg LLLbbb&&& 999aaa!!!666oooUUU + + +***sss>>> ;;;sssTTT SSSaaa!!!666OOO///qqq333###OOOeee}}}fff}}}```ooo }}}fff}}} ~~~www777}}}ooo FFFwwwsss666ggg}}}eee333###OOOqqq쯯JJJ죣VVVJJJ죣OOO^^^ 򤤤 JJJ죣VVVJJJ죣HHH{{{RRRJJJ죣ooo 򤤤 TTTttt===sssGGGߜKKK???JJJ죣qqq쯯333###OOOYYY===444\\\ ggg \\\ ggg###""" ___\\\ ggg \\\ ggg + + +RRRyyy\\\ gggooo ___999sss}}}KKK???\\\ gggYYY===444333###OOOPPP'''~~~xxx nnn~~~xxx cccWWWppp ~~~xxx nnn~~~xxx ___eee444~~~xxx oooWWWppp hhhsss...hhhKKK???~~~xxx PPP'''333###OOO'''>>>999>>>999ZZZ>>>999>>>999)))]]]KKK--->>>999ooo&&&SSSsssdddKKK???>>>999'''555###MMM 777]]]999///[[[///RRRooo + + +vvv===sssşnnn///cccKKK???777]]]===BBB  IIIOOO ///[[[\\\ oooKKKsssCCC cccKKK??? QQQ+++&&&ЊHHH + + +aaaaaaaaaaaa///[[['''RRRaaaooo uuuCCCsss777+++SSScccKKK???aaaЊHHH + + +wwwKKK888444333444333EEEDDD444333444333///[[[DDD000CCC'''444333ooo)))NNN!!!sssmmmiii cccKKK???444333888xxxJJJ!!!PPPrrrrrrFFF EEEtttiiirrrrrr///[[[rrroootttiiiqqq sss:::aaaLLL???rrrJJJ!!!PPP)))***+++BBBAAAjjj+++AAAjjj+++ tttAAAjjj+++AAAjjj+++///[[[^^^xxxAAAjjj+++SSSJJJooo tttjjjLLLsss;;;KKK[[[QQQ:::AAAjjj+++)))***+++BBBGGG^^^<<>>}}}www'''ggg LLL@@@ ---oooKKKUUU + + +***sssTTT SSS@@@ ---oooKKKaaa!!!666OOO///qqq>>> ;;;sssaaa!!!666TTT SSSXXX888ooo 777aaafff??? 111aaa!!!666hhh!!!JJJXXX888>>> ;;;sssXXX888XXX888@@@ ---oooKKKnnnCCCfff + + +}}}CCCGGG^^^^^^xxx>>>///ttt ~~~KKK111\\\KKK 666gggKKK111\\\KKK}}}eeeFFFwwwsss}}}666ggg^^^ooofff bbbrrr}}}fff + + +^^^FFFwwwsss^^^^^^KKK111\\\KKK ___CCCﲲJJJ죣ooo```QQQ蔔```QQQ蔔666^^^lllCCCKKK 򤤤 GGGߜ^^^lllCCCKKKJJJ죣qqq쯯TTTttt===sssJJJ죣KKK???GGGߜ```QQQ蔔oooVVVoooccc,,,JJJ죣KKK???ﲲ```QQQ蔔TTTttt===sss```QQQ蔔```QQQ蔔^^^lllCCCKKK^^^hhh(((CCC\\\ gggooodddNNNdddNNNuuu  + + +}}}GGGKKK ___}}}}}}GGGKKK\\\ gggYYY===444999sss\\\ gggKKK???}}}dddNNNooo &&&___\\\ gggKKK???dddNNN999sssdddNNNdddNNN}}}GGGKKKCCC]]]ccc~~~xxx ooo___ ttt___ ttthhh222<<<<<<XXX\\\___eeejjjKKKWWWppp ...hhhjjjKKK~~~xxx PPP'''hhhsss~~~xxx KKK???...hhh___ tttooonnn[[[```~~~xxx KKK???]]]ccc___ ttthhhsss___ ttt___ tttjjjKKKVVVCCC444VVV>>>999oooTTTTTTppp}}}pppDDD)))]]]"""]]]000KKKddd"""]]]000KKK>>>999'''&&&SSSsss>>>999KKK???dddTTTooovvv{{{>>>999KKK???444VVVTTT&&&SSSsssTTTTTT"""]]]000KKKxxx + + +XXX???CCC777SSSooo|||||| + + +... + + +///[[[XXXKKKşnnn///cccXXXKKK777]]] + + +vvv===sssKKK???şnnn///ccc|||ooowww{{{KKK???777SSS||| + + +vvv===sss||||||XXXKKK + + +CCC777SSS ooo{{{{{{LLLiiiDDD///[[[cccKKKCCC ccccccKKK KKKsss KKK???CCC ccc{{{ooowww{{{ KKK???777SSS{{{KKKsss{{{{{{cccKKK555MMMCCC777SSSaaaooo||||||&&&AAA...999///[[[XXXKKK777+++SSScccXXXKKKaaaЊHHH + + + uuuCCCsssaaaKKK???777+++SSSccc|||ooowww{{{aaaKKK???777SSS||| uuuCCCsss||||||XXXKKKIIIVVVCCC777SSS444333ooo SSS SSSiiirrrFFF///[[[[[[333KKKmmmiii ccc[[[333KKK444333888)))NNN!!!sss444333KKK???mmmiii ccc SSSooowww{{{444333KKK???777SSS SSS)))NNN!!!sss SSS SSS[[[333KKKCCC777SSSrrrooo]]] qqq]]] qqq000''' ,,,///[[[]]]KKKtttiii:::aaa]]]KKKrrrJJJ!!!PPPqqq sssrrrLLL???:::aaa]]] qqqooowww{{{rrrLLL???777SSS]]] qqqqqq sss]]] qqq]]] qqq]]]KKKOOOXXXCCC777SSSAAAjjj+++ooocccCCCcccCCC|||555 + + +yyy///[[[XXXKKK ttt;;;KKK[[[XXXKKKAAAjjj+++)))***+++BBBjjjLLLsssAAAjjj+++QQQ:::;;;KKK[[[cccCCCSSSJJJooowww{{{AAAjjj+++QQQ:::777SSScccCCCjjjLLLssscccCCCcccCCCXXXKKK]]]mmmCCC777SSSttt888XXXOOOooo^^^QQQ蕕^^^QQQ蕕EEE===///[[[XXXqqqSSSLLLLLLrrr稨222OOOXXXqqqSSSLLLttt888XXXOOO<<>><<<&&&PPPRRR'''```XXX***>>><<<&&&PPPmmmHHHhhhccc%%%...sssmmmHHH```XXX***YYY mmm[[[ooowww{{{mmmHHH777SSSYYY ccc%%%...sssYYY YYY >>><<<&&&PPPiii'''CCC777SSS___!!!***[[[oooUUU<<<UUU<<>> ;;;sss 777aaa!!!666bbb&&& 999XXX888aaafff??? 111aaafff??? 111aaa!!!666ggg LLL>>> ;;;sssTTT SSSXXX888ggg LLLOOO///qqq666ggg ~~~FFFwwwsssfff}}}www777^^^ bbbrrr bbbrrr}}} ~~~FFFwwwsss666ggg^^^ ~~~eeeGGGߜTTTttt===sssVVVJJJ죣HHH{{{RRR```QQQ蔔oooccc,,,oooccc,,,JJJ죣TTTttt===sssGGGߜKKK???```QQQ蔔qqq쯯}}} + + +999sss \\\ gggRRRyyydddNNN&&&___&&&___\\\ ggg + + +999sss}}}KKK???dddNNN + + +YYY===444...hhh___eeehhhsssnnn~~~xxx 444___ ttt[[[```[[[```~~~xxx ___eeehhhsss...hhhKKK???___ ttt___eeePPP'''ddd)))]]]&&&SSSsss>>>999KKK---TTTvvv{{{vvv{{{>>>999)))]]]&&&SSSsssdddKKK???TTT)))]]]'''şnnn///ccc///[[[ + + +vvv===sss///RRR|||www{{{www{{{///[[[ + + +vvv===sssşnnn///cccKKK???|||///[[[777]]]CCC ccc///[[[KKKsss \\\{{{www{{{www{{{ ///[[[KKKsssCCC cccKKK???{{{///[[[ 777+++SSSccc///[[[ uuuCCCsssaaa'''RRR|||www{{{www{{{aaa///[[[ uuuCCCsss777+++SSScccKKK???|||///[[[ЊHHH + + +mmmiii ccc///[[[)))NNN!!!sss444333DDD000CCC''' SSSwww{{{www{{{444333///[[[)))NNN!!!sssmmmiii cccKKK??? SSS///[[[888:::aaa///[[[qqq sssrrr]]] qqqwww{{{www{{{rrr///[[[qqq sss:::aaaLLL???]]] qqq///[[[JJJ!!!PPP;;;KKK[[[///[[[jjjLLLsssAAAjjj+++^^^xxxcccCCCwww{{{www{{{AAAjjj+++///[[[jjjLLLsss;;;KKK[[[QQQ:::cccCCC///[[[)))***+++BBBLLLrrr稨222OOO///[[[rrr^^^LLLsssttt888XXXOOOyyyVVVkkkwww^^^QQQ蕕www{{{www{{{ttt888XXXOOO///[[[rrr^^^LLLsssLLLrrr稨222OOOeee^^^QQQ蕕///[[[<<> +stream + +mntrRGB XYZ acspAPPL- +desc|cprtx(wtptbkptrXYZgXYZbXYZrTRC gTRC bTRC desc"Artifex Software sRGB ICC Profile"Artifex Software sRGB ICC ProfiletextCopyright Artifex Software 2011XYZ QXYZ XYZ o8XYZ bXYZ $curv +#(-27;@EJOTY^chmrw| %+28>ELRY`gnu| &/8AKT]gqz !-8COZfr~ -;HUcq~ +:IXgw'7HYj{+=Oat 2FZn  % : O d y + +' += +T +j + + + + + + " 9 Q i  * C \ u & @ Z t .Id %A^z &Ca~1Om&Ed#Cc'Ij4Vx&IlAe@e Ek*Qw;c*R{Gp@j>i  A l !!H!u!!!"'"U"""# +#8#f###$$M$|$$% %8%h%%%&'&W&&&''I'z''( (?(q(())8)k))**5*h**++6+i++,,9,n,,- -A-v--..L.../$/Z///050l0011J1112*2c223 3F3334+4e4455M555676r667$7`7788P8899B999:6:t::;-;k;;<' >`>>?!?a??@#@d@@A)AjAAB0BrBBC:C}CDDGDDEEUEEF"FgFFG5G{GHHKHHIIcIIJ7J}JK KSKKL*LrLMMJMMN%NnNOOIOOP'PqPQQPQQR1R|RSS_SSTBTTU(UuUVV\VVWDWWX/X}XYYiYZZVZZ[E[[\5\\]']x]^^l^__a_``W``aOaabIbbcCccd@dde=eef=ffg=ggh?hhiCiijHjjkOkklWlmm`mnnknooxop+ppq:qqrKrss]sttptu(uuv>vvwVwxxnxy*yyzFz{{c{|!||}A}~~b~#G +k͂0WGrׇ;iΉ3dʋ0cʍ1fΏ6n֑?zM _ɖ4 +uL$h՛BdҞ@iءG&vVǥ8nRĩ7u\ЭD-u`ֲK³8%yhYѹJº;.! +zpg_XQKFAǿ=ȼ:ɹ8ʷ6˶5̵5͵6ζ7ϸ9к<Ѿ?DINU\dlvۀ܊ݖޢ)߯6DScs 2F[p(@Xr4Pm8Ww)Km +endstream +endobj + +7 0 obj +[/ICCBased 6 0 R] +endobj + +8 0 obj +<> +stream + +q +595 0 0 818.125 0 11.9375 cm +/fzImg0 Do +Q + +endstream +endobj + +9 0 obj +<>>> +endobj + +10 0 obj +<> +endobj + +11 0 obj +<>/Width 800/Height 1100/BitsPerComponent 8/ColorSpace 7 0 R/Length 2640000>> +stream +''' [[[+++܋PPP$$$%%%TTT333SSS ccc{{{܋PPP$$$%%%TTTGGG### !!!AAAttt+++---ZZZ㛛TTT/// JJJ### !!!AAAttt[[[]]]BBB[[[+++FFFfff333SSS ccc{{{FFFfff]]]aaaGGG###;;;+++ UUU888###;;;[[[~~~xxx[[[+++aaa333SSS iiiccc{{{aaa + + + + + +GGG###+++TTTVVV###[[[HHH\\\GGG+++222^^^<<<333SSS ccc{{{222^^^<<<444GGG### uuu;;;+++VVV<<>>WWWGGG###***]]]+++mmmUUU###***]]]___uuu______+++ !!!333SSS OOOxxx000ccc{{{!!!QQQGGG### 222:::+++ 666www CCC### 222:::___fffvvv___+++ EEE333SSS OOO(((ccc{{{EEELLLGGG###+++ ---###___gggCCC___+++ ttt333SSS OOOccc{{{tttGGG###+++ ^^^ ###___gggCCC___+++ \\\ccc@@@333SSS OOOrrr444ccc{{{\\\ccc@@@GGG###iii+++ gggfff###iii___gggCCC___+++ 333SSS OOO%%%ccc{{{ ))) ***GGG###+++%%%+++  +++###+++%%%___wwwccc___+++ bbb888333SSS OOOccc{{{bbb888VVVLLLGGG###eee+++ kkkhhh'''###eee___eeeuuu___+++  + + +333SSS OOOmmm111ccc{{{ + + +fffGGG###zzz+++  ###zzz___...???___+++ ddd111///333SSS OOO!!!ccc{{{ddd111///QQQ}}}GGG###''''''+++ DDD###''''''___ ___+++ """  + + +333SSS OOOccc{{{"""  + + +jjjGGG###}}}+++ XXXWWWyyy###}}}___QQQ;;;___+++___ttt'''333SSS OOOhhhccc{{{___ttt'''CCC555###ccc+++VVV???QQQ !!!###ccc___qqq___+++www000333SSS OOOccc{{{000###vvvFFF+++###999+++wwwNNN###999___TTTggg___+++wwwccc333SSS OOOccc{{{ccc~~~+++###+++wwwVVV999###___222___+++wwwlll=== &&&VVV333SSS OOOcccccc{{{lll=== &&&VVV!!!:::www+++###JJJ +++www擓VVV''' KKK###JJJ ___''' ---ZZZ򕕕QQQ >>>~~~]]]BBB lll~~~xxxTTTkkk:::HHH\\\nnnՃVVV%%%``` + + + 555555UUUkkkRRRGGG>>><<<nnn))) [[[ooogggCCCڋJJJ$$$###SSSkkk--- """[[[SSSkkkYYY$$$EEEQQQ666}}}ooogggCCC'''\\\^^^XXXSSSkkk ???gggCCCLLL###333<<>>AAAڙ%%%QQQ☘kkk uuuܖ"""bbb\\\ooo}}} YYY555***;;;kkk ###QQQ))):::LLLЗ444[[[ ??? + + +kkkqqq%%%===  HHHyyyDDD!!!___jjjkkk^^^@@@)))mmmkkk]]]uuu___666wwwܩllluuu!!!kkk YYYfffvvvқtttSSS/// YYY ;;;kkktttcccgggCCC%%%OOOGGGkkk|||kkkzzzxxxgggCCCFFF+++\\\ZZZ<<<kkkSSSgggCCC]]]TTTooo$$$kkk<<< wwwccc"""kkkuuu333eeeuuunnnkkk:::&&&...???zzzBBB )))kkk[[[}}},,,FFFhhh,,, ]]]444333TTTkkkQQQ;;;KKK㺺fff rrrYYY祥***lllJJJ㱱PPP(((|||qqqsssVVV---UUU777PPPqqqjjjvvvTTTgggmmm%%%fffZZZgggxxx___222kkk///+++dddlll***!!!aaaXXX%%%)))YYY???;;;cccQQQGGG XXX``` {{{趶RRRqqq + + + 555쐐KKK + + +111hhhvvv777KKKsss???OOOKKK PPP''' + + +DDDKKK PPP KKKKKK PPP\\\+++ +++dddeeeaaakkk 777KKKsss???OOO''' KKK'''ppp777鿿222???OOOUUUvvvooo'''###HHHUUUvvvooo KKKUUUvvvoooppp sss뼼fff+++999,,,nnn777CCC???OOOjjj___888'''/// + + +jjj___888 KKKjjj___888;;;TTT444[[['''jjjnnn777CCC???OOOVVV)))'''///eeeVVV))) KKKVVV)))+++ + + +[[[~~~ 777CCC???OOO>>> + + +___'''///888>>> + + +___ KKK>>> + + +___iii444PPPKKK999[[[eeexxx777CCC???OOOaaa,,, + + +'''///aaa,,, + + + KKKaaa,,, + + +,,,DDDUUU[[[---jjjѧ777CCC???OOO|||'''///www||| KKK|||xxx + + +$$$[[[***777CCC???{{{'''///```{{{ KKK{{{)))999[[[\\\777 CCC???iii'''///LLLiii KKKiii555[[[222iii 777 CCC???;;;```'''///MMM``` KKK```[[[666777CCC???OOOqqq'''///]]]qqq KKKqqq|||999[[[SSS777CCC???OOO'''///~~~ KKK⿿SSS[[[888hhh 777CCC???OOOddd<<< '''///ddd<<< KKKddd<<< + + +///ooo[[[BBB777CCC???OOO??? + + +\\\'''///''' ??? + + +\\\ KKK??? + + +\\\___ooo[[[KKK777CCC???OOOMMM///'''///jjjMMM/// KKKMMM///%%%ooo[[[>>>]]]777CCC???OOOkkkZZZ@@@'''/// + + +kkkZZZ@@@ KKKkkkZZZ@@@333[[[MMM777CCC???OOOYYYЃjjj'''###===YYYЃjjj 888YYYЃjjj ^^^|||[[[777CCC???OOO''' kkk+++ ooo[[[CCC + + +777CCC???OOOIII RRR''' '''VVVIII RRR kkkIII RRRlll444'''XXXKKK PPPsssjjjsssxxx???UUUvvvooosssKKK???jjj___888sssKKK???VVV)))sssKKK???>>> + + +___ooo 777TTT SSSggg LLLTTT SSSnnnOOO///qqqOOO///qqqggg LLLXXX888aaa!!!666>>> ;;;ssshhh!!!JJJ 777aaa!!!666aaa!!!666UUU + + +***ssshhh!!!JJJTTT SSSOOO///qqqaaa!!!666OOO///qqqaaa,,, + + +ooofff666ggg ~~~666ggg ___eeeeee ~~~^^^}}}FFFwwwsssfff + + +fff}}}}}} fff + + +666gggeee}}}eee|||oooVVVGGGߜGGGߜ^^^hhh(((qqq쯯qqq쯯FFF///CCC>>>```QQQ蔔FFF///CCC>>>JJJ죣TTTttt===sssKKK???ﲲVVVJJJ죣JJJ죣 򤤤 ﲲGGGߜqqq쯯JJJ죣qqq쯯{{{ooo }}} + + +}}}YYY===444YYY===444 + + +dddNNN\\\ ggg999sssKKK??? \\\ ggg\\\ ggg ___}}}YYY===444\\\ gggYYY===444iiiooonnn...hhh___eee...hhhVVVPPP'''PPP'''___eee ~~~ ___ ttt ~~~ ~~~xxx hhhsssKKK???]]]cccnnn~~~xxx ~~~xxx WWWppp ]]]ccc...hhhPPP'''~~~xxx PPP'''```oooddd)))]]]dddxxx + + +XXX???'''''')))]]]```%%%777XXXTTT```%%%777XXX>>>999&&&SSSsssKKK???444VVV>>>999>>>999444VVVddd'''>>>999'''qqqoooşnnn///ccc///[[[şnnn///ccc + + +777]]]777]]]///[[[||| + + +vvv===sssKKK???777SSS777SSSşnnn///ccc777]]]777]]]oooCCC ccc///[[[CCC ccc555MMM  ///[[[uuu{{{uuu KKKsssKKK???777SSS 777SSSCCC ccc  ddd<<< ooo777+++SSSccc///[[[777+++SSScccIIIVVVЊHHH + + +ЊHHH + + +///[[[{{{+++qqq|||{{{+++qqqaaa uuuCCCsssKKK???777SSSaaaaaa777SSS777+++SSScccЊHHH + + +aaaЊHHH + + +??? + + +\\\ooommmiii ccc///[[[mmmiii ccc888888///[[[ SSS444333)))NNN!!!sssKKK???777SSS444333444333777SSSmmmiii ccc888444333888MMM///ooo:::aaa///[[[:::aaaOOOXXXJJJ!!!PPPJJJ!!!PPP///[[[777{{{}}}---]]] qqq777{{{}}}---rrrqqq sssLLL???777SSSrrrrrrtttiii777SSS:::aaaJJJ!!!PPPrrrJJJ!!!PPPkkkZZZ@@@SSSJJJooo;;;KKK[[[///[[[;;;KKK[[[]]]mmm)))***+++BBB)))***+++BBB///[[[...---cccCCC...---AAAjjj+++jjjLLLsssQQQ:::777SSSAAAjjj+++AAAjjj+++ ttt777SSS;;;KKK[[[)))***+++BBBAAAjjj+++)))***+++BBBYYYЃjjjMMMrrrDDD oooLLLrrr稨222OOO///[[[LLLrrr稨222OOO<<>>999KKK---KKK???TTT)))]]]'''///]]]şnnn///cccKKK???şnnn///ccc|||///RRRKKK???|||///[[['''///~~~CCC cccKKK???CCC ccc{{{ \\\KKK???{{{///[[['''///777+++SSScccKKK???777+++SSScccTTT[[[CCC|||aaa'''RRRKKK???|||///[[['''///''' mmmiii cccKKK???mmmiii ccc>>> ,,, SSS444333DDD000CCC'''KKK??? SSS///[[['''///jjj:::aaaLLL???:::aaagggsss]]] qqqrrrLLL???]]] qqq///[[['''/// + + +;;;KKK[[[QQQ:::;;;KKK[[[GGG|||,,,cccCCCAAAjjj+++^^^xxxQQQ:::cccCCC///[[['''###===LLLrrr稨222OOOeeeLLLrrr稨222OOO)))444LLL^^^QQQ蕕ttt888XXXOOOyyyVVVkkkwwweee^^^QQQ蕕///[[['''```XXX***```XXX***RRR<<>> ;;;sssaaafff??? 111aaa!!!666 777bbb&&& 999OOO///qqqCCCGGG 777XXX888aaafff??? 111UUU + + +***sss 777XXX888>>> ;;;sssooobbb&&& 999XXX888ggg LLLOOO///qqqnnnOOO///qqqaaa!!!666aaafff??? 111OOO///qqq,,,666gggfff + + +}}}fff}}}FFFwwwsss bbbrrr}}}fffwww777eeeCCCGGGfff^^^ bbbrrr fff^^^FFFwwwsssooowww777^^^ ~~~eee ___eee}}} bbbrrreeexxxGGGߜKKK???ﲲJJJ죣VVVJJJ죣TTTttt===sssoooccc,,,JJJ죣KKK???VVVHHH{{{RRRqqq쯯oooVVV```QQQ蔔oooccc,,, 򤤤 VVV```QQQ蔔TTTttt===sssoooHHH{{{RRRKKK???```QQQ蔔qqq쯯^^^hhh(((qqq쯯KKK???JJJ죣oooccc,,,qqq쯯}}}KKK???\\\ ggg \\\ ggg999sss&&&___\\\ gggKKK??? RRRyyyYYY===444ooo dddNNN&&&___ ___ dddNNN999sssoooRRRyyyKKK???dddNNN + + +YYY===444YYY===444KKK???\\\ ggg&&&___YYY===444...hhhKKK???]]]ccc~~~xxx nnn~~~xxx hhhsss[[[```~~~xxx KKK???nnn444PPP'''ooonnn___ ttt[[[```WWWppp nnn___ ttthhhsssooo444KKK???___ ttt___eeePPP'''VVVPPP'''KKK???~~~xxx [[[```PPP'''dddKKK???444VVV>>>999>>>999&&&SSSsssvvv{{{>>>999KKK???KKK---'''oooTTTvvv{{{TTT&&&SSSsssoooKKK---KKK???TTT)))]]]'''xxx + + +XXX???'''KKK???>>>999vvv{{{'''şnnn///cccKKK???777SSS + + +vvv===ssswww{{{KKK???///RRR777]]]ooo|||www{{{||| + + +vvv===sssooo///RRRKKK???|||///[[[777]]] + + +777]]]KKK???www{{{777]]]⿿SSSCCC cccKKK???777SSS KKKssswww{{{ KKK???\\\ ooo{{{www{{{{{{KKKsssooo\\\KKK???{{{///[[[ 555MMM KKK??? www{{{  + + +///ooo777+++SSScccKKK???777SSSaaaaaa uuuCCCssswww{{{aaaKKK???'''RRRЊHHH + + +ooo|||www{{{||| uuuCCCsssooo'''RRRKKK???|||///[[[ЊHHH + + +IIIVVVЊHHH + + +KKK???aaawww{{{ЊHHH + + +___ooommmiii cccKKK???777SSS444333444333)))NNN!!!ssswww{{{444333KKK???DDD000CCC'''888ooo SSSwww{{{ SSS)))NNN!!!sssoooDDD000CCC'''KKK??? SSS///[[[888888KKK???444333www{{{888%%%ooo:::aaaLLL???777SSSrrrrrrqqq ssswww{{{rrrLLL???JJJ!!!PPPooo]]] qqqwww{{{tttiii]]] qqqqqq sssoooLLL???]]] qqq///[[[JJJ!!!PPPOOOXXXJJJ!!!PPPLLL???rrrwww{{{JJJ!!!PPP333;;;KKK[[[QQQ:::777SSSAAAjjj+++AAAjjj+++jjjLLLssswww{{{AAAjjj+++QQQ:::^^^xxx)))***+++BBBooocccCCCwww{{{ tttcccCCCjjjLLLsssSSSJJJooo^^^xxxQQQ:::cccCCC///[[[)))***+++BBB]]]mmm)))***+++BBBQQQ:::AAAjjj+++www{{{)))***+++BBB ^^^|||LLLrrr稨222OOOeee777SSSttt888XXXOOOttt888XXXOOOrrr^^^LLLssswww{{{ttt888XXXOOOeeeyyyVVVkkkwww<<>>+++ III333>>> ;;;sssTTT SSSnnnUUU + + +***sssaaa!!!666 777XXX888>>> ;;;sss^^^}}}fff666ggg,,,JJJgggxxxFFFwwwsss666ggg ___ }}}fff^^^FFFwwwsss```QQQ蔔FFF///CCC>>>JJJ죣VVVGGGߜ999>>>FFF222TTTttt===sssGGGߜ^^^hhh((( 򤤤 JJJ죣VVV```QQQ蔔TTTttt===sssdddNNN\\\ ggg }}}fff + + +EEE222BBB999sss}}} ___\\\ ggg dddNNN999sss___ ttt ~~~ ~~~xxx nnn...hhhhhh***KKKhhhsss...hhhVVVWWWppp ~~~xxx nnn___ ttthhhsssTTT```%%%777XXX>>>999dddKKKVVV萐 + + +---LLL&&&SSSsssdddxxx + + +XXX???>>>999TTT&&&SSSsss|||şnnn///ccc???)))999CCC + + +vvv===sssşnnn///ccc + + +||| + + +vvv===sss{{{uuu CCC ccc::: 222HHH444KKKsssCCC ccc555MMM {{{KKKsss|||{{{+++qqqaaa777+++SSScccnnnxxx uuuCCCsss777+++SSScccIIIVVVaaa||| uuuCCCsss SSS444333mmmiii cccRRR777GGG555)))NNN!!!sssmmmiii ccc444333 SSS)))NNN!!!sss]]] qqq777{{{}}}---rrr:::aaa  qqq sss:::aaaOOOXXXtttiiirrr]]] qqqqqq ssscccCCC...---AAAjjj+++;;;KKK[[[UUUFFFnnnjjjLLLsss;;;KKK[[[]]]mmm tttAAAjjj+++cccCCCjjjLLLsss^^^QQQ蕕ttt888XXXOOOLLLrrr稨222OOOLLLFFF+++iii|||rrr^^^LLLsssLLLrrr稨222OOOttt888XXXOOO^^^QQQ蕕rrr^^^LLLsssYYY QQQGGGmmmHHH```XXX***,,,\\\ccc%%%...sss```XXX***iii'''RRR'''mmmHHHYYY ccc%%%...sssUUU<<<___!!!***[[[jjj### &&&fff333 !!!eeebbb???III%%%vvvCCCsssjjj### &&&fffKKK444___!!!***[[[UUU<<>>999oooTTTvvv{{{ddd)))]]]KKK--->>>999 ...)))]]]dddxxx + + +XXX???''''''fff999'''777ooo|||www{{{şnnn///ccc///[[[///RRR999XXX///[[[şnnn///ccc + + +777]]]777]]]EEEJJJ'''### ooo{{{www{{{CCC ccc///[[[\\\ \\\///[[[CCC ccc555MMM  +++ggg'''///aaaooo|||www{{{777+++SSSccc///[[['''RRRaaattt///[[[777+++SSScccIIIVVVЊHHH + + +ЊHHH + + +'''///444333ooo SSSwww{{{mmmiii ccc///[[[DDD000CCC'''444333bbb///[[[mmmiii ccc888888'''///rrrooo]]] qqqwww{{{:::aaa///[[[rrr~~~"""///[[[:::aaaOOOXXXJJJ!!!PPPJJJ!!!PPP000'''///AAAjjj+++ooocccCCCwww{{{;;;KKK[[[///[[[^^^xxxAAAjjj+++LLLEEE///[[[;;;KKK[[[]]]mmm)))***+++BBB)))***+++BBBUUU'''///ttt888XXXOOOooo^^^QQQ蕕www{{{LLLrrr稨222OOO///[[[yyyVVVkkkwwwttt888XXXOOOBBBOOO///[[[LLLrrr稨222OOO<<>> ;;;sssaaa!!!666ggg LLLCCCGGGaaa!!!666>>> ;;;sssVVV%%%pppXXX888aaa!!!666ggg LLLaaa!!!666bbb&&& 999999OOO///qqqTTT SSSggg LLL>>> ;;;sssXXX888UUU + + +***sssaaafff??? 111TTT SSSXXX888ggg LLLFFFwwwsss}}} ~~~CCCGGG}}}FFFwwwssshhh ^^^}}} ~~~}}}www777999eee666ggg ~~~FFFwwwsss^^^  bbbrrr 666ggg^^^ ~~~TTTttt===sssJJJ죣KKK???oooJJJ죣TTTttt===sss 񤤤 ```QQQ蔔KKK???KKK???JJJ죣JJJ죣HHH{{{RRR888qqq쯯GGGߜTTTttt===sss```QQQ蔔 򤤤 KKK???oooccc,,,SSSGGGߜKKK???```QQQ蔔999sss\\\ ggg + + +KKK???ooo\\\ ggg999sss```dddNNNKKK???KKK???\\\ ggg + + +\\\ gggRRRyyy 888YYY===444}}} + + +999sssdddNNN ___KKK???&&&___}}}}}}KKK???dddNNN + + +hhhsss~~~xxx ___eeeKKK???ooo~~~xxx hhhsssbbbkkk ___ tttKKK???KKK???~~~xxx ___eee~~~xxx 444777PPP'''...hhh___eeehhhsss___ tttWWWppp KKK???[[[```\\\...hhhKKK???___ ttt___eee&&&SSSsss>>>999)))]]]KKK???ooo>>>999&&&SSSsssTTTKKK???KKK???>>>999)))]]]>>>999KKK---'''ddd)))]]]&&&SSSsssTTTKKK???vvv{{{ + + +444dddKKK???TTT)))]]] + + +vvv===sss///[[[KKK???ooo + + +vvv===sss|||KKK???KKK???///[[[///RRR777]]]şnnn///ccc///[[[ + + +vvv===sss|||KKK???www{{{şnnn///cccKKK???|||///[[[KKKsss ///[[[KKK???ooo KKKsss{{{KKK???KKK??? ///[[[ \\\000... CCC ccc///[[[KKKsss{{{KKK???www{{{888CCC cccKKK???{{{///[[[ uuuCCCsssaaa///[[[KKK???oooaaa uuuCCCsss|||KKK???KKK???aaa///[[[aaa'''RRR111QQQЊHHH + + +777+++SSSccc///[[[ uuuCCCsss|||KKK???www{{{^^^777+++SSScccKKK???|||///[[[)))NNN!!!sss444333///[[[KKK???ooo444333)))NNN!!!sss SSSKKK???KKK???444333///[[[444333DDD000CCC'''jjj888mmmiii ccc///[[[)))NNN!!!sss SSSKKK???www{{{iiimmmiii cccKKK??? SSS///[[[qqq sssrrr///[[[LLL???ooorrrqqq sssxxx```]]] qqqLLL???LLL???rrr///[[[rrrJJJ!!!PPP:::aaa///[[[qqq sss]]] qqqtttiiiLLL???www{{{>>>:::aaaLLL???]]] qqq///[[[jjjLLLsssAAAjjj+++///[[[QQQ:::oooAAAjjj+++jjjLLLsssuuucccCCCQQQ:::QQQ:::AAAjjj+++///[[[AAAjjj+++^^^xxx```)))***+++BBB;;;KKK[[[///[[[jjjLLLssscccCCC tttQQQ:::www{{{ ;;;KKK[[[QQQ:::cccCCC///[[[rrr^^^LLLsssttt888XXXOOO///[[[eeeooottt888XXXOOOrrr^^^LLLsss ^^^QQQ蕕eeeeeettt888XXXOOO///[[[ttt888XXXOOOyyyVVVkkkwwwKKK<<>>999'''||||||KKK???ooo///[[[KKK???777]]]{{{{{{KKK???ooo///[[[KKK??? ||||||KKK???ooo///[[[KKK???aaaЊHHH + + + SSS SSSKKK???ooo///[[[KKK???444333888]]] qqqtttiiitttiii]]] qqqLLL???ooo///[[[LLL???rrrJJJ!!!PPPcccCCC ttt tttcccCCCQQQ:::SSSJJJooo///[[[QQQ:::AAAjjj+++)))***+++BBB^^^QQQ蕕^^^QQQ蕕eeeMMMrrrDDD ooo///[[[eeettt888XXXOOO<<>>YYY aaa!!!666bbb&&& 999ooo 777Ǣnnn333###sssaaa!!!666aaa!!!666888>>>}}}www'''III777}}}www777ooofff ___333###III}}}}}}xxx>>>///ttt欬999tttJJJ죣HHH{{{RRRoooVVVKKK???^^^hhh(((333###555JJJ죣FFF///CCC>>>JJJ죣666;;;lllHHH\\\ gggRRRyyyooo KKK???333###҅ \\\ ggg\\\ ggguuu 444+++```~~~xxx 444ooonnnKKK???VVV333~~~xxx ~~~ ~~~xxx hhh222<<<<<<XXX\\\랞҇FFF >>>999KKK---oooKKK???xxx + + +XXX???333\\\>>>999```%%%777XXX>>>999ppp}}}pppDDD 222Ɇ//////RRRoooKKK??? + + +333###aaaWWW + + +... + + +rrr!!! \\\oooKKK???555MMM333###::: uuu LLLiiiDDDsss)))mmmaaa'''RRRoooKKK???IIIVVV333###VVVYYYaaa{{{+++qqqaaa&&&AAA...999ppp&&&CCCvvv444333DDD000CCC'''oooKKK???333###444333444333iiirrrFFFgggXXXeeeeeerrroooLLL???OOOXXX333###rrr777{{{}}}---rrr000''' ,,,___```666sssfffAAAjjj+++^^^xxxSSSJJJoooQQQ:::]]]mmm333###SSS[[[AAAjjj+++...---AAAjjj+++|||555 + + +yyy\\\RRRᄄlll999$$$ttt888XXXOOOyyyVVVkkkwwwMMMrrrDDD oooeee333###ttt888XXXOOOttt888XXXOOOEEE===TTT,,,>>>BBBmmmHHHiii\\\mmm[[[oooiii'''333###!!!mmmHHHQQQGGGmmmHHHLLLBBBmmm000(((ccc___!!!***[[[VVVEEEvvv'''FFF[[[ooo%%%333###999uuu___!!!***[[[___!!!***[[[aaaaaasssAAAɰ///%%%555苋DDD + + +###XXXsssjjjsssjjjjjjsssyyy222sssxxx???sssxxx???xxx???sss;;;:::YYYsssKKK???sssKKK???KKK???sssJJJ{{{&&&sssKKK???sssKKK???KKK???sssiiiyyysssKKK???sssKKK???KKK???sss,,, OOOXXX888ggg LLL>>> ;;;sssooobbb&&& 999aaa!!!666>>> ;;;sssoooggg LLLaaa!!!666 777TTT SSSVVV%%%pppǢnnnTTT SSSOOO///qqqOOO///qqqaaa!!!666OOO///qqqOOO///qqqaaafff??? 111aaa!!!666ggg LLLTTT SSSggg LLL>>> ;;;sss^^^^^^ ~~~FFFwwwsssooowww777}}}FFFwwwsssooo ~~~}}}fff666ggghhh  ___666gggeeeeee}}}eeeeee bbbrrr}}} ~~~666ggg ~~~FFFwwwsss```QQQ蔔TTTttt===sssoooHHH{{{RRRKKK???JJJ죣TTTttt===sssFFF///CCC>>>oooJJJ죣VVVGGGߜ 񤤤 KKK???^^^hhh(((GGGߜqqq쯯qqq쯯JJJ죣qqq쯯qqq쯯oooccc,,,JJJ죣KKK???GGGߜTTTttt===ssswwwdddNNN + + +999sssoooRRRyyyKKK???\\\ ggg999sssooo + + +\\\ ggg }}}```KKK???}}}YYY===444YYY===444\\\ gggYYY===444YYY===444&&&___\\\ ggg + + +KKK???}}} + + +999ssshhh___ ttt___eeehhhsssooo444KKK???~~~xxx hhhsss ~~~ ooo___eee~~~xxx nnn...hhhbbbkkk KKK???VVV...hhhPPP'''PPP'''~~~xxx PPP'''PPP'''[[[```~~~xxx ___eeeKKK???...hhh___eeehhhssskkkTTT)))]]]&&&SSSsssoooKKK---KKK???>>>999&&&SSSsss```%%%777XXXooo)))]]]>>>999dddKKK???xxx + + +XXX???ddd''''''>>>999''''''vvv{{{>>>999)))]]]KKK???ddd)))]]]&&&SSSsss|||///[[[ + + +vvv===sssooo///RRRKKK??? + + +vvv===sssooo///[[[şnnn///cccKKK??? + + +şnnn///ccc777]]]777]]]777]]]777]]]www{{{///[[[KKK???şnnn///ccc///[[[ + + +vvv===sss{{{///[[[KKKsssooo\\\KKK??? KKKsssuuuooo///[[[ CCC cccKKK???555MMMCCC ccc    www{{{ ///[[[KKK???CCC ccc///[[[KKKsssTTT[[[CCC|||///[[[ uuuCCCsssooo'''RRRKKK???aaa uuuCCCsss{{{+++qqqooo///[[[aaa777+++SSScccKKK???IIIVVV777+++SSScccЊHHH + + +ЊHHH + + +aaaЊHHH + + +ЊHHH + + +www{{{aaa///[[[KKK???777+++SSSccc///[[[ uuuCCCsss>>> ,,, SSS///[[[)))NNN!!!sssoooDDD000CCC'''KKK???444333)))NNN!!!sssooo///[[[444333mmmiii cccKKK???mmmiii ccc888888444333888888www{{{444333///[[[KKK???mmmiii ccc///[[[)))NNN!!!sssgggsss]]] qqq///[[[qqq sssoooLLL???rrrqqq sss777{{{}}}---ooo///[[[rrr:::aaaxxx```LLL???OOOXXX:::aaaJJJ!!!PPPJJJ!!!PPPrrrJJJ!!!PPPJJJ!!!PPPwww{{{rrr///[[[LLL???:::aaa///[[[qqq sssGGG|||,,,cccCCC///[[[jjjLLLsssSSSJJJooo^^^xxxQQQ:::AAAjjj+++jjjLLLsss...---SSSJJJooo///[[[AAAjjj+++;;;KKK[[[uuuQQQ:::]]]mmm;;;KKK[[[)))***+++BBB)))***+++BBBAAAjjj+++)))***+++BBB)))***+++BBBwww{{{AAAjjj+++///[[[QQQ:::;;;KKK[[[///[[[jjjLLLsss)))444LLL^^^QQQ蕕///[[[rrr^^^LLLsssMMMrrrDDD oooyyyVVVkkkwwweeettt888XXXOOOrrr^^^LLLsssMMMrrrDDD ooo///[[[ttt888XXXOOOLLLrrr稨222OOO eeeLLLrrr稨222OOO<<> +stream + +q +595 0 0 818.125 0 11.9375 cm +/fzImg0 Do +Q + +endstream +endobj + +13 0 obj +<>>> +endobj + +14 0 obj +<> +endobj + +15 0 obj +<>/Width 800/Height 1100/BitsPerComponent 8/ColorSpace 7 0 R/Length 2640000>> +stream +''' [[[+++܋PPP$$$%%%TTT333SSS ccc{{{܋PPP$$$%%%TTTGGG### !!!AAAttt+++---ZZZ㛛TTT/// JJJ### !!!AAAttt[[[]]]BBB[[[+++FFFfff333SSS ccc{{{FFFfff]]]aaaGGG###;;;+++ UUU888###;;;[[[~~~xxx[[[+++aaa333SSS iiiccc{{{aaa + + + + + +GGG###+++TTTVVV###[[[HHH\\\GGG+++222^^^<<<333SSS ccc{{{222^^^<<<444GGG### uuu;;;+++VVV<<>>WWWGGG###***]]]+++mmmUUU###***]]]___uuu______+++ !!!333SSS OOOxxx000ccc{{{!!!QQQGGG### 222:::+++ 666www CCC### 222:::___fffvvv___+++ EEE333SSS OOO(((ccc{{{EEELLLGGG###+++ ---###___gggCCC___+++ ttt333SSS OOOccc{{{tttGGG###+++ ^^^ ###___gggCCC___+++ \\\ccc@@@333SSS OOOrrr444ccc{{{\\\ccc@@@GGG###iii+++ gggfff###iii___gggCCC___+++ 333SSS OOO%%%ccc{{{ ))) ***GGG###+++%%%+++  +++###+++%%%___wwwccc___+++ bbb888333SSS OOOccc{{{bbb888VVVLLLGGG###eee+++ kkkhhh'''###eee___eeeuuu___+++  + + +333SSS OOOmmm111ccc{{{ + + +fffGGG###zzz+++  ###zzz___...???___+++ ddd111///333SSS OOO!!!ccc{{{ddd111///QQQ}}}GGG###''''''+++ DDD###''''''___ ___+++ """  + + +333SSS OOOccc{{{"""  + + +jjjGGG###}}}+++ XXXWWWyyy###}}}___QQQ;;;___+++___ttt'''333SSS OOOhhhccc{{{___ttt'''CCC555###ccc+++VVV???QQQ !!!###ccc___qqq___+++www000333SSS OOOccc{{{000###vvvFFF+++###999+++wwwNNN###999___TTTggg___+++wwwccc333SSS OOOccc{{{ccc~~~+++###+++wwwVVV999###___222___+++wwwlll=== &&&VVV333SSS OOOcccccc{{{lll=== &&&VVV!!!:::www+++###JJJ +++www擓VVV''' KKK###JJJ ___''' ---ZZZ鄄??? 000ppp]]]BBB ccc~~~xxxTTThhh$$$HHH\\\ 777>>>%%%``` + + + TTT555555 BBBkkkRRRGGG@@@NNN nnn)))\\\...___```gggCCCڋJJJ$$$###SSSkkk--- """[[[SSSkkkYYY$$$EEE@@@gggCCC'''\\\^^^XXXSSSkkk \\\gggCCCLLL###333<<>>AAAڙ%%%QQQ☘kkk uuuܖ"""bbbkkkooo}}} YYY555***;;;kkk ###QQQ:::LLLЗ444[[[ ??? + + +kkkqqq%%%===MMM HHHyyyDDD!!!___jjjkkk^^^AAA@@@)))mmmkkkՅ***uuu___666wwwܩllluuu!!!kkk***nnnfffvvvқtttSSS/// YYY ;;;kkkttt gggCCC%%%OOOGGGkkk|||kkkcccgggCCCFFF+++\\\ZZZ<<<kkkgggCCC]]]TTTooo$$$kkkwwwccc"""kkkuuu\\\:::eeeuuunnnkkkTTT@@@...???zzzBBB )))kkk[[[}}},,,FFF***>>> ]]]444333TTTkkk*** QQQ;;;KKK㺺fff rrrYYY祥***lllJJJ㱱PPP<<<```yyyqqqsssVVV---UUU777PPPqqqjjjvvv000kkkTTTgggmmm%%%fffZZZgggxxxjjj222kkk///+++dddlll***!!!aaaXXX%%%)))YYY~~~=== ###XXX;;;cccQQQGGG XXX``` {{{趶RRRqqq + + + 555쐐KKK + + +111hhh333 &&&XXX777KKK苋DDD + + +###XXXKKK PPPvvvvvv777KKK###___''' + + +DDDuuuHHHsssKKK PPP###___[[[$$$ EEE333kkk777KKKyyy222kkk kkk 777KKK###***___'''sss###***___///333###ᙙ777;;;:::YYYUUUvvvooo777###www___'''###HHH%%%vvv鿿222UUUvvvooo###www___QQQfff 333###%%%777JJJ{{{&&&jjj___888,,,nnn,,,nnn777###___'''/// + + +NNNMMMuuuCCCjjj___888###___ttt333###XXX777iiiyyyVVV)))[[['''jjjnnn[[['''jjjnnn777###''')))___'''///eeeCCCVVV)))###''')))___(((MMM333###sss777,,, OOO>>> + + +___[[[~~~ [[[~~~ 777###SSS|||uuu___'''///888&&&;;;CCC>>> + + +___###SSS|||uuu___YYY 333###III777^^^aaa,,, + + +[[[eeexxx[[[eeexxx777###SSS...___'''///'''===CCCaaa,,, + + +###SSS...___777333###555777|||[[[---jjjѧ[[[---jjjѧ777###SSS'''___'''///wwwiiiCCC|||###SSS'''___ttt333###҅ 777www{{{[[[***[[[***777###SSS~~~sss___'''///```aaa=== CCC{{{###SSS~~~sss___lllHHH333777 hhhiii[[[\\\[[[\\\777 ###SSS///___'''///LLL VVVCCCiii###SSS///___+++```333\\\777 kkk```[[[222iii [[[222iii 777 ###SSS%%%___'''///MMM ...CCC```###SSS%%%___҇FFF 333###aaaWWW777qqq[[[666[[[666777###SSSppp___'''///]]]999XXXCCCqqq###SSSppp___Ɇ///333###:::777[[[SSS[[[SSS777###SSS222___'''///~~~\\\CCC###SSS222___!!!333###VVVYYY777TTT[[[CCCddd<<< [[[888hhh [[[888hhh 777###SSS$$$___'''///tttCCCddd<<< ###SSS$$$___sss)))mmm333###777>>> ,,,??? + + +\\\[[[BBB[[[BBB777###SSSnnn___'''///''' bbbCCC??? + + +\\\###SSSnnn___vvv333###777gggsssMMM///[[[KKK[[[KKK777###SSS333((('''///jjj~~~"""CCCMMM///###SSS333(((eeeeee333###SSS[[[777GGG|||,,,kkkZZZ@@@[[[>>>]]][[[>>>]]]777###SSS'''/// + + +LLLEEECCCkkkZZZ@@@###SSS666sssfff333###777)))444LLLYYYЃjjj[[[MMM[[[MMM777###SSS'''###===BBBOOOCCCYYYЃjjj###SSSlll999$$$333###!!!777RRR<<<[[[[[[777###SSS555'''CCC###SSS555>>>BBB333###999uuu777ooo444 + + +$$$]]]III RRR[[[CCC + + +[[[CCC + + +777###SSS''' '''VVV%%%|||CCCIII RRR###SSSmmm000(((ccc %%%```sssPPPrrrssssss cccsss ssssss 888!!!sssssssss KKK ssssssssssss KKKsssooossssss KKKTTT SSSOOO///qqqaaa!!!666>>> ;;;sssXXX888ggg LLLXXX888ooo 777CCCGGGggg LLL>>> ;;;sssggg LLL@@@ ---oooKKKOOO///qqq888>>>}}}www'''aaa!!!666 777aaa!!!666bbb&&& 999XXX888aaafff??? 111aaafff??? 111aaa!!!666ggg LLL>>> ;;;sss KKKCCC,,,666gggeee}}}FFFwwwsss^^^ ~~~^^^ooofffCCCGGG ~~~FFFwwwsss ~~~KKK111\\\KKKeeexxx>>>///ttt}}}fff}}}www777^^^ bbbrrr bbbrrr}}} ~~~FFFwwwsss KKKRRRGGGߜqqq쯯JJJ죣TTTttt===sss```QQQ蔔```QQQ蔔oooVVVoooTTTttt===sss^^^lllCCCKKKqqq쯯666JJJ죣VVVJJJ죣HHH{{{RRR```QQQ蔔oooccc,,,oooccc,,,JJJ죣TTTttt===sss '''}}}YYY===444\\\ ggg999sssdddNNN + + +dddNNNooo ooo + + +999sss + + +}}}GGGKKKYYY===444uuu \\\ ggg \\\ gggRRRyyydddNNN&&&___&&&___\\\ ggg + + +999sss iii...hhhPPP'''~~~xxx hhhsss___ ttt___eee___ tttooonnnooo___eeehhhsss___eeejjjKKKPPP'''hhh222<<<<<<XXX\\\~~~xxx nnn~~~xxx 444___ ttt[[[```[[[```~~~xxx ___eeehhhsss 888<<<///ddd'''>>>999&&&SSSsssTTT)))]]]TTToooooo)))]]]&&&SSSsss)))]]]"""]]]000KKK'''ppp}}}pppDDD>>>999>>>999KKK---TTTvvv{{{vvv{{{>>>999)))]]]&&&SSSsss KKKqqqşnnn///ccc777]]] + + +vvv===sss|||///[[[|||oooooo///[[[ + + +vvv===sss///[[[XXXKKK777]]] + + +... + + +///RRR|||www{{{www{{{///[[[ + + +vvv===sss KKK### CCC ccc  KKKsss{{{///[[[{{{oooooo///[[[KKKsss///[[[cccKKK LLLiiiDDD \\\{{{www{{{www{{{ ///[[[KKKsss KKK___777+++SSScccЊHHH + + +aaa uuuCCCsss|||///[[[|||oooooo///[[[ uuuCCCsss///[[[XXXKKKЊHHH + + +&&&AAA...999aaaaaa'''RRR|||www{{{www{{{aaa///[[[ uuuCCCsss KKKbbbmmmiii ccc888444333)))NNN!!!sss SSS///[[[ SSSoooooo///[[[)))NNN!!!sss///[[[[[[333KKK888iiirrrFFF444333444333DDD000CCC''' SSSwww{{{www{{{444333///[[[)))NNN!!!sss KKK''' :::aaaJJJ!!!PPPrrrqqq sss]]] qqq///[[[]]] qqqoooooo///[[[qqq sss///[[[]]]KKKJJJ!!!PPP000''' ,,,rrrrrr]]] qqqwww{{{www{{{rrr///[[[qqq sss KKK[[[;;;KKK[[[)))***+++BBBAAAjjj+++jjjLLLssscccCCC///[[[cccCCCSSSJJJoooooo///[[[jjjLLLsss///[[[XXXKKK)))***+++BBB|||555 + + +yyyAAAjjj+++AAAjjj+++^^^xxxcccCCCwww{{{www{{{AAAjjj+++///[[[jjjLLLsss 888QQQ LLLrrr稨222OOO<<>><<<&&&PPPhhh{{{mmmHHHmmmHHHiii\\\YYY www{{{www{{{mmmHHH///[[[ccc%%%...sss DDDjjj### &&&fffTTT ###ccc___!!!***[[[III%%%vvvCCCsssUUU<<<///[[[UUU<<>>JJJ죣&&&___ ___\\\ ggg&&&___\\\ ggg + + +KKK???RRRyyy}}}RRRyyy + + +}}}GGGKKK}}}\\\ ggg KKK???dddNNN&&&___ ___ dddNNN\\\ ggg[[[```WWWppp ~~~xxx [[[```~~~xxx ___eeeKKK???444...hhh444]]]ccc___eeejjjKKK...hhhVVV~~~xxx nnnKKK???___ ttt[[[```WWWppp nnn___ ttt ~~~ ~~~xxx vvv{{{>>>999vvv{{{>>>999)))]]]KKK???KKK---dddKKK---444VVV)))]]]"""]]]000KKKdddxxx + + +XXX???>>>999KKK???TTTvvv{{{TTT```%%%777XXX>>>999www{{{www{{{///[[[KKK???///RRRşnnn///ccc///RRR777SSS///[[[XXXKKKşnnn///ccc + + +KKK???|||www{{{|||www{{{ www{{{ ///[[[KKK???\\\CCC ccc\\\777SSS///[[[cccKKKCCC ccc555MMM KKK???{{{www{{{{{{uuu www{{{aaawww{{{aaa///[[[KKK???'''RRR777+++SSSccc'''RRR777SSS///[[[XXXKKK777+++SSScccIIIVVVaaaKKK???|||www{{{|||{{{+++qqqaaawww{{{444333www{{{444333///[[[KKK???DDD000CCC'''mmmiii cccDDD000CCC'''777SSS///[[[[[[333KKKmmmiii ccc444333KKK??? SSSwww{{{ SSS444333www{{{tttiiirrrwww{{{rrr///[[[LLL???:::aaa777SSS///[[[]]]KKK:::aaaOOOXXXrrrLLL???]]] qqqwww{{{tttiii]]] qqq777{{{}}}---rrrwww{{{ tttAAAjjj+++www{{{AAAjjj+++///[[[QQQ:::^^^xxx;;;KKK[[[^^^xxx777SSS///[[[XXXKKK;;;KKK[[[]]]mmmAAAjjj+++QQQ:::cccCCCwww{{{ tttcccCCC...---AAAjjj+++www{{{ttt888XXXOOOwww{{{ttt888XXXOOO///[[[eeeyyyVVVkkkwwwLLLrrr稨222OOOyyyVVVkkkwww777SSS///[[[XXXqqqSSSLLLLLLrrr稨222OOOttt888XXXOOOeee^^^QQQ蕕www{{{^^^QQQ蕕ttt888XXXOOOwww{{{RRR'''mmmHHHwww{{{mmmHHH///[[[iii\\\```XXX***iii\\\777SSS///[[[>>><<<&&&PPP```XXX***iii'''mmmHHHYYY www{{{RRR'''YYY QQQGGGmmmHHHwww{{{KKK444___!!!***[[[www{{{___!!!***[[[///[[[%%%VVVEEEjjj### &&&fffVVVEEE777SSS///[[[999...///ZZZjjj### &&&fff___!!!***[[[%%%UUU<<<www{{{KKK444UUU<<<___!!!***[[["""ooottt000pppsssAAAsss + + +%%%hhhpppɰ///%%%^^^nnn))) + + +###XXX555jjj---{{{aaaEEE::: \\\fffxxx???ppp{{{XXX111### KKK???{{{ 333hhh|||333YYY```}}}KKK???{{{UUUnnn KKK???UUUTTT {{{ ***kkk 777aaa!!!666OOO///qqqUUU + + +***sssXXX888ggg LLLOOO///qqqaaa!!!666aaafff??? 111aaa!!!666OOO///qqqVVV%%%pppnnn {{{III333555 %%%fff}}}eee ^^^ ~~~eee}}} bbbrrr}}}eeehhh ___ {{{gggxxx---RRRVVVJJJ죣qqq쯯 򤤤 ```QQQ蔔qqq쯯JJJ죣KKK???oooccc,,,JJJ죣qqq쯯 񤤤 ^^^hhh(((<<<iii {{{FFF222 + + + 999 \\\ gggYYY===444 ___dddNNN + + +YYY===444\\\ gggKKK???&&&___\\\ gggYYY===444```%%% {{{222BBB000\\\ccc:::nnn~~~xxx PPP'''WWWppp ___ ttt___eeePPP'''~~~xxx KKK???[[[```~~~xxx PPP'''bbbkkk VVV {{{***KKK555(((PPP888^^^>>>999'''TTT)))]]]'''>>>999KKK???vvv{{{>>>999'''xxx + + +XXX???'''}}} {{{---LLLAAA \\\&&&222!!!777]]]|||///[[[777]]]KKK???www{{{777]]] + + +hhh333 {{{999CCCggg+++SSSfff {{{///[[[  KKK???www{{{ 555MMM {{{HHH444 aaaЊHHH + + +|||///[[[ЊHHH + + +aaaKKK???www{{{aaaЊHHH + + +IIIVVV[[[nnnxxx{{{666444333888 SSS///[[[888444333KKK???www{{{444333888[[[GGG555 + + +AAArrrJJJ!!!PPPtttiii]]] qqq///[[[JJJ!!!PPPrrrLLL???www{{{rrrJJJ!!!PPPxxx```OOOXXX {{{ bbb999AAAjjj+++)))***+++BBB tttcccCCC///[[[)))***+++BBBAAAjjj+++QQQ:::www{{{AAAjjj+++)))***+++BBBuuu]]]mmm {{{FFFnnn'''ttt888XXXOOO<<>>```sssAAAɰ///%%%555RRR ...www333###OOOsssjjjjjjjjjjjjjjjRRR333###OOOsssxxx???xxx???xxx???xxx???xxx???eee]]]졡 333###OOOsssKKK???KKK???KKK???KKK???KKK???hhh YYY333###OOOsssKKK???KKK???KKK???KKK???KKK???ccc333###OOOsssKKK???KKK???KKK???KKK???KKK???!!!&&&333###OOOUUU + + +***sss@@@ ---oooKKK 777TTT SSS>>> ;;;sssaaa!!!666TTT SSSooohhh!!!JJJaaa!!!666ggg LLLbbb&&& 999TTT SSSXXX888ggg LLLOOO///qqqnnnOOO///qqqaaa!!!666aaafff??? 111XXX888``` 333###OOO KKK111\\\KKKfff666gggFFFwwwsss}}}666gggooofff + + +}}} ~~~www777666ggg^^^ ~~~eee ___eee}}} bbbrrr^^^ MMM333###OOO 򤤤 ^^^lllCCCKKKVVVGGGߜTTTttt===sssJJJ죣GGGߜoooKKK???ﲲJJJ죣KKK???HHH{{{RRRGGGߜKKK???```QQQ蔔qqq쯯^^^hhh(((qqq쯯KKK???JJJ죣oooccc,,,KKK???```QQQ蔔UUU333###OOO ___}}}GGGKKK }}}999sss\\\ ggg}}}oooKKK???\\\ ggg + + +KKK???RRRyyy}}}KKK???dddNNN + + +YYY===444YYY===444KKK???\\\ ggg&&&___KKK???dddNNN|||ccc333###OOOWWWppp jjjKKKnnn...hhhhhhsss~~~xxx ...hhhoooKKK???]]]ccc~~~xxx ___eeeKKK???444...hhhKKK???___ ttt___eeePPP'''VVVPPP'''KKK???~~~xxx [[[```KKK???___ ttt}}}:::333###OOO"""]]]000KKKddd&&&SSSsss>>>999dddoooKKK???444VVV>>>999)))]]]KKK???KKK---dddKKK???TTT)))]]]'''xxx + + +XXX???'''KKK???>>>999vvv{{{KKK???TTTfff999555###MMM XXXKKKşnnn///ccc + + +vvv===sssşnnn///cccoooKKK???777SSS///[[[KKK???///RRRşnnn///cccKKK???|||///[[[777]]] + + +777]]]KKK???www{{{KKK???|||EEEJJJ===BBBcccKKKCCC cccKKKsss CCC cccoooKKK???777SSS ///[[[KKK???\\\CCC cccKKK???{{{///[[[ 555MMM KKK??? www{{{KKK???{{{+++gggQQQ+++&&&XXXKKK777+++SSSccc uuuCCCsssaaa777+++SSScccoooKKK???777SSSaaa///[[[KKK???'''RRR777+++SSScccKKK???|||///[[[ЊHHH + + +IIIVVVЊHHH + + +KKK???aaawww{{{KKK???|||wwwKKK[[[333KKKmmmiii ccc)))NNN!!!sss444333mmmiii cccoooKKK???777SSS444333///[[[KKK???DDD000CCC'''mmmiii cccKKK??? SSS///[[[888888KKK???444333www{{{KKK??? SSSxxxtttiii]]]KKK:::aaaqqq sssrrr:::aaaoooLLL???777SSSrrr///[[[LLL???:::aaaLLL???]]] qqq///[[[JJJ!!!PPPOOOXXXJJJ!!!PPPLLL???rrrwww{{{LLL???]]] qqq000 tttXXXKKK;;;KKK[[[jjjLLLsssAAAjjj+++;;;KKK[[[SSSJJJoooQQQ:::777SSSAAAjjj+++///[[[QQQ:::^^^xxx;;;KKK[[[QQQ:::cccCCC///[[[)))***+++BBB]]]mmm)))***+++BBBQQQ:::AAAjjj+++www{{{QQQ:::cccCCCUUUGGG^^^XXXqqqSSSLLLLLLrrr稨222OOOrrr^^^LLLsssttt888XXXOOOLLLrrr稨222OOOMMMrrrDDD oooeee777SSSttt888XXXOOO///[[[eeeyyyVVVkkkwwwLLLrrr稨222OOOeee^^^QQQ蕕///[[[<<>><<<&&&PPP```XXX***ccc%%%...sssmmmHHH```XXX***mmm[[[ooo777SSSmmmHHH///[[[iii\\\```XXX***YYY ///[[[hhhiii'''hhhmmmHHHwww{{{YYY 싋DDD;;;KKK444999...///ZZZjjj### &&&fffIII%%%vvvCCCsss___!!!***[[[jjj### &&&fffvvv'''FFF[[[ooo%%%777SSS___!!!***[[[///[[[%%%VVVEEEjjj### &&&fff%%%UUU<<<///[[[TTT ###cccTTT ###ccc%%%___!!!***[[[www{{{%%%UUU<<<"""ooottt000pppsssAAAsss + + +%%%hhhpppɰ///%%%^^^nnn))) + + +###XXX555jjjjjjPPPrrrjjjjjjjjjjjjxxx???xxx??? xxx???xxx???xxx???xxx???KKK???KKK???KKK???KKK???KKK???KKK???KKK???KKK???sssKKK???KKK???KKK???KKK???KKK???KKK???oooKKK???KKK???KKK???KKK???OOO///qqqoooUUU + + +***sssUUU + + +***sssXXX888 777XXXaaafff??? 111oooCCCGGGTTT SSSbbb&&& 999XXX888 777TTT SSSooohhh!!!JJJaaa!!!666ggg LLLbbb&&& 999TTT SSSXXX888ggg LLLeeeooo  ^^^fffVVV bbbrrroooCCCGGG666gggwww777^^^fff666gggooofff + + +}}} ~~~www777666ggg^^^ ~~~qqq쯯ooo 򤤤  򤤤 ```QQQ蔔VVVKKK???oooccc,,,oooKKK???oooGGGߜHHH{{{RRRKKK???```QQQ蔔VVVGGGߜoooKKK???ﲲJJJ죣KKK???HHH{{{RRRGGGߜKKK???```QQQ蔔YYY===444ooo ___ ___dddNNN KKK???&&&___oooKKK???ooo}}}RRRyyyKKK???dddNNN }}}oooKKK???\\\ ggg + + +KKK???RRRyyy}}}KKK???dddNNN + + +PPP'''oooWWWppp WWWppp ___ tttnnnKKK???[[[```oooKKK???ooo...hhh444KKK???___ tttnnn...hhhoooKKK???]]]ccc~~~xxx ___eeeKKK???444...hhhKKK???___ ttt___eee'''oooTTTKKK???vvv{{{oooKKK???ooodddKKK---KKK???TTTdddoooKKK???444VVV>>>999)))]]]KKK???KKK---dddKKK???TTT)))]]]777]]]ooo|||KKK???www{{{oooKKK???oooşnnn///ccc///RRRKKK???|||şnnn///cccoooKKK???777SSS///[[[KKK???///RRRşnnn///cccKKK???|||///[[[ ooo{{{KKK???www{{{oooKKK???oooCCC ccc\\\KKK???{{{CCC cccoooKKK???777SSS ///[[[KKK???\\\CCC cccKKK???{{{///[[[ЊHHH + + +ooo|||KKK???www{{{oooKKK???ooo777+++SSSccc'''RRRKKK???|||777+++SSScccoooKKK???777SSSaaa///[[[KKK???'''RRR777+++SSScccKKK???|||///[[[888ooo SSSKKK???www{{{oooKKK???ooommmiii cccDDD000CCC'''KKK??? SSSmmmiii cccoooKKK???777SSS444333///[[[KKK???DDD000CCC'''mmmiii cccKKK??? SSS///[[[JJJ!!!PPPoootttiiitttiii]]] qqqLLL???www{{{oooLLL???ooo:::aaaLLL???]]] qqq:::aaaoooLLL???777SSSrrr///[[[LLL???:::aaaLLL???]]] qqq///[[[)))***+++BBBSSSJJJooo ttt tttcccCCCQQQ:::www{{{SSSJJJoooQQQ:::ooo;;;KKK[[[^^^xxxQQQ:::cccCCC;;;KKK[[[SSSJJJoooQQQ:::777SSSAAAjjj+++///[[[QQQ:::^^^xxx;;;KKK[[[QQQ:::cccCCC///[[[<<>>>>> + + +___UUU + + +***sssaaafff??? 111aaa!!!666>>> ;;;sssTTT SSSTTT SSSVVV%%%pppTTT SSSOOO///qqqaaa!!!666??? 666WWWooooooaaa!!!666 777aaa!!!666OOO///qqqXXX888 777aaa!!!666>>> ;;;sssooobbb&&& 999aaa!!!666IIIaaa,,, + + +  bbbrrr }}}FFFwwwsss666ggg666ggghhh 666gggeee}}}GGG```WWWoooooo}}}fff}}}eee^^^fff}}}FFFwwwsssooowww777}}}欬999||| 򤤤 KKK???oooccc,,,SSSJJJ죣TTTttt===sssGGGߜKKK???GGGߜ 񤤤 GGGߜqqq쯯JJJ죣WWWjjj444 ooooooJJJ죣VVVJJJ죣qqq쯯KKK???```QQQ蔔VVVJJJ죣TTTttt===sssoooHHH{{{RRRJJJ죣;;;{{{ ___KKK???&&&___}}}\\\ ggg999sss}}}KKK???}}}```}}}YYY===444\\\ gggttt000oooooo\\\ ggg \\\ gggYYY===444KKK???dddNNN \\\ ggg999sssoooRRRyyy\\\ ggg444iiiWWWppp KKK???[[[```\\\~~~xxx hhhsss...hhhKKK???...hhhbbbkkk ...hhhPPP'''~~~xxx kkk oooooo~~~xxx nnn~~~xxx PPP'''KKK???___ tttnnn~~~xxx hhhsssooo444~~~xxx 랞```KKK???vvv{{{ + + +444>>>999&&&SSSsssdddKKK???dddddd'''>>>999)))JJJoooooo>>>999>>>999'''KKK???TTT>>>999&&&SSSsssoooKKK--->>>999 222qqqKKK???www{{{ + + +vvv===sssşnnn///cccKKK???şnnn///cccşnnn///ccc777]]]nnnBBBoooooo777]]]KKK???||| + + +vvv===sssooo///RRRrrrKKK???www{{{888 KKKsssCCC cccKKK???CCC cccCCC ccc  xxxSSSoooooo KKK???{{{ KKKsssooo\\\ ddd<<< KKK???www{{{^^^aaa uuuCCCsss777+++SSScccKKK???777+++SSSccc777+++SSScccЊHHH + + +aaammmKKKooooooaaaaaaЊHHH + + +KKK???|||aaa uuuCCCsssooo'''RRRaaappp&&&CCC??? + + +\\\KKK???www{{{iii444333)))NNN!!!sssmmmiii cccKKK???mmmiii cccmmmiii ccc888444333444EEE)))oooooo444333444333888KKK??? SSS444333)))NNN!!!sssoooDDD000CCC'''444333gggXXXMMM///tttiiiLLL???www{{{>>>rrrqqq sss:::aaaLLL???:::aaaxxx```:::aaaJJJ!!!PPPrrruuuoooooorrrrrrJJJ!!!PPPLLL???]]] qqqrrrqqq sssooorrr___```kkkZZZ@@@ tttQQQ:::www{{{ AAAjjj+++jjjLLLsss;;;KKK[[[QQQ:::;;;KKK[[[uuu;;;KKK[[[)))***+++BBBAAAjjj+++___SSSoooSSSJJJoooAAAjjj+++AAAjjj+++)))***+++BBBQQQ:::cccCCCAAAjjj+++jjjLLLsssSSSJJJooo^^^xxxAAAjjj+++\\\RRRᄄYYYЃjjjeeewww{{{"""ttt888XXXOOOrrr^^^LLLsssLLLrrr稨222OOOeeeLLLrrr稨222OOO LLLrrr稨222OOO<<>> ;;;sssVVV%%%pppnnn>>>III333555 %%%eee}}}fff}}}fff^^^666gggFFFwwwssshhh ___IIIgggxxx---RRRqqq쯯JJJ죣VVVFFF///CCC>>>JJJ죣VVV```QQQ蔔GGGߜTTTttt===sss 񤤤 ^^^hhh(((欬999FFF222 + + + 999YYY===444\\\ ggg \\\ ggg dddNNN}}}999sss```;;;222BBB000\\\ccc:::PPP'''~~~xxx nnn ~~~ ~~~xxx nnn___ ttt...hhhhhhsssbbbkkk VVV444***KKK555(((PPP888^^^'''>>>999```%%%777XXX>>>999TTTddd&&&SSSsssxxx + + +XXX???랞---LLLAAA \\\&&&222!!!777]]]|||şnnn///ccc + + +vvv===sss + + + 222999CCCggg+++SSSfff  uuu {{{CCC cccKKKsss555MMMrrrHHH444 ЊHHH + + +aaa{{{+++qqqaaa|||777+++SSSccc uuuCCCsssIIIVVVnnnxxx{{{666888444333444333 SSSmmmiii ccc)))NNN!!!sssppp&&&CCCGGG555 + + +AAAJJJ!!!PPPrrr777{{{}}}---rrr]]] qqq:::aaaqqq sssxxx```OOOXXXgggXXX bbb999)))***+++BBBAAAjjj+++...---AAAjjj+++cccCCC;;;KKK[[[jjjLLLsssuuu]]]mmm___```FFFnnn'''<<>>```sssAAAɰ///%%%555苋DDD + + +###XXXKKK PPP###___苋DDD + + +###XXX KKK333###OOO[[[$$$ EEEKKK PPP###___yyy222###***___yyy222 KKK333###OOO///###***___;;;:::YYYUUUvvvooo###www___;;;:::YYY KKK333###OOOQQQfff UUUvvvooo###www___JJJ{{{&&&jjj___888###___JJJ{{{&&& KKK333###OOOtttjjj___888###___iiiyyyVVV)))###''')))___iiiyyy KKK333###OOO(((MMMVVV)))###''')))___,,, OOO>>> + + +___###SSS|||uuu___,,, OOO KKK333###OOOYYY >>> + + +___###SSS|||uuu___^^^aaa,,, + + +###SSS...___^^^ KKK333###OOO777aaa,,, + + +###SSS...___|||###SSS'''___ KKK333###OOOttt|||###SSS'''___www{{{###SSS~~~sss___www KKK333###OOOlllHHH{{{###SSS~~~sss___hhhiii###SSS///___hhh KKK333###OOO+++```iii###SSS///___kkk```###SSS%%%___kkk KKK333###OOO҇FFF ```###SSS%%%___qqq###SSSppp___ KKK555###MMM Ɇ///qqq###SSSppp___###SSS222___ KKK===BBB!!!###SSS222___TTT[[[CCCddd<<< ###SSS$$$___TTT[[[CCC KKKQQQ+++&&&sss)))mmmddd<<< ###SSS$$$___>>> ,,,??? + + +\\\###SSSnnn___>>> ,,, KKKwwwKKKvvv??? + + +\\\###SSSnnn___gggsssMMM///###SSS333(((gggsss KKKxxxeeeeeeMMM///###SSS333(((GGG|||,,,kkkZZZ@@@###SSSGGG|||,,, KKK666sssfffkkkZZZ@@@###SSS)))444LLLYYYЃjjj###SSS)))444LLL 888GGG^^^lll999$$$YYYЃjjj###SSSRRR<<<###SSS555RRR<<< kkk >>>BBB###SSS555ooo444 + + +$$$]]]III RRR###SSSooo444 + + +$$$]]] kkk싋DDD;;;mmm000(((cccIII RRR###SSS +endstream +endobj + +16 0 obj +<> +stream + +q +595 0 0 818.125 0 11.9375 cm +/fzImg0 Do +Q + +endstream +endobj + +xref +0 17 +0000000000 65535 f +0000000016 00000 n +0000000062 00000 n +0000000128 00000 n +0000000174 00000 n +0000000281 00000 n +0002640436 00000 n +0002643087 00000 n +0002643121 00000 n +0002643215 00000 n +0002643262 00000 n +0002643371 00000 n +0005283527 00000 n +0005283622 00000 n +0005283670 00000 n +0005283780 00000 n +0007923936 00000 n + +trailer +<<589368D47622B715717C47F6DD67FB8E>]>> +startxref +7924031 +%%EOF diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_sales_report.pdf b/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_sales_report.pdf new file mode 100644 index 0000000..178c638 --- /dev/null +++ b/packages/markitdown-ocr/tests/ocr_test_data/pdf_scanned_sales_report.pdf @@ -0,0 +1,342 @@ +%PDF-1.7 +%µ¶ + +1 0 obj +<> +endobj + +2 0 obj +<> +endobj + +3 0 obj +<>>> +endobj + +4 0 obj +<> +endobj + +5 0 obj +<>/Width 800/Height 1100/BitsPerComponent 8/ColorSpace 7 0 R/Length 2640000>> +stream +''' ԃLLL!!!&&&TTT###SSS### !!!AAAttt[[[+++### !!!AAAtttGGGrrrsss888!!! + + +333dddGGG+++sss888!!! + + +333ddd### !!!AAAttt+++---ZZZ㛛TTT/// JJJ### !!!AAAttt[[[]]]BBB999QQQ###SSS]]]aaa###;;;[[[+++###;;;GGG###NNN]]]aaaGGG+++###;;;+++ UUU888###;;;[[[~~~xxxTTTxxx###SSS + + + + + +###[[[+++###GGGVVVNNNLLL555 + + + + + +GGG+++LLL555###+++TTTVVV###[[[HHH\\\''' MMMEEEEEE###SSS444### uuu;;;GGG+++### uuu;;;GGGdddYYY777|||111000444GGG+++YYY777|||111000### uuu;;;+++VVV<<>>###SSS111"""999###֣???aaa___+++333###֣???aaaGGG===ttt 777ttt111"""999GGG+++333 777ttt###֣???aaa+++333}}} FFF###֣???aaa___:::LLLOOO'''###SSS###KKK___+++333###KKKGGGNNNcccJJJGGG+++333cccJJJ###KKK+++333З444[[[###KKK___ [[[///###SSSooo"""###___+++333###GGGaaa{{{((()))ooo"""GGG+++333{{{((()))###+++333HHH!!!ddd###___@@@)))OOO###SSS>>>WWW###***]]]___+++###***]]]GGG222ݚZZZ>>>WWWGGG+++ݚZZZ###***]]]+++mmmUUU###***]]]___uuu___>>> %%%UUUQQQ### 222:::___+++ ### 222:::GGGݜUUU QQQGGG+++ ݜUUU ### 222:::+++ 666www CCC### 222:::___fffvvv!!!+++ + + +\\\LLL###___+++ ###GGG'''!!!LLLGGG+++ !!!###+++ ---###___gggCCC***RRR555fff###___+++ ###GGG'''GGG+++ ###+++ ^^^ ###___gggCCCqqqdddLLL}}}###iii___+++ ###iiiGGG'''@@@***GGG+++ @@@***###iii+++ gggfff###iii___gggCCC((( kkk))) ***###+++%%%___+++ ###+++%%%GGG'''PPPlll))) ***GGG+++ PPPlll###+++%%%+++  +++###+++%%%___wwwccc444WWWMMMgggSSSHHHVVVLLL###eee___+++ ###eeeGGG''' {{{VVVLLLGGG+++  {{{###eee+++ kkkhhh'''###eee___eeeuuu + + +###fff###zzz___+++ ###zzzGGG'''fff{{{fffGGG+++ fff{{{###zzz+++  ###zzz___...???XXX###|||444vvvSSSkkkQQQ}}}###''''''___+++ ###''''''GGG'''QQQ}}}GGG+++ ###''''''+++ DDD###''''''___ ,,,vvvˑ>>>HHHaaaUUU jjj###}}}___+++ ###}}}GGG'''gggcccjjjGGG+++ gggccc###}}}+++ XXXWWWyyy###}}}___QQQ;;;/// + + +NNN??? ]]] + + +\\\QQQCCC###ccc___+++###ccc555'''888888nnn'''FFFCCC555+++888888nnn'''FFF###ccc+++VVV???QQQ !!!###ccc___qqq]]] WWW###vvvFFF###999___+++www###999+++'''KKKNNN###vvvFFF++++++wwwKKKNNN###999+++wwwNNN###999___TTTggg<<<NNN ???~~~###___+++www###+++'''$$$~~~++++++www$$$###+++wwwVVV999###___222ՃJJJ '''TTT,,,nnn=== + + + EEE|||!!!:::www###JJJ ___+++www###JJJ +++'''}}}HHH%%%666kkk!!!:::www++++++www}}}HHH%%%666kkk###JJJ +++www擓VVV''' KKK###JJJ ___ + + +rrrXXX䆆@@@ &&&[[[RRR ...wwwaaaEEERRR ...www---{{{'''---gggPPPrrr[[[$$$ EEEsss&&&RRRXXXRRRppp{{{'''kkk ///555...ddd eee]]]졡 333hhh|||eee]]]졡 {{{'''###~~~QQQfff @@@###hhh YYYUUUnnnhhh YYY{{{'''/// + + +ssstttzzz444<<<ccc cccUUUTTT {{{'''///kkkooo(((MMM""" !!!&&&III333!!!&&& {{{'''///aaa!!!666 777...GGGXXX888 777aaafff??? 111TTT SSSggg LLLbbb&&& 999aaa!!!666YYY oooaaafff??? 111aaafff??? 111TTT SSS 777pppnnn$$$777``` gggxxx```  {{{'''///}}}fff---GGG^^^fff bbbrrr666ggg ~~~www777}}}777ooo bbbrrr bbbrrr666gggfff ___xxxYYY MMMFFF222 MMM<<<iii {{{'''///~~~JJJ죣VVVooo```QQQ蔔VVVoooccc,,,GGGߜHHH{{{RRRJJJ죣tttooooooccc,,,oooccc,,,GGGߜVVV^^^hhh(((MMMUUU222BBBUUU%%% {{{'''///\\\ ggg ooodddNNN &&&___}}} + + +RRRyyy\\\ ggglllHHHooo&&&___&&&___}}} :::|||ccc***KKK|||ccc {{{'''/// ~~~xxx nnnooo___ tttnnn[[[```...hhh___eee444~~~xxx +++```ooo[[[```[[[```...hhhnnnVVV999}}}:::---LLL}}}:::'''}}} {{{'''OOO>>>999oooTTTvvv{{{ddd)))]]]KKK--->>>999҇FFF ooovvv{{{vvv{{{dddxxx + + +XXX???MMMfff999999CCCfff999hhh333 {{{'''777ooo|||www{{{şnnn///ccc///[[[///RRRɆ///ooowww{{{www{{{şnnn///ccc + + +wwwZZZEEEJJJHHH444EEEJJJ {{{'''### ooo{{{www{{{CCC ccc///[[[\\\ !!!ooowww{{{www{{{CCC ccc555MMM(((+++gggnnnxxx+++ggg[[['''///aaaooo|||www{{{777+++SSSccc///[[['''RRRaaasss)))mmmooowww{{{www{{{777+++SSScccIIIVVVhhhtttGGG555[[['''///444333ooo SSSwww{{{mmmiii ccc///[[[DDD000CCC'''444333vvvooowww{{{www{{{mmmiii cccrrr   {{{'''///rrrooo]]] qqqwww{{{:::aaa///[[[rrreeeeeeooowww{{{www{{{:::aaaOOOXXXEEE{{{999000FFFnnn000 {{{'''///AAAjjj+++ooocccCCCwww{{{;;;KKK[[[///[[[^^^xxxAAAjjj+++666sssfffSSSJJJooowww{{{www{{{;;;KKK[[[]]]mmm;;;---VVVUUU+++iii|||UUU {{{'''///ttt888XXXOOOooo^^^QQQ蕕www{{{LLLrrr稨222OOO///[[[yyyVVVkkkwwwttt888XXXOOOlll999$$$MMMrrrDDD ooowww{{{www{{{LLLrrr稨222OOOyyy%%%\\\ {{{'''///mmmHHHoooYYY www{{{```XXX***///[[[iii\\\mmmHHH>>>BBBmmm[[[ooowww{{{www{{{```XXX***iii'''懇@@@ %%%ZZZrrrbbb??? {{{'''///___!!!***[[[oooUUU<<<www{{{jjj### &&&fff///[[[VVVEEE___!!!***[[[mmm000(((cccvvv'''FFF[[[ooowww{{{www{{{jjj### &&&fff111sssAAAɰ///%%%555333 &&&XXX777KKK\\\+++ +++dddKKK PPP###___uuuHHH KKK %%%```333 &&&XXX777KKKuuuHHH KKK666LLL''' + + +DDDKKK PPPxxxSSS ###___333kkk777KKK'''###***___ KKK ccc333kkk777KKK KKK444UUU'''>>>MMM###***___333###ᙙ777ppp sss뼼fffUUUvvvooo###www___%%%vvv KKK 888!!!333###ᙙ777%%%vvv KKK222^^^'''###HHHUUUvvvooo [[[jjj###www___333###%%%777;;;jjj___888###___NNNMMMuuu KKK KKK 333###%%%777NNNMMMuuu KKK111hhh'''/// + + +jjj___888DDD yyy!!!###___333###XXX777+++VVV)))###''')))___ KKK KKK333###XXX777 KKK///qqq'''///eeeVVV)))eeeNNNXXX ###''')))___333###sss777iii444PPP>>> + + +___###SSS|||uuu___&&&;;; KKK KKK333###sss777&&&;;; KKK---{{{'''///888>>> + + +___fff###SSS|||uuu___333###III777,,,aaa,,, + + +###SSS...___'''=== KKK KKKCCC,,,333###III777'''=== KKK+++'''///aaa,,, + + +mmmRRRGGG###SSS...___333###555777xxx|||###SSS'''___iii KKK KKKRRR333###555777iii KKK***'''///www|||TTT222######SSS'''___333###҅ 777{{{###SSS~~~sss___aaa=== KKK '''333###҅ 777aaa=== KKK((('''///```{{{DDDjjj\\\888###SSS~~~sss___333777 iii###SSS///___ VVV KKK iii333777 VVV 444'''&&&'''///LLLiii'''mmm###SSS///___333\\\777 ```###SSS%%%___ ... KKK 888<<<///333\\\777 ... qqq]]]'''///MMM```!!!mmm'''###SSS%%%___333###aaaWWW777qqq###SSSppp___999XXX KKK KKKqqq333###aaaWWW777999XXX ---'''///]]]qqqdddDDDkkkAAA###SSSppp___333###:::777⿿SSS###SSS222___\\\ KKK KKK### 333###:::777\\\ + + + '''///~~~ ###aaaQQQ###SSS222___333###VVVYYY777 + + +///oooddd<<< ###SSS$$$___ttt KKK KKK___333###VVVYYY777ttt KKK000'''///ddd<<< BBBddd######SSS$$$___333###777___ooo??? + + +\\\###SSSnnn___bbb KKK KKKbbb333###777bbb KKKCCCkkk'''///''' ??? + + +\\\000000###SSSnnn___333###777%%%oooMMM///###SSS333(((~~~""" KKK KKK''' 333###777~~~""" KKK'''///jjjMMM///tttfff\\\^^^$$$###SSS333(((333###SSS[[[777333kkkZZZ@@@###SSSLLLEEE KKK KKK[[[333###SSS[[[777LLLEEE KKK'''/// + + +kkkZZZ@@@111GGGjjj###SSS333###777 ^^^|||YYYЃjjj###SSSBBBOOO 888 888QQQ 333###777BBBOOO KKK]]];;;'''###===YYYЃjjj^^^###SSS333###!!!777+++ ooo###SSS555 kkk 333###!!!777 KKK zzz'''@@@555###SSS555333###999uuu777lll444'''XXXIII RRR###SSS%%%||| kkk DDD333###999uuu777%%%||| KKK''' '''VVVIII RRR|||<<<###SSS333 &&&XXX333 &&&XXX\\\+++ +++dddjjj333kkk333kkk'''xxx???333###ᙙ333###ᙙppp sss뼼fffKKK???333###%%%333###%%%;;;KKK???333###XXX333###XXX+++KKK???333###sssaaa!!!666@@@ ---oooKKKXXX888ggg LLL333###sssaaa!!!666aaa!!!666ggg LLLoooaaa!!!666iii444PPP 777XXX888888>>>}}}www'''hhh!!!JJJ333###III}}}KKK111\\\KKK^^^ ~~~333###III}}}}}} ~~~ooo}}},,,fff^^^xxx>>>///tttfff + + +333###555JJJ죣^^^lllCCCKKK```QQQ蔔333###555JJJ죣FFF///CCC>>>JJJ죣oooJJJ죣xxxVVV```QQQ蔔666KKK???ﲲ333###҅ \\\ ggg}}}GGGKKKdddNNN + + +333###҅ \\\ ggg\\\ ggg + + +ooo\\\ ggg dddNNNuuu KKK???333~~~xxx jjjKKK___ ttt___eee333~~~xxx ~~~ ~~~xxx ___eeeooo~~~xxx nnn___ ttthhh222<<<<<<XXX\\\KKK???]]]ccc333\\\>>>999"""]]]000KKKTTT)))]]]333\\\>>>999```%%%777XXX>>>999)))]]]ooo>>>999TTTppp}}}pppDDDKKK???444VVV333###aaaWWWXXXKKK|||///[[[333###aaaWWW///[[[ooo||| + + +... + + +KKK???777SSS333###::: cccKKK{{{///[[[333###::: uuu ///[[[ooo ⿿SSS{{{LLLiiiDDDKKK???777SSS333###VVVYYYaaaXXXKKK|||///[[[333###VVVYYYaaa{{{+++qqqaaa///[[[oooaaa + + +///ooo|||&&&AAA...999KKK???777SSS333###444333[[[333KKK SSS///[[[333###444333444333///[[[ooo444333___ooo SSSiiirrrFFFKKK???777SSS333###rrr]]]KKK]]] qqq///[[[333###rrr777{{{}}}---rrr///[[[ooorrr%%%ooo]]] qqq000''' ,,,LLL???777SSS333###SSS[[[AAAjjj+++XXXKKKcccCCC///[[[333###SSS[[[AAAjjj+++...---AAAjjj+++///[[[SSSJJJoooAAAjjj+++333cccCCC|||555 + + +yyyQQQ:::777SSS333###ttt888XXXOOOXXXqqqSSSLLL^^^QQQ蕕///[[[333###ttt888XXXOOOttt888XXXOOO///[[[MMMrrrDDD ooottt888XXXOOO ^^^|||^^^QQQ蕕EEE===eee777SSS333###!!!mmmHHH>>><<<&&&PPPYYY ///[[[333###!!!mmmHHHQQQGGGmmmHHH///[[[mmm[[[ooommmHHH+++ oooYYY 777SSS333###999uuu___!!!***[[[999...///ZZZUUU<<<///[[[333###999uuu___!!!***[[[___!!!***[[[///[[[vvv'''FFF[[[ooo___!!!***[[[lll444'''XXXUUU<<>>&&&qqq[[[eeexxx777<<<fff>>>&&&qqq---RRR###SSS'''___```QQQ蔔VVVKKK???ﲲiiioooccc,,,JJJ죣VVVHHH{{{RRRGGGߜjjj MMM [[[---jjjѧ  + + + 999###SSS~~~sss___dddNNN KKK???aaa=== &&&___\\\ ggg RRRyyy}}}aaaRRRUUUrrr⋋ [[[***rrr⋋ 000\\\ccc:::###SSS///______ tttnnnKKK???]]]ccc VVV[[[```~~~xxx nnn444...hhhnnn888|||ccc---@@@[[[\\\333---@@@555(((PPP888^^^###SSS%%%___TTTKKK???444VVV ...vvv{{{>>>999KKK---dddmmm!!!DDD}}}:::eee[[[222iii 333eeeAAA \\\&&&222!!!###SSSppp___|||KKK???777SSS999XXXwww{{{///RRRşnnn///cccRRRFFFfff999[[[666㿿}}}ggg+++SSSfff###SSS222___{{{KKK???777SSS\\\www{{{ \\\CCC ccc555EEEJJJ[[[SSS ###SSS$$$___|||KKK???777SSStttwww{{{aaa'''RRR777+++SSSccc+++ggg[[[888hhh {{{666###SSSnnn___ SSSKKK???777SSSbbbwww{{{444333DDD000CCC'''mmmiii cccfff///mmm<<<[[[BBB<<< + + +AAA###SSS333(((]]] qqqLLL???777SSS~~~"""www{{{rrr:::aaa222!!!sssGGG[[[KKKsssGGGbbb999###SSScccCCCQQQ:::777SSSLLLEEEwww{{{AAAjjj+++^^^xxx;;;KKK[[[000DDD[[[>>>]]]DDD'''###SSS^^^QQQ蕕eee777SSSBBBOOOwww{{{ttt888XXXOOOyyyVVVkkkwwwLLLrrr稨222OOO}}}'''UUUkkkHHHzzz444[[[MMMkkkHHHzzz444JJJ)))XXXkkk###SSS555YYY 777SSSwww{{{mmmHHHiii\\\```XXX***]]]999[[[]]]999:::777&&&!!!###SSSUUU<<<%%%777SSS%%%|||www{{{___!!!***[[[VVVEEEjjj### &&&fff󓓓EEE"""dddPPPCCC[[[CCC + + +PPPCCC333>>>```777KKKCCC777}}}VVV:::vvvRRR ...wwwRRR ...www::: \\\fff777KKK333000kkk RRRRRR111### 777FFFlll  666www???eee]]]졡 eee]]]졡 333YYY```}}}777www///""",,,nnnhhh YYYhhh YYY 777!!!111444<<<uuu[[['''jjjnnncccccc***kkk777ooo 777XXX888UUU + + +***sssaaa!!!666rrr ppp[[[~~~ !!!&&&!!!&&&555 %%%777ooofff^^^ }}}777===[[[eeexxx``` ``` ---RRR777oooVVV```QQQ蔔 򤤤 JJJ죣jjj 666\\\[[[---jjjѧ MMM MMM + + + 999777ooo dddNNN ___\\\ gggaaaRRRGGG[[[***UUUUUU000\\\ccc:::777 ooonnn___ tttWWWppp ~~~xxx nnn888"""bbb[[[\\\333|||ccc|||ccc555(((PPP888^^^777 oooTTT>>>999mmm!!!DDDccc___蚚 [[[222iii 333}}}:::}}}:::AAA \\\&&&222!!!777ooo|||RRRFFFyyyMMM[[[666㿿}}}fff999fff999ggg+++SSSfff777ooo{{{ 555XXX[[[SSSEEEJJJEEEJJJ 777ooo|||aaa333III[[[888hhh +++ggg+++ggg{{{666777ooo SSS444333fff///mmm000CCC[[[BBB + + +AAA777ooo]]] qqqtttiiirrr222!!!YYY[[[KKKbbb999777SSSJJJooocccCCC tttAAAjjj+++sss777[[[>>>]]]000000'''777MMMrrrDDD ooo^^^QQQ蕕ttt888XXXOOO}}}'''OOOWWW[[[MMMUUUUUUJJJ)))XXXkkk777mmm[[[oooYYY RRR'''mmmHHHOOO[[[:::777&&&!!!777vvv'''FFF[[[oooUUU<<<KKK444___!!!***[[[󓓓EEE"""dddQQQ222}}}[[[CCC + + +333>>>```uuuHHH'''---gggPPPrrrCCC777}}}PPPUUURRR ...wwwvvvPPPUUUqqq{{{::: \\\fff'''kkk 333xxx]]]RRRkkk xxx]]]==={{{111### %%%vvv'''###~~~FFFlll FFF```xxxeee]]]졡 FFF```xxx mmm333YYY```}}}NNNMMMuuu'''/// + + +ssswww666WWW + + +hhh YYY,,,nnn666WWW + + + '''///kkkooo!!!111ccc[[['''jjjnnn***kkk&&&;;;OOO///qqqTTT SSS'''///TTT SSSbbb&&& 999CCCGGGbbb&&& 999rrr>>>!!!&&&[[[~~~ >>>ooo333555 %%%'''===eee666ggg'''///666gggwww777CCCGGGwww777III``` [[[eeexxxIII<<<fff>>>&&&qqq---RRRiiiqqq쯯GGGߜ'''///~~~GGGߜHHH{{{RRRoooHHH{{{RRRjjj欬999 MMM[[[---jjjѧ欬999  + + + 999aaa=== YYY===444}}}'''///}}}RRRyyyoooRRRyyyaaaRRR;;;UUU[[[***;;;rrr⋋ 000\\\ccc::: VVVPPP'''...hhh'''/// ...hhh444ooo444nnn888444|||ccc[[[\\\333444---@@@555(((PPP888^^^ ...'''ddd'''OOOdddKKK---oooKKK---mmm!!!DDD랞}}}:::[[[222iii 333랞eeeAAA \\\&&&222!!!999XXX777]]]şnnn///ccc'''777şnnn///ccc///RRRooo///RRRRRRFFF 222fff999[[[666㿿}}} 222ggg+++SSSfff\\\ CCC ccc'''###CCC ccc\\\ooo\\\555rrrEEEJJJ[[[SSSrrr tttЊHHH + + +777+++SSSccc'''///777+++SSSccc'''RRRooo'''RRR+++ggg[[[888hhh {{{666bbb888mmmiii ccc'''///mmmiii cccDDD000CCC'''oooDDD000CCC'''fff///mmmppp&&&CCC[[[BBBppp&&&CCC<<< + + +AAA~~~"""JJJ!!!PPP:::aaa'''///:::aaaooo222!!!gggXXX[[[KKKgggXXXsssGGGbbb999LLLEEE)))***+++BBB;;;KKK[[['''///;;;KKK[[[^^^xxxooo^^^xxx___```000[[[>>>]]]___```DDD'''BBBOOO<<>>``` KKKjjjuuuHHHCCC777}}}aaaEEENNNTTTvvvRRR ...www::: \\\fff KKKxxx???333XXX]]]qqqkkk RRR111### KKKKKK???%%%vvvFFFlll  333hhh|||ggg666懇mmm eee]]]졡 333YYY```}}} KKKKKK???NNNMMMuuuwwwUUUnnn===,,,nnnhhh YYY  KKKKKK???!!!111 rrr<<<QQQ[[['''jjjnnn444<<<ccc***kkk KKKTTT SSSggg LLL&&&;;;aaafff??? 111aaa!!!666 777bbb&&& 999TTT SSSrrrIII333>>>+++ [[[~~~  !!!&&&555 %%% KKK666ggg ~~~'''=== bbbrrr}}}fffwww777666ggggggxxx,,,JJJ[[[eeexxx777``` ---RRR KKKGGGߜKKK???iiioooccc,,,JJJ죣VVVHHH{{{RRRGGGߜjjjFFF222999>>>[[[---jjjѧ MMM + + + 999 KKK}}}KKK??? + + +aaa=== &&&___\\\ ggg RRRyyy}}}aaaRRR222BBBfff + + +EEE[[[***UUU000\\\ccc::: KKK...hhhKKK???___eee VVV[[[```~~~xxx nnn444...hhhnnn888***KKKhhh[[[\\\333|||ccc555(((PPP888^^^ KKKdddKKK???)))]]] ...vvv{{{>>>999KKK---dddmmm!!!DDD---LLLKKKVVV萐 + + +[[[222iii 333}}}:::AAA \\\&&&222!!! KKKşnnn///cccKKK???///[[[999XXXwww{{{///RRRşnnn///cccRRRFFF999CCC???)))[[[666㿿}}}fff999ggg+++SSSfff KKKCCC cccKKK???///[[[\\\www{{{ \\\CCC ccc555HHH444::: 222[[[SSSEEEJJJ  KKK777+++SSScccKKK???///[[[tttwww{{{aaa'''RRR777+++SSScccnnnxxx[[[888hhh +++ggg{{{666 KKKmmmiii cccKKK???///[[[bbbwww{{{444333DDD000CCC'''mmmiii cccfff///mmmGGG555RRR777[[[BBB + + +AAA KKK:::aaaLLL???///[[[~~~"""www{{{rrr:::aaa222!!!  [[[KKKbbb999 KKK;;;KKK[[[QQQ:::///[[[LLLEEEwww{{{AAAjjj+++^^^xxx;;;KKK[[[FFFnnnUUU[[[>>>]]]000''' 888LLLrrr稨222OOOeee///[[[BBBOOOwww{{{ttt888XXXOOOyyyVVVkkkwwwLLLrrr稨222OOO}}}'''+++iii|||LLLFFF[[[MMMUUUJJJ)))XXXkkk kkk```XXX***///[[[www{{{mmmHHHiii\\\```XXX***\\\,,,[[[:::777&&&!!! kkkjjj### &&&fff%%%///[[[%%%|||www{{{___!!!***[[[VVVEEEjjj### &&&fff󓓓EEE"""dddbbb???333 !!!eee[[[CCC + + +333>>>```sssKKK PPPsssuuuHHH KKKCCC777}}}VVV:::---{{{vvvRRR ...wwwPPPUUU::: \\\fffssssss KKK333000ppp{{{kkk RRRxxx]]]111### 鿿222UUUvvvooo鿿222%%%vvv KKKFFFlll 666www???{{{eee]]]졡 FFF```xxx333YYY```}}}CCCjjj___888CCCNNNMMMuuu KKKwww///"""{{{,,,nnnhhh YYY666WWW + + + CCCVVV)))CCC KKK!!!111uuuUUUTTT {{{[[['''jjjnnnccc***kkkCCC>>> + + +___CCC&&&;;; KKKrrrppp {{{[[[~~~ !!!&&&>>>555 %%%CCCaaa,,, + + +CCC'''=== KKK=== {{{[[[eeexxx``` III---RRRCCC|||CCCiii KKKjjj 666\\\<<<iii {{{[[[---jjjѧ MMM欬999 + + + 999CCC{{{CCCaaa=== KKKaaaRRRGGG%%% {{{[[[***UUU;;;000\\\ccc:::CCCiiiCCC VVV KKKnnn888"""bbb {{{[[[\\\333|||ccc444555(((PPP888^^^CCC```CCC ... KKKmmm!!!DDDccc___蚚 '''}}} {{{[[[222iii 333}}}:::랞AAA \\\&&&222!!!CCCqqqCCC999XXX KKKRRRFFFyyyMMMhhh333 {{{[[[666㿿}}}fff999 222ggg+++SSSfffCCCCCC\\\ KKK555XXX {{{[[[SSSEEEJJJrrr CCCddd<<< CCCttt KKK333III[[[[[[888hhh +++ggg{{{666CCC??? + + +\\\CCCbbb KKKfff///mmm000CCC[[[[[[BBBppp&&&CCC + + +AAACCCMMM///CCC~~~""" KKK222!!!YYY {{{[[[KKKgggXXXbbb999CCCkkkZZZ@@@CCCLLLEEE KKKsss777 {{{[[[>>>]]]000___```'''CCCYYYЃjjjCCCBBBOOO 888}}}'''OOOWWW {{{[[[MMMUUU\\\RRRᄄJJJ)))XXXkkkCCCCCC kkkOOO {{{[[[TTT,,,:::777&&&!!!CCCIII RRRCCC%%%||| kkk󓓓EEE"""dddQQQ222}}} {{{[[[CCC + + +LLLBBB333>>>```sssPPPrrrsssjjjsss sssxxx???鿿222sssKKK???CCCssssssKKK???CCCooosssKKK???CCCXXX888UUU + + +***sssUUU + + +***sssaaa!!!666 777...GGGXXX888 777aaafff??? 111ggg LLL@@@ ---oooKKKUUU + + +***sss 777XXX888>>> ;;;sssooobbb&&& 999OOO///qqqCCC^^^  }}}fff---GGG^^^fff bbbrrr ~~~KKK111\\\KKK fff^^^FFFwwwsssooowww777eeeCCC```QQQ蔔 򤤤  򤤤 JJJ죣VVVooo```QQQ蔔VVVoooccc,,,^^^lllCCCKKK 򤤤 VVV```QQQ蔔TTTttt===sssoooHHH{{{RRRKKK???qqq쯯CCCdddNNN ___ ___\\\ ggg ooodddNNN &&&___ + + +}}}GGGKKK ___ dddNNN999sssoooRRRyyyKKK???YYY===444CCC___ tttWWWppp WWWppp ~~~xxx nnnooo___ tttnnn[[[```___eeejjjKKKWWWppp nnn___ ttthhhsssooo444KKK???PPP'''CCCTTT>>>999oooTTTvvv{{{)))]]]"""]]]000KKKTTT&&&SSSsssoooKKK---KKK???'''CCC|||ooo|||www{{{///[[[XXXKKK||| + + +vvv===sssooo///RRRKKK???777]]]CCC{{{ ooo{{{www{{{///[[[cccKKK{{{KKKsssooo\\\KKK??? CCC|||aaaooo|||www{{{///[[[XXXKKK||| uuuCCCsssooo'''RRRKKK???ЊHHH + + +CCC SSS444333ooo SSSwww{{{///[[[[[[333KKK SSS)))NNN!!!sssoooDDD000CCC'''KKK???888CCC]]] qqqtttiiitttiiirrrooo]]] qqqwww{{{///[[[]]]KKKtttiii]]] qqqqqq sssoooLLL???JJJ!!!PPPCCCcccCCC ttt tttAAAjjj+++ooocccCCCwww{{{///[[[XXXKKK tttcccCCCjjjLLLsssSSSJJJooo^^^xxxQQQ:::)))***+++BBBCCC^^^QQQ蕕ttt888XXXOOOooo^^^QQQ蕕www{{{///[[[XXXqqqSSSLLL^^^QQQ蕕rrr^^^LLLsssMMMrrrDDD oooyyyVVVkkkwwweee<<>><<<&&&PPPRRR'''YYY ccc%%%...sssmmm[[[oooiii\\\hhhCCCUUU<<<KKK444KKK444___!!!***[[[oooUUU<<<www{{{///[[[999...///ZZZKKK444UUU<<>> ;;;sssooobbb&&& 999&&&;;;rrr>>> [[[~~~ '''///fff^^^FFFwwwsssooowww777'''===III777[[[eeexxx'''///~~~VVV```QQQ蔔TTTttt===sssoooHHH{{{RRRKKK???iiijjj欬999[[[---jjjѧ'''/// dddNNN999sssoooRRRyyyKKK???aaa=== aaaRRR;;;[[[***'''/// nnn___ ttthhhsssooo444KKK??? VVVnnn888444[[[\\\'''OOOTTT&&&SSSsssoooKKK---KKK??? ...mmm!!!DDD랞[[[222iii '''777||| + + +vvv===sssooo///RRRKKK???999XXXRRRFFF 222[[[666'''###{{{KKKsssooo\\\KKK???\\\555rrr[[[SSS'''///||| uuuCCCsssooo'''RRRKKK???ttt[[[888hhh '''/// SSS)))NNN!!!sssoooDDD000CCC'''KKK???bbbfff///mmmppp&&&CCC[[[BBB'''///]]] qqqqqq sssoooLLL???~~~"""222!!!gggXXX[[[KKK'''///cccCCCjjjLLLsssSSSJJJooo^^^xxxQQQ:::LLLEEE___```[[[>>>]]]'''///^^^QQQ蕕rrr^^^LLLsssMMMrrrDDD oooyyyVVVkkkwwweeeBBBOOO}}}'''\\\RRRᄄ[[[MMM'''///YYY ccc%%%...sssmmm[[[oooiii\\\TTT,,,[[['''///UUU<<>> ;;;sssooobbb&&& 999 KKKrrr!!!&&&aaa[[[~~~ '''///fff^^^FFFwwwsssooowww777 KKKCCC,,,``` @@@[[[eeexxx'''///~~~VVV```QQQ蔔TTTttt===sssoooHHH{{{RRRKKK??? KKKRRRjjj MMM[[[---jjjѧ'''/// dddNNN999sssoooRRRyyyKKK??? '''aaaRRRUUU222[[[***'''/// nnn___ ttthhhsssooo444KKK??? iiinnn888|||ccc[[[\\\'''OOOTTT&&&SSSsssoooKKK---KKK??? 888<<<///mmm!!!DDD}}}:::KKK [[[222iii '''777||| + + +vvv===sssooo///RRRKKK??? KKKqqqRRRFFFfff999aaa[[[666'''###{{{KKKsssooo\\\KKK??? KKK### 555EEEJJJ[[[SSS'''///||| uuuCCCsssooo'''RRRKKK??? KKK___+++gggGGG[[[888hhh '''/// SSS)))NNN!!!sssoooDDD000CCC'''KKK??? KKKbbbfff///mmmfff[[[BBB'''///]]] qqqqqq sssoooLLL??? KKK''' 222!!![[[KKK'''///cccCCCjjjLLLsssSSSJJJooo^^^xxxQQQ::: KKK[[[000[[[>>>]]]'''///^^^QQQ蕕rrr^^^LLLsssMMMrrrDDD oooyyyVVVkkkwwweee 888QQQ }}}'''UUU___ [[[MMM'''///YYY ccc%%%...sssmmm[[[oooiii\\\ DDD+++[[['''///UUU<<> +stream + +mntrRGB XYZ acspAPPL- +desc|cprtx(wtptbkptrXYZgXYZbXYZrTRC gTRC bTRC desc"Artifex Software sRGB ICC Profile"Artifex Software sRGB ICC ProfiletextCopyright Artifex Software 2011XYZ QXYZ XYZ o8XYZ bXYZ $curv +#(-27;@EJOTY^chmrw| %+28>ELRY`gnu| &/8AKT]gqz !-8COZfr~ -;HUcq~ +:IXgw'7HYj{+=Oat 2FZn  % : O d y + +' += +T +j + + + + + + " 9 Q i  * C \ u & @ Z t .Id %A^z &Ca~1Om&Ed#Cc'Ij4Vx&IlAe@e Ek*Qw;c*R{Gp@j>i  A l !!H!u!!!"'"U"""# +#8#f###$$M$|$$% %8%h%%%&'&W&&&''I'z''( (?(q(())8)k))**5*h**++6+i++,,9,n,,- -A-v--..L.../$/Z///050l0011J1112*2c223 3F3334+4e4455M555676r667$7`7788P8899B999:6:t::;-;k;;<' >`>>?!?a??@#@d@@A)AjAAB0BrBBC:C}CDDGDDEEUEEF"FgFFG5G{GHHKHHIIcIIJ7J}JK KSKKL*LrLMMJMMN%NnNOOIOOP'PqPQQPQQR1R|RSS_SSTBTTU(UuUVV\VVWDWWX/X}XYYiYZZVZZ[E[[\5\\]']x]^^l^__a_``W``aOaabIbbcCccd@dde=eef=ffg=ggh?hhiCiijHjjkOkklWlmm`mnnknooxop+ppq:qqrKrss]sttptu(uuv>vvwVwxxnxy*yyzFz{{c{|!||}A}~~b~#G +k͂0WGrׇ;iΉ3dʋ0cʍ1fΏ6n֑?zM _ɖ4 +uL$h՛BdҞ@iءG&vVǥ8nRĩ7u\ЭD-u`ֲK³8%yhYѹJº;.! +zpg_XQKFAǿ=ȼ:ɹ8ʷ6˶5̵5͵6ζ7ϸ9к<Ѿ?DINU\dlvۀ܊ݖޢ)߯6DScs 2F[p(@Xr4Pm8Ww)Km +endstream +endobj + +7 0 obj +[/ICCBased 6 0 R] +endobj + +8 0 obj +<> +stream + +q +595 0 0 818.125 0 11.9375 cm +/fzImg0 Do +Q + +endstream +endobj + +xref +0 9 +0000000000 65535 f +0000000016 00000 n +0000000062 00000 n +0000000114 00000 n +0000000160 00000 n +0000000267 00000 n +0002640422 00000 n +0002643073 00000 n +0002643107 00000 n + +trailer +<<4D3CE2DA38B28A61008A8AC493DBB2BB>]>> +startxref +2643201 +%%EOF diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pptx_complex_layout.pptx b/packages/markitdown-ocr/tests/ocr_test_data/pptx_complex_layout.pptx new file mode 100644 index 0000000..10467ea Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/pptx_complex_layout.pptx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pptx_image_end.pptx b/packages/markitdown-ocr/tests/ocr_test_data/pptx_image_end.pptx new file mode 100644 index 0000000..1ed9804 Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/pptx_image_end.pptx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pptx_image_middle.pptx b/packages/markitdown-ocr/tests/ocr_test_data/pptx_image_middle.pptx new file mode 100644 index 0000000..315586a Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/pptx_image_middle.pptx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pptx_image_start.pptx b/packages/markitdown-ocr/tests/ocr_test_data/pptx_image_start.pptx new file mode 100644 index 0000000..32a50aa Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/pptx_image_start.pptx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/pptx_multiple_images.pptx b/packages/markitdown-ocr/tests/ocr_test_data/pptx_multiple_images.pptx new file mode 100644 index 0000000..a8eaa4d Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/pptx_multiple_images.pptx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/xlsx_complex_layout.xlsx b/packages/markitdown-ocr/tests/ocr_test_data/xlsx_complex_layout.xlsx new file mode 100644 index 0000000..6052c1e Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/xlsx_complex_layout.xlsx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/xlsx_image_end.xlsx b/packages/markitdown-ocr/tests/ocr_test_data/xlsx_image_end.xlsx new file mode 100644 index 0000000..3e26b33 Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/xlsx_image_end.xlsx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/xlsx_image_middle.xlsx b/packages/markitdown-ocr/tests/ocr_test_data/xlsx_image_middle.xlsx new file mode 100644 index 0000000..2a6c91b Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/xlsx_image_middle.xlsx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/xlsx_image_start.xlsx b/packages/markitdown-ocr/tests/ocr_test_data/xlsx_image_start.xlsx new file mode 100644 index 0000000..9e46182 Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/xlsx_image_start.xlsx differ diff --git a/packages/markitdown-ocr/tests/ocr_test_data/xlsx_multiple_images.xlsx b/packages/markitdown-ocr/tests/ocr_test_data/xlsx_multiple_images.xlsx new file mode 100644 index 0000000..eb8d0cf Binary files /dev/null and b/packages/markitdown-ocr/tests/ocr_test_data/xlsx_multiple_images.xlsx differ diff --git a/packages/markitdown-ocr/tests/test_docx_converter.py b/packages/markitdown-ocr/tests/test_docx_converter.py new file mode 100644 index 0000000..0fb6665 --- /dev/null +++ b/packages/markitdown-ocr/tests/test_docx_converter.py @@ -0,0 +1,223 @@ +""" +Unit tests for DocxConverterWithOCR. + +For each DOCX test file: convert with a mock OCR service then compare the +full output string against the expected snapshot. + +OCR block format used by the converter: + *[Image OCR] + MOCK_OCR_TEXT_12345 + [End OCR]* +""" + +import sys +from pathlib import Path +from typing import Any + +import pytest + +sys.path.insert(0, str(Path(__file__).parent.parent / "src")) + +from markitdown_ocr._ocr_service import OCRResult # noqa: E402 +from markitdown_ocr._docx_converter_with_ocr import ( # noqa: E402 + DocxConverterWithOCR, +) +from markitdown import StreamInfo # noqa: E402 + +TEST_DATA_DIR = Path(__file__).parent / "ocr_test_data" + +_MOCK_TEXT = "MOCK_OCR_TEXT_12345" + + +class MockOCRService: + def extract_text( # noqa: ANN101 + self, image_stream: Any, **kwargs: Any + ) -> OCRResult: + return OCRResult(text=_MOCK_TEXT, backend_used="mock") + + +@pytest.fixture(scope="module") +def svc() -> MockOCRService: + return MockOCRService() + + +def _convert(filename: str, ocr_service: MockOCRService) -> str: + path = TEST_DATA_DIR / filename + if not path.exists(): + pytest.skip(f"Test file not found: {path}") + converter = DocxConverterWithOCR() + with open(path, "rb") as f: + return converter.convert( + f, StreamInfo(extension=".docx"), ocr_service=ocr_service + ).text_content + + +# --------------------------------------------------------------------------- +# docx_image_start.docx +# --------------------------------------------------------------------------- + + +def test_docx_image_start(svc: MockOCRService) -> None: + expected = ( + "Document with Image at Start\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "This is the main content after the header image.\n\n" + "More text content here." + ) + assert _convert("docx_image_start.docx", svc) == expected + + +# --------------------------------------------------------------------------- +# docx_image_middle.docx +# --------------------------------------------------------------------------- + + +def test_docx_image_middle(svc: MockOCRService) -> None: + expected = ( + "# Introduction\n\n" + "This is the introduction section.\n\n" + "We will see an image below.\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "# Analysis\n\n" + "This section comes after the image." + ) + assert _convert("docx_image_middle.docx", svc) == expected + + +# --------------------------------------------------------------------------- +# docx_image_end.docx +# --------------------------------------------------------------------------- + + +def test_docx_image_end(svc: MockOCRService) -> None: + expected = ( + "Report\n\n" + "Main findings of the report.\n\n" + "Details and analysis.\n\n" + "Recommendations.\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + ) + assert _convert("docx_image_end.docx", svc) == expected + + +# --------------------------------------------------------------------------- +# docx_multiple_images.docx +# --------------------------------------------------------------------------- + + +def test_docx_multiple_images(svc: MockOCRService) -> None: + expected = ( + "Multi-Image Document\n\n" + "First section\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "Second section with another image\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "Conclusion" + ) + assert _convert("docx_multiple_images.docx", svc) == expected + + +# --------------------------------------------------------------------------- +# docx_multipage.docx +# --------------------------------------------------------------------------- + + +def test_docx_multipage(svc: MockOCRService) -> None: + expected = ( + "# Page 1 - Mixed Content\n\n" + "This is the first paragraph on page 1.\n\n" + "BEFORE IMAGE: Important content appears here.\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "AFTER IMAGE: This content follows the image.\n\n" + "More text on page 1.\n\n" + "# Page 2 - Image at End\n\n" + "Content on page 2.\n\n" + "Multiple paragraphs of text.\n\n" + "Building up to the image...\n\n" + "Final paragraph before image.\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "# Page 3 - Image at Start\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "Content that follows the header image.\n\n" + "AFTER IMAGE: This text is after the image." + ) + assert _convert("docx_multipage.docx", svc) == expected + + +# --------------------------------------------------------------------------- +# docx_complex_layout.docx +# --------------------------------------------------------------------------- + + +def test_docx_complex_layout(svc: MockOCRService) -> None: + expected = ( + "Complex Document\n\n" + "| | |\n" + "| --- | --- |\n" + "| Feature | Status |\n" + "| Authentication | Active |\n" + "| Encryption | Enabled |\n\n" + "Security notice:\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + ) + assert _convert("docx_complex_layout.docx", svc) == expected + + +# --------------------------------------------------------------------------- +# _inject_placeholders — internal unit tests (no file I/O) +# --------------------------------------------------------------------------- + + +def test_inject_placeholders_single_image() -> None: + converter = DocxConverterWithOCR() + html = "

Before

After

" + result_html, texts = converter._inject_placeholders(html, {"rId1": "TEXT"}) + assert " None: + converter = DocxConverterWithOCR() + html = "

Mid

" + result_html, texts = converter._inject_placeholders( + html, {"rId1": "FIRST", "rId2": "SECOND"} + ) + assert "MARKITDOWNOCRBLOCK0" in result_html + assert "MARKITDOWNOCRBLOCK1" in result_html + assert result_html.index("MARKITDOWNOCRBLOCK0") < result_html.index( + "MARKITDOWNOCRBLOCK1" + ) + assert len(texts) == 2 + + +def test_inject_placeholders_no_img_tag_appends_at_end() -> None: + converter = DocxConverterWithOCR() + html = "

No images

" + result_html, texts = converter._inject_placeholders(html, {"rId1": "ORPHAN"}) + assert "MARKITDOWNOCRBLOCK0" in result_html + assert texts == ["ORPHAN"] + + +def test_inject_placeholders_empty_map_leaves_html_unchanged() -> None: + converter = DocxConverterWithOCR() + html = "

Content

" + result_html, texts = converter._inject_placeholders(html, {}) + assert result_html == html + assert texts == [] + + +# --------------------------------------------------------------------------- +# No OCR service — no OCR tags emitted +# --------------------------------------------------------------------------- + + +def test_docx_no_ocr_service_no_tags() -> None: + path = TEST_DATA_DIR / "docx_image_middle.docx" + if not path.exists(): + pytest.skip(f"Test file not found: {path}") + converter = DocxConverterWithOCR() + with open(path, "rb") as f: + md = converter.convert(f, StreamInfo(extension=".docx")).text_content + assert "*[Image OCR]" not in md + assert "[End OCR]*" not in md diff --git a/packages/markitdown-ocr/tests/test_pdf_converter.py b/packages/markitdown-ocr/tests/test_pdf_converter.py new file mode 100644 index 0000000..5d4adcc --- /dev/null +++ b/packages/markitdown-ocr/tests/test_pdf_converter.py @@ -0,0 +1,234 @@ +""" +Unit tests for PdfConverterWithOCR. + +For each PDF test file: convert with a mock OCR service then compare the +full output string against the expected snapshot. + +OCR block format used by the converter: + *[Image OCR] + MOCK_OCR_TEXT_12345 + [End OCR]* +""" + +import io +import sys +from pathlib import Path +from typing import Any +from unittest.mock import MagicMock, patch + +import pytest + +sys.path.insert(0, str(Path(__file__).parent.parent / "src")) + +from markitdown_ocr._ocr_service import OCRResult # noqa: E402 +from markitdown_ocr._pdf_converter_with_ocr import ( # noqa: E402 + PdfConverterWithOCR, +) +from markitdown import StreamInfo # noqa: E402 + +TEST_DATA_DIR = Path(__file__).parent / "ocr_test_data" + +_MOCK_TEXT = "MOCK_OCR_TEXT_12345" +_OCR_BLOCK = f"*[Image OCR]\n{_MOCK_TEXT}\n[End OCR]*" +_PAGE_1_SCANNED = f"## Page 1\n\n\n\n\n{_OCR_BLOCK}" + + +class MockOCRService: + def extract_text( + self, # noqa: ANN101 + image_stream: Any, + **kwargs: Any, + ) -> OCRResult: + return OCRResult(text=_MOCK_TEXT, backend_used="mock") + + +@pytest.fixture(scope="module") +def svc() -> MockOCRService: + return MockOCRService() + + +def _convert(filename: str, ocr_service: MockOCRService) -> str: + path = TEST_DATA_DIR / filename + if not path.exists(): + pytest.skip(f"Test file not found: {path}") + converter = PdfConverterWithOCR() + with open(path, "rb") as f: + return converter.convert( + f, StreamInfo(extension=".pdf"), ocr_service=ocr_service + ).text_content + + +# --------------------------------------------------------------------------- +# pdf_image_start.pdf +# --------------------------------------------------------------------------- + + +def test_pdf_image_start(svc: MockOCRService) -> None: + expected = ( + "## Page 1\n\n\n\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n\n" + "This is text BEFORE the image.\n\n" + "The image should appear above this text.\n\n" + "This is more content after the image." + ) + assert _convert("pdf_image_start.pdf", svc) == expected + + +# --------------------------------------------------------------------------- +# pdf_image_middle.pdf +# --------------------------------------------------------------------------- + + +def test_pdf_image_middle(svc: MockOCRService) -> None: + expected = ( + "## Page 1\n\n\n" + "Section 1: Introduction\n\n" + "This document contains an image in the middle.\n\n" + "Here is some introductory text.\n\n\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n\n" + "Section 2: Details\n\n" + "This text appears AFTER the image." + ) + assert _convert("pdf_image_middle.pdf", svc) == expected + + +# --------------------------------------------------------------------------- +# pdf_image_end.pdf +# --------------------------------------------------------------------------- + + +def test_pdf_image_end(svc: MockOCRService) -> None: + expected = ( + "## Page 1\n\n\n" + "Main Content\n\n" + "This is the main text content.\n\n" + "The image will appear at the end.\n\n" + "Keep reading...\n\n\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + ) + assert _convert("pdf_image_end.pdf", svc) == expected + + +# --------------------------------------------------------------------------- +# pdf_multiple_images.pdf +# --------------------------------------------------------------------------- + + +def test_pdf_multiple_images(svc: MockOCRService) -> None: + expected = ( + "## Page 1\n\n\n" + "Document with Multiple Images\n\n\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n\n" + "Text between first and second image.\n\n\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n\n" + "Final text after all images." + ) + assert _convert("pdf_multiple_images.pdf", svc) == expected + + +# --------------------------------------------------------------------------- +# pdf_complex_layout.pdf +# --------------------------------------------------------------------------- + + +def test_pdf_complex_layout(svc: MockOCRService) -> None: + expected = ( + "## Page 1\n\n\n" + "Complex Layout Document\n\n" + "Table:\n\n" + "ItemQuantity\n\n\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n\n" + "Widget A5" + ) + assert _convert("pdf_complex_layout.pdf", svc) == expected + + +# --------------------------------------------------------------------------- +# pdf_multipage.pdf — pdfplumber/pdfminer fail (EOF); PyMuPDF fallback used +# --------------------------------------------------------------------------- + + +def test_pdf_multipage(svc: MockOCRService) -> None: + # pdfplumber cannot open this file (Unexpected EOF), so _ocr_full_pages + # falls back to PyMuPDF for page rendering. Each page becomes one OCR block. + expected = ( + f"## Page 1\n\n\n{_OCR_BLOCK}\n\n\n" + f"## Page 2\n\n\n{_OCR_BLOCK}\n\n\n" + f"## Page 3\n\n\n{_OCR_BLOCK}" + ) + assert _convert("pdf_multipage.pdf", svc) == expected + + +# --------------------------------------------------------------------------- +# pdf_scanned_*.pdf — raster-only pages → full-page OCR +# --------------------------------------------------------------------------- + + +def test_pdf_scanned_invoice(svc: MockOCRService) -> None: + assert _convert("pdf_scanned_invoice.pdf", svc) == _PAGE_1_SCANNED + + +def test_pdf_scanned_meeting_minutes(svc: MockOCRService) -> None: + assert _convert("pdf_scanned_meeting_minutes.pdf", svc) == _PAGE_1_SCANNED + + +def test_pdf_scanned_minimal(svc: MockOCRService) -> None: + assert _convert("pdf_scanned_minimal.pdf", svc) == _PAGE_1_SCANNED + + +def test_pdf_scanned_sales_report(svc: MockOCRService) -> None: + assert _convert("pdf_scanned_sales_report.pdf", svc) == _PAGE_1_SCANNED + + +def test_pdf_scanned_report(svc: MockOCRService) -> None: + expected = ( + f"{_PAGE_1_SCANNED}\n\n\n\n" + f"## Page 2\n\n\n\n\n{_OCR_BLOCK}\n\n\n\n" + f"## Page 3\n\n\n\n\n{_OCR_BLOCK}" + ) + assert _convert("pdf_scanned_report.pdf", svc) == expected + + +# --------------------------------------------------------------------------- +# Scanned PDF fallback path (pdfplumber finds no text → full-page OCR) +# --------------------------------------------------------------------------- + + +def test_pdf_scanned_fallback_format(svc: MockOCRService) -> None: + """_ocr_full_pages emits *[Image OCR]...[End OCR]* for each page.""" + path = TEST_DATA_DIR / "pdf_image_start.pdf" + if not path.exists(): + pytest.skip(f"Test file not found: {path}") + + converter = PdfConverterWithOCR() + with patch("pdfplumber.open") as mock_plumber: + mock_pdf = MagicMock() + mock_page = MagicMock() + mock_page.page_number = 1 + mock_pdf.pages = [mock_page] + mock_pdf.__enter__.return_value = mock_pdf + mock_plumber.return_value = mock_pdf + + with open(path, "rb") as f: + md = converter._ocr_full_pages(io.BytesIO(f.read()), svc) + + expected = "## Page 1\n\n\n" "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + assert ( + md == expected + ), f"_ocr_full_pages must produce:\n{expected!r}\nActual:\n{md!r}" + + +# --------------------------------------------------------------------------- +# No OCR service — no OCR tags emitted +# --------------------------------------------------------------------------- + + +def test_pdf_no_ocr_service_no_tags() -> None: + path = TEST_DATA_DIR / "pdf_image_middle.pdf" + if not path.exists(): + pytest.skip(f"Test file not found: {path}") + converter = PdfConverterWithOCR() + with open(path, "rb") as f: + md = converter.convert(f, StreamInfo(extension=".pdf")).text_content + assert "*[Image OCR]" not in md + assert "[End OCR]*" not in md diff --git a/packages/markitdown-ocr/tests/test_pptx_converter.py b/packages/markitdown-ocr/tests/test_pptx_converter.py new file mode 100644 index 0000000..724f103 --- /dev/null +++ b/packages/markitdown-ocr/tests/test_pptx_converter.py @@ -0,0 +1,148 @@ +""" +Unit tests for PptxConverterWithOCR. + +For each PPTX test file: convert with a mock OCR service then compare the +full output string against the expected snapshot. + +OCR block format used by the converter: + *[Image OCR] + MOCK_OCR_TEXT_12345 + [End OCR]* + +Note: PPTX slide text uses literal backslash-n (\\n) sequences from the +underlying PPTX converter template; OCR blocks use real newlines. +""" + +import sys +from pathlib import Path +from typing import Any + +import pytest + +sys.path.insert(0, str(Path(__file__).parent.parent / "src")) + +from markitdown_ocr._ocr_service import OCRResult # noqa: E402 +from markitdown_ocr._pptx_converter_with_ocr import ( # noqa: E402 + PptxConverterWithOCR, +) +from markitdown import StreamInfo # noqa: E402 + +TEST_DATA_DIR = Path(__file__).parent / "ocr_test_data" + +_MOCK_TEXT = "MOCK_OCR_TEXT_12345" +_OCR_BLOCK = f"*[Image OCR]\n{_MOCK_TEXT}\n[End OCR]*" + + +class MockOCRService: + def extract_text( + self, # noqa: ANN101 + image_stream: Any, + **kwargs: Any, + ) -> OCRResult: + return OCRResult(text=_MOCK_TEXT, backend_used="mock") + + +@pytest.fixture(scope="module") +def svc() -> MockOCRService: + return MockOCRService() + + +def _convert(filename: str, ocr_service: MockOCRService) -> str: + path = TEST_DATA_DIR / filename + if not path.exists(): + pytest.skip(f"Test file not found: {path}") + converter = PptxConverterWithOCR() + with open(path, "rb") as f: + return converter.convert( + f, StreamInfo(extension=".pptx"), ocr_service=ocr_service + ).text_content + + +# --------------------------------------------------------------------------- +# pptx_image_start.pptx +# --------------------------------------------------------------------------- + + +def test_pptx_image_start(svc: MockOCRService) -> None: + # Slide 1: title "Welcome" followed by an image + expected = ( + "\\n\\n\\n# Welcome\\n\\n" + "\n*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + ) + assert _convert("pptx_image_start.pptx", svc) == expected + + +# --------------------------------------------------------------------------- +# pptx_image_middle.pptx +# --------------------------------------------------------------------------- + + +def test_pptx_image_middle(svc: MockOCRService) -> None: + # Slide 1: Introduction | Slide 2: Architecture + image | Slide 3: Conclusion # noqa: E501 + expected = ( + "\\n\\n\\n# Introduction" + "\\n\\n\\n\\n\\n# Architecture\\n\\n" + "\n*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + "\\n\\n\\n# Conclusion\\n\\n" + ) + assert _convert("pptx_image_middle.pptx", svc) == expected + + +# --------------------------------------------------------------------------- +# pptx_image_end.pptx +# --------------------------------------------------------------------------- + + +def test_pptx_image_end(svc: MockOCRService) -> None: + # Slide 1: Presentation | Slide 2: Thank You + image + expected = ( + "\\n\\n\\n# Presentation" + "\\n\\n\\n\\n\\n# Thank You\\n\\n" + "\n*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + ) + assert _convert("pptx_image_end.pptx", svc) == expected + + +# --------------------------------------------------------------------------- +# pptx_multiple_images.pptx +# --------------------------------------------------------------------------- + + +def test_pptx_multiple_images(svc: MockOCRService) -> None: + # Slide 1: two images, no title text + expected = ( + "\\n\\n\\n# \\n" + "\n*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + "\n\n*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + ) + assert _convert("pptx_multiple_images.pptx", svc) == expected + + +# --------------------------------------------------------------------------- +# pptx_complex_layout.pptx +# --------------------------------------------------------------------------- + + +def test_pptx_complex_layout(svc: MockOCRService) -> None: + expected = ( + "\\n\\n\\n# Product Comparison" + "\\n\\nOur products lead the market\\n" + "\n*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + ) + assert _convert("pptx_complex_layout.pptx", svc) == expected + + +# --------------------------------------------------------------------------- +# No OCR service — no OCR tags emitted +# --------------------------------------------------------------------------- + + +def test_pptx_no_ocr_service_no_tags() -> None: + path = TEST_DATA_DIR / "pptx_image_middle.pptx" + if not path.exists(): + pytest.skip(f"Test file not found: {path}") + converter = PptxConverterWithOCR() + with open(path, "rb") as f: + md = converter.convert(f, StreamInfo(extension=".pptx")).text_content + assert "*[Image OCR]" not in md + assert "[End OCR]*" not in md diff --git a/packages/markitdown-ocr/tests/test_xlsx_converter.py b/packages/markitdown-ocr/tests/test_xlsx_converter.py new file mode 100644 index 0000000..4ab30c6 --- /dev/null +++ b/packages/markitdown-ocr/tests/test_xlsx_converter.py @@ -0,0 +1,249 @@ +""" +Unit tests for XlsxConverterWithOCR. + +For each XLSX test file: convert with a mock OCR service then compare the +full output string against the expected snapshot. + +OCR block format used by the converter: + *[Image OCR] + MOCK_OCR_TEXT_12345 + [End OCR]* + +Images are grouped at the end of each sheet under: + ### Images in this sheet: +""" + +import sys +from pathlib import Path +from typing import Any + +import pytest + +sys.path.insert(0, str(Path(__file__).parent.parent / "src")) + +from markitdown_ocr._ocr_service import OCRResult # noqa: E402 +from markitdown_ocr._xlsx_converter_with_ocr import ( # noqa: E402 + XlsxConverterWithOCR, +) +from markitdown import StreamInfo # noqa: E402 + +TEST_DATA_DIR = Path(__file__).parent / "ocr_test_data" + +_MOCK_TEXT = "MOCK_OCR_TEXT_12345" +_OCR_BLOCK = f"*[Image OCR]\n{_MOCK_TEXT}\n[End OCR]*" +_IMG_SECTION = "### Images in this sheet:" + + +class MockOCRService: + def extract_text( + self, # noqa: ANN101 + image_stream: Any, + **kwargs: Any, + ) -> OCRResult: + return OCRResult(text=_MOCK_TEXT, backend_used="mock") + + +@pytest.fixture(scope="module") +def svc() -> MockOCRService: + return MockOCRService() + + +def _convert(filename: str, ocr_service: MockOCRService) -> str: + path = TEST_DATA_DIR / filename + if not path.exists(): + pytest.skip(f"Test file not found: {path}") + converter = XlsxConverterWithOCR() + with open(path, "rb") as f: + return converter.convert( + f, StreamInfo(extension=".xlsx"), ocr_service=ocr_service + ).text_content + + +# --------------------------------------------------------------------------- +# xlsx_image_start.xlsx +# --------------------------------------------------------------------------- + + +def test_xlsx_image_start(svc: MockOCRService) -> None: + expected = ( + "## Sales Q1\n\n" + "| Product | Sales |\n" + "| --- | --- |\n" + "| Widget A | 100 |\n" + "| Widget B | 150 |\n\n" + "### Images in this sheet:\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "## Forecast Q2\n\n" + "| Projected Sales | Unnamed: 1 |\n" + "| --- | --- |\n" + "| Widget A | 120 |\n" + "| Widget B | 180 |\n\n" + "### Images in this sheet:\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + ) + assert _convert("xlsx_image_start.xlsx", svc) == expected + + +# --------------------------------------------------------------------------- +# xlsx_image_middle.xlsx +# --------------------------------------------------------------------------- + + +def test_xlsx_image_middle(svc: MockOCRService) -> None: + expected = ( + "## Revenue\n\n" + "| Q1 Report | Unnamed: 1 |\n" + "| --- | --- |\n" + "| NaN | NaN |\n" + "| Revenue | $50,000 |\n" + "| NaN | NaN |\n" + "| NaN | NaN |\n" + "| NaN | NaN |\n" + "| NaN | NaN |\n" + "| Profit Margin | 40% |\n\n" + "### Images in this sheet:\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "## Expenses\n\n" + "| Expense Breakdown | Unnamed: 1 |\n" + "| --- | --- |\n" + "| NaN | NaN |\n" + "| Expenses | $30,000 |\n" + "| NaN | NaN |\n" + "| NaN | NaN |\n" + "| NaN | NaN |\n" + "| NaN | NaN |\n" + "| Savings | $5,000 |\n\n" + "### Images in this sheet:\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + ) + assert _convert("xlsx_image_middle.xlsx", svc) == expected + + +# --------------------------------------------------------------------------- +# xlsx_image_end.xlsx +# --------------------------------------------------------------------------- + + +def test_xlsx_image_end(svc: MockOCRService) -> None: + expected = ( + "## Sheet\n\n" + "| Financial Summary | Unnamed: 1 |\n" + "| --- | --- |\n" + "| Total Revenue | $500,000 |\n" + "| Total Expenses | $300,000 |\n" + "| Net Profit | $200,000 |\n" + "| NaN | NaN |\n" + "| NaN | NaN |\n" + "| NaN | NaN |\n" + "| NaN | NaN |\n" + "| NaN | NaN |\n" + "| Signature: | NaN |\n\n" + "### Images in this sheet:\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "## Budget\n\n" + "| Budget Allocation | Unnamed: 1 |\n" + "| --- | --- |\n" + "| Marketing | $100,000 |\n" + "| R&D | $150,000 |\n" + "| Operations | $50,000 |\n" + "| NaN | NaN |\n" + "| NaN | NaN |\n" + "| NaN | NaN |\n" + "| NaN | NaN |\n" + "| NaN | NaN |\n" + "| Approved: | NaN |\n\n" + "### Images in this sheet:\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + ) + assert _convert("xlsx_image_end.xlsx", svc) == expected + + +# --------------------------------------------------------------------------- +# xlsx_multiple_images.xlsx +# --------------------------------------------------------------------------- + + +def test_xlsx_multiple_images(svc: MockOCRService) -> None: + expected = ( + "## Overview\n\n" + "| Dashboard |\n" + "| --- |\n" + "| Status: Active |\n" + "| NaN |\n" + "| NaN |\n" + "| NaN |\n" + "| NaN |\n" + "| Performance Summary |\n\n" + "### Images in this sheet:\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "## Details\n\n" + "| Detailed Metrics |\n" + "| --- |\n" + "| System Health |\n\n" + "### Images in this sheet:\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "## Summary\n\n" + "| Quarter Summary |\n" + "| --- |\n" + "| Overall Performance |\n\n" + "### Images in this sheet:\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + ) + assert _convert("xlsx_multiple_images.xlsx", svc) == expected + + +# --------------------------------------------------------------------------- +# xlsx_complex_layout.xlsx +# --------------------------------------------------------------------------- + + +def test_xlsx_complex_layout(svc: MockOCRService) -> None: + expected = ( + "## Complex Report\n\n" + "| Annual Report 2024 | Unnamed: 1 |\n" + "| --- | --- |\n" + "| NaN | NaN |\n" + "| Month | Sales |\n" + "| Jan | 1000 |\n" + "| Feb | 1200 |\n" + "| NaN | NaN |\n" + "| Total | 2200 |\n\n" + "### Images in this sheet:\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "## Customers\n\n" + "| Customer Metrics | Unnamed: 1 |\n" + "| --- | --- |\n" + "| NaN | NaN |\n" + "| New Customers | 250 |\n" + "| Retention Rate | 92% |\n\n" + "### Images in this sheet:\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*\n\n" + "## Regions\n\n" + "| Regional Breakdown | Unnamed: 1 |\n" + "| --- | --- |\n" + "| NaN | NaN |\n" + "| Region | Revenue |\n" + "| North | $800K |\n" + "| South | $600K |\n\n" + "### Images in this sheet:\n\n" + "*[Image OCR]\nMOCK_OCR_TEXT_12345\n[End OCR]*" + ) + assert _convert("xlsx_complex_layout.xlsx", svc) == expected + + +# --------------------------------------------------------------------------- +# No OCR service — no OCR tags emitted +# --------------------------------------------------------------------------- + + +def test_xlsx_no_ocr_service_no_tags() -> None: + path = TEST_DATA_DIR / "xlsx_image_middle.xlsx" + if not path.exists(): + pytest.skip(f"Test file not found: {path}") + converter = XlsxConverterWithOCR() + with open(path, "rb") as f: + md = converter.convert(f, StreamInfo(extension=".xlsx")).text_content + assert "*[Image OCR]" not in md + assert "[End OCR]*" not in md diff --git a/packages/markitdown/src/markitdown/__about__.py b/packages/markitdown/src/markitdown/__about__.py index 3de6ec2..ff02806 100644 --- a/packages/markitdown/src/markitdown/__about__.py +++ b/packages/markitdown/src/markitdown/__about__.py @@ -1,4 +1,4 @@ # SPDX-FileCopyrightText: 2024-present Adam Fourney # # SPDX-License-Identifier: MIT -__version__ = "0.1.5" +__version__ = "0.1.6b1"