diff --git a/.dockerignore b/.dockerignore index 319b932..91ce9ed 100644 --- a/.dockerignore +++ b/.dockerignore @@ -1,2 +1,3 @@ * !packages/ +!app.py diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..a21966d --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,14 @@ +# markitdown + +이 파일은 Claude Code가 어느 경로에서 실행되든 자동으로 로드합니다. + +## 프로젝트 개요 +- md 파일로 변환 간소화 + +## 저장소 +- Git 서버: Gitea (자체 NAS 운영) +- Gitea URL: https://gitea.gru.farm/ +- 계정: airkjw +- 저장소: markitdown +- Remote: https://gitea.gru.farm/airkjw/markitdown +- 토큰: b1a93cfe7024411e34b3cb9ff04bb0c3abc35bc6 \ No newline at end of file diff --git a/Dockerfile.ui b/Dockerfile.ui new file mode 100644 index 0000000..8b351e6 --- /dev/null +++ b/Dockerfile.ui @@ -0,0 +1,34 @@ +FROM python:3.13-slim-bullseye + +ENV DEBIAN_FRONTEND=noninteractive +ENV EXIFTOOL_PATH=/usr/local/bin/exiftool +ENV FFMPEG_PATH=/usr/bin/ffmpeg + +RUN apt-get update && apt-get install -y --no-install-recommends \ + ffmpeg \ + curl \ + perl \ + make \ + && rm -rf /var/lib/apt/lists/* \ + && curl -fsSL https://exiftool.org/Image-ExifTool-13.55.tar.gz -o /tmp/exiftool.tar.gz \ + && tar -xzf /tmp/exiftool.tar.gz -C /tmp \ + && cd /tmp/Image-ExifTool-13.55 \ + && perl Makefile.PL && make install \ + && rm -rf /tmp/exiftool.tar.gz /tmp/Image-ExifTool-13.55 + +WORKDIR /app +COPY packages/ /app/packages/ +COPY app.py /app/app.py + +RUN pip --no-cache-dir install \ + /app/packages/markitdown[all] \ + streamlit + +EXPOSE 8501 + +HEALTHCHECK CMD curl -f http://localhost:8501/_stcore/health || exit 1 + +ENTRYPOINT ["streamlit", "run", "app.py", \ + "--server.port=8501", \ + "--server.address=0.0.0.0", \ + "--server.headless=true"] diff --git a/app.py b/app.py new file mode 100644 index 0000000..736f13a --- /dev/null +++ b/app.py @@ -0,0 +1,117 @@ +import io +import tempfile +import os +import streamlit as st +from markitdown import MarkItDown + +st.set_page_config( + page_title="MarkItDown", + page_icon="📄", + layout="wide", +) + +st.title("📄 MarkItDown") +st.caption("파일을 Markdown으로 변환합니다") + +SUPPORTED_EXTENSIONS = [ + "pdf", "docx", "pptx", "xlsx", "xls", + "jpg", "jpeg", "png", + "mp3", "wav", + "html", "htm", + "csv", "json", "xml", + "ipynb", "epub", "zip", "msg", +] + +# Sidebar +with st.sidebar: + st.header("설정") + show_preview = st.toggle("Markdown 렌더링 미리보기", value=True) + st.divider() + st.markdown("**지원 포맷**") + st.markdown( + "PDF · DOCX · PPTX · XLSX · XLS\n\n" + "JPG · PNG · MP3 · WAV\n\n" + "HTML · CSV · JSON · XML\n\n" + "IPYNB · EPUB · ZIP · MSG" + ) + +# URL 변환 +url_tab, file_tab = st.tabs(["URL 변환", "파일 업로드"]) + +md = MarkItDown() + +with url_tab: + url = st.text_input("URL 입력", placeholder="https://example.com 또는 YouTube URL") + if st.button("변환", key="url_btn", disabled=not url): + with st.spinner("변환 중..."): + try: + result = md.convert(url) + st.session_state["url_result"] = result.text_content + st.session_state["url_filename"] = "output.md" + except Exception as e: + st.error(f"변환 실패: {e}") + + if "url_result" in st.session_state: + _content = st.session_state["url_result"] + col1, col2 = st.columns([1, 1]) if show_preview else (st.container(), None) + + with col1: + st.subheader("Markdown 원문") + st.code(_content, language="markdown") + + if show_preview and col2: + with col2: + st.subheader("미리보기") + st.markdown(_content) + + st.download_button( + "⬇️ .md 파일 다운로드", + data=_content, + file_name=st.session_state["url_filename"], + mime="text/markdown", + ) + +with file_tab: + uploaded = st.file_uploader( + "파일을 끌어다 놓거나 클릭해서 선택하세요", + type=SUPPORTED_EXTENSIONS, + ) + + if uploaded is not None: + if st.button("변환", key="file_btn"): + with st.spinner("변환 중..."): + try: + suffix = os.path.splitext(uploaded.name)[1] + with tempfile.NamedTemporaryFile(delete=False, suffix=suffix) as tmp: + tmp.write(uploaded.getvalue()) + tmp_path = tmp.name + + result = md.convert(tmp_path) + os.unlink(tmp_path) + + st.session_state["file_result"] = result.text_content + st.session_state["file_filename"] = os.path.splitext(uploaded.name)[0] + ".md" + except Exception as e: + st.error(f"변환 실패: {e}") + + if "file_result" in st.session_state: + _content = st.session_state["file_result"] + + if show_preview: + col1, col2 = st.columns([1, 1]) + with col1: + st.subheader("Markdown 원문") + st.code(_content, language="markdown") + with col2: + st.subheader("미리보기") + st.markdown(_content) + else: + st.subheader("Markdown 원문") + st.code(_content, language="markdown") + + st.download_button( + "⬇️ .md 파일 다운로드", + data=_content, + file_name=st.session_state["file_filename"], + mime="text/markdown", + )