Commit Graph

82 Commits

Author SHA1 Message Date
gagb 736e7d9a7e Merge branch 'main' into patch-1 2024-12-16 16:53:58 -08:00
gagb 19c111251b Merge pull request #60 from madduci/main
Added Dockerfile
2024-12-16 16:42:26 -08:00
gagb 360c2dd95f Merge branch 'main' into main 2024-12-16 16:35:50 -08:00
gagb 73776b2c0f Merge pull request #50 from narumiruna/youtube-transcript-languages
Support specifying YouTube transcript language
2024-12-16 16:23:20 -08:00
gagb 2d3ffeade1 Merge branch 'main' into youtube-transcript-languages 2024-12-16 16:20:35 -08:00
gagb 51c1453699 Merge pull request #48 from Soulter/main
Fix: pass the kwargs to _convert method when converting an url file
2024-12-16 16:19:09 -08:00
gagb ae4669107c Merge branch 'main' into main 2024-12-16 16:01:59 -08:00
gagb b0115cf971 Merge branch 'main' into youtube-transcript-languages 2024-12-16 15:47:38 -08:00
gagb 5cf8474f37 Merge pull request #44 from Y-Kim-64/main
Exclude test files from language statistics using linguist-vendored
2024-12-16 15:35:19 -08:00
gagb 83dc81170b Merge branch 'main' into main 2024-12-16 15:29:33 -08:00
gagb e7a2e20d93 Merge pull request #39 from SH4DOW4RE/main
Catching pydub's warning of ffmpeg or avconv missing
2024-12-16 15:28:53 -08:00
gagb 980abd3a60 Merge branch 'main' into main 2024-12-16 15:24:58 -08:00
afourney 6587e0f097 Merge branch 'main' into patch-1 2024-12-16 14:27:26 -08:00
afourney 978c8763aa Merge pull request #38 from VillePuuska/support-comments-in-docx
Add passing style_map kwarg to Mammoth when converting docx to allow keeping comments
2024-12-16 14:26:55 -08:00
afourney e7636656d8 Merge branch 'main' into support-comments-in-docx 2024-12-16 14:23:14 -08:00
afourney fa1f496d51 Merge branch 'main' into patch-1 2024-12-16 14:18:20 -08:00
afourney da779dd125 Merge pull request #33 from nyosegawa/feature/add-pptx-chart-support
Add PPTX chart support
2024-12-16 14:11:49 -08:00
afourney 12ce5e95b2 Merge branch 'main' into feature/add-pptx-chart-support 2024-12-16 14:06:14 -08:00
gagb 6dad1cca96 Merge pull request #22 from Josh-XT/main
Add zip handling
2024-12-16 13:56:25 -08:00
gagb 9e6a19987b Merge branch 'main' into main 2024-12-16 13:51:39 -08:00
gagb ed91e8b534 Merge pull request #19 from brc-dd/fix/18
Fix character decoding issues with text-like files
2024-12-16 13:49:48 -08:00
gagb aeff2cb5ae Merge branch 'main' into fix/18 2024-12-16 13:46:17 -08:00
gagb c9c7d98d30 Merge pull request #11 from simonw/patch-2
CLI usage instructions
2024-12-16 13:45:05 -08:00
gagb e7d9b5546a Merge branch 'main' into patch-2 2024-12-16 13:42:28 -08:00
CharlesCNorton 3d9f3f3e5b Fix LLM terms
Updated all instances of mlm_client and mlm_model to llm_client and llm_model in the readme. The previous terms (mlm_client and mlm_model) are incorrect in the context of configuring Large Language Models (LLMs), as "MLM" typically refers to Masked Language Models, which is unrelated to the intended functionality. This change aligns the documentation with standard naming conventions for LLM configuration parameters and improves clarity for users integrating with LLMs like OpenAI's GPT models.
2024-12-16 16:23:03 -05:00
Michele Adduci 5fc03b6415 Added UID as argument 2024-12-16 13:11:13 +01:00
Michele Adduci 013b022427 Added Docker Image for using markitdown in a sandboxed environment 2024-12-16 13:08:15 +01:00
narumi 695100d5d8 Support specifying YouTube transcript language 2024-12-16 13:16:00 +08:00
Soulter d66ef5fcca Update README to introduce the customized mlm_prompt 2024-12-16 12:08:51 +08:00
Soulter c168703d5e Pass the kwargs to _convert method when converting an url file 2024-12-16 11:41:39 +08:00
Yeonjun 3548c96dd3 Create .gitattributes
Mark test files as linguist-vendored
2024-12-16 09:21:07 +09:00
SH4DOW4RE 1559d9d163 pre-commit ran 2024-12-15 22:15:20 +01:00
SH4DOW4RE b7f5662ffd PR: Catching pydub's warning of ffmpeg or avconv missing 2024-12-15 17:29:14 +01:00
Ville Puuska 0a7203b876 add style_map prop to MarkItDown class 2024-12-15 17:23:57 +02:00
Ville Puuska 0704b0b6ff pass 'style_map' kwarg to mammoth when converting docx 2024-12-15 16:59:21 +02:00
sakasegawa 0dd4e95584 Remove _is_chart 2024-12-15 21:14:58 +09:00
sakasegawa 93130b5ba5 Add PPTX chart support 2024-12-15 20:42:55 +09:00
Divyansh Singh 52b723724c Fix character decoding issues with text-like files 2024-12-15 10:37:59 +05:30
Josh XT a55c3d525c Merge branch 'main' into main 2024-12-14 23:09:30 -05:00
gagb 81e3f24acd Merge pull request #29 from microsoft/gagb-patch-1
Update README.md
2024-12-14 19:17:54 -08:00
gagb b84294620a Update README.md 2024-12-14 19:05:51 -08:00
gagb 60c495d609 Merge branch 'main' into patch-2 2024-12-14 18:57:11 -08:00
gagb 71123a4df3 Merge pull request #7 from microsoft/gagb/improve-readme
Improve the readme with contributing guidelines
2024-12-14 18:54:28 -08:00
gagb 5753e553fe Fix conflicts 2024-12-14 18:47:34 -08:00
gagb 752dd897b9 Merge pull request #28 from pawarbi/main
Update README.md
2024-12-14 18:44:52 -08:00
gagb 1aa4abe90f Merge branch 'gagb/improve-readme' into main 2024-12-14 18:44:33 -08:00
gagb ea7c6dcc40 Merge pull request #27 from haesleinhuepf/patch-1
Add installation instructions from haesleinhuepf:patch-1
2024-12-14 18:39:51 -08:00
gagb a31c0a13e7 Merge branch 'main' into gagb/improve-readme 2024-12-14 18:34:27 -08:00
Sandeep Pawar 30ab78fe9e Update README.md
I have updated the readme with three changes:
- Created sections for Installation and Usage to help users
- Added installation instruction
- Added additional example of using LLM. This will be the primary use case and will help users.
2024-12-14 19:15:10 -06:00
gagb 559b1fc62a Merge branch 'main' into patch-2 2024-12-14 15:02:42 -08:00