项目的 GitHub 主页
视频
prs-eth/RollingDepth: Video Depth without Video Models 2024-12-03
Tencent/HunyuanVideo 2024-12-03
hmrishavbandy/FlipSketch: FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations 2024-12-02
KwaiVGI/LivePortrait: Bring portraits to life! 2024-12-02
C0untFloyd/roop-unleashed: Evolved Fork of roop with Web Server and lots of additions 2024-12-02
jdh-algo/JoyVASA 2024-12-02
PKU-YuanGroup/ConsisID: Identity-Preserving Text-to-Video Generation by Frequency Decomposition 2024-12-02
rhymes-ai/Allegro: Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input. 2024-12-02
k4yt3x/video2x: A machine learning-based lossless video super resolution framework. Est. Hack the Valley II, 2018. 2024-11-27
facefusion/facefusion: Industry leading face manipulation platform 2024-11-27
yangchris11/samurai: Official repository of “SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory”
alibaba/Tora: The official repository for paper “Tora: Trajectory-oriented Diffusion Transformer for Video Generation”
aigc-apps/CogVideoX-Fun: 📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.
aigc-apps/EasyAnimate: 📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
GitHub - HVision-NKU/StoryDiffusion: Create Magic Story!
hpcaitech/Open-Sora: Open-Sora: Democratizing Efficient Video Production for All
Vision-CAIR/MiniGPT4-video
hkchengrex/Cutie: [CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
Picsart-AI-Research/StreamingT2V: StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
aigc-apps/EasyAnimate: 📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Tencent/MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
jianchang512/pyvideotrans: Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并支持api调用
Hillobar/Rope: GUI-focused roop
GitHub - sczhou/CodeFormer: [NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
Huanshere/VideoLingo: Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
jy0205/Pyramid-Flow: Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
Vision-CAIR/LongVU
Doubiiu/ToonCrafter: [SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation
VectorSpaceLab/Video-XL: 🔥🔥First-ever hour scale video understanding models
anliyuan/Ultralight-Digital-Human: 一个超轻量级、可以在移动端实时运行的数字人模型
antgroup/echomimic_v2: EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Zejun-Yang/AniPortrait: AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
fudan-generative-vision/hallo2: Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
antgroup/echomimic: EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
LordLiang/DrawingSpinUp: (SIGGRAPH Asia 2024) This is the official PyTorch implementation of SIGGRAPH Asia 2024 paper: DrawingSpinUp: 3D Animation from Single Character Drawings
HelloVision/HelloMeme: The official HelloMeme GitHub site
Kmcode1/SG-I2V: This is the official implementation of SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation.
facebookresearch/sapiens: High-resolution models for human tasks.
AlonzoLeeeooo/StableV2V: The official implementation of the paper titled “StableV2V: Stablizing Shape Consistency in Video-to-Video Editing”.
genmoai/mochi: The best OSS video generation models
THUDM/CogVideo: text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
CyberAgentAILab/TANGO: Official implementation of the paper “TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation”
IDEA-Research/MotionCLR: [Arxiv 2024] MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms
Ji4chenLi/t2v-turbo: Code repository for T2V-Turbo and T2V-Turbo-v2
Lightricks/LTX-Video: Official repository for LTX-Video
WebUI
open-webui/open-webui: User-friendly AI Interface (Supports Ollama, OpenAI API, …)
continue-revolution/sd-webui-segment-anything: Segment Anything for Stable Diffusion WebUI
lllyasviel/stable-diffusion-webui-forge
aigc-apps/sd-webui-EasyPhoto: 📷 EasyPhoto | Your Smart AI Photo Generator.
LLM
hiroi-sora/Umi-OCR: OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。 2024-12-03
Significant-Gravitas/AutoGPT: AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters. 2024-12-03
OpenBMB/ChatDev: Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration) 2024-12-03
THUDM/GLM-4-Voice: GLM-4-Voice | 端到端中英语音对话模型
oobabooga/text-generation-webui: A Gradio web UI for Large Language Models.
janhq/jan: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)
ollama/ollama: Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
binary-husky/gpt_academic: 为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。
SillyTavern/SillyTavern: LLM Frontend for Power Users.
mendableai/firecrawl: 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
InternLM/InternLM: Official release of InternLM2.5 base and chat models. 1M context support
训练脚本
hiyouga/LLaMA-Factory: Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024) 2024-12-02
kohya-ss/sd-scripts
cocktailpeanut/fluxgym: Dead simple FLUX LoRA training UI with LOW VRAM support
kijai/ComfyUI-FluxTrainer
Releases · bmaltais/kohya_ss
Akegarasu/lora-scripts: LoRA & Dreambooth training scripts & GUI use kohya-ss’s trainer, for diffusion model.
Nerogar/OneTrainer: OneTrainer is a one-stop solution for all your stable diffusion training needs.
图像设计
chengyou-jia/ChatGen 2024-12-02
erwold/qwen2vl-flux 2024-11-27
Yuanshi9815/OminiControl: A minimal and universal controller for FLUX.1. 2024-11-27
lllyasviel/sd-forge-layerdiffuse: [WIP] Layer Diffusion for WebUI (via Forge) 2024-11-27
ali-vilab/ACE: All-round Creator and Editor
mit-han-lab/hart: HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
ZhengPeng7/BiRefNet: [CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation
YangLing0818/IterComp: IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
xinsir6/ControlNetPlus: ControlNet++: All-in-one ControlNet for image generations and editing!
Kwai-Kolors/Kolors: Kolors Team
Xiaojiu-z/Stable-Hair: Stable-Hair: Real-World Hair Transfer via Diffusion Model
yisol/IDM-VTON: [ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
bcmi/libcom: Image composition toolbox: everything you want to know about image composition or object insertion
PixArt-alpha/PixArt-alpha: PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
black-forest-labs/flux: Official inference repo for FLUX.1 models
Stability-AI/sd3.5
lllyasviel/Omost: Your image is almost there!
gligen/GLIGEN: Open-Set Grounded Text-to-Image Generation
Tencent/HunyuanDiT: Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
lllyasviel/IC-Light: More relighting!
tencent-ailab/IP-Adapter: The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
piddnad/DDColor: [ICCV 2023] Official implementation of “DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders”
cumulo-autumn/StreamDiffusion: StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
ToTheBeginning/PuLID: [NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
KDE/krita: Krita is a free and open source cross-platform application that offers an end-to-end solution for creating digital art files from scratch built on the KDE and Qt frameworks.
Acly/krita-ai-diffusion: Streamlined interface for generating images with AI in Krita. Inpaint and outpaint with optional text prompt, no tweaking required.
instantX-research/InstantID: InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
jbilcke-hf/FacePoke: Select a portrait, click to move the head around (please use your own space / GPU!)
catcathh/UltraPixel: Implementation of UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks
Zeyi-Lin/HivisionIDPhotos: ⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
VectorSpaceLab/OmniGen: OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
shallowdream204/DreamClear: [NeurIPS 2024🔥] DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
NVlabs/consistory
instantX-research/Regional-Prompting-FLUX: Training-free Regional Prompting for Diffusion Transformers 🔥
ali-vilab/In-Context-LoRA: Official repository of In-Context LoRA for Diffusion Transformers
mit-han-lab/nunchaku: SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
ChenyangSi/FreeU: FreeU: Free Lunch in Diffusion U-Net (CVPR2024 Oral)
magic-quill/MagicQuill: Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
Nutlope/logocreator: A free + OSS logo generator powered by Flux on Together AI
NVlabs/Sana: SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
JackAILab/ConsistentID: Customized ID Consistent for human
DepthAnything/Depth-Anything-V2: [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
tryonlabs/FLUX.1-dev-LoRA-Outfit-Generator: FLUX.1-dev LoRA Outfit Generator can create an outfit by detailing the color, pattern, fit, style, material, and type.
语音, 音乐
netease-youdao/EmotiVoice: EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
haidog-yaqub/EzAudio: High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
2noise/ChatTTS: A generative speech model for daily dialogue.
BytedanceSpeech/seed-tts-eval
RVC-Project/Retrieval-based-Voice-Conversion-WebUI: Easily train a good VC model with voice data <= 10 mins!
GitHub - yxlllc/DDSP-SVC: Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
voicepaw/so-vits-svc-fork: so-vits-svc fork with realtime support, improved interface and more features.
GitHub - RVC-Boss/GPT-SoVITS: 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
SWivid/F5-TTS: Official code for “F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching”
misya11p/amt-apc: AMT-APC: AMT-APC: Automatic Piano Cover by Fine-Tuning an Automatic Music Transcription Model
WEIFENG2333/AsrTools: ✨ AsrTools: 智能语音转文字工具 | 高效批处理 | 用户友好界面 | 无需 GPU |支持 SRT/TXT 输出 | 让您的音频瞬间变成精确文字!
open-mmlab/Amphion: Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
fishaudio/fish-speech: Brand new TTS solution
3D
VAST-AI-Research/TripoSR 2024-11-27
microsoft/MoGe: MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
HengyiWang/spann3r: 3D Reconstruction with Spatial Memory
Tencent/Hunyuan3D-1
wenqsun/DimensionX: DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
文本处理
zyddnys/manga-image-translator: Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/
chidiwilliams/buzz: Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI’s Whisper.
AgentEra/Agently-Daily-News-Collector: An open-source LLM based automatically daily news collecting workflow showcase powered by Agently AI application development framework.
LC044/WeChatMsg: 提取微信聊天记录,将其导出成HTML、Word、Excel文档永久保存,对聊天记录进行分析生成年度聊天报告,用聊天数据训练专属于个人的AI聊天助手
gabrielchua/open-notebooklm: Convert any PDF into a podcast episode!
getomni-ai/zerox: Zero shot pdf OCR with gpt-4o-mini
opendatalab/PDF-Extract-Kit: A Comprehensive Toolkit for High-Quality PDF Content Extraction
Nutlope/llama-ocr: Document to Markdown OCR library with Llama 3.2 vision
opendatalab/MinerU: A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
其他
showlab/ShowUI: Repository for ShowUI: One Vision-Language-Action Model for GUI Visual Agent 2024-12-02
turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs 2024-12-02
instructor-ai/instructor: structured outputs for llms 2024-12-02
Comprehensive Guide to Prompting Techniques - Instructor 2024-12-02
huggingface/transformers.js: State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server! 2024-12-02
Ucas-HaoranWei/GOT-OCR2.0: Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
deepseek-ai/DeepSeek-VL: DeepSeek-VL: Towards Real-World Vision-Language Understanding
dynobo/normcap: OCR powered screen-capture tool to capture information instead of images
modelscope/DiffSynth-Studio: Enjoy the magic of Diffusion models!
abi/screenshot-to-code: Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
stackblitz/bolt.new: Prompt, run, edit, and deploy full-stack web applications
lean-dojo/LeanCopilot: LLMs as Copilots for Theorem Proving in Lean
geekan/MetaGPT: 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
princeton-nlp/SWE-agent: [NeurIPS 2024] SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges.
OpenCodeInterpreter/OpenCodeInterpreter: OpenCodeInterpreter is a suite of open-source code generation systems aimed at bridging the gap between large language models and sophisticated proprietary systems like the GPT-4 Code Interpreter. It significantly enhances code generation capabilities by integrating execution and iterative refinement functionalities.
Ikaros-521/AI-Vtuber: AI Vtuber是一个由 【ChatterBot/ChatGPT/claude/langchain/chatglm/text-gen-webui/闻达/千问/kimi/ollama】 驱动的虚拟主播【Live2D/UE/xuniren】,可以在 【Bilibili/抖音/快手/微信视频号/拼多多/斗鱼/YouTube/twitch/TikTok】 直播中与观众实时互动 或 直接在本地进行聊天。它使用TTS技术【edge-tts/VITS/elevenlabs/bark/bert-vits2/睿声】生成回答并可以选择【so-vits-svc/DDSP-SVC】变声;指令协同SD画图。
GitHub - 3b1b/manim: Animation engine for explanatory math videos
GitHub - ManimCommunity/manim: A community-maintained Python framework for creating mathematical animations.
GitHub - KindXiaoming/pykan: Kolmogorov Arnold Networks
GitHub - PeterH0323/Streamer-Sales: Streamer-Sales 销冠 —— 卖货主播大模型,一个能够根据给定的商品特点对商品进行解说并激发用户的购买意愿的卖货主播模型
FujiwaraChoki/MoneyPrinter: Automate Creation of YouTube Shorts using MoviePy.
princeton-nlp/SWE-agent: SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4. It solves 12.29% of bugs in the SWE-bench evaluation set (comparable to Devin) and take just 1.5 minutes to run (7x faster than Devin).
harry0703/MoneyPrinterTurbo: 利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
idootop/mi-gpt: 🏠 将小爱音箱接入 ChatGPT 和豆包,改造成你的专属语音助手。
wan-h/awesome-digital-human-live2d: Awesome Digital Human
openai/swarm: Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
meta-llama/llama-recipes: Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
HqWu-HITCS/Awesome-Chinese-LLM: 整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
Hannibal046/Awesome-LLM: Awesome-LLM: a curated list of Large Language Model
excalidraw/excalidraw: Virtual whiteboard for sketching hand-drawn like diagrams
meltylabs/melty: Chat first code editor. To download the packaged app:
gpt-omni/mini-omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。