项目的 GitHub 主页

视频

prs-eth/RollingDepth: Video Depth without Video Models 2024-12-03

Tencent/HunyuanVideo 2024-12-03

hmrishavbandy/FlipSketch: FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations 2024-12-02

KwaiVGI/LivePortrait: Bring portraits to life! 2024-12-02

C0untFloyd/roop-unleashed: Evolved Fork of roop with Web Server and lots of additions 2024-12-02

jdh-algo/JoyVASA 2024-12-02

PKU-YuanGroup/ConsisID: Identity-Preserving Text-to-Video Generation by Frequency Decomposition 2024-12-02

rhymes-ai/Allegro: Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input. 2024-12-02

k4yt3x/video2x: A machine learning-based lossless video super resolution framework. Est. Hack the Valley II, 2018. 2024-11-27

facefusion/facefusion: Industry leading face manipulation platform 2024-11-27

yangchris11/samurai: Official repository of “SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory”

alibaba/Tora: The official repository for paper “Tora: Trajectory-oriented Diffusion Transformer for Video Generation”

aigc-apps/CogVideoX-Fun: 📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.

aigc-apps/EasyAnimate: 📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion

GitHub - HVision-NKU/StoryDiffusion: Create Magic Story!

hpcaitech/Open-Sora: Open-Sora: Democratizing Efficient Video Production for All

Vision-CAIR/MiniGPT4-video

hkchengrex/Cutie: [CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation

Picsart-AI-Research/StreamingT2V: StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

aigc-apps/EasyAnimate: 📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion

Tencent/MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

jianchang512/pyvideotrans: Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并支持api调用

Hillobar/Rope: GUI-focused roop

GitHub - sczhou/CodeFormer: [NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer

Huanshere/VideoLingo: Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组

jy0205/Pyramid-Flow: Code of Pyramidal Flow Matching for Efficient Video Generative Modeling

Vision-CAIR/LongVU

Doubiiu/ToonCrafter: [SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation

VectorSpaceLab/Video-XL: 🔥🔥First-ever hour scale video understanding models

anliyuan/Ultralight-Digital-Human: 一个超轻量级、可以在移动端实时运行的数字人模型

antgroup/echomimic_v2: EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Zejun-Yang/AniPortrait: AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

fudan-generative-vision/hallo2: Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

antgroup/echomimic: EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

LordLiang/DrawingSpinUp: (SIGGRAPH Asia 2024) This is the official PyTorch implementation of SIGGRAPH Asia 2024 paper: DrawingSpinUp: 3D Animation from Single Character Drawings

HelloVision/HelloMeme: The official HelloMeme GitHub site

Kmcode1/SG-I2V: This is the official implementation of SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation.

facebookresearch/sapiens: High-resolution models for human tasks.

AlonzoLeeeooo/StableV2V: The official implementation of the paper titled “StableV2V: Stablizing Shape Consistency in Video-to-Video Editing”.

genmoai/mochi: The best OSS video generation models

THUDM/CogVideo: text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

CyberAgentAILab/TANGO: Official implementation of the paper “TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation”

IDEA-Research/MotionCLR: [Arxiv 2024] MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms

Ji4chenLi/t2v-turbo: Code repository for T2V-Turbo and T2V-Turbo-v2

Lightricks/LTX-Video: Official repository for LTX-Video

WebUI

open-webui/open-webui: User-friendly AI Interface (Supports Ollama, OpenAI API, …)

continue-revolution/sd-webui-segment-anything: Segment Anything for Stable Diffusion WebUI

lllyasviel/stable-diffusion-webui-forge

aigc-apps/sd-webui-EasyPhoto: 📷 EasyPhoto | Your Smart AI Photo Generator.

LLM

hiroi-sora/Umi-OCR: OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。 2024-12-03

Significant-Gravitas/AutoGPT: AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters. 2024-12-03

OpenBMB/ChatDev: Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration) 2024-12-03

THUDM/GLM-4-Voice: GLM-4-Voice | 端到端中英语音对话模型

oobabooga/text-generation-webui: A Gradio web UI for Large Language Models.

janhq/jan: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)

ollama/ollama: Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.

binary-husky/gpt_academic: 为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。

SillyTavern/SillyTavern: LLM Frontend for Power Users.

mendableai/firecrawl: 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

InternLM/InternLM: Official release of InternLM2.5 base and chat models. 1M context support

训练脚本

hiyouga/LLaMA-Factory: Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024) 2024-12-02

kohya-ss/sd-scripts

cocktailpeanut/fluxgym: Dead simple FLUX LoRA training UI with LOW VRAM support

kijai/ComfyUI-FluxTrainer

Releases · bmaltais/kohya_ss

Akegarasu/lora-scripts: LoRA & Dreambooth training scripts & GUI use kohya-ss’s trainer, for diffusion model.

Nerogar/OneTrainer: OneTrainer is a one-stop solution for all your stable diffusion training needs.

图像设计

chengyou-jia/ChatGen 2024-12-02

erwold/qwen2vl-flux 2024-11-27

Yuanshi9815/OminiControl: A minimal and universal controller for FLUX.1. 2024-11-27

lllyasviel/sd-forge-layerdiffuse: [WIP] Layer Diffusion for WebUI (via Forge) 2024-11-27

ali-vilab/ACE: All-round Creator and Editor

mit-han-lab/hart: HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

ZhengPeng7/BiRefNet: [CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation

YangLing0818/IterComp: IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

xinsir6/ControlNetPlus: ControlNet++: All-in-one ControlNet for image generations and editing!

Kwai-Kolors/Kolors: Kolors Team

Xiaojiu-z/Stable-Hair: Stable-Hair: Real-World Hair Transfer via Diffusion Model

yisol/IDM-VTON: [ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild

bcmi/libcom: Image composition toolbox: everything you want to know about image composition or object insertion

PixArt-alpha/PixArt-alpha: PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

black-forest-labs/flux: Official inference repo for FLUX.1 models

Stability-AI/sd3.5

lllyasviel/Omost: Your image is almost there!

gligen/GLIGEN: Open-Set Grounded Text-to-Image Generation

Tencent/HunyuanDiT: Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

lllyasviel/IC-Light: More relighting!

tencent-ailab/IP-Adapter: The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

piddnad/DDColor: [ICCV 2023] Official implementation of “DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders”

cumulo-autumn/StreamDiffusion: StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

ToTheBeginning/PuLID: [NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment

KDE/krita: Krita is a free and open source cross-platform application that offers an end-to-end solution for creating digital art files from scratch built on the KDE and Qt frameworks.

Acly/krita-ai-diffusion: Streamlined interface for generating images with AI in Krita. Inpaint and outpaint with optional text prompt, no tweaking required.

instantX-research/InstantID: InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥

jbilcke-hf/FacePoke: Select a portrait, click to move the head around (please use your own space / GPU!)

catcathh/UltraPixel: Implementation of UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks

Zeyi-Lin/HivisionIDPhotos: ⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。

VectorSpaceLab/OmniGen: OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

shallowdream204/DreamClear: [NeurIPS 2024🔥] DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

NVlabs/consistory

instantX-research/Regional-Prompting-FLUX: Training-free Regional Prompting for Diffusion Transformers 🔥

ali-vilab/In-Context-LoRA: Official repository of In-Context LoRA for Diffusion Transformers

mit-han-lab/nunchaku: SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

ChenyangSi/FreeU: FreeU: Free Lunch in Diffusion U-Net (CVPR2024 Oral)

magic-quill/MagicQuill: Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

Nutlope/logocreator: A free + OSS logo generator powered by Flux on Together AI

NVlabs/Sana: SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

JackAILab/ConsistentID: Customized ID Consistent for human

DepthAnything/Depth-Anything-V2: [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

tryonlabs/FLUX.1-dev-LoRA-Outfit-Generator: FLUX.1-dev LoRA Outfit Generator can create an outfit by detailing the color, pattern, fit, style, material, and type.

语音, 音乐

netease-youdao/EmotiVoice: EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

haidog-yaqub/EzAudio: High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

2noise/ChatTTS: A generative speech model for daily dialogue.

BytedanceSpeech/seed-tts-eval

RVC-Project/Retrieval-based-Voice-Conversion-WebUI: Easily train a good VC model with voice data <= 10 mins!

GitHub - yxlllc/DDSP-SVC: Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)

voicepaw/so-vits-svc-fork: so-vits-svc fork with realtime support, improved interface and more features.

GitHub - RVC-Boss/GPT-SoVITS: 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

SWivid/F5-TTS: Official code for “F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching”

misya11p/amt-apc: AMT-APC: AMT-APC: Automatic Piano Cover by Fine-Tuning an Automatic Music Transcription Model

WEIFENG2333/AsrTools: ✨ AsrTools: 智能语音转文字工具 | 高效批处理 | 用户友好界面 | 无需 GPU |支持 SRT/TXT 输出 | 让您的音频瞬间变成精确文字!

open-mmlab/Amphion: Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

fishaudio/fish-speech: Brand new TTS solution

3D

VAST-AI-Research/TripoSR 2024-11-27

microsoft/MoGe: MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

HengyiWang/spann3r: 3D Reconstruction with Spatial Memory

Tencent/Hunyuan3D-1

wenqsun/DimensionX: DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

文本处理

zyddnys/manga-image-translator: Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/

chidiwilliams/buzz: Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI’s Whisper.

AgentEra/Agently-Daily-News-Collector: An open-source LLM based automatically daily news collecting workflow showcase powered by Agently AI application development framework.

LC044/WeChatMsg: 提取微信聊天记录,将其导出成HTML、Word、Excel文档永久保存,对聊天记录进行分析生成年度聊天报告,用聊天数据训练专属于个人的AI聊天助手

gabrielchua/open-notebooklm: Convert any PDF into a podcast episode!

getomni-ai/zerox: Zero shot pdf OCR with gpt-4o-mini

opendatalab/PDF-Extract-Kit: A Comprehensive Toolkit for High-Quality PDF Content Extraction

Nutlope/llama-ocr: Document to Markdown OCR library with Llama 3.2 vision

opendatalab/MinerU: A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

其他

showlab/ShowUI: Repository for ShowUI: One Vision-Language-Action Model for GUI Visual Agent 2024-12-02

turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs 2024-12-02

instructor-ai/instructor: structured outputs for llms 2024-12-02

Comprehensive Guide to Prompting Techniques - Instructor 2024-12-02

huggingface/transformers.js: State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server! 2024-12-02

Ucas-HaoranWei/GOT-OCR2.0: Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

deepseek-ai/DeepSeek-VL: DeepSeek-VL: Towards Real-World Vision-Language Understanding

dynobo/normcap: OCR powered screen-capture tool to capture information instead of images

modelscope/DiffSynth-Studio: Enjoy the magic of Diffusion models!

abi/screenshot-to-code: Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

stackblitz/bolt.new: Prompt, run, edit, and deploy full-stack web applications

lean-dojo/LeanCopilot: LLMs as Copilots for Theorem Proving in Lean

geekan/MetaGPT: 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

princeton-nlp/SWE-agent: [NeurIPS 2024] SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges.

OpenCodeInterpreter/OpenCodeInterpreter: OpenCodeInterpreter is a suite of open-source code generation systems aimed at bridging the gap between large language models and sophisticated proprietary systems like the GPT-4 Code Interpreter. It significantly enhances code generation capabilities by integrating execution and iterative refinement functionalities.

Ikaros-521/AI-Vtuber: AI Vtuber是一个由 【ChatterBot/ChatGPT/claude/langchain/chatglm/text-gen-webui/闻达/千问/kimi/ollama】 驱动的虚拟主播【Live2D/UE/xuniren】,可以在 【Bilibili/抖音/快手/微信视频号/拼多多/斗鱼/YouTube/twitch/TikTok】 直播中与观众实时互动 或 直接在本地进行聊天。它使用TTS技术【edge-tts/VITS/elevenlabs/bark/bert-vits2/睿声】生成回答并可以选择【so-vits-svc/DDSP-SVC】变声;指令协同SD画图。

GitHub - 3b1b/manim: Animation engine for explanatory math videos

GitHub - ManimCommunity/manim: A community-maintained Python framework for creating mathematical animations.

GitHub - KindXiaoming/pykan: Kolmogorov Arnold Networks

GitHub - PeterH0323/Streamer-Sales: Streamer-Sales 销冠 —— 卖货主播大模型,一个能够根据给定的商品特点对商品进行解说并激发用户的购买意愿的卖货主播模型

FujiwaraChoki/MoneyPrinter: Automate Creation of YouTube Shorts using MoviePy.

princeton-nlp/SWE-agent: SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4. It solves 12.29% of bugs in the SWE-bench evaluation set (comparable to Devin) and take just 1.5 minutes to run (7x faster than Devin).

harry0703/MoneyPrinterTurbo: 利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

idootop/mi-gpt: 🏠 将小爱音箱接入 ChatGPT 和豆包,改造成你的专属语音助手。

wan-h/awesome-digital-human-live2d: Awesome Digital Human

openai/swarm: Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

meta-llama/llama-recipes: Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.

HqWu-HITCS/Awesome-Chinese-LLM: 整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。

Hannibal046/Awesome-LLM: Awesome-LLM: a curated list of Large Language Model

excalidraw/excalidraw: Virtual whiteboard for sketching hand-drawn like diagrams

meltylabs/melty: Chat first code editor. To download the packaged app:

gpt-omni/mini-omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。