Extractor

An intelligent content aggregation and analysis platform that automatically collects, processes, and generates insights from multiple sources.

English | 简体中文

Features

🔄 Content Aggregation

Multi-source data collection (Twitter, WeChat, Podcasts, Videos)
Automatic content extraction and cleaning
Duplicate content detection
Support for RSS feeds

🤖 AI Processing

Intelligent content tagging and categorization
Content quality scoring
Automated summarization

📊 Content Generation

Daily report generation
In-depth research report creation
Podcast script generation
Text-to-Speech conversion

Getting Started

Prerequisites

uv installed

Installation

Clone the repository

git clone https://github.com/yuanzhi-code/extractor.git
cd extractor

init venv and install project dependencies

uv venv && uv sync --group dev

Setup database

uv run alembic upgrade head

Configure your sources

cp data/rss_sources.json.example data/rss_sources.json
# Edit rss_sources.json with your sources

Usage

TODO

Configuration

RSS Sources

Add your RSS sources in data/rss_sources.json:

{
    "sources": [
        {
            "name": "Example Tech Blog",
            "url": "https://example.com/feed",
            "description": "Tech news and updates"
        }
    ]
}

Environment Variables

Refer the .env.example and c reate a .env file:

To make the project run probably, you need to setup the MODEL_PROVIDER and relevant env showed in the .env.example,for example, you choose deepseek as model provider

MODEL_PROVIDER="deepseek"
DEEPSEEK_API_KEY="sk-xxxxxxx"
DEEPSEEK_MODEL="deepseek-chat"

Contributing

Fork the repository
Create your feature branch
Commit your changes
Push to the branch
Open a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
.vscode		.vscode
config		config
data		data
docs		docs
scripts		scripts
src		src
test		test
testdata		testdata
web		web
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
.ruff.toml		.ruff.toml
README.md		README.md
README_ZH.md		README_ZH.md
alembic.ini		alembic.ini
clear_table.py		clear_table.py
main.py		main.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extractor

Features

🔄 Content Aggregation

🤖 AI Processing

📊 Content Generation

Getting Started

Prerequisites

Installation

Usage

Configuration

RSS Sources

Environment Variables

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Extractor

Features

🔄 Content Aggregation

🤖 AI Processing

📊 Content Generation

Getting Started

Prerequisites

Installation

Usage

Configuration

RSS Sources

Environment Variables

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages