Skip to content

yuanzhi-code/extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

104 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Extractor

An intelligent content aggregation and analysis platform that automatically collects, processes, and generates insights from multiple sources.

English | ็ฎ€ไฝ“ไธญๆ–‡

Features

๐Ÿ”„ Content Aggregation

  • Multi-source data collection (Twitter, WeChat, Podcasts, Videos)
  • Automatic content extraction and cleaning
  • Duplicate content detection
  • Support for RSS feeds

๐Ÿค– AI Processing

  • Intelligent content tagging and categorization
  • Content quality scoring
  • Automated summarization

๐Ÿ“Š Content Generation

  • Daily report generation
  • In-depth research report creation
  • Podcast script generation
  • Text-to-Speech conversion

Getting Started

Prerequisites

  • uv installed

Installation

  1. Clone the repository
git clone https://github.com/yuanzhi-code/extractor.git
cd extractor
  1. init venv and install project dependencies
uv venv && uv sync --group dev
  1. Setup database
uv run alembic upgrade head
  1. Configure your sources
cp data/rss_sources.json.example data/rss_sources.json
# Edit rss_sources.json with your sources

Usage

TODO

Configuration

RSS Sources

Add your RSS sources in data/rss_sources.json:

{
    "sources": [
        {
            "name": "Example Tech Blog",
            "url": "https://example.com/feed",
            "description": "Tech news and updates"
        }
    ]
}

Environment Variables

Refer the .env.example and c reate a .env file:

To make the project run probably, you need to setup the MODEL_PROVIDER and relevant env showed in the .env.example,for example, you choose deepseek as model provider

MODEL_PROVIDER="deepseek"
DEEPSEEK_API_KEY="sk-xxxxxxx"
DEEPSEEK_MODEL="deepseek-chat"

Contributing

  1. Fork the repository
  2. Create your feature branch
  3. Commit your changes
  4. Push to the branch
  5. Open a Pull Request

About

๐Ÿค–๐Ÿ”Ž Intelligent platform for multi-source content aggregation, AI analysis, and insight generation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages