Advanced Web Crawling & Multi-Source Trending Analysis Platform
Intelligent batch processing • Real-time analytics • Professional-grade backlink analysis • AI-powered content intelligence
From seed URLs to intelligent insights - follow the complete journey of data through our advanced crawling and analysis pipeline
Initialize crawling process
Discover link relationships
Extract intelligent insights
Persist & organize results
Load Seed URLs
Initialize from seed_urls.json file
Fetch & Parse
Extract content and discover new links
Backlink Analysis
Build comprehensive link network
Turso Database
Store crawled data in distributed SQLite
Trending Analysis
Extract and index trending topics
Real-time Monitoring
Live dashboard and API access
Get RatCrawler running in minutes with these simple steps
git clone
https://github.com/swadhinbiswas/ratcrowler.git
pip install -r requirements.txt
export DASHBOARD_PASSWORD=swadhin
python main.py
Starts intelligent batch processing with automatic resume
python run_dashboard.py
Access dashboard at http://localhost:8501
python run_log_api.py
Real-time logs at http://localhost:8502
Automatically crawl and analyze web content with intelligent batch processing
Smart Batch Processing
Monitor Google Trends and social media for real-time insights
Multi-source analytics
Professional PageRank calculation and spam detection
Advanced algorithms
🎉 Everything is configured and ready to go!
Your system will automatically resume from where it left off
RatCrawler is a sophisticated multi-source trending analysis platform that combines intelligent web crawling, Google Trends analysis, Twitter/X trends monitoring, and professional-grade backlink analysis.
Automatically processes URLs with optimized batch sizes, progress persistence and auto-resume capability.
Google Trends + Twitter/X + Web crawling integration for comprehensive trend analysis.
PageRank calculation, domain authority assessment, and advanced spam detection algorithms.
Modular and scalable system design
Handles web page discovery, content extraction, and link following
Analyzes link relationships, calculates PageRank, and detects spam
Aggregates data from multiple sources for trending analysis
Manages data persistence, indexing, and query optimization
Intelligent URL batch processing from backlinks database with automatic progress tracking and resume capability.
Advanced async HTTP client with comprehensive content extraction, robots.txt respect, and error handling.
Real-time data from Google Trends and Twitter/X with cross-platform analytics and trend correlation.
Multi-database support with automatic schema migration, connection pooling, and cloud backup.
Real-time dashboard and API monitoring with authentication, performance metrics, and health checks.
Advanced spam detection, PageRank calculation, domain authority assessment, and network analysis.
Comprehensive web crawling and analysis capabilities
Complete installation guide for RatCrawler multi-source analysis platform
git clone https://github.com/swadhinbiswas/ratcrowler.git
cd ratcrowler
pip install -r requirements.txt
Installs: SQLAlchemy, Streamlit, FastAPI, aiohttp, BeautifulSoup4, and more
pip install -r requirements_turso.txt
For cloud database integration and scaling
export DASHBOARD_PASSWORD=swadhin
export TURSO_DATABASE_URL=your_url # Optional
export TURSO_AUTH_TOKEN=your_token # Optional
python main.py # Start batch crawler
python run_dashboard.py # Launch dashboard
docker build -t ratcrawler .
docker run -p 8501:8501 -p 8502:8502 ratcrawler
cd rust_version
cargo build --release
./target/release/rat-crawler
For maximum performance and memory efficiency
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
pip install -e .
seed_urls.json
- Initial URL list
trends.json
- Trending data cache
website_crawler.db
- Local database
python -c "from rat.database import *; print('✓ Database
OK')"
python test_enhanced_crawler.py
curl http://localhost:8501
sudo pip install
or virtual environment
Complete guide to using RatCrawler's powerful features
python main.py
Processes 50 URLs at a time with automatic progress saving
python run_dashboard.py
Real-time dashboard at http://localhost:8501
python run_log_api.py
Detailed logging API at http://localhost:8502
cd engine && python googletrends.py --limit 10
Analyze top trending topics with custom limits
cd engine && python xtrends.py
Social media trending analysis and correlation
python test_backlink_storage.py
PageRank calculation and spam detection
# main.py configuration
BATCH_SIZE = 50 # URLs per batch
DELAY_BETWEEN_REQUESTS = 1 # seconds
MAX_CONCURRENT_REQUESTS = 10
RESPECT_ROBOTS_TXT = True
ENABLE_DETAILED_LOGGING = True
# Local SQLite (default)
DATABASE_URL = "sqlite:///website_crawler.db"
# Turso Cloud Database
TURSO_DATABASE_URL = "libsql://your-db.turso.io"
TURSO_AUTH_TOKEN = "your-auth-token"
# Environment variables
export DASHBOARD_PASSWORD=swadhin
export LOG_API_TOKEN=your-secret-token
export USER_AGENT="RatCrawler/2.0"
export RATE_LIMIT=true
# Dashboard: http://localhost:8501
# Log API: http://localhost:8502
# Health Check: /health
# Metrics: /metrics
# Status: /status
# Analyze competitor backlinks
python test_backlink_storage.py
# Monitor domain authority
python test_enhanced_crawler.py
Comprehensive SEO analysis with PageRank and domain authority metrics
# Track trending topics
cd engine && python googletrends.py
# Social media analysis
python xtrends.py --realtime
Real-time market trends and social media sentiment analysis
# Large-scale crawling
python main.py
# Export results
python dashboard.py --export
Automated data collection with smart queue management and intelligent batch processing
Core classes and methods
comprehensive_crawl()
- Full crawling with
analysis
crawl_page_content()
- Single page crawling
export_results()
- Export crawl dataget_all_backlinks()
- Retrieve backlinks
crawl_backlinks()
- Discover backlinkscalculate_pagerank()
- PageRank computation
calculate_domain_authority()
- Domain scoring
detect_link_spam()
- Spam detectioncrawl()
- Async web crawlingcrawl_single_page()
- Single page processing
extract_urls()
- URL discoverycan_crawl()
- Robots.txt checkinganalyze_backlinks()
- Backlink analysis
calculate_domain_authority()
- Authority
scoring
detect_spam_links()
- Spam detection