📚 API Documentation

Complete reference for RatCrawler's APIs

Python API Reference

Detailed documentation for Python implementation classes and methods

EnhancedProductionCrawler

Main crawler class that combines web content crawling with backlink analysis.

Methods:

comprehensive_crawl(seed_urls: List[str]) -> Dict

Execute comprehensive crawling and analysis using a priority queue frontier.

Parameters:
  • seed_urls - List of starting URLs to crawl
Returns:
  • • Dictionary with complete analysis results including pages crawled, backlinks found, and performance metrics
crawl_page_content(url: str) -> Optional[Dict]

Crawl a single page and extract comprehensive content data.

Parameters:
  • url - URL to crawl
Returns:
  • • Dictionary with page content data or None if crawling failed
export_results(results: Dict)

Export crawl results to JSON and CSV files.

Parameters:
  • results - Results dictionary from comprehensive_crawl

BacklinkProcessor

Handles backlink discovery, analysis, and PageRank calculations.

Methods:

crawl_backlinks(seed_urls: List[str], max_depth: int = 2)

Crawl and discover backlinks starting from seed URLs.

Parameters:
  • seed_urls - Starting URLs for backlink discovery
  • max_depth - Maximum depth for crawling (default: 2)
calculate_pagerank(damping_factor: float = 0.85) -> Dict

Calculate PageRank scores for all discovered pages.

Parameters:
  • damping_factor - PageRank damping factor (default: 0.85)
Returns:
  • • Dictionary mapping URLs to PageRank scores
calculate_domain_authority()

Calculate domain authority scores based on backlink profiles.

Stores:
  • • Domain authority scores in the database
  • • Scores range from 0-100 based on backlink quality and diversity

Rust API Reference

Detailed documentation for Rust implementation structs and methods

WebsiteCrawler

High-performance async web crawler with concurrent processing.

Methods:

async fn crawl(&mut self, seed_urls: Vec<String>, database: &mut WebsiteCrawlerDatabase) -> Result<CrawlResult, CrawlError>

Execute comprehensive web crawling with priority-based frontier.

Parameters:
  • seed_urls - Vector of starting URLs
  • database - Mutable reference to database instance
Returns:
  • • Result containing crawl statistics or error
async fn crawl_single_page(&self, url: &str, depth: usize) -> Result<CrawledPage, CrawlError>

Crawl a single page and extract structured data.

Parameters:
  • url - URL to crawl
  • depth - Current crawl depth
Returns:
  • • Result containing crawled page data or error
fn extract_urls(&self, html: &str, base_url: &str) -> Vec<String>

Extract all URLs from HTML content.

Parameters:
  • html - HTML content to parse
  • base_url - Base URL for resolving relative links
Returns:
  • • Vector of extracted URLs

BacklinkProcessor

Async backlink analysis with parallel processing capabilities.

Methods:

async fn analyze_backlinks(&self, url: &str) -> Result<BacklinkAnalysis, BacklinkError>

Analyze backlinks for a given URL with comprehensive metrics.

Parameters:
  • url - Target URL for backlink analysis
Returns:
  • • Result containing backlink analysis data

Database Schema

SQLite database structure and table definitions

Core Tables

crawl_sessions

Tracks crawling sessions and their metadata

  • • id (PRIMARY KEY)
  • • start_time (TIMESTAMP)
  • • seed_urls (TEXT)
  • • config (TEXT)
  • • end_time (TIMESTAMP)
  • • status (TEXT)

crawled_pages

Stores crawled page content and metadata

  • • id (PRIMARY KEY)
  • • session_id (FOREIGN KEY)
  • • url (TEXT, NOT NULL)
  • • title (TEXT)
  • • content_text (TEXT)
  • • content_html (TEXT)
  • • word_count (INTEGER)
  • • crawl_time (TIMESTAMP)

backlinks

Stores discovered backlink relationships

  • • id (PRIMARY KEY)
  • • session_id (FOREIGN KEY)
  • • source_url (TEXT)
  • • target_url (TEXT)
  • • anchor_text (TEXT)
  • • is_nofollow (BOOLEAN)
  • • crawl_date (TIMESTAMP)

Database Methods

WebsiteCrawlerDatabase

  • create_crawl_session()
  • store_crawled_page()
  • store_backlinks()
  • get_all_crawled_urls()
  • get_crawl_summary()

BacklinkDatabase

  • store_backlinks()
  • get_backlinks_for_url()
  • store_domain_scores()
  • get_domain_authority()

Code Examples

Practical examples for using RatCrawler APIs

🐍 Python Examples

Basic Crawling

from crawler import EnhancedProductionCrawler

# Initialize crawler
config = {
    'delay': 1.5,
    'max_depth': 3,
    'max_pages': 100,
    'db_path': 'website_crawler.db'
}

crawler = EnhancedProductionCrawler(config)

# Start crawling
seed_urls = ['https://example.com']
results = crawler.comprehensive_crawl(seed_urls)

print(f"Crawled {results['pages_crawled']} pages")
print(f"Found {results['backlinks_found']} backlinks")

Backlink Analysis

from backlinkprocessor import BacklinkProcessor

# Initialize processor
processor = BacklinkProcessor(delay=1.0)

# Discover backlinks
seed_urls = ['https://example.com']
processor.crawl_backlinks(seed_urls, max_depth=2)

# Calculate metrics
pagerank_scores = processor.calculate_pagerank()
processor.calculate_domain_authority()

print(f"Analyzed {len(processor.backlinks)} backlinks")

🦀 Rust Examples

Async Crawling

use ratcrawler::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = CrawlConfig {
        user_agent: "MyBot/1.0".to_string(),
        timeout_secs: 30,
        max_pages: 100,
        ..Default::default()
    };

    let mut crawler = WebsiteCrawler::new(&config);
    let mut database = WebsiteCrawlerDatabase::new("crawl.db")?;

    let seed_urls = vec!["https://example.com".to_string()];
    let result = crawler.crawl(seed_urls, &mut database).await?;

    println!("Crawled {} pages", result.pages_crawled);
    Ok(())
}

Backlink Analysis

use ratcrawler::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let processor = BacklinkProcessor::new(
        "MyBot/1.0".to_string(),
        60, // timeout
        5   // max redirects
    );

    let database = BacklinkDatabase::new("backlinks.db")?;
    let mut analyzer = BacklinkAnalyzer::new(processor, database);

    let analysis = analyzer.analyze_backlinks("https://example.com").await?;
    println!("Found {} backlinks", analysis.total_backlinks);
    Ok(())
}