RacketScrapingHTMLAutomation

Project 3: Build a Web Scraper/Bot | Racket Projects

2.55 min read
Md Nasim SheikhMd Nasim Sheikh
Share:

Python is famous for scraping, but Racket is faster and has excellent HTML parsing libraries.

Advertisement

Tools

We need:

  • net/url to fetch pages.
  • html-parsing to parse S-expressions (HTML is trees!).
#lang racket
(require net/url)
(require html-parsing)
(require sxml/sxpath)

Fetching the Page

(define (fetch-page url)
  (call/input-url (string->url url)
                  get-pure-port
                  html->xexp))

This returns the HTML as an X-expression (a list of symbols and strings).

Extracting Data (XPath)

We use sxpath to query the tree. It's like CSS selectors but for lists.

(define (get-headlines url)
  (define content (fetch-page url))
  (define extract-h2 (sxpath "//h2/text()"))
  
  (extract-h2 content))

Running the Bot

(define news (get-headlines "https://news.ycombinator.com"))

(with-output-to-file "headlines.txt"
  (lambda ()
    (for ([item news])
      (displayln item)))
  #:exists 'replace)

Advertisement

Summary

Because HTML is nested (like Lisp code), Racket is naturally good at parsing it. You can build powerful extractors without regular expressions.

Quick Quiz

What format does Racket convert HTML into for easy parsing?

Md Nasim Sheikh
Written by

Md Nasim Sheikh

Software Developer at softexForge

Verified Author150+ Projects
Published:

You May Also Like