Ultimate Guide to Web Scraping
Learn how to avoid the most common pitfalls and find the data you need.
The book is designed to walk you from beginner to expert, honing your skills and helping you become a master craftsman in the art of web scraping.
You’ll learn about:
- The most common complaints about web scraping, and why they probably don’t matter for you.
- How modern websites send information to a browser, and how you can intercept and parse it.
- How to find all the data you need on someone else’s website.
- Common traps and anti-scraping tactics (and how to thwart them).
- How to write a well-behaved scraper and be a good scraping citizen.
The book has several code samples in some of the most common languages. It’ll walk through the process of scraping data from several large site, step-by-step.
— ★ Jaron Ray Hinds ★ (@JaronRayHinds) July 11, 2014
— Ryan Gum (@ryangum) January 22, 2014
Table of Contents
The Ultimate Guide to Web Scraping contains the following chapters:
- Introduction to Web Scraping
- Web Scraping as a Legitimate Data Collection Tool
- Understand Web Technologies: A Brief Introduction to HTTP and the DOM
- Finding The Data: Discovering Your “API”
- Extracting the Data: Finding Structure in an HTML Document
- Sample Code to Get You Started
- Avoiding Common Scraping Traps
- Being a Good Web Scraping Citizen
About the Author
Over the past few years, I’ve scraped dozens of websites — from music blogs and fashion retailers to the USPTO and undocumented JSON endpoints. I’ve learned a thing or two along the way, so I wrote an article last winter called I Don’t Need No Stinking API: Web Scraping For Fun and Profit.
That article has now been viewed almost 100,000 times. It’s helped me meet people from all over the world who are trying to navigate the wild world of web scraping. After dozens of conversation via email and Twitter, I finally decided I’d write a book.
Try The Free Email Course
You’ll receive several brief emails that contain helpful information from the book.
© 2013 Hartley Brody
Questions? Get in touch…