How to Scrape Hacker News (news.ycombinator.com)
Learn how to scrape Hacker News to extract top tech stories, job listings, and community discussions. Perfect for market research and trend analysis.
Anti-Bot Protection Detected
- Rate Limiting
- Limits requests per IP/session over time. Can be bypassed with rotating proxies, request delays, and distributed scraping.
- IP Blocking
- Blocks known datacenter IPs and flagged addresses. Requires residential or mobile proxies to circumvent effectively.
- User-Agent Filtering
About Hacker News
Learn what Hacker News offers and what valuable data can be extracted from it.
The Tech Hub
Hacker News is a social news website focusing on computer science and entrepreneurship, operated by the startup incubator Y Combinator. It functions as a community-driven platform where users submit links to technical articles, startup news, and deep-dive discussions.
Data Richness
The platform contains a wealth of real-time data including upvoted tech stories, "Show HN" startup launches, "Ask HN" community questions, and specialized job boards. It is widely considered the pulse of the Silicon Valley ecosystem and the broader global developer community.
Strategic Value
Scraping this data allows businesses and researchers to monitor emerging technologies, track competitor mentions, and identify influential thought leaders. Because the site layout is remarkably stable and lean, it is one of the most reliable sources for automated technical news aggregation.

Why Scrape Hacker News?
Discover the business value and use cases for extracting data from Hacker News.
Identify emerging programming languages and developer tools early
Monitor the startup ecosystem for new launches and funding news
Lead generation for technical recruiting by monitoring 'Who is Hiring' threads
Sentiment analysis on software releases and corporate announcements
Building high-signal technical news aggregators for niche audiences
Academic research on information propagation in technical communities
Scraping Challenges
Technical challenges you may encounter when scraping Hacker News.
Parsing nested HTML table structures used for layouts
Handling relative time strings like '2 hours ago' for database storage
Managing server-side rate limits that trigger temporary IP bans
Extracting deep comment hierarchies that span multiple pages
Scrape Hacker News with AI
No coding required. Extract data in minutes with AI-powered automation.
How It Works
Describe What You Need
Tell the AI what data you want to extract from Hacker News. Just type it in plain language — no coding or selectors needed.
AI Extracts the Data
Our artificial intelligence navigates Hacker News, handles dynamic content, and extracts exactly what you asked for.
Get Your Data
Receive clean, structured data ready to export as CSV, JSON, or send directly to your apps and workflows.
Why Use AI for Scraping
AI makes it easy to scrape Hacker News without writing any code. Our AI-powered platform uses artificial intelligence to understand what data you want — just describe it in plain language and the AI extracts it automatically.
How to scrape with AI:
- Describe What You Need: Tell the AI what data you want to extract from Hacker News. Just type it in plain language — no coding or selectors needed.
- AI Extracts the Data: Our artificial intelligence navigates Hacker News, handles dynamic content, and extracts exactly what you asked for.
- Get Your Data: Receive clean, structured data ready to export as CSV, JSON, or send directly to your apps and workflows.
Why use AI for scraping:
- Point-and-click selection of stories without writing complex CSS selectors
- Automatic handling of the 'More' button for seamless pagination
- Built-in cloud execution to prevent your local IP from being rate-limited
- Scheduled scraping runs to capture the front page every hour automatically
- Direct export to Google Sheets or Webhooks for real-time alerts
No-Code Web Scrapers for Hacker News
Point-and-click alternatives to AI-powered scraping
Several no-code tools like Browse.ai, Octoparse, Axiom, and ParseHub can help you scrape Hacker News. These tools use visual interfaces to select elements, but they come with trade-offs compared to AI-powered solutions.
Typical Workflow with No-Code Tools
Common Challenges
Learning curve
Understanding selectors and extraction logic takes time
Selectors break
Website changes can break your entire workflow
Dynamic content issues
JavaScript-heavy sites often require complex workarounds
CAPTCHA limitations
Most tools require manual intervention for CAPTCHAs
IP blocking
Aggressive scraping can get your IP banned
No-Code Web Scrapers for Hacker News
Several no-code tools like Browse.ai, Octoparse, Axiom, and ParseHub can help you scrape Hacker News. These tools use visual interfaces to select elements, but they come with trade-offs compared to AI-powered solutions.
Typical Workflow with No-Code Tools
- Install browser extension or sign up for the platform
- Navigate to the target website and open the tool
- Point-and-click to select data elements you want to extract
- Configure CSS selectors for each data field
- Set up pagination rules to scrape multiple pages
- Handle CAPTCHAs (often requires manual solving)
- Configure scheduling for automated runs
- Export data to CSV, JSON, or connect via API
Common Challenges
- Learning curve: Understanding selectors and extraction logic takes time
- Selectors break: Website changes can break your entire workflow
- Dynamic content issues: JavaScript-heavy sites often require complex workarounds
- CAPTCHA limitations: Most tools require manual intervention for CAPTCHAs
- IP blocking: Aggressive scraping can get your IP banned
Code Examples
import requests
from bs4 import BeautifulSoup
url = 'https://news.ycombinator.com/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'}
try:
response = requests.get(url, headers=headers)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
# Stories are contained in rows with class 'athing'
posts = soup.select('.athing')
for post in posts:
title_element = post.select_one('.titleline > a')
title = title_element.text
link = title_element['href']
print(f'Title: {title}
Link: {link}
---')
except Exception as e:
print(f'Scraping failed: {e}')When to Use
Best for static HTML pages where content is loaded server-side. The fastest and simplest approach when JavaScript rendering isn't required.
Advantages
- ●Fastest execution (no browser overhead)
- ●Lowest resource consumption
- ●Easy to parallelize with asyncio
- ●Great for APIs and static pages
Limitations
- ●Cannot execute JavaScript
- ●Fails on SPAs and dynamic content
- ●May struggle with complex anti-bot systems
How to Scrape Hacker News with Code
Python + Requests
import requests
from bs4 import BeautifulSoup
url = 'https://news.ycombinator.com/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'}
try:
response = requests.get(url, headers=headers)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
# Stories are contained in rows with class 'athing'
posts = soup.select('.athing')
for post in posts:
title_element = post.select_one('.titleline > a')
title = title_element.text
link = title_element['href']
print(f'Title: {title}
Link: {link}
---')
except Exception as e:
print(f'Scraping failed: {e}')Python + Playwright
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto('https://news.ycombinator.com/')
# Wait for the table to load
page.wait_for_selector('.athing')
# Extract all story titles and links
items = page.query_selector_all('.athing')
for item in items:
title_link = item.query_selector('.titleline > a')
if title_link:
print(title_link.inner_text(), title_link.get_attribute('href'))
browser.close()Python + Scrapy
import scrapy
class HackerNewsSpider(scrapy.Spider):
name = 'hn_spider'
start_urls = ['https://news.ycombinator.com/']
def parse(self, response):
for post in response.css('.athing'):
yield {
'id': post.attrib.get('id'),
'title': post.css('.titleline > a::text').get(),
'link': post.css('.titleline > a::attr(href)').get(),
}
# Follow pagination 'More' link
next_page = response.css('a.morelink::attr(href)').get()
if next_page:
yield response.follow(next_page, self.parse)Node.js + Puppeteer
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://news.ycombinator.com/');
const results = await page.evaluate(() => {
const items = Array.from(document.querySelectorAll('.athing'));
return items.map(item => ({
title: item.querySelector('.titleline > a').innerText,
url: item.querySelector('.titleline > a').href
}));
});
console.log(results);
await browser.close();
})();What You Can Do With Hacker News Data
Explore practical applications and insights from Hacker News data.
Startup Trend Discovery
Identify which industries or product types are being launched and discussed most frequently.
How to implement:
- 1Scrape the 'Show HN' category on a weekly basis.
- 2Clean and categorize startup descriptions using NLP.
- 3Rank trends based on community upvotes and comment sentiment.
Use Automatio to extract data from Hacker News and build these applications without writing code.
What You Can Do With Hacker News Data
- Startup Trend Discovery
Identify which industries or product types are being launched and discussed most frequently.
- Scrape the 'Show HN' category on a weekly basis.
- Clean and categorize startup descriptions using NLP.
- Rank trends based on community upvotes and comment sentiment.
- Tech Sourcing & Recruitment
Extract job listings and company details from specialized monthly hiring threads.
- Monitor for the monthly 'Who is hiring' thread ID.
- Scrape all top-level comments which contain job descriptions.
- Parse text for specific tech stacks like Rust, AI, or React.
- Competitive Intelligence
Track mentions of competitors in comments to understand public perception and complaints.
- Set up a keyword-based scraper for specific brand names.
- Extract user comments and timestamps for sentiment analysis.
- Generate weekly reports on brand health versus competitors.
- Automated Content Curation
Create a high-signal tech newsletter that only includes the most relevant stories.
- Scrape the front page every 6 hours.
- Filter for posts that exceed a threshold of 200 points.
- Automate the delivery of these links to a Telegram bot or email list.
- Venture Capital Lead Gen
Discover early-stage startups that are gaining significant community traction.
- Track 'Show HN' posts that hit the front page.
- Monitor the growth rate of upvotes over the first 4 hours.
- Alert analysts when a post shows viral growth patterns.
Supercharge your workflow with AI Automation
Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.
Pro Tips for Scraping Hacker News
Expert advice for successfully extracting data from Hacker News.
Use the official Firebase API for massive historical data collection to avoid HTML parsing complexity.
Always set a custom User-Agent to identify your bot responsibly and avoid immediate blocking.
Implement a random sleep interval of 3-7 seconds between requests to mimic human browsing behavior.
Target specific subdirectories like /newest for fresh stories or /ask for community discussions.
Store the 'Item ID' as your primary key to avoid duplicate entries when scraping the front page frequently.
Scrape during off-peak hours (UTC night) to experience faster response times and lower rate-limiting risks.
Testimonials
What Our Users Say
Join thousands of satisfied users who have transformed their workflow
Jonathan Kogan
Co-Founder/CEO, rpatools.io
Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.
Mohammed Ibrahim
CEO, qannas.pro
I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!
Ben Bressington
CTO, AiChatSolutions
Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!
Sarah Chen
Head of Growth, ScaleUp Labs
We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.
David Park
Founder, DataDriven.io
The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!
Emily Rodriguez
Marketing Director, GrowthMetrics
Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.
Jonathan Kogan
Co-Founder/CEO, rpatools.io
Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.
Mohammed Ibrahim
CEO, qannas.pro
I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!
Ben Bressington
CTO, AiChatSolutions
Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!
Sarah Chen
Head of Growth, ScaleUp Labs
We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.
David Park
Founder, DataDriven.io
The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!
Emily Rodriguez
Marketing Director, GrowthMetrics
Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.
Related Web Scraping

How to Scrape Healthline: The Ultimate Health & Medical Data Guide

How to Scrape Daily Paws: A Step-by-Step Web Scraper Guide
How to Scrape BeChewy: Extract Pet Care Guides & Health Advice

How to Scrape Web Designer News

How to Scrape Substack Newsletters and Posts
Frequently Asked Questions About Hacker News
Find answers to common questions about Hacker News