How to Scrape Hacker News (news.ycombinator.com)

Learn how to scrape Hacker News to extract top tech stories, job listings, and community discussions. Perfect for market research and trend analysis.

Coverage:Global
Available Data6 fields
TitleDescriptionSeller InfoPosting DateCategoriesAttributes
All Extractable Fields
Story TitleExternal URLSource DomainPoints (Upvotes)Author UsernameTimestampComment CountItem IDPost RankJob TitleComment Text
Technical Requirements
Static HTML
No Login
Has Pagination
Official API Available
Anti-Bot Protection Detected
Rate LimitingIP BlockingUser-Agent Filtering

Anti-Bot Protection Detected

Rate Limiting
Limits requests per IP/session over time. Can be bypassed with rotating proxies, request delays, and distributed scraping.
IP Blocking
Blocks known datacenter IPs and flagged addresses. Requires residential or mobile proxies to circumvent effectively.
User-Agent Filtering

About Hacker News

Learn what Hacker News offers and what valuable data can be extracted from it.

The Tech Hub

Hacker News is a social news website focusing on computer science and entrepreneurship, operated by the startup incubator Y Combinator. It functions as a community-driven platform where users submit links to technical articles, startup news, and deep-dive discussions.

Data Richness

The platform contains a wealth of real-time data including upvoted tech stories, "Show HN" startup launches, "Ask HN" community questions, and specialized job boards. It is widely considered the pulse of the Silicon Valley ecosystem and the broader global developer community.

Strategic Value

Scraping this data allows businesses and researchers to monitor emerging technologies, track competitor mentions, and identify influential thought leaders. Because the site layout is remarkably stable and lean, it is one of the most reliable sources for automated technical news aggregation.

About Hacker News

Why Scrape Hacker News?

Discover the business value and use cases for extracting data from Hacker News.

Identify emerging programming languages and developer tools early

Monitor the startup ecosystem for new launches and funding news

Lead generation for technical recruiting by monitoring 'Who is Hiring' threads

Sentiment analysis on software releases and corporate announcements

Building high-signal technical news aggregators for niche audiences

Academic research on information propagation in technical communities

Scraping Challenges

Technical challenges you may encounter when scraping Hacker News.

Parsing nested HTML table structures used for layouts

Handling relative time strings like '2 hours ago' for database storage

Managing server-side rate limits that trigger temporary IP bans

Extracting deep comment hierarchies that span multiple pages

Scrape Hacker News with AI

No coding required. Extract data in minutes with AI-powered automation.

How It Works

1

Describe What You Need

Tell the AI what data you want to extract from Hacker News. Just type it in plain language — no coding or selectors needed.

2

AI Extracts the Data

Our artificial intelligence navigates Hacker News, handles dynamic content, and extracts exactly what you asked for.

3

Get Your Data

Receive clean, structured data ready to export as CSV, JSON, or send directly to your apps and workflows.

Why Use AI for Scraping

Point-and-click selection of stories without writing complex CSS selectors
Automatic handling of the 'More' button for seamless pagination
Built-in cloud execution to prevent your local IP from being rate-limited
Scheduled scraping runs to capture the front page every hour automatically
Direct export to Google Sheets or Webhooks for real-time alerts
No credit card requiredFree tier availableNo setup needed

AI makes it easy to scrape Hacker News without writing any code. Our AI-powered platform uses artificial intelligence to understand what data you want — just describe it in plain language and the AI extracts it automatically.

How to scrape with AI:
  1. Describe What You Need: Tell the AI what data you want to extract from Hacker News. Just type it in plain language — no coding or selectors needed.
  2. AI Extracts the Data: Our artificial intelligence navigates Hacker News, handles dynamic content, and extracts exactly what you asked for.
  3. Get Your Data: Receive clean, structured data ready to export as CSV, JSON, or send directly to your apps and workflows.
Why use AI for scraping:
  • Point-and-click selection of stories without writing complex CSS selectors
  • Automatic handling of the 'More' button for seamless pagination
  • Built-in cloud execution to prevent your local IP from being rate-limited
  • Scheduled scraping runs to capture the front page every hour automatically
  • Direct export to Google Sheets or Webhooks for real-time alerts

No-Code Web Scrapers for Hacker News

Point-and-click alternatives to AI-powered scraping

Several no-code tools like Browse.ai, Octoparse, Axiom, and ParseHub can help you scrape Hacker News. These tools use visual interfaces to select elements, but they come with trade-offs compared to AI-powered solutions.

Typical Workflow with No-Code Tools

1
Install browser extension or sign up for the platform
2
Navigate to the target website and open the tool
3
Point-and-click to select data elements you want to extract
4
Configure CSS selectors for each data field
5
Set up pagination rules to scrape multiple pages
6
Handle CAPTCHAs (often requires manual solving)
7
Configure scheduling for automated runs
8
Export data to CSV, JSON, or connect via API

Common Challenges

Learning curve

Understanding selectors and extraction logic takes time

Selectors break

Website changes can break your entire workflow

Dynamic content issues

JavaScript-heavy sites often require complex workarounds

CAPTCHA limitations

Most tools require manual intervention for CAPTCHAs

IP blocking

Aggressive scraping can get your IP banned

No-Code Web Scrapers for Hacker News

Several no-code tools like Browse.ai, Octoparse, Axiom, and ParseHub can help you scrape Hacker News. These tools use visual interfaces to select elements, but they come with trade-offs compared to AI-powered solutions.

Typical Workflow with No-Code Tools
  1. Install browser extension or sign up for the platform
  2. Navigate to the target website and open the tool
  3. Point-and-click to select data elements you want to extract
  4. Configure CSS selectors for each data field
  5. Set up pagination rules to scrape multiple pages
  6. Handle CAPTCHAs (often requires manual solving)
  7. Configure scheduling for automated runs
  8. Export data to CSV, JSON, or connect via API
Common Challenges
  • Learning curve: Understanding selectors and extraction logic takes time
  • Selectors break: Website changes can break your entire workflow
  • Dynamic content issues: JavaScript-heavy sites often require complex workarounds
  • CAPTCHA limitations: Most tools require manual intervention for CAPTCHAs
  • IP blocking: Aggressive scraping can get your IP banned

Code Examples

import requests
from bs4 import BeautifulSoup

url = 'https://news.ycombinator.com/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'}

try:
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Stories are contained in rows with class 'athing'
    posts = soup.select('.athing')
    for post in posts:
        title_element = post.select_one('.titleline > a')
        title = title_element.text
        link = title_element['href']
        print(f'Title: {title}
Link: {link}
---')
except Exception as e:
    print(f'Scraping failed: {e}')

When to Use

Best for static HTML pages where content is loaded server-side. The fastest and simplest approach when JavaScript rendering isn't required.

Advantages

  • Fastest execution (no browser overhead)
  • Lowest resource consumption
  • Easy to parallelize with asyncio
  • Great for APIs and static pages

Limitations

  • Cannot execute JavaScript
  • Fails on SPAs and dynamic content
  • May struggle with complex anti-bot systems

How to Scrape Hacker News with Code

Python + Requests
import requests
from bs4 import BeautifulSoup

url = 'https://news.ycombinator.com/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'}

try:
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Stories are contained in rows with class 'athing'
    posts = soup.select('.athing')
    for post in posts:
        title_element = post.select_one('.titleline > a')
        title = title_element.text
        link = title_element['href']
        print(f'Title: {title}
Link: {link}
---')
except Exception as e:
    print(f'Scraping failed: {e}')
Python + Playwright
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto('https://news.ycombinator.com/')
    
    # Wait for the table to load
    page.wait_for_selector('.athing')
    
    # Extract all story titles and links
    items = page.query_selector_all('.athing')
    for item in items:
        title_link = item.query_selector('.titleline > a')
        if title_link:
            print(title_link.inner_text(), title_link.get_attribute('href'))
            
    browser.close()
Python + Scrapy
import scrapy

class HackerNewsSpider(scrapy.Spider):
    name = 'hn_spider'
    start_urls = ['https://news.ycombinator.com/']

    def parse(self, response):
        for post in response.css('.athing'):
            yield {
                'id': post.attrib.get('id'),
                'title': post.css('.titleline > a::text').get(),
                'link': post.css('.titleline > a::attr(href)').get(),
            }
        
        # Follow pagination 'More' link
        next_page = response.css('a.morelink::attr(href)').get()
        if next_page:
            yield response.follow(next_page, self.parse)
Node.js + Puppeteer
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://news.ycombinator.com/');
  
  const results = await page.evaluate(() => {
    const items = Array.from(document.querySelectorAll('.athing'));
    return items.map(item => ({
      title: item.querySelector('.titleline > a').innerText,
      url: item.querySelector('.titleline > a').href
    }));
  });

  console.log(results);
  await browser.close();
})();

What You Can Do With Hacker News Data

Explore practical applications and insights from Hacker News data.

Startup Trend Discovery

Identify which industries or product types are being launched and discussed most frequently.

How to implement:

  1. 1Scrape the 'Show HN' category on a weekly basis.
  2. 2Clean and categorize startup descriptions using NLP.
  3. 3Rank trends based on community upvotes and comment sentiment.

Use Automatio to extract data from Hacker News and build these applications without writing code.

What You Can Do With Hacker News Data

  • Startup Trend Discovery

    Identify which industries or product types are being launched and discussed most frequently.

    1. Scrape the 'Show HN' category on a weekly basis.
    2. Clean and categorize startup descriptions using NLP.
    3. Rank trends based on community upvotes and comment sentiment.
  • Tech Sourcing & Recruitment

    Extract job listings and company details from specialized monthly hiring threads.

    1. Monitor for the monthly 'Who is hiring' thread ID.
    2. Scrape all top-level comments which contain job descriptions.
    3. Parse text for specific tech stacks like Rust, AI, or React.
  • Competitive Intelligence

    Track mentions of competitors in comments to understand public perception and complaints.

    1. Set up a keyword-based scraper for specific brand names.
    2. Extract user comments and timestamps for sentiment analysis.
    3. Generate weekly reports on brand health versus competitors.
  • Automated Content Curation

    Create a high-signal tech newsletter that only includes the most relevant stories.

    1. Scrape the front page every 6 hours.
    2. Filter for posts that exceed a threshold of 200 points.
    3. Automate the delivery of these links to a Telegram bot or email list.
  • Venture Capital Lead Gen

    Discover early-stage startups that are gaining significant community traction.

    1. Track 'Show HN' posts that hit the front page.
    2. Monitor the growth rate of upvotes over the first 4 hours.
    3. Alert analysts when a post shows viral growth patterns.
More than just prompts

Supercharge your workflow with AI Automation

Automatio combines the power of AI agents, web automation, and smart integrations to help you accomplish more in less time.

AI Agents
Web Automation
Smart Workflows

Pro Tips for Scraping Hacker News

Expert advice for successfully extracting data from Hacker News.

Use the official Firebase API for massive historical data collection to avoid HTML parsing complexity.

Always set a custom User-Agent to identify your bot responsibly and avoid immediate blocking.

Implement a random sleep interval of 3-7 seconds between requests to mimic human browsing behavior.

Target specific subdirectories like /newest for fresh stories or /ask for community discussions.

Store the 'Item ID' as your primary key to avoid duplicate entries when scraping the front page frequently.

Scrape during off-peak hours (UTC night) to experience faster response times and lower rate-limiting risks.

Testimonials

What Our Users Say

Join thousands of satisfied users who have transformed their workflow

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Jonathan Kogan

Jonathan Kogan

Co-Founder/CEO, rpatools.io

Automatio is one of the most used for RPA Tools both internally and externally. It saves us countless hours of work and we realized this could do the same for other startups and so we choose Automatio for most of our automation needs.

Mohammed Ibrahim

Mohammed Ibrahim

CEO, qannas.pro

I have used many tools over the past 5 years, Automatio is the Jack of All trades.. !! it could be your scraping bot in the morning and then it becomes your VA by the noon and in the evening it does your automations.. its amazing!

Ben Bressington

Ben Bressington

CTO, AiChatSolutions

Automatio is fantastic and simple to use to extract data from any website. This allowed me to replace a developer and do tasks myself as they only take a few minutes to setup and forget about it. Automatio is a game changer!

Sarah Chen

Sarah Chen

Head of Growth, ScaleUp Labs

We've tried dozens of automation tools, but Automatio stands out for its flexibility and ease of use. Our team productivity increased by 40% within the first month of adoption.

David Park

David Park

Founder, DataDriven.io

The AI-powered features in Automatio are incredible. It understands context and adapts to changes in websites automatically. No more broken scrapers!

Emily Rodriguez

Emily Rodriguez

Marketing Director, GrowthMetrics

Automatio transformed our lead generation process. What used to take our team days now happens automatically in minutes. The ROI is incredible.

Related Web Scraping

Frequently Asked Questions About Hacker News

Find answers to common questions about Hacker News