Data Pre-crawled Datasets CNN news dataset

CNN news dataset

edition.cnn.com · JSON

This dataset contains over 27,000 news articles sourced from CNN.com, including full content, metadata, and media fields. Each article is enriched with publish dates, author information, descriptions, and full raw + cleaned content—perfect for media research, sentiment analysis, topic modeling, and natural language processing (NLP) projects.

Last crawled in July 2021, this collection offers a historical snapshot of CNN’s reporting and editorial content.

Use Cases:
  • News content analysis

  • Fake news detection & bias tracking

  • Topic classification and clustering

  • Training AI/NLP models

  • Historical news trend research

  • Media monitoring tools

 

Update Frequency:

Archived — no current updates, great for snapshot-based analysis

 

 

Fields
title, url, published_at. last_modified_at, author, short_description, header_image, raw_content, content, crawled_at, _id, source
Pricing
$135.00

Availability: immediately

Records: 27,000