Ingredients Dataset β 18K+ Product Records with Ingredients Data from Beauty, Pets, Groceries & Health (CSV for AI & NLP)
target.com Β· CSV
The Ingredients Dataset (18K+ records) provides a high-quality, structured collection of product information with detailed ingredients data. Covering a wide variety of categories including beauty, pet care, groceries, and health products, this dataset is designed to power AI, NLP, and machine learning applications that require domain-specific knowledge of consumer products.
Why This Dataset MattersIn todayβs data-driven economy, access to structured and clean datasets is critical for building intelligent systems. For industries like healthcare, beauty, food-tech, and retail, the ability to analyze product ingredients enables deeper insights, including:
-
Identifying allergens or harmful substances
-
Comparing ingredient similarities across brands
-
Training LLMs and NLP models for better understanding of consumer products
-
Supporting regulatory compliance and labeling standards
-
Enhancing recommendation engines for personalized shopping
This dataset bridges the gap between raw, unstructured product data and actionable information by providing well-organized CSV files with fields that are easy to integrate into your workflows.
Dataset CoverageThe 18,000+ product records span several consumer categories:
-
π Beauty & Personal Care β cosmetics, skincare, haircare products with full ingredient transparency
-
πΎ Pet Supplies β pet food and wellness products with detailed formulations
-
π₯« Groceries & Packaged Foods β snacks, beverages, pantry staples with structured ingredients lists
-
π Health & Wellness β supplements, vitamins, and healthcare products with nutritional components
By including multiple categories, this dataset allows cross-domain analysis and model training that reflects real-world product diversity.
Key Features-
π 18,000+ records with structured ingredient fields
-
π§Ύ Covers beauty, pet care, groceries, and health products
-
π Delivered in CSV format, ready to use for analytics or machine learning
-
π· Includes categories and breadcrumbs for taxonomy and classification
-
π Useful for AI, NLP, LLM fine-tuning, allergen detection, and product recommendation systems
-
AI & NLP Training β fine-tune LLMs on structured ingredients data for food, beauty, and healthcare applications.
-
Retail Analytics β analyze consumer product composition across categories to inform pricing, positioning, and product launches.
-
Food & Health Research β detect allergens, evaluate ingredient safety, and study nutritional compositions.
-
Recommendation Engines β build smarter product recommendation systems for e-commerce platforms.
-
Regulatory & Compliance Tools β ensure products meet industry and government standards through ingredient validation.
Unlike generic product feeds, this dataset emphasizes ingredient transparency across multiple categories. With 18K+ records, it strikes a balance between being comprehensive and affordable, making it suitable for startups, researchers, and enterprise teams looking to experiment with product intelligence.
Note: Each record includes a url (main page) and a buy_url (purchase page). Records are based on the buy_url to ensure unique, product-level data.
Fields
link, title, category, breadcrumbs, brand, department, avg_rating, reviews_count, reviews, tcin, buy_url, meta_data, current_price, regular_price, savings_in_percent, formatted_current_price_type, currency, upc, primary_image, alternate_images, description, specification, highlights, ingredients, nutrients, serving_size, serving_description, warning, at_a_glance, uniq_id, scraped_at