Extracting App Reviews

Mar 31, 2025
4 min read

Updated: Nov 1, 2025

Building a web scraper to quickly extract App Store, Google Play Store, and Trustpilot reviews for analysis.

Context

Product managers need quick, actionable insights from customer reviews to understand pain points, identify feature requests, and track trends. However, accessing this data can be challenging due to platform limitations, API complexity, and time constraints. This project explores how we might solve this problem by building a simple web scraper, which would provide a quick, no-setup solution to extract and analyze reviews from multiple platforms in minutes. These reviews could also be uploaded directly to an AI model (e.g., Claude, ChatGPT, or Gemini) for analysis to identify issues and opportunities.

You can find the complete code and details on how to use it in this project’s GitHub repository.

Workflow

Tools: Cursor, Claude, ChatGPT

Building The Reviews Extractor

Step 1: Define Requirements

We outline the requirements for the review data we want to collect.

Basic Requirements

Configuration: Option to set review extraction settings and scraping criteria.
Platform Selection: Toggleable platform selection (App Store, Google Play, Trustpilot).
Data Scope: Reviews limited to the past 12 months.
Extracted Data Fields: Review date, Rating, Reviewer name (anonymized), Review text, Source platform
Output Options: Output CSV files with the option to add sentiment analysis (positive/negative scoring) and to either extract separate files (reviews by platform) or a single combined file (all reviews).

Step 2: Define Initial Prompt

We construct a prompt based on the requirements identified.

Initial Prompt

Step 3: Optimize The Prompt

We use meta prompting to enhance the initial prompt and generate an optimized prompt using Claude. Then, review and refine the generated prompt as needed. Refer to my meta prompting resource for more details on how to do this.

Optimized Prompt (using Claude Sonnet 4.5)

Step 4: Generate Code

We use the optimized prompt to generate the code using the Cursor Agent with the model set to Auto (uses GPT 4.1).

Step 5: Test The Code

To verify that the code works, we just need to enter the App IDs along with the desired settings (which platforms to extract reviews from, output format, etc.). We test if the code performs as expected and make subsequent requests to refine its behavior.

Extracting Platform-Specific App IDs

We can identify the App IDs from the app URLs on each platform. I extracted reviews for the QuickBooks App. Here is what the App IDs look like for each platform.

App Store URL = https://apps.apple.com/us/app/quickbooks-business-accounting/id584606479 → App ID = 584606479
Google Play Store URL = https://play.google.com/store/apps/details?id=com.intuit.quickbooks&hl=en_US → App ID = com.intuit.quickbooks
Trustpilot URL = https://www.trustpilot.com/review/quickbooks.intuit.com → App ID = Same as URL

Enter the App IDs and Settings in the code.

# =======================
# 🔧 CONFIGURATION
# =======================

# App IDs for each platform
APP_STORE_ID = "584606479"  # Insert your app's App Store ID
GOOGLE_PLAY_ID = "com.intuit.quickbooks"  # Insert your app's Google Play Store ID
TRUSTPILOT_URL = "<https://www.trustpilot.com/review/quickbooks.intuit.com>"  # Insert your app's Trustpilot URL

# Platform selection (set to True/False to enable/disable platforms)
SCRAPE_APP_STORE = True
SCRAPE_GOOGLE_PLAY = True
SCRAPE_TRUSTPILOT = True

# Output options
OUTPUT_REVIEWS_ONLY = False  # Set to True to only output raw reviews CSV
OUTPUT_ANALYSIS_ONLY = True  # Set to True to only output analysis CSV
OUTPUT_BOTH = False  # Set to True to output both raw reviews and analysis
SINGLE_FILE = True  # Set to True to combine all reviews into a single CSV file (yourapp_reviews.csv)

Step 6: Optimize The Code

Once we have working code, we can make subsequent requests to refine its behavior.

Note: The code generated by Cursor worked, but it was running very slowly. I tried to optimize the code directly in Cursor, but kept running into issues, so I uploaded the generated Python file (reviews_scraper.py) to Claude and optimized the code further.

Code Refinement Prompt

# Context
I have a Python file (reviews_scraper.py) that extracts reviews from the App Store, Google Play Store, and Trustpilot. The code currently takes a long time to run. 

# Task
- Analyze the code, identify issues, fix the issues, optimize the code, and describe what has been changed.  
- Generate a README file for the updated Python code

Optimized Python Code (using Claude Sonnet 4.5)

Example Output - QuickBooks Reviews Data

Step 7: Run The Reviews Extractor

We have now successfully created a reviews extractor that can be reused across different apps. You can find the complete code and details on how to use it in this project’s GitHub repository.

Use Cases

When to use this scraper:

Discovery: Quickly understanding what users are saying about your app (e.g., when planning the next product iteration, reviewing product ideas, or assessing gaps).
Competitor Analysis: Quickly understanding what users are saying about competing apps.
Incident Response: Reviewing how users are reacting to a recent issue or bug.
Sentiment Analysis: Generate sentiment analysis for stakeholder meetings.

When to build a proper dashboard instead:

You need real-time monitoring with automated alerts.
Your team requires role-based access controls and audit trails.
You want to track trends over months/years with historical data warehousing.
You need to integrate review data with other product metrics (MAU, retention, etc.).

Benefits

Free: No API usage fees or recurring subscription costs.
Rapid Implementation: Full configuration and execution are achievable within five minutes.
Cross-Platform Reviews: Unified scraping functionality across App Store, Google Play, and Trustpilot platforms.
Automated Sentiment Analysis: Integrated analytical capabilities for immediate identification of sentiment trends and patterns.
Zero-Configuration Setup: Operates without API credentials or authentication protocols, enabling immediate deployment.
Configurable Data Output: Support for both platform-specific and consolidated reviews to accommodate diverse workflow requirements.

Limitations

No Real-Time Updates: Script has to be re-run to get fresh data.
Data Completeness: Web scraping may result in partial data retrieval.
Maintenance: Platform changes may break the scraper, requiring updates to the code.
Data Accuracy: Scraped content may occasionally have formatting issues or missing fields.
Rate Limits: Excessive requests may trigger rate-limiting protocols, temporarily blocking your IP.
Temporal Data Access: Retrieval is limited to currently accessible reviews, with no guaranteed access to historical or archived content.

Bottom Line

This scraper can be helpful for quick research and ad-hoc analysis, but shouldn't replace a production-grade reviews monitoring system

Priank Ravichandar