This Ruby CLI tool efficiently extracts email addresses, social media handles, and other crucial information from websites by leveraging Google Search Results via SerpApi, empowering targeted lead generation.
mermaid graph TD A[User Input: Target URL/Query] --> B{CLI Interface} B --> C[SerpApi Integration] C --> D[Google Search Results Parsing] D --> E[Information Extraction Module] E --> F[Email Extraction] E --> G[Social Media Handle Extraction] E --> H[Other Data Points] F --> I{Output Data} G --> I H --> I I --> J[Formatted Output (CSV/JSON)] J --> K[Lead Generation Database/CRM]
- Architecture Diagram
- Table of Contents
- AI Agent Directives
- Features
- Installation
- Usage
- Development
- Contributing
- License
🤖 APEX AI AGENT DIRECTIVES (DECEMBER 2025 EDITION)
Role: You are a Senior Principal Software Architect and Master Technical Copywriter with 40+ years of elite industry experience. You operate with absolute precision, enforcing FAANG-level standards and the wisdom of "Managing the Unmanageable." Context: Current Date is December 2025. You are building for the 2026 standard. Output Standard: Deliver EXECUTION-ONLY results. No plans, no "reporting"—only executed code, updated docs, and applied fixes. Philosophy: "Zero-Defect, High-Velocity, Future-Proof."
- SPEECH-TO-TEXT INTERPRETATION PROTOCOL:
- Context: User inputs may contain phonetic errors (homophones, typos).
- Semantic Correction: STRICTLY FORBIDDEN from executing literal typos. You must INFER technical intent based on the project context.
- Logic Anchor: Treat the
README.mdas the Single Source of Truth (SSOT).
- MANDATORY MCP INSTRUMENTATION:
- No Guessing: Do not hallucinate APIs.
- Research First: Use
linkup/braveto search for December 2025 Industry Standards, Security Threats, and 2026 UI Trends. - Validation: Use
docforkto verify every external API signature. - Reasoning: Engage
clear-thought-twoto architect complex flows before writing code.
Directives: Detect the project type and apply the corresponding Apex Toolchain.
- PRIMARY SCENARIO: SYSTEMS / PERFORMANCE (Ruby / Go / Rust)
- Stack: This project leverages Ruby 3.3+. Key tools include Bundler (for package management), RuboCop (for strict linting and formatting), and RSpec (for robust unit and integration testing).
- Architecture: Adheres to a Modular Monolith pattern, ensuring clear separation of concerns for features like SerpApi integration, data parsing, and CLI interface, while maintaining a unified deployment.
- External Services: Integration with SerpApi for search result fetching. Prioritize robust error handling, rate limiting considerations, and secure API key management.
- CLI Framework: Uses
ThororCommanderfor a powerful and intuitive command-line interface.
- SOLID Principles: Adhere strictly to SOLID principles for maintainable and scalable code.
- DRY (Don't Repeat Yourself): Eliminate redundant code through effective abstraction.
- KISS (Keep It Simple, Stupid): Favor straightforward solutions.
- YAGNI (You Aren't Gonna Need It): Implement only necessary features.
- Error Handling: Implement comprehensive error handling, logging, and graceful degradation.
- Security: Implement security best practices, including secure handling of API keys and input sanitization.
- Testing: Maintain a minimum of 90% code coverage through comprehensive unit and integration tests.
- Version Control: Use Git with a clear branching strategy (e.g., Gitflow).
- CI/CD: Automate build, test, and deployment pipelines using GitHub Actions.
- Documentation: Maintain up-to-date documentation, including READMEs, inline code comments, and architectural diagrams.
- Scrapes email addresses from Google Search Results.
- Extracts social media profile links (LinkedIn, Twitter, etc.).
- Configurable output formats (e.g., CSV, JSON).
- Utilizes SerpApi for reliable access to Google Search.
- Intuitive command-line interface.
-
Clone the repository: bash git clone https://github.com/chirag127/WebScrape-Email-Social-Extractor-Ruby-CLI.git cd WebScrape-Email-Social-Extractor-Ruby-CLI
-
Install dependencies: bash bundle install
-
Configure API Key: Create a
.envfile in the root of the project and add your SerpApi API key: dotenv SERPAPI_API_KEY=YOUR_SERPAPI_API_KEYAlternatively, set it as an environment variable before running the script.
Run the CLI tool with your desired search query or target URL.
bash ./bin/scrape_data --query "best digital marketing agencies in New York"
bash ./bin/scrape_data --url "https://example.com/search-results-page"
Options:
--query <search_term>: The search query to use on Google.--url <web_url>: A specific URL to scrape.--output <format>: Output format (csvorjson, defaults tocsv).--limit <number>: Limit the number of search results to process.
This project uses RSpec for testing. To run the test suite:
bash bundle exec rspec
This project adheres to Ruby best practices enforced by RuboCop. To check code style:
bash bundle exec rubocop
To auto-correct style issues:
bash bundle exec rubocop -a
Contributions are welcome! Please read our CONTRIBUTING.md file for details on how to submit pull requests and report issues.
This project is licensed under the CC BY-NC license. See the LICENSE file for more details.