Hacker News

Tell HN: YC companies scrape GitHub activity, send spam emails to users

Comments

12 min read Via news.ycombinator.com

Mewayz Team

Editorial Team

Hacker News

When Your GitHub Activity Becomes Someone Else's Sales Funnel

Imagine pushing a commit at 11 PM, fixing a gnarly authentication bug in your side project. Two days later, an email lands in your inbox: "Hey, I noticed you've been working on user auth for your SaaS — our tool can help." You never signed up for their mailing list. You never visited their website. You never gave them your email address. Yet somehow, they know exactly what you've been building. That unsettling feeling? It's not paranoia. It's a systematic, industrialized scraping operation that turns your open-source contributions into raw material for someone else's growth metrics.

A recent thread on Hacker News surfaced what many developers had long suspected: a subset of Y Combinator-backed companies — and plenty of non-YC startups following the same playbook — have been programmatically harvesting GitHub activity data to identify and cold-email developers. The backlash was swift and fierce. For the developer community, this crosses a line that no clever growth hack can uncross.

How the Scraping Machine Actually Works

GitHub's public API is, by design, open. It powers legitimate integrations, developer tools, and ecosystem analytics. But the same infrastructure that lets you build a CI/CD dashboard can be repurposed to build a lead generation pipeline. Scrapers ingest commit histories, repository topics, star counts, contributor lists, and — critically — the email addresses developers sometimes expose in their Git configuration or profile metadata.

From there, enrichment tools cross-reference GitHub handles against LinkedIn profiles, company domains, and data broker databases. Within minutes, a raw GitHub username transforms into a full contact record: company, title, inferred tech stack, approximate team size. Some operations reportedly process tens of thousands of profiles per day, feeding the results directly into automated email sequences disguised as personalized outreach.

The sophistication of the operation is what makes it particularly invasive. These aren't mass blasts to purchased lists. They're highly targeted, contextually aware emails crafted to feel like the sender actually knows you — because algorithmically, in a hollow data-driven sense, they do. The technical familiarity creates a false sense of legitimate relationship where none exists.

Why Developers Are Uniquely Vulnerable to This Tactic

Most professionals can spot a cold email for what it is. But developers face a specific psychological trap: the email references real, current work. When someone mentions the exact repository you've been contributing to, the specific framework you adopted last month, or the error pattern showing up in your recent commits, it triggers a "how do they know this?" response that can momentarily bypass the spam filter in your brain.

This is compounded by the culture of open-source development. Contributing publicly to GitHub is both a professional practice and a community value. Developers share code openly because transparency and collaboration are foundational to the ecosystem — not as an invitation to be prospected. Exploiting that openness for commercial gain without consent is a fundamental betrayal of the culture that makes the platform valuable in the first place.

"The problem isn't that startups want to find their customers. The problem is that they've confused 'publicly visible' with 'freely available for any commercial purpose.' Public data and consensual data are not the same thing."

There's also a power asymmetry at play. Individual developers have no visibility into who's scraping their activity or how their data is being processed. A startup can build a 50,000-person developer list in a weekend; the developers on that list have no idea it exists until the emails start arriving.

The Real Cost to Startups That Play This Game

From a purely mercenary perspective, the strategy is self-defeating. Developer communities talk. Hacker News threads go viral. Twitter callouts get reshared. When your growth tactic becomes a cautionary tale on the front page of the most influential developer forum on the internet, the reputational damage doesn't just affect one campaign — it taints your brand for years in exactly the audience you were trying to reach.

The numbers tell a damning story. Industry research consistently shows cold email response rates hovering between 1% and 5% for legitimate outreach. Unsolicited emails built on scraped data perform even worse, often triggering spam complaints that damage sender domain reputation and reduce deliverability for all subsequent campaigns. You're not just burning bridges with the people you emailed — you're making it harder to reach anyone via email at all.

Consider the contrast: companies that invest in genuine content marketing, developer relations, and community engagement regularly report conversion rates 3–5x higher than equivalent cold outreach spend. The developer community, in particular, responds powerfully to authenticity. Sponsoring an open-source project, writing genuinely useful technical content, or participating honestly in communities like Hacker News and Discord servers builds the kind of trust that no scraped email list can manufacture.

What Ethical Outreach Actually Looks Like

The distinction between invasive prospecting and legitimate outreach isn't always a bright line, but there are clear principles that separate the two. Ethical customer acquisition respects the following boundaries:

💡 DID YOU KNOW?

Mewayz replaces 8+ business tools in one platform

CRM · Invoicing · HR · Projects · Booking · eCommerce · POS · Analytics. Free forever plan available.

Start Free →
  • Consent-based contact: The prospect has given you a way to reach them — through a newsletter signup, a product trial, an event registration, or a direct inquiry.
  • Contextual relevance: Your outreach addresses a problem the prospect has explicitly expressed, not one you've inferred by surveilling their activity.
  • Transparent identity: You're clear about who you are and how you found them. "I found your email by scraping your GitHub commits" is not a foundation for a relationship.
  • Easy opt-out: Every communication includes a genuine, functional way to stop receiving messages — not buried in 4-point font, not disguised as a link to a different page.
  • Data minimization: You collect only what you need for the legitimate purpose at hand, not everything you can technically access.

These aren't just ethical guidelines — they increasingly reflect legal requirements. GDPR in Europe, CASL in Canada, and various US state privacy laws impose real obligations around consent and legitimate interest that scraped-data email campaigns routinely violate. The legal exposure alone should give growth hackers pause, but the reputational risk is arguably more immediate and severe.

How Modern Business Platforms Are Rethinking Customer Relationships

The underlying problem that drives scrape-and-spam behavior is a broken mental model of what a customer relationship is. When acquisition is treated as a numbers game — more contacts, more emails, more "touches" — the individual human on the other end of the email disappears. They become a row in a spreadsheet, a conversion probability, an experiment variable.

Platforms built on a different philosophy start from the opposite premise: that the quality of a customer relationship is the asset, not the size of a contact list. This means investing in tools that help businesses understand the customers they already have, engage them meaningfully, and build the kind of product and community that generates genuine inbound interest.

Mewayz, for instance, approaches CRM not as a prospecting machine but as an integrated system for managing real relationships across every stage of the customer journey. With modules spanning CRM, invoicing, HR, analytics, and beyond — all serving over 138,000 users globally — the platform is designed around the reality that businesses succeed by deepening engagement with their existing customer base, not by blasting cold emails to scraped lists. When your CRM, communication tools, and analytics live in the same modular ecosystem, you're working with signal-rich data from people who chose to engage with you — infinitely more valuable than any scraped dataset.

Protecting Yourself as a Developer

While the responsibility for ethical behavior lies with the companies doing the scraping, developers can take practical steps to reduce their exposure:

  1. Audit your GitHub profile: Remove your personal email address from your public profile and use a role address (like [email protected]) if you want to be reachable.
  2. Configure your Git client carefully: Make sure your global user.email isn't your primary personal address if you're committing to public repositories.
  3. Use GitHub's email privacy settings: GitHub offers a "Keep my email addresses private" option that substitutes a noreply address in web-based operations.
  4. Report and block aggressively: When you receive emails that are clearly built on scraped activity data, mark them as spam and report them. Enough reports affect sender reputation at the infrastructure level.
  5. Name and shame thoughtfully: The Hacker News thread that sparked this conversation is a perfect example of community accountability in action. Public documentation of abusive practices creates real consequences.

None of these steps are perfect. A determined scraper with access to commit metadata and cross-referencing tools can often find contact information even when it's not directly exposed. But friction matters — making it harder to harvest your data reduces the ROI of the scraping operation and pushes operators toward less invasive approaches.

The Long Game: Trust as Competitive Advantage

There's a broader business lesson embedded in this controversy that transcends developer-targeted spam. We're living through a period of profound recalibration in how companies build customer relationships. The decade-long playbook of growth-at-all-costs, fueled by cheap data and cheap attention, is running into hard limits: regulatory pressure, platform restrictions, rising customer sophistication, and — perhaps most importantly — community-level resistance from exactly the audiences startups most want to reach.

The companies that will win the next decade aren't the ones with the most aggressive prospecting operations. They're the ones that understand that trust compounds. A developer who discovers your product organically, finds it genuinely useful, and recommends it to their team is worth a hundred scraped email addresses. A reputation for respecting developer privacy is a durable competitive asset in a market where that respect is increasingly rare.

The Hacker News thread about GitHub scraping will fade. The emails will keep coming for a while — habits die hard and the tooling is too accessible for the practice to disappear overnight. But the underlying dynamic is shifting. Communities are paying attention. Regulators are catching up. And the developers being spammed are building the next generation of tools, platforms, and products. Alienating them for a few percentage points of open rate is not a trade worth making.

The future belongs to businesses that earn attention rather than harvest it — that build products so genuinely useful, so deeply integrated into how people work, that customers come looking for them. That's not a naive aspiration. It's the only sustainable strategy left.

Frequently Asked Questions

How do these companies get my email address from GitHub activity?

Most GitHub profiles include a public email address, and even when they don't, scrapers cross-reference your username against other public data sources — npm packages, commit metadata, forum posts, and leaked data breaches. Automated pipelines then enrich these records with professional emails sourced from services like Hunter.io or Apollo, all without any direct interaction from you.

It exists in a legal grey area. While scraping publicly available data is generally not prohibited outright, sending unsolicited commercial email without consent may violate CAN-SPAM, GDPR, or CASL depending on jurisdiction. GitHub's Terms of Service explicitly prohibit scraping for spamming purposes, but enforcement against offending companies remains inconsistent and largely complaint-driven.

How can I reduce my exposure to developer-targeted sales spam?

Hide your email on GitHub by setting it to private in profile settings and using a masked address for commits via Git config. Consider using a dedicated developer alias for open-source work. If you're building tools for a team, platforms like Mewayz — a 207-module business OS at $19/mo (app.mewayz.com) — let you centralize operations without scattering personal contact details across public repositories.

Why do YC-backed companies rely on GitHub scraping instead of legitimate marketing?

Investor pressure to show rapid user growth creates incentives to prioritize volume over consent. GitHub scraping delivers highly targeted leads — developers actively solving specific problems — at near-zero marginal cost. It's a shortcut that trades long-term brand trust for short-term pipeline metrics. Companies serious about sustainable growth build products worth discovering organically, rather than hijacking developers' workflows as a prospecting database.

Try Mewayz Free

All-in-one platform for CRM, invoicing, projects, HR & more. No credit card required.

Start managing your business smarter today

Join 30,000+ businesses. Free forever plan · No credit card required.

Ready to put this into practice?

Join 30,000+ businesses using Mewayz. Free forever plan — no credit card required.

Start Free Trial →

Ready to take action?

Start your free Mewayz trial today

All-in-one business platform. No credit card required.

Start Free →

14-day free trial · No credit card · Cancel anytime