Post

The Smart TV in Your LivingRoom Is a Node in the AIScraping Economy

The Smart TV in Your LivingRoom Is a Node in the AIScraping Economy

The Smart TV in Your LivingRoom Is a Node in the AIScraping Economy

What you might not be aware of are the distributed efforts to train AI that could be using the devices inside your home. Bright Data is a data-collection company that sells access to what it markets as the world’s largest residential proxy network of 400M+ home IP addresses that its customers route web-scraping traffic through. The supply behind that network comes from an SDK: a piece of software embedded in consumer apps that, with the user’s consent, turns their phone or smart TV into one of those exit nodes. Today, Bright Data is the largest residential proxy network in the world by its own marketing, advertising 150M+ IPs sourced via a consent SDK embedded in partner apps.

AI companies depend on web-scraped content for pre-training, retrieval, agent grounding, and search. But the modern web isn’t scrapeable from a datacenter. Cloudflare, DataDome, HUMAN, among others, throttle or block requests from known cloud IPs. The workaround is residential proxies. A scraping job routed through a Comcast or T-Mobile subscriber’s connection arrives at the target site from an IP that belongs to a paying residential customer. Academic measurement going back to 2019 shows these networks are overwhelmingly misused. The FBI issued a formal advisory earlier this year.

Connected TV, a.k.a Smart TV, is a near-perfect residential proxy. Compared to a mobile phone, a Smart TV offers several advantages: it’s always plugged in for power, always connected via high-speed WiFi, offers 24/7 uptime in standby, has effectively unlimited bandwidth, is often unattended, presents consent UI text navigated via TV remote arrow keys, and has virtually no corporate/family oversight. Privacy-policy disclosure is the wrong control surface for a TV. It is hard to scroll through a legal document navigated by arrow keys on a remote, and the in-app consent dialog doesn’t convey that a paying Bright Data customer is about to route their scraping traffic through the user’s home internet.

Petflix, a Roku app documented by The Verge, is a representative case. Its opt-in screen reads: “To enjoy Petflix for free with fewer ads, you are allowing Bright Data to occasionally use your device’s free resources and IP address to download public web data from the internet. Bright Data will only use your IP address for approved business-related use cases. None of your personal information is accessed or collected except your IP address. Period.” The Petflix dialog says “occasionally.” The SDK’s publicly queryable config sets max_bw_monthly_wifi: 200,000,000,000 bytes – a 200 GB default monthly WiFi budget. Bright Data exposes a partner manifest endpoint. The endpoint is unauthenticated and anyone can fetch it. Names in the manifest that the author was able to identify with high confidence from public sources include: PlayWorks Digital Ltd, which has 400+ CTV game titles and reaches ~250M TV homes via Comcast, Sky, Cox, LG, Samsung, Vizio, Roku; CloudTV, integrated across 125+ TV brands and 15+ OEMs; Longvision Media HK (LongTV), with 5M OTT users across HK and Malaysia; and Viber Media S.à r.l. (Rakuten), with 250M-820M monthly users of the Viber messenger. At least three CTV-focused entities–PlayWorks, CloudTV, and Longvision–monetized their users’ devices as residential proxy exit nodes.

To read the complete article see: Read full article

This post is licensed under CC BY 4.0 by the author.