June 11, 2024
|
No Robots(.txt): How to Ask ChatGPT and Google Bard to Not Use Your Website for Training.
You can prevent ChatGPT and Bard from using the content on your website to train their models. Using your website's "robots.txt." file, you can instruct bots and web crawlers NOT to scrape content.
While it may appear that AI dominates every industry and sector, public sentiment around AI remains increasingly skeptical. We’ve been witnesses to its lies, e.g., hallucinations, generic knowledge e.g., regurgitated output and biased responses e.g., whitewashed content.
Every new AI release unveils yet-another countermeasure. And this arsenal keeps expanding with approaches that block the AI’s access to data. Today, an oldie-but-goodie approach is highlighted with a recap of recent and human-based interventions. Stacking these old, new and forever school approaches help to curb runaway AI.
The robots.txt file remains a stable since the beginning of the internet. It has been a constant line of defense to limit access to websites’ webpages and/or directory of webpages. The simple use of providing instructions for web crawlers — and now generative AI algorithms — should be a universal stopgap. It’s not. First, adding disallow instructions to this file contradicts websites’ purpose, which is to be discovered. So the incentive to update the robots.txt file remains low. And second, following this file’s instructions aren’t required or enforceable.
The resilience of the robots.txt file comes from the web crawling politeness policies. The most relevant one is any form of web scraping tools respecting the allow and disallow instructions. So what does that mean for you? If you host a website, revise your robots.txt file to disallow bots from continuing to access, read and ingest your webpage content. Companies adhere to politeness policies since that bad actor label minimizes their reputation and profits. Follow the step-by-step guide given in the article below.
There’s a swell of checks-and-balances methods performed as part of AI systems. Recently, watermarking in AI has gained renewed attention. It amounts to adding a hard-to-remove digital tracker to an algorithm – it’s a small piece of code that keeps a running log of how the algorithm was manipulated. Digital watermarking can help distinguish between what’s AI generated, AI assisted or AI enabled. It can help indicate what’s perceived as digitally true. Automated content moderation algorithms attempts to identify blatant inappropriate content. It also can help combat AI-enabled fraud, misinformation and disinformation, when executed responsibly. Otherwise, content moderation dissolves into algorithmic misogynoir.
The human eye — with our critical thinking skills — remains one of the best stopgap measures to check AI’s output. Here’s a suggested shortcut to help you more quickly vet responses.
Read the Entire Article Here! |
"People believe that they won’t be able to learn to code, that it’ll take a long time to learn the skill well or they have to be a math prodigy to understand and apply coding concepts. In reality, you don’t have to be “super smart,” but you must be persistent." pg 87
Get Your Copy of Data Conscience Here! |
Stay Rebel Techie,
Dr. Brandeis
Thanks for subscribing! If you like what you read or use it as a resource, please share the newsletter signup with three friends!
Learn how to make more responsible data connections. I help educators, researchers and practitioners align data polices, practices and products for equity. Sign up for my Rebel Tech Newsletter!
June 25, 2024 The Rebel Tech Newsletter is our safe place to critique data and tech algorithms, processes, and systems. We highlight a recent data article in the news and share resources to help you dig deeper in understand how our digital world operates. DataedX Group helps data educators, scholars and practitioners learn how to make responsible data connections. We help you source remedies and interventions based on the needs of your team or organization. IN DATA NEWS The impact of...
April 30, 2024 The Rebel Tech Newsletter is our safe place to critique data and tech algorithms, processes, and systems. We highlight a recent data article in the news and share resources to help you dig deeper in understand how our digital world operates. DataedX Group helps data educators, scholars and practitioners learn how to make responsible data connections. We help you source remedies and interventions based on the needs of your team or organization. IN DATA NEWS Introducing Devin,...
February 20th, 2024 The Rebel Tech Newsletter is our safe place to critique data and tech algorithms, processes, and systems. We highlight a recent data article in the news and share resources to help you dig deeper in understand how our digital world operates. DataedX Group helps data educators, scholars and practitioners learn how to make responsible data connections. We help you source remedies and interventions based on the needs of your team or organization. IN DATA NEWS “Don’t let...