1xINTERNET blog: Preparing your website for the future of AI search with llms.txt

Common pitfalls when optimising for LLMs

There are a few ways to prevent this:
X-Robots-Tag: noindex

Block access to .md files in robots.txt

You can tell traditional search engines not to access Markdown files using your robots.txt file:
User-agent: *
Disallow: /*.md$
User-agent: Googlebot
Disallow: /*.md$
If you’re creating Markdown versions of your content for large language models (LLMs), make sure those files don’t show up in traditional search results. You don’t want someone searching for your company name on Google to land on a raw .md version of your homepage.
Disallowing .md files in your robots.txt will stop traditional search engines from crawling that content. LLM crawlers do not follow or respect robots.txt, so .md files will still be accessible to them (this is our actual goal). Remember, this method is not fully reliable to prevent page indexing, it only restricts crawling of those pages.
Alternatively, you can block traditional search engines entirely from accessing Markdown files. This goes a step further than noindex by preventing them from even loading the page. Again, this can be done at the server or CDN level by checking the user-agent and denying access.

Use noindex headers or block crawlers entirely

or, more specifically:
User-agent: Bingbot
Disallow: /*.md$
This can be applied selectively based on the user-agent, so only traditional search engines receive the noindex directive while LLM crawlers remain unaffected. To do this, you’ll need access to your web server or a CDN that supports conditional headers.
You can add HTTP response headers to tell traditional search engines not to index specific pages:

Internet News

Innovating Hosting with Cloudways: Flexibility and Performance Redefined

Furthermore the launch of Cloudways Autonomous and Cloudways Flexible has expanded the choices to users for managing and expanding their hosting settings. Whether you require scaling for applications, with heavy traffic or favor…
Internet News

Agentic AI security breaches are coming: 7 ways to make sure it’s not your firm

Geisler continued, articulating Walmart’s direction. “Our strategy is to build robust, proactive security controls using advanced AI Security Posture Management (AI-SPM), ensuring continuous risk monitoring, data protection, regulatory compliance and operational trust.” By…
Internet News

Dries Buytaert: Funding Open Source like public infrastructure

Today, some Open Source has become public infrastructure. Leaving critical infrastructure dependent on too few maintainers is a risk no society should accept. To protect the digital foundation of essential government services, governments…
Internet News

The Drop Times: Unmanaged Files in Drupal: Building a Random File Handler (Part 2)

Unmanaged Files in Drupal: Building a Random File Handler (Part 2) Jeff Greenberg Related Organizations You May Like You may also like Related People Note: The vision of this web portal is to…
Internet News

XB week 10: no field widget left behind

… and some notable new issues: XB’s primary menu now closes after dropping a dragged component onto the canvas. Issue #3458617, image by Harumi. We saw a huge leap forward this week, thanks to…
Internet News

Navigating Leading Cloud Data Platforms and Table Formats

Snowflake and Databricks are competitive and similar in many ways. Snowflake however primarily was built around using a proprietary storage concept in infrastructure managed by Snowflake. Snowflake offers robust developer tools and flexible…