Crawling through Umbraco with Robots
The robots.txt file’s main purpose is to tell robots (Google Bot, Bing Bot, etc.) what to index for their search engine, and also what not to. Usually you want most of your website crawled by Google, such as blog posts, product pages, etc., but most websites will have some pages/sections that shouldn’t be indexed or listed in search engines.
Ignoring Umbraco ping.aspx from Azure Application Insights
Application Performance Monitors provide you with a lot of data, but some of that data may not be relevant. Specifically, in Umbraco there is a page at \umbraco\ping.aspx that is being called frequently to keep the site alive. This is very useful to prevent the site from "dying" (?), but the data for this request isn't that relevant and could skew your statistics. Using Azure Application Insights ITelemetryProcessor, we can prevent ping request from being sent to Azure Application Insights.
Crawling through Umbraco with Sitemaps
Websites come in all shapes and sizes. Some are fast, some are beautiful, and some are a complete mess. Whether it's a high-quality site is irrelevant if people can’t find it, but search engines are here to help. Though the competition to get on first page is tough, this series will dive into some common practices to make your website crawlable.