How to find broken links using Selenium C#?

  • Script for testing broken links using Selenium C#.

  • I personally wouldn't use Selenium to do this unless I had no other choice. It would be much more efficient to use a dedicated link crawler.

    That said, if I had to use Selenium, I'd do something on these lines (no guarantees on code correctness - I'm doing this from memory early in the morning):

    // First get all web elements with an href attribute
    List<IWebElement> links = driver.FindElements(By.Xpath(//*[@href]));
    // Loop through them all and click each one
    for(int i = 0; i < links.Count, i++)
        IWebElement link = links[i];;
        // check that the result of clicking the link is not a 404 or other error page
        // The exact check will depend on the site you're using
        // Go to the previous page
        // all the links will be stale so refresh the list
        links = driver.FindElements(By.Xpath(//*[@href]));

    The reason for this logic goes this way:

    Navigating to each link then going back will disconnect the elements in the list from the currently in-memory DOM, so it's not possible to use the simpler foreach looping. Since it's still possible to iterate through the list using the index, and the current index stays in memory, the for loop works best.

    Because the current value of the iterator i isn't changed when the list is refreshed, the next iteration will move to the next link in the list.

    I'd recommend a separate method to check whether a page is valid, because some sites use custom error pages, where others let the browser defaults handle things. What you use for your site will be up to you.

    It's important to note that this method will not work in all circumstances. If you are using it on a page that required a form submit, you will get browser notifications prompting you to resend the original form. This may or may not cause other failures, depending on your site. If there is a tight no-caching policy for the site, it may not be possible to go back because the page will have expired. This is particularly common with financial sites. Finally, if the site relies heavily on scripting, it's possible that the page itself is a scripted artifact which has to be generated through another page. In that case, the browser Back() function will not return you to the original page.

Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2