Out of Sight, Out of Mind: Detecting Orphaned Web Pages at Internet-Scale

PDF Paper Library link to paper

Authors

Stijn Pletinckx, Kevin Borgolte, Tobias Fiebig

Publication

Proceedings of the 28th ACM SIGSAC Conference on Computer and Communications Security (CCS), November 2021

Abstract

Security misconfigurations and neglected updates commonly lead to systems being vulnerable. Especially in the context of websites, we often find pages that were forgotten, that is, they were left online after they served their purpose and never updated thereafter.

In this paper, we introduce new methodology to detect such forgotten or orphaned web pages. We combine historic data from the Internet Archive with active measurements to identify pages no longer reachable via a path from the index page, yet stay accessible through their specific URL. We show the efficacy of our approach and the real-world relevance of orphaned web-pages by applying it to a sample of 100,000 domains from the Tranco Top 1M.

Leveraging our methodology, we find 1,953 pages on 907 unique domains that are orphaned, some of which are 20 years old. Analyzing their security posture, we find that these pages are significantly (𝑝 < 0.01 using 𝜒2) more likely to be vulnerable to cross-site scripting (XSS) and SQL injection (SQLi) vulnerabilities than maintained pages. In fact, orphaned pages are almost ten times as likely to suffer from XSS (19.3%) than maintained pages from a random Internet crawl (2.0%), and maintained pages of websites with some orphans are almost three times as vulnerable (5.9%). Concerning SQLi, maintained pages on websites with some orphans are almost as vulnerable (9.5%) as orphans (10.8%), and both are significantly more likely to be vulnerable than other maintained pages (2.7%). Overall, we see a clear hierarchy: Orphaned pages are the most vulnerable, followed by maintained pages on websites with orphans, with fully maintained sites being least vulnerable.

We share an open source implementation of our methodology to enable the reproduction and application of our results in practice.

BibTeX

@inproceedings{ccs2021-out-of-sight-out-of-mind,
  title     = {{Out of Sight, Out of Mind: Detecting Orphaned Web Pages at Internet-Scale}},
  author    = {Pletinckx, Stijn and Borgolte, Kevin and Fiebig, Tobias},
  booktitle = {Proceedings of the 28th ACM SIGSAC Conference on Computer and Communications Security (CCS)},
  date      = {2021-11},
  doi       = {10.1145/3460120.3485367},
  edition   = {28},
  editor    = {Shi, Elaine and Vigna, Giovanni},
  isbn      = {978-1-4503-8454-4},
  location  = {Seoul, South Korea},
  publisher = {Association for Computing Machinery (ACM)},
  url       = {http://dx.doi.org/10.1145/3460120.3485367}
}