Identifying Disinformation Websites Using Infrastructure Features

PDF Paper Library link to paper Link to recorded presentation

Authors

Austin Hounsel, Jordan Holland, Ben Kaiser, Kevin Borgolte, Nick Feamster, Jonathan Mayer

Publication

Proceedings of the 10th USENIX Workshop on Free and Open Communications on the Internet, August 2020

Abstract

Platforms have struggled to keep pace with the spread of disinformation. Current responses like user reports, manual analysis, and third-party fact checking are slow and difficult to scale, and as a result, disinformation can spread unchecked for some time after being created. Automation is essential for enabling platforms to respond rapidly to disinformation.

In this work, we explore a new direction for automated detection of disinformation websites: infrastructure features. Our hypothesis is that while disinformation websites may be perceptually similar to authentic news websites, there may also be significant non-perceptual differences in the domain registrations, TLS/SSL certificates, and web hosting configurations. Infrastructure features are particularly valuable for detecting disinformation websites because they are available before content goes live and reaches readers, enabling early detection.

We demonstrate the feasibility of our approach on a large corpus of labeled website snapshots. We also present results from a preliminary real-time deployment, successfully discovering disinformation websites while highlighting unexplored challenges for automated disinformation detection.

BibTeX

@inproceedings{foci2020-identifying-disinformation,
  title       = {{Identifying Disinformation Websites Using Infrastructure Features}},
  author      = {Hounsel, Austin and Holland, Jordan and Kaiser, Ben and Borgolte, Kevin and Feamster, Nick and Mayer, Jonathan},
  booktitle   = {Proceedings of the 10th USENIX Workshop on Free and Open Communications on the Internet},
  date        = {2020-08},
  edition     = {10},
  editor      = {Ensafi, Roya and Klein, Hans},
  eprint      = {2003.07684},
  eprintclass = {cs.CY},
  eprinttype  = {arxiv},
  publisher   = {USENIX Association},
  url         = {https://www.usenix.org/conference/foci20/presentation/hounsel}
}