Trilateral examines ethical standards for webscraping technology

Webscraping tools, sometimes referred to as ‘bots’, allow the extraction and collection of code and data from the web for later retrieval or analysis. Their uses range from law enforcement to journalism, to research and business, facilitating data and market analysis, visualisation, and even price comparison.

The RAMSES project is designing a platform to support law enforcement agencies (LEAs) in the cyber world. The project will consider financially-motivated malware and the use of Trojan horses. RAMSES will deploy webscraping tools to extract information from the Internet and build datasets, which will then be processed and analysed through machine learning techniques. The insights gained through RAMSES will deepen LEAs’ understanding of the functioning and spread of various malware and Trojan programmes and will aid criminal investigations.

Trilateral Research is working alongside other partners, focusing on the privacy and ethical aspects of the platform and its components and working towards specific technical and organisational safeguards for the development and deployment of this technology.

Trilateral Research will host an expert workshop to discuss the privacy, ethical and social impacts of the RAMSES platform on 23 November 2017 in Canterbury, UK. For further information, please contact the authors of the article, Anna and Christina.

While using webscraping bots to gather information from websites is not necessarily unethical, it is important to be aware of considerations that support their responsible use.

Respect the website and its creators

It is important to consult the Terms of Service on a website and comply with any stated requirements regarding what parts of the website can or cannot be scraped. Following this, do not attempt to access restricted, hidden or personal information, nor try to collect information beyond that which is permitted by the Terms of Service.

Not complying with the Terms of Service of a website is not only unethical, but can result in you being liable for a contractual breach relating to your use of the website.

Furthermore, webscraping can infringe upon copyright laws if it is done with the purpose of republishing the original content without further and substantially developing it and/ or without giving credit to the original authors of the content.

To respect the website and its creators, consider the following steps:

  • Read the Terms of Service before web scraping
  • Contact the website prior to your webscraping activity and/or inform them of your activity, identity and contact information
  • Do not simply republish information and/or pass it off as your own
  • Carry out web scraping in a manner which would not result in Denial-of-Service for other users of the website

Respect privacy

Remember not only to consider the interests of the website creators and owners but also of the people publishing information. Issues of privacy and informed consent may arise due to the vast amounts of data collected by webscrapers and the inability to acquire informed consent from people whose data is collected. This is especially important where the identities of persons to whom such data relates may not always be obvious and their age and vulnerability may be impossible to determine.

Moreover, privacy and ethical issues may arise even if only publicly available data is collected. People may have consented to their data being processed for a specific purpose and may not expect it to be visible or used for other purposes, such as research of law enforcement. Even when personal information can be deemed as public, its responsible use ought to always be a consideration.

A few simple safeguards may consist of:

  • Anonymise any data relating to people as soon as possible
  • Collect as little data as possible
  • Keep the data collected safe, confidential and do not share it with third parties
  • Consider relevant data protection laws

Pay special attention not to collect sensitive data which relates to people’s demographic, relational and cultural details, e.g. regarding age, sex, political and religious affiliations, etc.

Keep these tips in mind whenever you carry out web scraping and please contact us if you would like to know more about how to develop technology in a safe and responsible manner.

Authors:

Anna Donovan & Christina Hitrova



‘Risk Assessment Report and Methodology’

You can view the Executive Summary and Table of contents of the Project Solebay Risk Assessment Methodology Report.

Please sign up to the Solebay mailing list to download the Full Solebay project report.