Privacy Issues with Honeypots and Honeynets

This article discusses the legality of the data collected by honeypots and honeynets, and how they relate to liability and entrapment in US and EU law. After you read, you should be able to describe the four core elements of a honeynet and the issues associated with honeynets. How are honeypots classified according to their level of interaction and their purpose?

Privacy and personal data protection

In this section, we discuss selected aspects of privacy and data protection in the area of honeypots. First, we outline framework of privacy in the EU law. Then, we discuss privacy issues concerning data collected by honeypots, IP addresses, and data processing.

Legal framework of privacy and personal data protection in EU law

This section provides an overview of the most important privacy regulations in the EU that are applicable to honeypots. The EU legal framework, applicable to honeypots and honeynets, consists of the following legal instruments:

The primary regulatory instrument, or the lex generalis, of the personal data protection system is currently EU Directive 95/46/EC focused on the protection of individuals with regard to the processing of personal data and on the free movement of such data. It is commonly known as the EU Data Protection Directive. It ensures an equivalent level of protection of fundamental rights and freedoms, and in particular, the right to privacy, with respect to the processing of personal data. It also ensures the free movement of such data within the EU and sets rules for transborder data flow outside of the EU. The directive will be replaced by EU Regulation No. 2016/679, the General Data Protection Regulation (GDPR), which comes into force on 25 May 2018. GDPR is based on the same principles as the Directive, and thus, it will not change the basic premises of the data protection system. However, it adds a number of duties for the data controller (the person who processes the data and sets the purpose of the processing) and rights for the data subjects (an identifiable person, whose data is processed).

EU Directive 2002/58/EC focuses on the processing of personal data and the protection of privacy in the sector of electronic communications. This directive is commonly known as the EU Directive on privacy and electronic communications (e-Privacy Directive). This directive is lex specialis, and it specifically regulates privacy and personal data protection when it comes to electronic communications. It has to be used and interpreted in accordance with the general act, which is the Data Protection Directive, or soon the GDPR. This directive covers and harmonizes certain issues of privacy in electronic communications. Some of them are universally binding, e.g., preserving the confidentiality of communication and specific regulation of cookies; however, others only regulate operations of electronic communications providers, e.g., the storing of traffic and location data. In January 2017, the European Commission introduced a proposal for a new e-privacy Regulation. It will replace the current Directive, and thus, the material scope of it will probably stay similar. One of the interesting novelties is an explicit notion of machine-to-machine communication falling within the scope of the Regulation. It is too early now to draw any specific conclusions because the proposal must go through the whole legislative process.

The last relevant piece of legislation is the EU Directive on the security of network and information systems (2016/1148/EU; the "NIS Directive") which was enacted on 6 July 2016. The main purpose of the NIS Directive is to harmonize cybersecurity infrastructures of the member states so they can easily share information concerning cybersecurity incidents. Therefore, the NIS Directive may serve as a basis for national legislation, which will put information-sharing duties upon certain honeypot operators. However, the directive explicitly states in Art. 2 that any processing of personal data pursuant to it must be carried out in accordance with legal acts on data protection.

Basic concepts of personal data protection

In this section, we present several basic concepts of the European personal data protection system, which are relevant for honeypots and their functions and data processing. The data protection system is based on the principle of preventing privacy harm. To achieve this, the Data Protection Directive incorporates a very broad definition of "personal data," so the highest possible number of persons can be considered "data controllers." The most important duty of the controller is to process personal data only for legitimate and legal purposes and based on a legitimate legal ground. All this combined can ensure a high level of protection, as required by the recital 10 of the Personal Data Protection Directive and the Court of Justice of European Union (CJEU) in recent cases concerning personal data protection, e.g. Google Spain case C-131/12, Rynes case C-212/13, and Schrems Case C-362/14.

Personal data is defined in the Art. 2 letter a) of the Personal Data Protection directive as follows: "any information relating to an identified or identifiable natural person; an identifiable person is one who can be identified, directly or indirectly." The most relevant part of the definition is the notion of indirect identifiability. It means that any information, which can be used in the right context for the identification of a person ("data subject") is personal data, even though the information in itself (outside the right context) does not directly identify the data subject. This approach, which is supported by the CJEU, since it is necessary for assuring a high level of protection, leads to a situation where almost any information could be personal data. Thus, operators of honeypots and honeynets should be aware of this situation.

A purpose is the cornerstone of every personal data processing. It is set by the data controller and all that happens to the data during its life cycle is connected with the set purpose. Personal data can be processed only in order to achieve the declared purpose, which has to be conveyed to the data subject. The personal data can also be retained only for a time period that is necessary for fulfilling that purpose. This principle is called "the purpose limitation," it is grounded in Art. 6 para. 1 letter b), and it applies also on the legal grounds for processing. Once the purpose changes or the current legal ground can no longer be used or relied on, the data controller has to find another legal ground or cease the data processing.

The Data Protection Directive recognizes in Art. 7 several legal grounds for data processing, from which the following are relevant for the case of honeypots and honeynets:

The data subject has unambiguously given their consent (letter a))
The processing is necessary to comply with a legal obligation to which the controller is subject (letter c))
The processing is necessary for the purposes of legitimate interests pursued by the controller or by a third party or parties to whom the data are disclosed, except cases where such interests are overridden by the interests for fundamental rights and freedoms of the data subject (letter f))

The Data Protection Directive sets four conditions for the validity of consent which must be met. It has to be freely given, specific, informed, and unambiguous. However, there are several practical problems with this concept both on the side of the data subject (e.g., no one reads the terms and conditions and almost nobody can understand them; therefore, it is quite questionable whether the given consent is in fact informed) and on the side of data controller (e.g., it is technically almost impossible to obtain legally valid consent of data subjects whose personal data are processed in the course of honeypot and honeynet operation). Apart from the legal grounds for processing because of the necessity arising from a legal duty, the Data Protection Directive offers legal grounds for processing of personal data for the legitimate interests of the data controller or a third party. This legal duty must be grounded in a public law norm. For the legal duties of the data controller, which arising from private law, the provision of Art. 7 letter b) of the data protection directive is applicable. Working Party 29 elaborated on this issue in its opinion No. 6/2014, which can be summed up by stating that personal data can be processed for the legitimate interest of the data controller or the third party, as long as it is proportional with the impact on the right of privacy of the data subject.

Collected data

As described in the previous section, almost any data collected by honeypots might be considered personal data. The first aspect of privacy issues within honeypots and honeynets is the type of data that is being collected. There are two general categories:

The contents of communications
Information to establish communication

The first type of collected data, the contents of communications (content data), is regulated by the EU Directive on privacy and electronic communications. According to Article 2 a) of the e-Privacy Directive, communication (content data) means "any information exchanged or conveyed between a finite number of parties by means of a publicly available electronic communications service." Examples of content data are the bodies of email messages, file contents, full packets captured on a network segment, reconstructed content of interactive sessions (e.g., commands executed in a shell account, typed passwords), etc. Apart from harmonized European law, the demand for communication confidentiality is included in national legal regulation and is protected by civil as well as criminal law.

The extent of the collected content data records is related to the honeypot's level of interaction. Low-interaction honeypots capture and collect smaller amounts of content data records than medium-interaction and high-interaction honeypots.

The second type of collected information records is the information to establish communication (no-content data, transactional data, also known as metadata). These are mostly traffic and location data, which are defined in the EU Directive on Privacy and Electronic Communications, as follows:

Traffic data - any data processed for the purpose of conveying a communication on an electronic communications network or for the billing thereof (Article 6 of that Directive)

Location data - data processed in an electronic communications network, indicating the geographic position of the terminal equipment of a user of a publicly available electronic communications service (Article 7 of that Directive)

Examples of transactional data are IP addresses, network ports, network protocols, account names, email header information, time, date, website URLs, etc.

The categories of transactional data retained in honeypots include:

Data necessary to trace and identify the source and destination of a communication, for example, the IP address and domain name
Data necessary to identify the date, time, and duration of a communication (e.g., timestamp)
Data necessary to identify the type of communication, for example, an Internet protocol (e.g., ftp, ssh, samba)
Data necessary to identify the users' communication equipment or what purports to be their equipment, for example, the operating system

From the perspective of honeypots, the IP address, timestamp, and Internet protocol are data collected in all honeypots. Due to the abovementioned broad definition of personal data, all of this should be considered personal data within the scope of the Data Protection Directive.

IP addresses

In this section, we argue that IP addresses are personal data in the meaning of the Data Protection Directive. As stated before, the IP address is a piece of information necessary to trace and identify the source of a communication. According to our opinion, it is the most important piece of information in any subsequent analysis.

An IP address is connected with a specific device. However, in many cases, we can assume a strong connection between the device and its user. That is the case of smart phones, tablets, and other smart handheld devices, as well as personal computers. IP addresses are used by electronic communications service providers to help identify a subscriber. IP addresses are also collected and stored by electronic communications providers for the purpose of a possible criminal investigation. This is done in the course of data retention duty, which is still present in several member states although the Data retention directive 2006/24/EC was nulled by the CJEU. We can see in this example that IP addresses are used as information which leads to the identification of a person. Therefore, it counts as indirectly identifying personal data.

This view is supported both by the Data Protection Directive Article 29 Working Party, which considers IP addresses to constitute personal data within the meaning of Article 2 a) of the EU Data Protection Directive and the CJEU. The CJEU dealt with IP addresses in the case Scarlet Extended SA vs. Socit belge des auteurs compositeurs et diteurs (SABAM) (C-70-10). In this case, the CJEU stated in Section 51 that the monitoring of the behavior of Internet users and any further collection of their IP addresses amounts to an interference with their rights to respect for their private life and their correspondence, since IP addresses are personal data.

In this respect, the prejudicial question of the Federal Government of the Federal Court ("BGH") about IP addresses is quite crucial. In what is now known as the Breyer case (C-582/14), the BGH filed a preliminary reference to the CJEU on whether dynamic IP addresses are at all considered personal data, protected by the European data protection law, even if no further information on the identity of the terminal holder is available. In his opinion, the Advocate General stated that "an IP address stored by a service provider in connection with access to its web page constitutes personal data for that service provider, insofar as an Internet service provider has available additional data which make it possible to identify the data subject." This case is very similar to data collection in honeypots. That would mean that IP addresses are not personal data in the situation of honeypot and honeynet operators, because the particular natural person is not identifiable by the means the operator has at their disposal. Furthermore, in most situations, the attack is carried out by a machine, not a human. In this case, an identification of the natural person is fairly difficult. However, the final ruling stated in paragraph 49 "dynamic IP address...when a person accesses a website that the provider makes accessible to the public constitutes personal data within the meaning of that provision, in relation to that provider, where the latter has the legal means which enable it to identify the data subject with additional data which the internet service provider has about that person". This provision softened the objective approach to personal data definition a little. However, the "legal means" which the court mentioned might be for example just a possibility to hand the data over to the police which has then access to data retention data.

Therefore, in our opinion, it is safer to consider IP addresses personal regardless of what other information the operator has. There are three reasons for that. First, it is the basic preventive principle of the personal data protection system, which regulates the amount of collected data, so it cannot be connected and misused. Second, the opinion of the Advocate General is not binding for the CJEU and should the court keep the line of its previous decisions, it might decide on the matter more strictly. Third, even though in a number of cases the IP addresses can be connected only to a device and not a human being (e.g. the Internet of things), there is not an easy way for the honeypot operator to distinguish them.

Legal grounds to process data and purpose limitation

IP addresses collected during the operation of honeypots and honeynets can be personal data of either the operator's customers or third persons, whose devices are used for the attack. The customers can provide consent for the personal data processing, but that is not the case for the third persons. Furthermore, it is advisable to rely on a different legal ground for processing than for consent, when it is available and applicable. The legal ground must be chosen according to the purpose of the processing.

The following may be considered a relevant purpose of personal data processing within honeypots and honeynets:

For production honeypots - safeguarding the security of the service
For research honeypots - research and prevention of future threats

In the first case, the data controller can rely on their legitimate interest in the cybersecurity of his network. The possible harm of privacy for the data subject (those whose IP addresses are processed) is very little. Therefore, they can process personal data in accordance with Art. 7 letter f) of the Data Protection Directive. Furthermore, this processing is also in accordance with the legitimate interest of the owners whose devices were used for the attack, since this processing might help to solve their unfortunate situation.

In the second case, the situation is more complicated. The legitimate interest of the controller might be a promotion of cybersecurity and a right to carry out their business properly. These interests must be proportionate with the right of data subjects for privacy protection in the light of the possible harm done by the processing. Since the possible harm is quite low, we are convinced that the legal ground for processing established in Art. 7 letter f) should be applicable in the case of research honeypots as well.

Furthermore, the data controller must consider an adequate period of time for the retention of the collected data. It is, yet again, connected with the purpose of the processing. As was mentioned earlier, the data controller can only hold the data only a necessary period.

In the case of production honeypots, the data should be erased periodically after a shorter period of time (e.g., one month) or once the security incident is resolved. In the case of research honeypots, it could be a longer time, but it must not exceed the proportionality of the Art. 7 letter f) legal ground. Should the data controller wish to keep the data longer, they would have to obtain consent from the data subjects.

Finally, regardless of whether it is a case of a production or research honeypot, the honeypot operator might have a legal duty to share information about cybersecurity incidents based on, for example, the NIS Directive. In such cases, the data can be processed (and transferred) in accordance with the provision of Art. 7 letter c), as mentioned above.

In situations when the data transfer is not prescribed by the law, it is necessary to rely on different provisions. In case of data transfer within the borders of the European Union, the European Free Trade Area, and countries with an adequate level of protection, that would be again the Art. 7 letter f) legal ground. Article 25 of the Data Protection Directive enables the Commission to promulgate an adequacy decision, which states that the country in question has an adequate level of protection of personal data adequate to that of the EU. We argue that in this case, the legitimate interest on the processing might be the interest of users of communication networks because the sharing of information improves the security of the whole network ecosystem. It is true that this interest may seem quite vague, but as long as it is proportionate with the rights of the data subject, it is legal. The proportionality is, in our point of view, ensured by the fact that IP addresses in themselves do not impose too much of a privacy threat. In case of information sharing to partners seated in other countries, general rules on transborder personal data transfer apply.