Checking open proxies

This advanced lecture in the School of Computer Science covers many methods used to determine whether a host is an open proxy. It assumes extensive understanding of networking, TCP and UDP communications, IPv4 address allocation, proxy servers, and UNIX.

Overview
Open proxies have in recent years evolved as an excellent means by which to circumvent oppressive internet censorship, protect online privacy and anonymity, and ensure safety and security on the web; however, they have also become a useful tool for script kiddies and others with malicious intent. For this reason, many organizations and individuals have chosen to prohibit or restrict access to their servers from open proxies, yet the enforcement of these restrictions does not come easily, requiring a skilled team of open proxy checkers, such as those of the Wikimedia Wikiproject on open proxies, to verify which hosts are open proxies, and which are not. This lesson seeks to educate you in the skills necessary to master the art of open proxy checking, such that you too can become an "open proxy checker."

The basics
The methodology used by open proxy checkers is similar to that used by a detective investigating a crime. This is of course not to say that the users of open proxies are criminals, but rather to draw a helpful parallel in investigative strategies.

With this analogy in mind, let's define the first principle of open proxy checking as "presume innocence until guilt is proven"--that is, always assume that a host is not an open proxy until you can prove otherwise. Following this principle, we see that it is impossible to prove that a given host is not an open proxy; the only way to support a claim that a host is not an open proxy is that there is inconclusive evidence to support the opposite claim.

And what amount of evidence is then conclusive? Your job as an open proxy checker is not to prove that you could connect through a proxy, but rather that it is reasonable to expect that unauthorized persons can connect through and use the host as a proxy. This breaks down into the following burden of proof: You must prove that the host is currently running a web proxy and that one of the following conditions is additionally met:
 * 1) The proxy allows access to anyone.
 * 2) The proxy does not allow access to anyone, but uses a weak form of authentication that is easily bypassed.
 * 3) The proxy restricts access, but the details for authenticating to gain access to the proxy are publicly available.
 * 4) The host has been hijacked so as to allow unauthorized access to the proxy to one or more persons.

It is also important to keep in mind the second principle of proxy checking: "Once a proxy, not always a proxy." Just because you were able to prove the above yesterday, it doesn't mean that you will be able to prove it today. Proxies very rarely remain enabled or behind the same IP address for more than a few years, and the owners of hosts often disable or secure proxies once they become aware of the problem. As such, if your declaration of a host as an open proxy comes into question at a later time, you must again presume that the host is no longer an open proxy and seek to prove the opposite--you cannot argue that a host is an open proxy on the basis of previously collected evidence.

The process of reaching a conclusion about an open proxy consists of three basic steps:
 * 1) Determining if there is probable cause to suspect that a host may be an open proxy.
 * 2) Investigating the host in depth and collecting evidence to support you suspicion.
 * 3) Determining if the evidence collected conclusively supports your claim that the host is an open proxy.

Probable cause
As with any investigative work, it's necessary to have probable cause to suspect that a host is an open proxy before investing the time needed to perform an in-depth analysis. Naturally, there is no legal obligation or otherwise for this; it's just simple common sense. There are 4,294,967,296 IPv4 addresses out there (and as many as 340,282,366,920,938,463,463,374,607,431,768,211,456 IPv6 addresses), and absolutely no one has the time to check them all. As such, we choose to discriminate between hosts that are more and less likely to be open proxies on the basis of several rather superficial behavioral observations.

The following are some (but not all!) common characteristics of an open proxy:
 * Large amounts of traffic from a single IP address, especially if this traffic occurs at all hours of the day and especially if the IP address is consistently accessing the same file(s).
 * Inconsistent traffic from a single IP address or anything indicating that the IP may have been used by multiple unique individuals.
 * Similar traffic coming from multiple unique IP addresses, indicating that the same individual is using more than one IP. For example, if 12 completely unique IP addresses log in to the same account, you can pretty safely guess that at least some of the IPs being used are open proxies.
 * As a general rule of thumb, IP addresses are considered "unique" in this context if a whois report indicates that they originate from different parts of the world. In particular, IP addresses serviced by the same ISP, especially if they differ by fewer than 16 low-order bits, should not be considered unique.
 * Indicators of a poorly configured proxy, such as unneeded escape characters (typically backslashes, and commonly before characters such as quotation marks) being submitted in forms.
 * Multiple, recent red-listings on a completewhois report.

None of these criteria constitute conclusive evidence that a host is an open proxy, nor must an open proxy have any of these characteristics; however, they are useful guidelines to assist you in determining the likelihood of a host being an open proxy.