As far as I know, no one has done a survey like this on a public network. Governments, large corporations and accounting firms, and other organizations do routinely hire tiger teams to perform surprise attacks on their defenses either internally or when they audit their clients, but this survey used a significantly different approach.
The GAO put out the very fine Computer Attacks at Department of Defense Pose Increasing Risks, done by Congressional request where a US government assigned tiger team broke into vast numbers of military sites to test the security awareness and adherence to the security policy of the military systems.
My survey was only in the vaguest sense like a tiger team approach (which I'm not very enamoured of, in any case; I feel that there are far better ways to examine security.) Since I never even attempted to break into any systems, and the methods I used looked primarily at potential security problems from network services that were frequently used and the vulnerabilities very well known, I expected very little interesting data from the survey. The results were thus all the more surprising.
Traditionally it has been deemed to be "bad form" to perform certain activities on the net. Spamming, sending or reforwarding chain e-letters, disclosing explicit details of security problems, and posting private e-mail are all examples of this. And while some of these are more acceptable today, performing an unauthorized technical survey that solicits details about remote hosts or sites, while not illegal, has never been explicitly socially condoned. But, even though performing a security survey on certain high profile sites is not illegal, to many this is perhaps analogous to Jonathan Livingston Seagull buzz-bombing the elder gulls at speeds exceeding 100 MPH.
So why did I do it? Or, rather, since I am a rather well-known security researcher who probably could have gotten a reasonably positive response if I had sent out a request for survey participants, why didn't I attempt to get prior permission?
To start with, I was very interested in a rather small section of the Internet. I felt that if even if I had asked all of the potential participants for their explicit permission, the number of responses I received would have been very small, and perhaps statistically insignificant. I also thought that merely asking might scare them, and that the majority of the responses that I received would be those who put lots of time into their security setup, were very confident and knowledgeable about security, or would run back to their system and quickly fix any security problems before I could examine their system. I decided that I needed a more random sample.
The Internet - the WWW in particular - for better or worse, is getting more and more intrusive in its approach to this sort of subject, and this helped fuel my decision. For instance, the Network Information Center (NIC) not only controls the domain namespace of the Internet, but also has a database that contains architecture and operating system information; while the information is taken from written applications rather than any sort of technical or automated survey, it can be quite illuminating. The oft-quoted survey results from Network Wizards is one such example - over 100,000 hosts from all corners the world are remotely surveyed. Dan Bernstein, a long-time net denizen, did a vast SMTP scan covering over 130,000 hosts, listing what people were using as SMTP agent. And, finally, a British company called Netcraft did a web server survey of over 600,000 web sites and placed the results in a database that can be queried. Want to know which educational sites are running the Apache web server? It will quickly dump out over 2000 of them, and you can even query for the version number - it does all the work for you (the Wall Street Journals' server, for instance, is the Netscape-Commerce/1.1 with a certificate serial number of 0x027a000b.) The security ramifications of this are fairly staggering. And all of these are done without anyone's explicit permission; they are seen as doing the rest of the Internet a service rather than causing problems.
WWW search engines are even more amazing in the amount of information they routinely suck in and index. As John M. Fisher of CIAC (the Department of Energy emergency security response team) said, they are so efficient at gathering information of all sorts that:
... perhaps these search engines have become a little too powerful. Some Web sites have provided a few too many links, including information related to system configuration. For example, doing a search for keywords such as "root", "daemon:", "passwd", etc, will return back a few stray /etc/passwd or /etc/group files, located on systems that have very poor Web configurations. This is a way to quickly find those Web sites that are not maintained very well and are most vulnerable to attack.
Unfortunately, time and resources were a significant issue as well. I was doing this on my own time - I had no assistants, no paycheck, no nothing while performing the survey. The logistics of setting up a more formal survey, such as gathering the potential participants, establishing secure communication channels, setting up times and dates for the experiments, etc., etc. were simply out of the question. I had a very large amount of curiosity but a very limited amount of time, and, since I viewed the project as a cultural and philosophical effort and not a legal one, I decided to risk censure, loss of trust, and the general ire of the net by simply doing it.
I chose five categories of systems, primarily using Yahoo to find my survey participants. Yahoo is really wonderful if you'd like to select from a known set of potential survey participants - or targets, if you're a system cracker. With categories such as "News and Media: Newspapers: Indices" it does all the work for you. According to their FAQ, Yahoo makes its lists based both on user submissions and by identifying as many sites as they can find and verify, which was a reasonable approximation of what I wanted in the survey. I would be surprised if I got much less than 10% of the total numbers of hosts of that type on the entire Internet in the most of the categories; these organizations want people to find them, or they wouldn't be on the WWW, and Yahoo is one of the most popular ways that people search for Web services. The odd numbers of hosts in the survey are simply because that's how many hosts Yahoo had in that category that answered to a ping:
Bloom county was way ahead of its time, as per usual, with this brilliant cartoon published in the early 80's: