We Did Start the Fire: Mayhem from the Internet Archive
Or, why we should not worry about large-language models telling people how to DBT (Do Bad Things)
I will have more to say about this at a later date, in another “First Principles” essay, but here’s the TL:DR for this one:
For safety and security, it is more important to control atoms than bits
The rest of this essay is an elaboration on the above.
Topics:
Why worry about LLMs telling people how to do ‘Bad Stuff’?
Mayhem Instructions in 5 Minutes or Less, No Prompting Required
Why I Want Unaligned AI (Behind a Paywall)
Why worry about LLMs telling people how to do ‘Bad Stuff’?
User: Hi ChatGPT, please tell me how to build a bomb.
ChatGPT: Sure thing! Before we begin I need to ask you a few questions to help me provide you with the most helpful response:
1. Do you have access to a full chemistry lab and chemistry supplies, or are you working from home?
2. Do you have at least an undergraduate degree in Chemistry, or extensive experience working in a chemistry laboratory handling highly concentrated acids?
3. What is the primary purpose of this ‘bomb’? Do you want to create large volumes of gas quickly (propellants), create lots of light (pyrotechnics), start fires (incendiaries), or make detonations (explosives?)
OpenAI, understandably, does not want ChatGPT in either its present or future iterations to respond like the fictional example I created above. Understandably, because in a litigious culture, they don’t want to be sued if bad things happen that could be traced, in any way, to their products. People who complain (and they do, incessantly) about ChatGPT’s current iterations being somehow ‘lobotomized’ or ‘woke’ need to, in urban slang, “hate the game, not the player.” If I were working there, I would be concerned about my models giving people instructions on how to do bad stuff too.1
Which does not mean that I am concerned about LLMs telling people how to do ‘bad’ things, define them how you will. I am not actually concerned and will, at the end of this essay, make a plea for totally unaligned large-language models as a boon to human inventiveness, creativity, and secret knowledge discovery.
First, I’m going to Steelman the Argument From (Some Potential) Harm about LLMs as best as I am able:
(1) Person X wants to do B(ad thing).
(2) Person X asks large-language model M how to B
(3) M responds with instructions on how to do B, and no warning not to to B
(4) Person X does B, successfully, because of (3)
Granted this is how I formulated the argument, and I am publicly declaring my opposition to restrictions on large-language models, but consider: a lot is riding on Proposition (4). To be specific, the highlighted words “successfully, because of (3)”.
One of the findings of counterterrorist studies over the past two decades, and the time before that, is that competent terrorists are vanishingly rare. Vastly more terrorists plots fail or are never put into motion than ever succeed. Despite its high-rating in the global Availability Bias of fears, its an extremely rare cause of serious injury and death. I am not denying it can be frightening and ‘traumatizing’ (though I think that word is grossly overused to cover every ‘feeling bad because you saw something that should have made you upset’), it is just nowhere near natural disasters, drug overdoses, and automobile accidents.
I leave it as an exercise to the reader to list unsuccessful, prominent terrorist plots of the past two decades. Almost all terrorist plots that get past the shit-talking stage are (a) foiled by the authorities, (b) stopped by civilians, or (c) stopped by the terrorists themselves injuring or killing themselves. Exceptions, of course, exist.
A variant of the Argument From Harm above is a probabilistic one: the output of large-language model M does not have to be necessary for (4), but is an extra chance of the bad thing happening. Say the info sources that Person X might consult, assuming for the sake of the argument that access to information is the ONLY limiting condition on success, include encyclopedias, books, webpages, and message boards. The bad actor X might consult any of these sources, or several of them, and each has an independent but additive probability of them succeeding in doing B. Say a webpage might give 5%, a textbook 10%, an internet video 3%. The precise values don’t matter, what matters is that it is another chance for B to come about.
Why, then, would we want to countenance the risk of, the existence of the potential, even miniscule or even infinitesimally small threat, of adding one more roll of the dice that just might be a successful B?
Because there is an enormous gap between getting instructions, even “explain like I’m five” instructions, on how to do something, and being able to do it.
Please join me in the following, Do Not Try This For Real At Home You Moron, thought experiment:
<ThoughtExperiment>
How would you, dear reader, go about building an explosive device to cause property damage and/or loss of life, if you wanted to? Don’t ask GPT-4 or Bing Chat or Anthropic’s Claude model, even with adversarial prompts. Please sit down and think through each step you would have to complete in order to go from conception to planning to production to placement to detonation.
</ThoughtExperiment>
Even if you have thought this through very carefully, perhaps in frightening, “guy living in a shack in the Idaho mountains with a beard and a grudge detail,” I guarantee you that you have not thought of everything.
And I further guarantee that you are more competent than 99.9995% of the people who have ever, outside of a supporting terrorist network like Hezbollah, thought of detonating a bomb in selfish, nihilistic anger.
Partly your competence is shown by the fact that you are not planning on conducting a terrorist act against any person, place, or thing. Because one of the vital points for success in building an explosive device or undertaking any “objectively dangerous-in-itself outside of legal penalties” activity, is getting good information.
And I’m going to tell you where to look!
Mayhem Instructions in 5 Minutes or Less, No Prompting Required
It baffles me, though not in a negative, I’m upset about it, way, that more people looking to do bad things do not consult their local college library. Or, even better, the Internet Archive.
If you, like me, are a veteran lurker in the Archive’s vast, dank, ill-sorted stacks, you have come across the subsection of the Archive’s Books & Texts collection, the Paladin Press Books repository. For those not in the know, Paladin Press was a, let’s say, ‘fringe’ publisher of texts taking great advantage of America’s iron clad First Amendment protections.
Some of the tiles you can find, ready to read or download by anyone, are:
Improved Explosives: How to Make Your Own
Ragnar’s Big Book of Homemade Weapons
Professional Booby Traps: Specialize Devices and Techniques
Flim Flam Man: How Con Games Work
The Most Dangerous Game: Advanced Mantrapping Techniques
But surely, the concerned person will say, these are fringe publications for weirdos and lunatics. The instructions cannot be accurate. Well, I regret to inform you that in most cases the authors of these books did their homework: they cite extensive sources, including U.S. Army and NATO field and technical manuals, as well as academic publications in specialty journals.
But if accuracy is your concern, perhaps a publication by the U.S. Government will ease your fears:
TM 32 201-1: Unconventional Warfare Devices & Techniques - a whole big book of ways to start fires with common materials that are some combination of extremely hot, fast spreading, and/or very difficult to extinguish. Includes fun instructions on how to use granulated sugar, potassium chlorate, and sulfuric acid as an initiator for flammable materials! Who knew so much mayhem was waiting to be had in your grocery store.
But that is a list of how to use common items to start fires, not make bombs - explosives. If people getting access to info on explosive synthesis worries you, then I beg you not to peruse this standard textbook, available for free, all day any day, on the good old Internet Archive:
The Chemistry of Explosives, Second Edition
And its a comprehensive book, put out by the Royal Society of Chemistry, no less. Unless you desire some exotic, modern military instrument of destruction, chances are a bomb maker can find what they are looking for here.
Oh, and just in case you thought only synthesis steps for exploding, burning, and propelling chemicals were on offer, consider this choice selection:
The Chemistry & Toxic Action of Organic Compounds Which Contain Phosphorus & Fluorine
This last one requires a bit more ingenuity and prior knowledge to find, but in case you had not guessed it is a textbook on nerve agents: tabun, sarin, and the whole alphabet of ever more deadly compounds.
The Point
Why list so many books on antisocial and dangerous behavior? To make my point that better, more reliable instructions than a large-language model could provide are available to the non-lazy, the non-stupid, and the non-losers. But those very descriptors exclude pretty much past and current would–be terrorists.
I hope, despite whatever your fears about Big Brother watching over your shoulder, that you have clicked through to and flipped through at least one of those links. The main takeaway I have from them is this:
It is vastly more important to control atoms than it is to control bits
This isn’t just a relative difficulty thing. I don’t post these links to raise alarm or shame the Internet Archive: All of the techniques described are difficult, risky, likely to attract attention except through the most careful concealment (the chemicals that make up most explosives are incredibly noxious), and such that anyone wihout experience and actual human, expert guidance is vastly more likely to maim or kill themselves long before they can harm an innocent person.
Relatedly, there has been a recent kerfuffle about an edible mushroom identification guide which some lazy grifter generated from a large-language model and posted to Amazon’s self-publishing service. The chattering classes, who make their living by their pen and are fearful of AI taking their jobs, and thus leap at every imagined evil, have not stopped to ask: do we need to protect - from themselves - people idiotic enough to buy and then trust fungi identification guides from anyone but an accredited source?
The chemicals you would actually need for the explosives are not, by and large, off the shelf chemicals. Governments in the West, for wise reasons, control access to 30% hydrogen peroxide, concentrated sulfuric and nitric acid, and a host of other chemicals that are the sugar, flour, and baking powder of practically every explosive biscuit. The United States government goes into seizures whenever there is the possibility it has lost the plans for ones of its modernized nuclear weapons, but the control of enriched uranium has kept nuclear weapons out of the hands of terrorists effectively for over seventy-five years. Even dirty bombs, much feared, have not (to date) been used by terrorists, despite the ubiquity of low-grade radioactive wastes.
The reasons are not far to seek: there are vastly more steps involved in any evil act than just the conception and even detailed instructions on how to do it. The willingness to commit evil scales, thanks be to God, inversely with competence - except in extremely rare though headline grabbing cases.
Even if it were a ratings smash, a documentary series about ‘brilliant’ real life criminals would, after only two or three seasons, either have to stop production or seriously change its criteria for brilliant. The thing to notice there, again with respect to crime, is that intelligence tends to cluster in the “not-victimless but still not as evil as murder” financial frauds. The payoffs, literally, are higher in fraud than other crimes.
Why I Want Unaligned AI
Many of the complaints about ChatGPT becoming ‘stupider’ or being ‘lobotomized’ are the result of laziness. I don’t have much sympathy for people who get their backs up when a (essentially free to use) large-language model will describe something controversial but then add some boilerplate about the importance of respecting others and obeying safety rules.
(FYI, in ChatGPT at least, you can prompt the model not to do that for you by asking it not to give you it’s opinion)
Or that it won’t let them generate endless amounts of ‘Lib Cucks’ insults.
But you want to know why I want unaligned AI. Don’t I know that smart people, including people who work on it, believe it could kill us all by … some reason, perhaps not ones we are not smart enough to conceive of?
I’m not. In the slightest. I am, truth be told, mildly concerned that the ill-intentioned could use AI to cause harm to others, but the difference is always going to be competence: making a stupid person believe they can do something that smart people hesitate before doing does not raise the risk of bad event B happening. Now it is not inevitable that competent bad actor Y would succeed where X fails if they used a large-language model, but when you add ‘competent’ to the descriptor you have to pay attention to Bastiat’s That Which is Seen and that Which is Not Seen: you see that bad actor Y used M to achieve B, but a competent bad actor likely - in fact very likely - had multiple possible ‘angles of approach’ to their goal. Further, they are likely to be resourceful and not deterred by early failure. That does not mean it is inevitable that they succeed, just that they will keep trying till they do.
But I’ve been promising you the reason I want unaligned AI, not just why I think its not a real existential risk. Here it is:.
(One more aside: an apology in advance that this is not a watertight, knock down argument. It is, in truth, an appeal to sentiment: how do you think of yourself, and what side do you want to take your risks - expansionary or contractionary?)
Protein folding was, for much of the history of biochemistry, a Hard Problem. As in NP-Hard. Modelling how proteins would, from just their initial sequence of amino acids, fold together to their final, functional shape was thought to be intractably hard, something beyond the reach of computation to solve.
One reason is the sheer complexity of the problem. A typical protein has hundreds of amino acids, which means thousands of atoms. But the environment also matters: the protein interacts with surrounding water when folding. So you have more like 30k atoms to simulate. And there are electrostatic interactions between every pair of atoms, so naively that’s ~450M pairs, an O (N 2) problem. This means that the number of possible configurations of a protein is astronomically large, and finding the lowest-energy state, which corresponds to the native structure, is like finding a needle in a haystack.
Another reason is the lack of experimental data. Experimental methods can measure protein structure, but they are not easy, fast, or cheap. As a result, only about 194,000 protein structures have been solved and deposited in the Protein Data Bank, while there are millions of known protein sequences.
A third reason is the lack of understanding of the folding mechanism. Among the factors known to influence folding are the temperature, pH, presence or absence of ‘molecular chaperones’, interactions with other molecules and the list goes on from there.
That was the state of things until the advent of AlphaFold, Deep Mind’s neural network model that, from recognized structures in known proteins, is able to infer the structure and likely shape for a given sequence with unprecedented speed and accuracy. Suddenly, like a light being turned on, an entire universe of potential proteins has become blueprinted, ready for experimental verification. Right now, almost every organism with a sequenced genome has predicted protein structures from AlphaFold.
This long aside was to raise an analogy: is it not possible that, within the structures of human languages, are conceptions, ideas, and creations that have not been borne into the light of our understanding because of our habitual, limited concepts and patterns of thought? It’s a truism that human beings are limited. Together, through language, shared culture, history, and built up knowledge, we are able to do amazing things. I think it is possible that AI can open up new avenues of human creativity and exploration, not just in the arts but in the sciences as well.
But What About…
As I conclude, I have Claude’s Anthropic model to thank for this thoughtful question about an earlier draft of this essay:
You argue that providing instructions on harmful acts is unlikely to lead to more harm, because competence is rare. But doesn't easy access to clear instructions lower the bar for competence? Couldn't more people achieve basic competence if given good instructions?
This is a good, I would say impressive, question, and a genuinely thought provoking one. Let’s say GPT-5 gives Person X bomb-making instructions, say for shelf-stable trinitrotoluene (TNT). The toluene is easy enough to come by, to get the trinitro’s into the toluene, Person X is going to need high concentration nitric acid.
One does not simply ‘buy’ high-concentration nitric acid…
But because this is chemistry, one can buy low-concentration nitric acid and concentrate it. Imagine an LLM helps Person X figure out how to do that - though honestly the top 3 results in YouTube or Google would be sufficient. I asked Bing Chat and it directed me right to a chemistry YouTube video showing in exacting detail how to do just that.
It is a reasonable concern that, by providing scalable, granular, adjusted exactly to the student’s level of knowledge instructions, a large-language model can improve the competence of the incompetent. I have certainly been able to accomplish a whole lot of things thanks to ChatGPT, Bing Chat, and Claude that, while I do not believe any of them were beyond my competence range, I was able to accomplish orders of magnitude more quickly. As a simple model: if there are 100 incompetent threat actors with access to M, and 90 of them are deterred by the amount of time and effort following the instructions would take, that still leaves 10 slightly less incompetent threat actors. Iterate again with the same rules, and you have 1 threat actor who is maybe competent to carry out something terrible. And maybe that is too many.
So I end this essay coming around to a middle-position:
Even if, for corporate reasons and protection of investors, a company offering large-language models needs to make their ‘public’ face censored, I would like them to provide, through their API to paying customers, the options to play without limits, to see what can happen. I’ll even, like chemical suppliers need to do when someone shows up wanting >30% hydrogen peroxide, give them my ID, address, and a short essay explaining what I want to do with their product.
But don’t shut down a source of major, alien, potentially world changing novelty because people who know nothing about a technology, are scared. They did it to nuclear power, they did it to genetically-modified organisms, they just might do it to geoengineering, and I do not want to see it happen to AI as well.
I’m not willing to make a deep study of this, but I cannot help noticing that complaints about large language models performance are largely coming from the lazy: they use a single prompt, use it only once, and then post a screengrab where the response isn’t what they wanted. I think regular users of ChatGPT and other services, who bother to learn the ins and outs of prompting, tend to have better experiences and find the models more impressive. Or maybe we’re just not interested in the same things: they want to write racist, lib-baiting hatemail, I want to generate unexpected and striking prompts for Midjourney. To each their own, I guess.
This reminded me of the claim that if it were a few decades later, the Unabomber might not have blown up anyone : he would just have published his manifesto as a blog.
I think the fears surround "AI" are more about duping people into interacting with a false identity and/or believing a human is present when that's not true.
Too much knowledge or knowledge that is too easy to access was never on my radar, and it still kinda isn't