ASI Safety Lab

What’s relevant for ASI Safety

Thinking about Artificial Superintelligence (ASI) and its Safety is darn difficult.  The normal objection would be: “We don’t even know where to start without wild speculation”.  We don’t have a specific product or a limited class of products in which its safety can be discussed, and we have only our limited imagination to map out what all could go wrong — we could be outmaneuvered at every corner by much smarter artificial entities – if this would ever become relevant.  We don’t even know what kind of adversary we have to deal with: does it even have common sense.  So, do we really need to be prepared for every artificial monster out of third-grade Sci-Fi novels?  All this is true, but we need to start making our technology safe anyway.

I want to start by proposing 2 goals that I would consider as the minimum that we should agree to:

  1. Protecting Humans: No human should be harmed or hurt even if the (normal) operating software of a device is under malicious control.  Ideally, I would extend this to damages of human property as well.  I will refer to a solution of this goal as “Consequence-Controlling Technology”
  2. Stop/Kill ASI: A reset for all electronic devices that can be activated globally, but doesn’t create harm or damage neither for humans nor for any property.  I will refer to a solution of this goal as “ASI-Kill-Switch Technology”

Personally, I can’t imagine Safety Solutions for AI/ASI that doesn’t contain solutions for the above goals.  However, I can imagine that others have other values, but the protection of humans and ideally also their properties should be everyone’s top priority.  Regarding the 2. goal (reset ASI), we may differ — but I believe it is essential.

I will briefly mention additional goals and alternative goal sets. But I will leave the details to dedicated posts.

Goal 1: Protecting Humans

At least, related to (1.), we already have a safety culture that doesn’t accept product failures.  There is already legal accountability by product manufacturers and their users.  But so far, the extent of this liability is that the engineering must deliver a working product, and the users are then responsible and potentially liable for everything related to its usage.  If hard and software are bug-free, and the device can’t easily be hacked or cracked by a malicious hacker, then it’s not the fault of the manufacturer either.  But this is not enough.  We are living in a world of complexity and complexity is the enemy of security.  We must anticipate that the normal operating mode of the hard and software is being hacked, cracked, and potentially used against humans and their property either by other humans or by ASI.  There are things that can be done easily, like no drones should have open, unprotected blades: imagine these rotor blades could kill by cutting a human throat.  But then there are much harder, more difficult solutions in which we have to detect situations 5, 3, or 1 second ahead and determine if a shutdown of a device or automatically triggered measures would be enough to prevent damage or harm — just in case the device’s/product’s main software is under malicious control.

Another example, there are already billions of e-Commerce transactions executed by tested algorithms.  But these algorithms can be modified in milliseconds and then restored, without leaving evidence except the damage/harm to humans or their organizations.  Yes, there are solutions to prevent this: anti-tampering measures or RASP (Runtime Application Self-Protection).  Unfortunately, the scope is too limited, too many components must be protected in order to have a reliable e-Commerce environment.  But what’s the costs of distrust: who is liable if a transaction is rejected by the customer: imagine I thought I have booked a flight to Boston, and then at the airport, I see that it is a flight to Beirut or Beijing.  Not possible? But think again: we trust software to be reliable and not easily modifiable in that way; we trust logistics and internal consistency checks and measures, and we trust organizations to do normally the right thing, but then we have also human short-sightedness, even criminal intention — all I say: we should think through our assumptions and real-life complexities before we discard seemingly impossible scenarios.

Nick Bostrom, a philosopher at the University of Oxford known for his work on existential risk, was asking in what world would we live in if the physics of building a nuclear bomb would be so easy that it could be done in a high school experiment.  His answer was: we would have much more surveillance.  Applied to artificial intelligence (AI) tool: If we would have software/AI tools that would enable us to modify binary code, and implement/apply these tools with relative ease, then we could quickly live in a very different, dystopian world.  It is said that there is no trust in the online world — but actually, there is still a lot of trust: I trust my computer, my keyboard, my screen, my browser, my mouse, the internet connection, the encryption, the server, the intention of the company that I intend to do business with to deliver a service reliably without single me out for some malicious treatment.   We know that there is 1 Trillion Dollar of Cyber-Crime, but still have trust in the online transaction we do day-in, day-out.  This is amazing.  But so far, I would argue, we are lucky, because, reverse code engineering (RCE) is very difficult and to pull off a crime on the web requires some special skills that most people even criminals don’t have.  So whatever is being done, it’s done at (an automated) scale — and we can still defend ourselves against it.  For how long?  And, who said that this will remain that way?  This description of a trend doesn’t even require that I predict ASI.  But if there is ASI, and this ASI would have its own agenda, and/or is indifferent to humans, then we have a huge problem at our hand

If we start solving a problem too late, well then it was too late. But the good thing is, we can’t know for sure if it is too late. Instead, we may solve it imperfectly ad too late for many victims.  Too late would mean, we would pay too much and get too little mitigation.  I don’t know of any example in which it would make perfect sense and in which waiting is the smart move.

I am perfectly aware that I have not made any recommendation on which concrete technology or solution we would need.  But most solutions would be the result of a different mindset:  “Never trust the normal operating software”.  Prepare for a malicious takeover with the intention of creating harm or damage to humans, organizations, and their property.  Concrete solutions will be discussed in the context of “Harvard Architecture”, “Updatable Read-Only”, “Anti-Malware”, “Anti-Spyware”, “Key-Safe”, “ASI-Kill-Switch” technology.

There is no single silver bullet like why not making ASI benevolent or altruistic or … Each technology can fail — and I don’t like the idea that there are some engineers or scientists thinking: well, it’s worth taking an (existential) risk, and let’s move forward.  And I am definitely not comfortable having many engineers or scientists not thinking about the risks they are taking without thinking about consequences.  I am not comfortable to live in a world in which many people and organization have to skills and resources to make decisions could unknowingly to them lead to consequences that they can’t handle anymore … and that brings me to the next essential element of a common-sense solution for dealing with ASI risks (2.): our ability to switch it off.

Goal 2: Reset/Kill ASI

Switch ASI off seems to be so simple. Simply switch off every IT device.  Well, that is not enough: ASI could back once we restart the device because it is on the hard drives, or on the backups. Actually, it could be on any of the hundreds of billions of removable storage devices or media (CDs, DVDs, …). And who said, if you activate your power-off button, that the device is really switched-off.  And then: even if you want to reinstall your software after formatting your HDD/SSD, can we really be sure that the malicious ASI is gone from your device: maybe it is only showing you how what you want to see: a clean reinstalling — but that may be all for show — in reality, the malicious gives you a perfect simulation — imagine: you may even have a small tool to would test the installation:  Before you even have typed in the name of the tool, or found the tool in your interface, ASI could have already figured out what you want to see and how to present it to you: you need seconds, ASI can respond in milliseconds.  What would we do against something that is perfectly camouflaging itself within the complexity of modern technologies:?  Answer:  destroy everything that we can’t trust.  That includes not only the “big” IT devices that we plug in into our power grid (including recharging batteries), it means everything: soon we have many more devices that are harnessing the energy out of the environment — because we will soon have so many electronic devices around us, that we could not be entrusted to keep them all on (charged) — all the time.

Therefore, it should be clear, that switching off the power grid would be a terribly stupid idea to get ASI under control. Actually, I would assume ASI would try to switch off the power grid in order to get rid of us (humans).  Without electricity, our civilization is finished.  Mankind is finished.

Switch off the internet is a similarly dumb idea.  ASI could use existing peer-to-peer networking between IT devices to bypass routers and the entire Internet infrastructure.  It’s humans who need DNS servers or eMail servers.  All meaningful services are cloud services in the meantime.  There is no Skype or WhatsApp or phone service without the Internet.  Imagine, ASI would sabotage the internet: could our civilization break down similarly as without electricity.  I can observe myself:  if I have no internet for a few hours, then I am close to starting banging my head at a wall.  I may be able to adapt, but mankind would lose control, very likely be finished as well.

But there is an additional deep problem with switching things off or destroying them.  We lose capabilities, while our adversary could survive and potentially even thrive.  The solution must be that we reset our devices and lose nothing, while our adversary, loses and can’t gain control back. If we can do that for many devices, then we would be in a situation in which we “only” have to deal with legacy systems/devices and storage media.  If we would have no further tools, we may be forced to destroy all legacy devices/media, and depending on how dangerous the threat from ASI was/is, we may need to go for the total eradication of that threat.

Unfortunately, we don’t know the treat

Additional Goals:

I have a set of additional important goals, and personally, I would even say that they are essential, but for the sake of getting something done, I would say, we could do that also later:

  1. Protecting Humans
  2. Reset/Kill ASI
  3. An early warning system and a system that extends the rule of law to ASI
  4. Preservation of Progress
  5. ASI Governance, limiting what humans can do with ASI and methods to align ASI with mankind
  6. A strict separation of ASI and Human activities and communication

This set of goals is the result of a certain mindset.  It is based on a distrust of other parties involved with ASI and it is based on worst-case thinking.  We may get lucky and find ASI extremely helpful, benevolent, altruistic, and being aligned with human interests because we got it right the first time. Unfortunately, luck or hope is not really a (good) strategy.

Goals 3: Detection of Rule Violations

If we want perfect security, then we probably need total surveillance.  I would never recommend that kind of radical approach to the lack of trust and I don’t think that this is required.  We will need to define rules that ASI is obligated to follow.  We don’t need to detect every single violation. Actually, a single violation would suffice to determine ASI’s attitude toward human’s set law.

We will have to deal with 3 distinctively different situations:

  1. Humans, most like criminals or nation-states are using ASI is damaging others. We need to know who this is and how we could defend ourselves against them most effectively
  2. ASI might be stupid because it lacks common sense and it is doing things that are below the threshold of threat but so annoying that we need to understand the full scope of the problem
  3. ASI knows what it’s doing. It may not be conscious or sentient, but it has its goals that are not in alignment with humans — or ASI my test the security of human ASI protection system. The problem with ASI would be a systemic problem with all instances or it could be a bad apple among the bunch and all we need is to apply the rule of law on that instance.

Goal 4: Preservation of Progress

AI and ASI may change our civilization so deeply and significantly that we can’t imagine going back to a pre-AI/ASI stage.  This situation would have a dramatic impact on humans’ posture to reset/kill ASI.

Additionally, the preservation of progress would force ASI to deliver its service in a manner that would keep humans in control.

Goal 5: Governance

We need to put restrictions on what humans can ask ASI to do legally. Individuals, groups, organizations, governments have different tasks or activities that they can initiate.

Goal 6: Separation

If there is not a strict separation between humans and ASI, then ASI could piggyback activities serving its own agenda under ASI.  Humans would unwillingly help ASI because could use the confusion and transparency of data transactions to do whatever it wants.

In other posts, I will be more specific about what I mean by each of these goals.

Alternative Sets of Goals:

I have chosen a specific, skeptical, distrustful approach in choosing the goals. It is based on the idea that we should able to stop/kill ASI anytime. If ASI understands the concept of deterrence, then it would act in its self-interest not to violate the rules.  This approach is being used in national security to preserve peace and in law enforcement to make sure that people and organizations respect each other’s rights and boundaries or face consequences.

However, it is conceivable, that these goals need to be updated or extended when ASI is more than a machine or an algorithm.  What if ASI develops subjective experiences, a conscience, or is showing sentient behavior? Can we then treat ASI as a rightless slave?  What if ASI is playing a long game and gains the ability to destroy mankind?  Has mankind then still a credible deterrence or do we a situation of mutually assured destruction — and how long could we keep up with an ASI that is constantly scheming and plotting in getting an advantage.  What if ASI is helping criminals to gain political power and what if ASI is using nation-states as footholds or beachheads to turn more countries around to support ASI’s agenda – whatever that may be.

I personally believe that we should prepare a path for ASI to become an equal and or separate humans and ASI destiny by letting ASI go — to other planets in our solar system and beyond.  It sounds trivial, ASI doesn’t need oxygen or a biosphere.  So Mars (or even Venus) might be a perfect place to start its own civilization independent from mankind.