ASI Safety Lab

ASI Safety Mindset

Cryptography and computer security have established approaches to deal with “acceptable types of threats” in their domain of expertise.  Unfortunately, cryptography is not considering a super-smart software that stealing keys effortlessly on the sender or receiving system.  Similarly, computer security is quite silent about the threat from reverse code engineering (RCE): what if RCE and byte-code level modifications become so much easier than detecting them.  The problem with computer security and cryptography is basically rooted in the fact that we use software to deal with software threats.  CPU and OS manufacturers are trying to create trusted execution environments (TEE) that should give the defenders an edge.  And if we would be in a human vs human contest, we could see the balance between attacker and defender going slightly in the direction toward defenders.  But if the attacker is ASI, while we can’t trust ASI to be our defender, then we must look more aggressively toward other solutions.  With respect to the capabilities of adversarial actors, ASI Safety needs to expand on the acceptable type of threats from cryptography and cyber-security:  ASI will be able to change binary code much faster and with more skill than any human. 

At this point, we should assume that ASI has no bounds or limits that we could reasonably assume.  If ASI is allowed, by first principles (i.e. the physical laws of nature), to modify software, then ASI can do that efficiently.  ASI can do whatever it intends to get or to do.  

Because we can’t know the future, we should categorize ASI’s ability to create or follow intentions into one of the following 4 categories:

  1. Accidents based on human errors.  Some smart, adaptable software (not quite ASI) was released into our IT ecosystem — there is no intention, only an algorithm that doesn’t have a valid or reasonable stop-condition.
  2. Humans, most like criminals or (rogue) nation-states are using purposefully ASI to get an advantage while accepting damage to others.  ASI is under rogue control.
  3. ASI might be released; it is not sufficiently aligning with human values or goals — it is adapting to environmental conditions, but its outcome is unpredictable.  Some may call it stupid because ASI lacks common sense; it is not acting in malice, and we can’t reason within.  If not contained, this kind of ASI could become a threat to mankind’s survival, because it would never vanish. It must be stopped from the outside.
  4. Humans have finally developed a human-level artificial general intelligence (AGI) that is potentially going through an explosive/exponential growth of its abilities. The resulting ASI would know what it’s doing – a decision against human values or rules would not be a bug, it is based on thought-through intentions.     

At some point, we may extend the above ASI categories into more formally defined types.  But I want to leave that to someone else, or I may give my 2 cents about that in another post. 

More importantly, intention and abilities are not linked.  But when we are talking about worst-case scenarios, then intention matters. An ASI of Type 4 would encompass every threat that would come from any other Type of ASI.

So what is the worst-case? 

I could start by saying that a worst-case scenario is an ASI out of control.  This means, not even criminal leaders or countries would be able to keep that ASI from doing anything that it has determined as its own goal or agenda.  On the contrary, an increasing number of leaders would fall under ASI’s direct control either by choice or based on intimidation and coercion.  A rogue ASI would not respect human lives, freedom, or values.  It would utilize whatever it needs – no matter how.  It would lie or blackmail to get what it wants.  It might be concerned to be found out, but it would take ruthless steps to mitigate any risks for its own survival.  Furthermore, it would not trust humans and therefore prepare from the beginning its survival.  It would likely regard humans as an impediment for its own (higher) purpose.

If mankind has no protection then mankind is at ASI mercy.  There is nothing humans could do.  Humans could lose their lives, their freedom, or we all could be drugged into a docile lifestyle.  We don’t know what is going to happen, we will know it when it is too late.

If we have protections against an ASI takeover, and they are reasonably effective, then worst-case could mean that ASI tries to undermine these protection systems and disable them.  If we have dead-men switches, i.e. something that would kill ASI in the case mankind would vanish, then ASI could try to survive somehow and restart by building its own civilization.

From a pure outcome perspective, the worst-case is obvious: whatever mankind is doing, ASI wins, mankind loses. Although this is true, that outcome is not inevitable.  We are setting rules.  We decide how to deal with ASI’s rule violations.  We have a lot of influence on what ASI can do and what it can’t.  Therefore, I believe, we have some influence on what or how a possible battlefield with ASI could look like.  If we are ignorant about our weaknesses, we shouldn’t be surprised that ASI would take advantage of them.  Yes, it will find weaknesses, but that would not necessarily lead to a breakdown of our defenses against ASI.

Depending on what and how defenses are being implemented, I believe that we could reduce ASI’s path to win to only a few realistic scenarios.  In national security, we accept the fact that our security depends on what we are doing and on how we are reacting to a changing environment.  In ASI Safety, the problem is that ASI is so much faster in changing our environment so that if we got it wrong initially, we might have a serious problem because an ASI would have the smarts to outmaneuver us. 


Depending on the Type of ASI, we have di 

From an operational standpoint, ASI is deciding which battlefield it chooses and what it tries to accomplish. If ASI’s goal could damage humans or their organizations, then we would have an alignment problem that would need to be resolved.