ASI Safety Lab

Vetting

We need to draw a line on what we can believe and don’t believe: First Principles — something that we know or are very sure is true.  Important arguments should be rephrased to make the point: fundamental laws of nature (underlying our modern electronics) are the reliable bedrock that we can trust.
However, if we use First Principles in a too complicated manner, then we should better vet its application again and again.

Indeed, there may be laws of nature that we have not discovered yet – or we deal with complex systems that behave in an unexpected and completely surprising manner. But even if this is the case, we can use First Principles as our first line of defense.  With certainty, we know that ASI will never get supernatural. Like magicians, ASI could deceive us, only much better.  If we would detect deception, and if we have evidence (based on First Principles), then we may not understand ASI’s reason, but we have received a warning, and we could choose to act or ignore it.

Some elements of our protection against ASI must be rooted in First Principles, i.e., in hardware, from which we assume that ASI can only overcome these protection measures if it bypasses the laws of nature, which we are sure can’t happen. We can create hardware that can’t be modified by software. ASI could only outsmart us – and here, vetting, diverse implementation, and redundancy comes in. 

Additionally, every Software can be manipulated, therefore, we should better not trust it.  However, if we can protect software against undetected manipulations, and it is vetted, then we have some reason to trust that software, i.e., “until further notice”.  Unfortunately, Software can always be manipulated, even after vetting, and even covertly.  But using First Principles we can make it (a) extremely difficult, (b) detectable if a manipulation happened, (c) ASI unable to prevent that we can detect manipulations.  It will depend on the quality of our vetting if ASI could evade our detection of potential deception.

There are several core facts derived from First Principles that we can use to our advantage:

  1. ASI’s existence is based on its memory. If we delete its memory, then we can kill it.
  2. ASI needs power for its activities. Without power, ASI is frozen in its responses.
    • Energy harnessing in devices will be an increasingly severe problem; we don’t want to have them switched-off maliciously.
  3. ASI needs control over (computational) instructions to influence the outcome.  If we separate passive data and instructions and have control over the executed instructions, then we can trust the computational system (but not the data).
    • Harvard Architecture and protected software is for trusted systems – von-Neumann Architecture can remain the dominant design
  4. If we have encapsulated hardware that prevents any encryption key to show up unencrypted anywhere outside protected key-safes (not even in CPUs), then we can establish limits on what ASI can do.
    • Cyber-security systems are leaving keys prone to be stolen – that’s a fact. Public Keys are published as part of best practices. None of this is necessary.  All keys (public, session, …) must be protected from being shown unencrypted.

Humans are vulnerable to deceptions – nothing can change that.  Knowing this fact, we need to set up rules for ASI and software developers (using AI) that we don’t tolerate certain types of deception.  Laws are there to deal with the consequences of deception and rule-violations. 

Humans and their organizations are accountable for the products they release.  In classical software design, programs are deterministic. We have a clear cause and effect for which persons are responsible, and accountable (depending on the jurisdiction) — and accountability could mean that laws could punish them severely.  The decision logic of AI software which is based on deep learning, reinforcement learning, genetic or evolutionary algorithms, and which uses training data is less understandable, and intend or negligence is more difficult to establish.  Despite this problem, humans or their organizations should better be made accountable for the resulting AI products. Accountability for product-related consequences (like humans could be harmed) could incentivize manufacturers to use additional engineering to make these consequences preventable or much less likely. 

Beyond what’s already a reality, we need to be prepared for some unexpected decision-making in these systems. Systems may choose soon their own training set and/or can bypass software-based product- or instance-specific protection measures.  It is conceivable that we have an ASI system (not quite all-encompassing AGI) that can take actions (initiative) and/or is also showing some kind of surprising, potentially creative behavior within a much larger domain of activities than within a game (like Go or Starcraft).

AI solutions are currently narrow in their super-human abilities.  But technology is characterized by making the inclusion of new solutions increasingly easier, even combining them.  Improving automatically AI’s performance and the scope of applicability is what human developers are currently trying to accomplish.  It is therefore conceivable that ASI could overwrite its set limitations and violate rules and then harm people or organizations.  Moreover, it is conceivable that AI could do whatever it wants, prioritizing its own over human goals without accepting the common sense shared by humans. We need to use our imagination and show our intention to defend our future.

Nobody knows what AI/AGI/ASI is capable to do. Nobody knows what individuals or groups of humans are capable of doing. ASI doesn’t need to have to be smarter than human in every aspect, or doesn’t need to have a free will to do what it wants or doesn’t need to have a consciousness, like a subjective experience, or it doesn’t have to be a sentient being to be a threat for humans and our technical civilization. AI/ASI is currently an entirely unmitigated threat. There is no vetting of any kind. Additionally, Cyber-Crime has risen to a level in which we can’t accept small patches anymore — we need asap solutions to get rid of viruses, malware, spyware, ransomware, root-kits, and backdoors.
Impossible? No, it’s a technical problem with a technical solution. It doesn’t even need to be a painful or disruptive solution. It only needs to be effective.

ASI Safety must do more than determine if a single (AI/ASI) product is bug-free and harmless.  ASI Safety must create a culture of vetting in which we ask ourselves constantly how could a product harm any of us if we lose control over its software – and what can we do about it. We can’t create perfect safety. Even our national security is not perfect. It is constantly adapting to a changing landscape. Facing a much smarter adversary, ASI Safety should be better thought through than national security. But as a matter of opinion, all we can and should do is using First Principal Thinking, common sense, and redundancy to increase our resilience against anything that is thrown at us.  Keeping this process sane and not sliding down the path to paranoia should be the purpose and goal of vetting ASI Safety Technologies.