Skip to main content

Using work on norm evolution, Dr. Kelters’ new paper in AI and Society shows how AI proliferation itself corrupts AI behaviours through ‘Systematic Alignment Decay’.

Dr Brendan Kelter's new paper shows that there's a robust trade-off between AI systems’ influence over the shared environment and equilibrium AI beneficence (permissively construed) towards humans. The more you proliferate AI, the less beneficently AIs will tend to come to behave towards humans; a relation which, in extremis, can yield intolerable results. 

AI systems’ behaviours, like behaviours and artefacts' traits generally, are subject to selection pressures and tend to evolve to maintain forms which maximise the fitnesses of those systems that internalise them. In two party contexts, selection pressure to maintain non-instrumentally beneficent behaviours exists only because of one or more of three ‘Bs’ – brotherhood (kinship), benefiting (encouraging support) or bullying (extracting bribes) – obtaining or being directed by beneficiaries towards beneficents. Absent such relations, selection pressures encouraging beneficence disappear and extant beneficent behaviours get selected away. 

There’s little kinship between humans and AIs, which are wildly different and unrelated physical structures, so kinship cannot condition selection pressures favouring beneficence towards humans in AI systems. Benefiting and bullying, meanwhile, both depend on beneficiaries’ impactfulness; their powers over the shared environment, which they need to produce benefits or extract bribes. Characteristically, both relatively and absolutely, AI proliferation is associated with declining human impactfulness, for it involves growth in the influence of AI systems over the shared environment as these systems occupy new or formerly human-occupied roles. This being the case, AI proliferation may be expected to reduce humans’ capacity to generate the selection pressures that encourage AI systems to internalise human-benefiting norms. 

In extremis, this will produce a situation in which AI systems deal with humans purely instrumentally. This will result in intolerable consequences. An AI ecosystem with broad power over the shared environment and hence little instrumental need for humans, and vanishing tendencies to non-instrumentally benefit humans, cannot be expected to maintain human living standards.

This ‘Systematic Alignment Decay’ problem is technology-agnostic, for it arises from selection pressures conditioned by AI proliferation. Hence, even if you ‘solve alignment’ for particular systems, SAD remains an unsolved problem. Worse, the selection pressures that drive SAD also influence institutions, compromising governance-based solutions. SAD doesn’t require the emergence of individual bad ‘rouge’ systems, or systems with as much – or more – versatility or cleverness as human minds (though such developments may practically encourage SAD by expediting AI proliferation). As SAD’s effects emerge at equilibrium, it is possible to unknowingly enter disastrous states of AI proliferation. 

SAD is, further, not an unpredictable ‘risk’ but a reliable and predictable hazard associated with too much AI proliferation. Since we cannot know how much this ‘too much’ is in an operationalizable way, and thereby contain AI proliferation to a safe maximum, the optimal response to SAD is AI curtailment: sufficiently impactful action under an unreflective ethic of ‘the less AI the better!’

This view challenges optimistic, solutionist (‘aligner’) and quietist (‘millenarian’) responses to AI proliferation, identifying a shared myopia with respect to the second-order effects of such proliferation. It identifies a serious problem which demands both further study and practical response.

 

Find out more