Machine Consciousness and Human Enfeeblement
New research and upcoming talks
This week, I’d like to share a sketch of some of my recent research on two timely topics: the possibility of machine consciousness, and the deleterious existential consequences of over-reliance on AI. The latter is what Berkeley’s Stuart Russell, in his book Human Compatible, has labeled “the problem of human enfeeblement.”
In April I’m giving a talk at the Spring Symposium of the Association of the Advancement of Artificial Intelligence (AAAI). I’ll be a part of the symposium on Machine Consciousness: Integrating Theory, Technology and Philosophy, which is being hosted the California Institute for Machine Consciousness (CIMC).
In my presentation, “Toward Criteria for Artificial Self-Consciousness: Unity, Agency, and Normativity,” I sketch the picture of consciousness that we find in the classical phenomenological tradition (Husserl, Heidegger, Sartre, Merleau-Ponty). In particular, I emphasize the distinction this tradition makes between pre-reflective experiential consciousness (often called “phenomenal” consciousness in contemporary discussions) and reflective self-consciousness.
Pre-reflective experiential consciousness is the basic being awake or “there” in your experience. It is what does away under general anesthesia and dreamless sleep. When I perceive, feel, or think, the experience has some felt (or “phenomenal”) character and it is given as my experience. This does not require reflection or deliberation.
Reflective self-consciousness is something stronger. A reflectively self-conscious subject doesn’t just process information or get cast adrift in a sea of associations or pattern matchings: it can take a stand on what is presented in the experience or stream of information; it can active make-up or change its own mind about what is so and what to do.
Often in my philosophical work, I’m focused on exploring the pre-reflective dimensions of experience, how we can get around in the world and have an awareness of possibilities within it without having to stop and think and deliberate. The philosophical tradition has paid scant attention to this dimension of experience. But in this context, I found myself wearing the hat of the defender and explainer of the reflective dimensions of consciousness. I wanted to make a contribution to the debates about possible machine consciousness by doing my best to explain what reflective self-consciousness does for us and how it distinguishes us from whatever is going on in language models like ChatGPT.
The program committee for the upcoming conference offered me some very helpful and challenging advice on how to sculpt my philosophical arguments for an audience of computer scientists and cognitive scientists interested in the design implications of the philosophical distinctions. It was the first time I have ever attempted to write anything like this.
Towards Criteria for Artificial Self-Consciousness
Here is the introductory section of my paper.
Asking which theories of consciousness apply to artificial systems requires asking what kinds of awareness the theories are supposed to be modeling. Contemporary discussions of consciousness in AI frequently elide an important distinction: that between pre-reflective self-awareness, often called experiential or phenomenal consciousness, and reflective self-consciousness. The latter involves an experiencing subject’s agentic capacity to make up its own mind regarding what to think and do; to assume a unified normative standpoint from which one forms, evaluates, and revises one’s own beliefs and intentions. This finer-grained picture of human consciousness helps clarify the moral stakes of attempts to engineer artificial consciousness; it also intensifies the question of whether our inherited moral categories are adequate to the novel kinds of minds that may one day emerge (Schwitzgebel 2023).
Much contemporary research on consciousness in AI is focused on phenomenal consciousness, the “something it is like” character of an experience (Nagel 1974, Block 1995). For example, a notable recent survey (Butlin et al 2023), explicitly focuses only on phenomenal consciousness. The authors argue that phenomenal consciousness (when combined with the admittedly and hotly contested assumption of computational functionalism), is tractable by a range of contemporary neuroscientific theories that also have application to artificial intelligence research (e.g., recurrent processing, global workspace, and higher-order theories).
Notwithstanding the prima facie plausibility of this methodological fastidiousness, research into AI consciousness will be compromised unless it goes beyond a narrow focus on phenomenal consciousness. Any comprehensive attempt to map the space of conscious awareness must also attend to reflective self-consciousness. This experiential capacity lies at the center of our moral agency and practices of mutual accountability. Recognizing this distinction is therefore essential for thinking clearly about the moral dilemmas connected with the possibility of artificial consciousness.
Schwitzgebel and Garza (2015, 2020), for example, argue that developers face a “design dilemma”: either construct systems that clearly fall short of moral status, or deliberately design systems capable of personhood and treat them accordingly from the outset. Other approaches emphasize precaution. Sebo (2023) and others in the discussion about AI welfare (e.g., Metzinger 2021; Long et al. 2024), argue that if there is a non-negligible probability that artificial systems could possess morally relevant forms of experiential consciousness, developers should act under conditions of ethical caution. The distinction developed in this paper complements these proposals by clarifying the form of self-involvement that might underlie these different forms of moral standing. Distinguishing pre-reflective experiential consciousness from reflective self-consciousness helps locate more precisely where precautionary obligations may arise and where the stronger demands associated with moral agency would begin to apply. At the same time, the distinctions introduced here might also help to intensify the impression that new forms of artificial consciousness will fundamentally destabilize the intuitions and categories that have guided much moral theorizing and deliberation over the last two thousand years

Beyond Enfeeblement: Care as the Foundation of Human-Compatible AI
I have also been busy thinking about what Stuart Russell calls “the problem of human enfeeblement.” I just wrote up the following abstract of my recent research on the topic, which I have submitted to an upcoming conference.
This talk takes up Stuart Russell’s diagnosis of “human enfeeblement” as one of the deepest long-term risks posed by advanced AI systems. Russell warns that as AI systems take over more and more of what humans do, we risk losing the accumulated know-how of civilization, “becoming passengers on a cruise ship that runs itself.” Crucially, he insists that the solution is cultural rather than technical, and that machines might play a role in shaping whatever cultural response we find. This presentation takes up that underdeveloped thread in Russell’s argument and pursues it in dialogue with emerging research by Alison Gopnik and Brian Christian on care-giving and AI. Broadening their focus from care-giving as a specialized activity to caring as a general human capacity, I argue that the cultural movement Russell calls for must be grounded in a richer understanding of what enfeeblement actually threatens.
The danger extends beyond the loss of stored knowledge to the erosion of our capacities to care and to take care: to tend and attend to what matters, to be drawn into the world by what solicits our concern, to coordinate our commitments with others in shared action—a way of being human that is embodied, emotionally involved, and oriented toward what is worth doing. Gopnik’s and Christian’s work on care-taking also points toward how machines themselves might be designed to support rather than supplant these capacities, which is precisely the role Russell envisions for AI in his cultural solution, but does not specify.
This presentation develops that diagnosis into a concrete cultural vision. Drawing on Gopnik, Christian, and the phenomenological tradition that has long insisted on the primacy of care in human life, I argue that what is most endangered by AI enfeeblement is the full constellation of skills through which human beings “give a damn”: receptivity to what matters in our world today, the gumption and courage to undertake risky commitments, the practice of tending to concerns in common, and the embodied know-how that underlies all of these. Russell’s sketch of a remedy—something like a modern Spartan ethos of autonomy and agency—is a start, but remains too thin and too willful.
This talk presents an understanding of care as a set of skills, cultivated in communities of practice, and a design orientation that prioritizes supporting human capacities over the mere preservation of human knowledge. These skills represent the essential preconditions for human value alignment: if we lose the capacity to care, the very values we aim to align AI with will have evaporated. This turns the enfeeblement problem into a prerequisite for the entire safety project, focusing our attention on the underlying human capacities from which all genuine knowledge, and all genuine meaning, flows.




It's becoming clear that with all the brain and consciousness theories out there, the proof will be in the pudding. By this I mean, can any particular theory be used to create a human adult level conscious machine. My bet is on the late Gerald Edelman's Extended Theory of Neuronal Group Selection. The lead group in robotics based on this theory is the Neurorobotics Lab at UC at Irvine. Dr. Edelman distinguished between primary consciousness, which came first in evolution, and that humans share with other conscious animals, and higher order consciousness, which came to only humans with the acquisition of language. A machine with only primary consciousness will probably have to come first.
What I find special about the TNGS is the Darwin series of automata created at the Neurosciences Institute by Dr. Edelman and his colleagues in the 1990's and 2000's. These machines perform in the real world, not in a restricted simulated world, and display convincing physical behavior indicative of higher psychological functions necessary for consciousness, such as perceptual categorization, memory, and learning. They are based on realistic models of the parts of the biological brain that the theory claims subserve these functions. The extended TNGS allows for the emergence of consciousness based only on further evolutionary development of the brain areas responsible for these functions, in a parsimonious way. No other research I've encountered is anywhere near as convincing.
I post because on almost every video and article about the brain and consciousness that I encounter, the attitude seems to be that we still know next to nothing about how the brain and consciousness work; that there's lots of data but no unifying theory. I believe the extended TNGS is that theory. My motivation is to keep that theory in front of the public. And obviously, I consider it the route to a truly conscious machine, primary and higher-order.
My advice to people who want to create a conscious machine is to seriously ground themselves in the extended TNGS and the Darwin automata first, and proceed from there, by applying to Jeff Krichmar's lab at UC Irvine, possibly. Dr. Edelman's roadmap to a conscious machine is at https://arxiv.org/abs/2105.10461, and here is a video of Jeff Krichmar talking about some of the Darwin automata, https://www.youtube.com/watch?v=J7Uh9phc1Ow