AI safety field-building in Australia should accelerate
AI safety field-building in Australia should accelerate. OpenAI and Anthropic opened Sydney offices, OpenAI started building a $4.6B datacenter in Sydney, and the country is a close US/UK ally.
Shorter thoughts, quick posts, and observations
AI safety field-building in Australia should accelerate. OpenAI and Anthropic opened Sydney offices, OpenAI started building a $4.6B datacenter in Sydney, and the country is a close US/UK ally.
Attendees at the top-four ai conferences have been growing at 1.26x/year on average.
80% of MATS alumni who completed the program before 2025 are still working on AI safety today and 10% co-founded an active AI safety start-up or team during or after the program
AI safety should scale, research orgs are often more efficient than independent researchers, there is a market for for-profit AI safety start-ups, and new AI safety organizations struggle to get funding and co-founders despite having good ideas
An Executive trusting their AI assistant over human employees is like a King trusting their Jester over Generals. Jesters can be incredibly sycophantic and flattering! Even worse, sometimes Kings believe the Jester’s ideas are their own: “You’re so brilliant, my liege!”
Worst is when the King becomes convinced of their brilliance and fires the Generals. None can object, as they fear their jobs and the Jester has wormed too deep. Thus the psychosis deepens.
Without empirical testing, or even checking against the minds of other humans, this AI psychosis can compound on itself. The solution? Touch reality more.
I’ve often said that I feel (or want to feel) more aligned with humanity and conscious beings than with a particular tribe of humans. Recently, my friend told me that he feels (or wants to feel) more aligned with Earth’s biosphere than with humanity on the whole, that even if humans vanished, a large part of the value in the world would remain and maybe another high-potentia species would rise in our place. I don’t think I feel the same.
I think my deeper alignment is to my sense of a “worthy telos of the universe”; i.e., what I imagine nearly arbitrary societies of conscious, social beings to be striving towards. I feel deep kinship with other people and my sense is that this is connected to something more universal than humans, where we are the (local) universe’s best chance of actualizing this virtue.
While humans are dependent on many interlinked elements of the biosphere, I think the Shapley value of humanity’s contribution to the most flourishing future dwarfs other species. Also, I feel a sense of fragility; not only are humans important and cherished, but there’s no guarantee that a successor would rise in our place if we fell to infighting/accident. Some existential catastrophes don’t leave much left to evolve, such as AI paperclipper, supernova, or gray goo scenarios.
In summary, I love humans not just because I am one, but because I think we might be the (local) universe’s best chance at bootstrapping a more beautiful, flourishing, and cosmopolitan future for a myriad of life. This may include beings unrecognizable to us or future children who regard us as moral monsters; so it goes! But I don’t think the good future occurs by default; it will require deep striving, contemplation, and embodying virtue throughout the journey. Eudymonia will not be built on trampled values.
It is a bizarre fact about the world that it seems generally easier to improve the world via service than increasing one’s earning-to-give potential for small money/impact quantities, but this inverts for higher money/impact. I.e., many people find it very hard to increase their earning potential, but there are tons of ways they can donate their time to soup kitchens, homeless shelters, Scout groups, etc. However, for high earners, it seems generally harder to find ways to donate their time that are higher impact than just trying to earn more and donate the excess (e.g., to fund soup kitchens or antimalarial bednets). An exception to the latter might be AI alignment or biosec work, but these often require specialized skills that not all high earners possess.
Why is debilitating low back pain so common? A compelling hypothesis: the spine is extremely important for survival and overreacting to false positives for spinal damage is survivable, while underreacting to false negatives is not. Vertebrates evolved to be hyper-vigilant about potential damage to their central nervous system, because this was catastrophic. Better to be safe than sorry!
Modern medical technology means that we can rule out catastrophic damage in almost all cases of low back pain; so why does the pain persist? Because our lizard brains didn’t evolve to be reassured by medical charts! To reduce pain, we have to manually reprogram and soothe the overreactive pain response. Like a trauma survivor who has experienced a trigger, it’s important to slowly increase exposure and avoid “retraumatizing”, or the cycle of pain, catastrophizing, fear, and avoidance will persist.
Strong take, loosely held: I find that Rationalists, longevity types, “biohackers”, and “body work” advocates are often making a common mistake that I’ll call “trying to solve the problem yourself instead of identifying and deferring to experts.” Here are some (I claim) false beliefs that I commonly encounter:
I think all of the above beliefs are true in specific instances, but I often see them taken to the extreme. I think it’s great to do personal research on health and to listen to your body, but I think one should be wary of trusting one’s own judgement over bona fide expert opinion, especially where it’s easy to trick oneself with magical thinking. In general, the simple explanation is usually true and humans are really good at tricking themselves into feeling exceptional.
I propose a new name for an important metaethical distinction: bosonic vs. fermionic moral theories. Bosons are particles that can degenerately occupy the same state, while fermions can only occupy individual states.
Bosonic moral theories value multiple copies of the same moral patient experiencing identical states, like perfect bliss. Under these theories, “tiling the universe in hedonium” is permissible, because new copies experiencing the same qualia have nonzero moral value. A “bosonic moral utopia” could look like the entire universe filled with minds experiencing infinite bliss.
Fermionic moral theories value new moral patients only insofar as they have different experiences. “Moral degeneracy pressure” would disfavor the creation of identical copies, as they would be treated like “pointers” to the original, rather than independent moral patients. Under these theories, inequality and maybe even some suffering entities are permissible if higher value states are already occupied by other entities. A “fermionic moral utopia” could look like the universe filled with minds experiencing infinitesimally varying distinct positive experiences.
TW: graphic suffering, factory farming
I’ve often found it difficult to imagine “s-risk” scenarios unfolding (futures of endless suffering for countless beings). What would these futures even look like? I find factory farming of animals a useful intuition pump: these factories are endless nightmare machines of agony and death, devoid of hope or relief. Imagine an eternal Auschwitz the scale of a planet, where your body is grossly inflated, alien, and wracked with pain, your life is short, but agonizing, bleak, and overlong, and you are surrounded by countless silent sufferers, unable to connect. Imagine a karmic cycle of death and rebirth, but there is only this, now and forever.
Given that we know animals are sentient and experience pain, factory farming seems like the greatest evil in history. Dark, speculative sci-fi can depict much worse futures. Maybe we ought to learn from our mistakes before we introduce a new, godly race to this planet?
If AI automates labor, but doesn’t own capital, this would presumably create a massive surplus for humans and everyone’s lives could improve (assume adequate redistribution for the sake of argument). If AI has rights and owns capital, this might not result in a surplus for humans! In fact, Malthusian dynamics might reduce human quality of life as AI populations boom, particularly if AIs are much better workers and experience higher utility gains per marginal dollar (e.g., via arbitrarily small mind enhancements with high returns to cognitive output or valenced experience).
I expect that soon the “ability to leverage AI” will become a massive selective advantage for knowledge work. Consequences:
If e/acc’s/Landians are all about accelerating natural tendencies, do they aim to build black holes or aid vacuum collapse? Their mission literally seems to be, “let’s make maximally uninteresting things happen faster.” I much prefer “process-based” over “outcome-based” meta-ethics for this reason.
The end state of the universe is static and boring; instead, let’s optimize the integral over time of interestingness! Sometimes, this means slowing down! Algal blooms are very fast, but burn through a lake’s ecosystem, leaving it pretty boring. Cancer is very effective, but living humans are way more interesting.
Speed-running superintelligent AI could be pretty costly to total interestingness if we accidentally build “civilization cancer”! In the long run, slowing down a bit can often be higher value.
AI might largely decouple capital from labor. Whoever has money can automate work and offer products cheaply due to economy of scale. A few leviathans may capture most wealth. This is probably not good for society.
Parts therapy, except you’re contracting with counterfactual “yous” in different timelines where each can pursue a different one of your terminal goals. Everyone’s content because somewhere someone is doing the things that you don’t have time for.
There’s a certain level of neuroticism that I think is very useful for leading an examined life. If one is satisfied with their current impact on the world, they aren’t looking for ways to improve. Too much contentment can lead to complacency and stagnation. But too much discontentment feels miserable! What do?
I aspire to feel satisfied in the process of improvement. If, every day, I’m improving in some dimension, I’m content to be content! Of course, some types of improvements are more impactful than others; improving my 100 m sprint probably doesn’t translate well to improving others’ lives. But maybe it improves my self esteem or fitness or resilience or something, which advances my general competence, or maybe I just really like sprinting for its own sake!
While I think “positive impact on the world” should be (and is) my strongest signal for “am I satisfied with this thing called life?” I value other things intrinsically too! Too much neuroticism about impact (i.e., being a naive impact maximizer) can detract from my other intrinsic goals and probably will result in burn-out, dissatisfaction, and less impact overall. As with most goals, impact-chasers should take the middle path.
A couple days ago, I woke up and my phone was dead. It vibrated, but the screen remained black. Nothing fixed it. I ordered a replacement and resigned myself to wait.
In the next two days, I marveled at how necessary my phone had become in my daily life. Without it, I couldn’t enter my workplace, which required a phone app to unlock the doors. I couldn’t listen to music or podcasts on my commute. I couldn’t use 2FA to easily log into websites. I couldn’t track my sets at the gym. I had to rely on a friend to travel via Uber and order me food. I couldn’t perform my end-of-day rituals easily. I couldn’t call people while I walked, or check information in the middle of a conversation. It was oddly liberating.
I have a new phone now. I’m glad to be back, but I feel like I’ve learned something valuable about my experience. Phones are wonderful and I am dependent. I choose to create rituals and practices that are dependent on this external part of myself. I’m glad I could examine this intention. Let me not become attached to practices I may not want.
A friend shared some wisdom with me last night: “When I indulge myself, good things happen.” I reflected deeply on this as I feel I haven’t indulged my intellectual curiosity in a big way for a long time. A hypothesis and possible explanation for the aphorism, informed by stoicism:
Why are painful experiences more intense and common than pleasurable experiences? Plausibly, because there are many more obvious, permanent ways for organisms to lose the ability to propagate their genes than to improve their chances at success. Finding and eating a sweet berry is good, but not as strongly as eating a poisonous plant is bad.
I’m hopeful that with future technology and a robust ethics system, we can engineer far more useful pleasurable experiences, as a means to signal progress towards “ultimate good” on a societal level, in the same way poisoned berries are a significant step towards “ultimate bad” on an individual level. Imagine reliably feeling ecstatic joy every time you saved someone’s life or otherwise contributed to the common good!
The phenomenon of experiential “terra incognita” is fascinating to me. There are areas of the traversable mental landscape marked “here there be dragons; if you tread here you may not return.” Regions of extreme addiction or depression or other self-sustaining feedback loops or mind-traps.
Reasons that scaling labs might be motivated to sign onto AI safety standards:
However, AI companies that don’t believe in AGI x-risk might tolerate higher x-risk than ideal safety standards by the lights of this community. Also, I think insurance contracts are unlikely to appropriately account for x-risk, if the market is anything to go by.
Is there a name for the theory, “Most/all convincing ‘Chinese room’ programs small enough (in terms of Kolmogorov complexity) to run on wetware (i.e., the human brain) are sentient”?