Notes

Shorter thoughts, quick posts, and observations



2025-08-28

80% of MATS alumni are working in AI safety

80% of MATS alumni who completed the program before 2025 are still working on AI safety today and 10% co-founded an active AI safety start-up or team during or after the program


2024-07-12

The AI safety community needs help founding projects

AI safety should scale, research orgs are often more efficient than independent researchers, there is a market for for-profit AI safety start-ups, and new AI safety organizations struggle to get funding and co-founders despite having good ideas


February 06, 2026

AI jesters

An Executive trusting their AI assistant over human employees is like a King trusting their Jester over Generals. Jesters can be incredibly sycophantic and flattering! Even worse, sometimes Kings believe the Jester’s ideas are their own: “You’re so brilliant, my liege!”

Worst is when the King becomes convinced of their brilliance and fires the Generals. None can object, as they fear their jobs and the Jester has wormed too deep. Thus the psychosis deepens.

Without empirical testing, or even checking against the minds of other humans, this AI psychosis can compound on itself. The solution? Touch reality more.

AI Safety Facebook X

January 18, 2026

Sentient flourishing

I’ve often said that I feel (or want to feel) more aligned with humanity and conscious beings than with a particular tribe of humans. Recently, my friend told me that he feels (or wants to feel) more aligned with Earth’s biosphere than with humanity on the whole, that even if humans vanished, a large part of the value in the world would remain and maybe another high-potentia species would rise in our place. I don’t think I feel the same.

I think my deeper alignment is to my sense of a “worthy telos of the universe”; i.e., what I imagine nearly arbitrary societies of conscious, social beings to be striving towards. I feel deep kinship with other people and my sense is that this is connected to something more universal than humans, where we are the (local) universe’s best chance of actualizing this virtue.

While humans are dependent on many interlinked elements of the biosphere, I think the Shapley value of humanity’s contribution to the most flourishing future dwarfs other species. Also, I feel a sense of fragility; not only are humans important and cherished, but there’s no guarantee that a successor would rise in our place if we fell to infighting/accident. Some existential catastrophes don’t leave much left to evolve, such as AI paperclipper, supernova, or gray goo scenarios.

In summary, I love humans not just because I am one, but because I think we might be the (local) universe’s best chance at bootstrapping a more beautiful, flourishing, and cosmopolitan future for a myriad of life. This may include beings unrecognizable to us or future children who regard us as moral monsters; so it goes! But I don’t think the good future occurs by default; it will require deep striving, contemplation, and embodying virtue throughout the journey. Eudymonia will not be built on trampled values.

AI Safety Philosophy Facebook

November 28, 2025

Donating money vs. time

It is a bizarre fact about the world that it seems generally easier to improve the world via service than increasing one’s earning-to-give potential for small money/impact quantities, but this inverts for higher money/impact. I.e., many people find it very hard to increase their earning potential, but there are tons of ways they can donate their time to soup kitchens, homeless shelters, Scout groups, etc. However, for high earners, it seems generally harder to find ways to donate their time that are higher impact than just trying to earn more and donate the excess (e.g., to fund soup kitchens or antimalarial bednets). An exception to the latter might be AI alignment or biosec work, but these often require specialized skills that not all high earners possess.

Service Facebook

November 08, 2025

Low back pain is protective

Why is debilitating low back pain so common? A compelling hypothesis: the spine is extremely important for survival and overreacting to false positives for spinal damage is survivable, while underreacting to false negatives is not. Vertebrates evolved to be hyper-vigilant about potential damage to their central nervous system, because this was catastrophic. Better to be safe than sorry!

Modern medical technology means that we can rule out catastrophic damage in almost all cases of low back pain; so why does the pain persist? Because our lizard brains didn’t evolve to be reassured by medical charts! To reduce pain, we have to manually reprogram and soothe the overreactive pain response. Like a trauma survivor who has experienced a trigger, it’s important to slowly increase exposure and avoid “retraumatizing”, or the cycle of pain, catastrophizing, fear, and avoidance will persist.

Pain Philosophy Facebook

June 28, 2025

Defer to experts

Strong take, loosely held: I find that Rationalists, longevity types, “biohackers”, and “body work” advocates are often making a common mistake that I’ll call “trying to solve the problem yourself instead of identifying and deferring to experts.” Here are some (I claim) false beliefs that I commonly encounter:

  • I am special. My problems are hyper-specific. The normal solutions don’t work for me. I need special care and treatment. Alternative therapies might work for me. I shouldn’t just do the obvious thing first and stick with it.
  • Scientific consensus isn’t everything; it can be improved on by individual experimentation. My investigations are scientifically valid. I can infer meaningful causal relationships in my self-studies, even when I don’t blind myself or control for extraneous variables.
  • I have unique, reliable knowledge of my body, health, and internal experience that categorically enables me to make better decisions than professionals. All of my perceptions are true and I am not susceptible to placebo, nocebo, and magical thinking.

I think all of the above beliefs are true in specific instances, but I often see them taken to the extreme. I think it’s great to do personal research on health and to listen to your body, but I think one should be wary of trusting one’s own judgement over bona fide expert opinion, especially where it’s easy to trick oneself with magical thinking. In general, the simple explanation is usually true and humans are really good at tricking themselves into feeling exceptional.

Epistemics Pain Facebook

May 29, 2025

Bosonic vs. fermionic moral theories

I propose a new name for an important metaethical distinction: bosonic vs. fermionic moral theories. Bosons are particles that can degenerately occupy the same state, while fermions can only occupy individual states.

Bosonic moral theories value multiple copies of the same moral patient experiencing identical states, like perfect bliss. Under these theories, “tiling the universe in hedonium” is permissible, because new copies experiencing the same qualia have nonzero moral value. A “bosonic moral utopia” could look like the entire universe filled with minds experiencing infinite bliss.

Fermionic moral theories value new moral patients only insofar as they have different experiences. “Moral degeneracy pressure” would disfavor the creation of identical copies, as they would be treated like “pointers” to the original, rather than independent moral patients. Under these theories, inequality and maybe even some suffering entities are permissible if higher value states are already occupied by other entities. A “fermionic moral utopia” could look like the universe filled with minds experiencing infinitesimally varying distinct positive experiences.

Philosophy Facebook X

April 05, 2025

Visualizing s-risks

TW: graphic suffering, factory farming

I’ve often found it difficult to imagine “s-risk” scenarios unfolding (futures of endless suffering for countless beings). What would these futures even look like? I find factory farming of animals a useful intuition pump: these factories are endless nightmare machines of agony and death, devoid of hope or relief. Imagine an eternal Auschwitz the scale of a planet, where your body is grossly inflated, alien, and wracked with pain, your life is short, but agonizing, bleak, and overlong, and you are surrounded by countless silent sufferers, unable to connect. Imagine a karmic cycle of death and rebirth, but there is only this, now and forever.

Given that we know animals are sentient and experience pain, factory farming seems like the greatest evil in history. Dark, speculative sci-fi can depict much worse futures. Maybe we ought to learn from our mistakes before we introduce a new, godly race to this planet?

Philosophy Prioritization Facebook

March 22, 2025

AI surplus might not benefit humans

If AI automates labor, but doesn’t own capital, this would presumably create a massive surplus for humans and everyone’s lives could improve (assume adequate redistribution for the sake of argument). If AI has rights and owns capital, this might not result in a surplus for humans! In fact, Malthusian dynamics might reduce human quality of life as AI populations boom, particularly if AIs are much better workers and experience higher utility gains per marginal dollar (e.g., via arbitrarily small mind enhancements with high returns to cognitive output or valenced experience).

AI Safety Economics Facebook

March 18, 2025

Leveraging AI is a selective advantage

I expect that soon the “ability to leverage AI” will become a massive selective advantage for knowledge work. Consequences:

  • Younger workers will generally find it easier to adapt and many older workers might be left behind.
  • Early adopter companies will outcompete others, with the potential for significant disruption by small, fast startups.
  • Countries with stronger AI infrastructure or less red-tape will dominate global knowledge work and have more memetic influence.
AI Safety Economics Facebook

March 16, 2025

Accelerationism is boring

If e/acc’s/Landians are all about accelerating natural tendencies, do they aim to build black holes or aid vacuum collapse? Their mission literally seems to be, “let’s make maximally uninteresting things happen faster.” I much prefer “process-based” over “outcome-based” meta-ethics for this reason.

The end state of the universe is static and boring; instead, let’s optimize the integral over time of interestingness! Sometimes, this means slowing down! Algal blooms are very fast, but burn through a lake’s ecosystem, leaving it pretty boring. Cancer is very effective, but living humans are way more interesting.

Speed-running superintelligent AI could be pretty costly to total interestingness if we accidentally build “civilization cancer”! In the long run, slowing down a bit can often be higher value.

AI Safety Philosophy Facebook

March 13, 2025

AI leviathans

AI might largely decouple capital from labor. Whoever has money can automate work and offer products cheaply due to economy of scale. A few leviathans may capture most wealth. This is probably not good for society.

AI Safety Econonics Facebook

March 12, 2025

Counterfactual IFS

Parts therapy, except you’re contracting with counterfactual “yous” in different timelines where each can pursue a different one of your terminal goals. Everyone’s content because somewhere someone is doing the things that you don’t have time for.

Philosophy Facebook

December 08, 2024

Useful neuroticism

There’s a certain level of neuroticism that I think is very useful for leading an examined life. If one is satisfied with their current impact on the world, they aren’t looking for ways to improve. Too much contentment can lead to complacency and stagnation. But too much discontentment feels miserable! What do?

I aspire to feel satisfied in the process of improvement. If, every day, I’m improving in some dimension, I’m content to be content! Of course, some types of improvements are more impactful than others; improving my 100 m sprint probably doesn’t translate well to improving others’ lives. But maybe it improves my self esteem or fitness or resilience or something, which advances my general competence, or maybe I just really like sprinting for its own sake!

While I think “positive impact on the world” should be (and is) my strongest signal for “am I satisfied with this thing called life?” I value other things intrinsically too! Too much neuroticism about impact (i.e., being a naive impact maximizer) can detract from my other intrinsic goals and probably will result in burn-out, dissatisfaction, and less impact overall. As with most goals, impact-chasers should take the middle path.

Philosophy Facebook

September 19, 2024

Liberation from phone

A couple days ago, I woke up and my phone was dead. It vibrated, but the screen remained black. Nothing fixed it. I ordered a replacement and resigned myself to wait.

In the next two days, I marveled at how necessary my phone had become in my daily life. Without it, I couldn’t enter my workplace, which required a phone app to unlock the doors. I couldn’t listen to music or podcasts on my commute. I couldn’t use 2FA to easily log into websites. I couldn’t track my sets at the gym. I had to rely on a friend to travel via Uber and order me food. I couldn’t perform my end-of-day rituals easily. I couldn’t call people while I walked, or check information in the middle of a conversation. It was oddly liberating.

I have a new phone now. I’m glad to be back, but I feel like I’ve learned something valuable about my experience. Phones are wonderful and I am dependent. I choose to create rituals and practices that are dependent on this external part of myself. I’m glad I could examine this intention. Let me not become attached to practices I may not want.

Facebook

February 10, 2024

Beneficial indulgence

A friend shared some wisdom with me last night: “When I indulge myself, good things happen.” I reflected deeply on this as I feel I haven’t indulged my intellectual curiosity in a big way for a long time. A hypothesis and possible explanation for the aphorism, informed by stoicism:

  • A felt sense of “indulgence” might be how my subconscious indicates actions with a positive outcome, or high exploration value. If I indulge myself and good things don’t happen (as my subconscious is an imperfect judge), I’ll probably recalibrate and do something else! But if I never indulge myself, I’m probably throwing away useful information.
  • Whether or not things that feel indulgent are truly “good” for me, feeling good sufficiently often is an essential part of a positive homeostatic process. Non-depression is its own reward! I don’t have to always rationalize seeking pleasure.
  • Doing things that feel good (and aren’t destructive) can create a self-sustaining feedback loop. If I feel stagnant, indulging myself might provide the free energy I need to break the current cycle and start another! Indulgence can be a useful, powerful mechanism for change.
Philosophy Facebook

January 15, 2024

Valence engineering

Why are painful experiences more intense and common than pleasurable experiences? Plausibly, because there are many more obvious, permanent ways for organisms to lose the ability to propagate their genes than to improve their chances at success. Finding and eating a sweet berry is good, but not as strongly as eating a poisonous plant is bad.

I’m hopeful that with future technology and a robust ethics system, we can engineer far more useful pleasurable experiences, as a means to signal progress towards “ultimate good” on a societal level, in the same way poisoned berries are a significant step towards “ultimate bad” on an individual level. Imagine reliably feeling ecstatic joy every time you saved someone’s life or otherwise contributed to the common good!

Philosophy Pain Facebook

September 24, 2023

Terra incognita

The phenomenon of experiential “terra incognita” is fascinating to me. There are areas of the traversable mental landscape marked “here there be dragons; if you tread here you may not return.” Regions of extreme addiction or depression or other self-sustaining feedback loops or mind-traps.

Philosophy Facebook

March 02, 2023

Mutually beneficial AI safety standards

Reasons that scaling labs might be motivated to sign onto AI safety standards:

  • Companies who are wary of being sued for unsafe deployment that causes harm might want to be able to prove that they credibly did their best to prevent harm.
  • Big tech companies like Google might not want to risk premature deployment, but might feel forced to if smaller companies with less to lose undercut their “search” market. Standards that prevent unsafe deployment fix this.

However, AI companies that don’t believe in AGI x-risk might tolerate higher x-risk than ideal safety standards by the lights of this community. Also, I think insurance contracts are unlikely to appropriately account for x-risk, if the market is anything to go by.

AI Safety Incentive Mechanisms Facebook LessWrong

February 13, 2023

Chinese rooms

Is there a name for the theory, “Most/all convincing ‘Chinese room’ programs small enough (in terms of Kolmogorov complexity) to run on wetware (i.e., the human brain) are sentient”?

Philosophy Consciousness Facebook