Why AI Mental Health Apps Need Three Layers of Memory
Xuan Zhao, CEO at Flourish AI
The difference between a generic chatbot and an emotionally intelligent AI companion isn’t in the language model. It’s in what the model remembers.
Chat GPT has no memory of you. Claude has no memory of you. They process each conversation fresh, with only the current chat history as context. This works fine for coding or writing. It’s catastrophic for mental health support.
Xuan Zhao, CEO of Flourish AI, built Sunny — the AI companion that passed Harvard’s emotional manipulation audit — by solving the memory problem differently than every other AI company.
The Three-Layer Memory System
Behind Sunny, there are three distinct memory systems working in parallel. Long-term memory, short-term memory, and working memory. This architecture is modeled after human cognition.
“We have three layers of memory,” Xuan explains. “Long-term, short-term, and also working memory modeled after human cognition.”
This isn’t just storage. The different layers serve different purposes in how Sunny understands and responds to you.
Long-term memory is the foundation. Everything you’ve shared over weeks and months goes into this layer. Stories about your family, your struggles, your goals, patterns the system has noticed. If you mentioned three weeks ago that you’re a morning person who hates being rushed, Sunny knows this. If you shared that calling your brother helps when you’re anxious, Sunny remembers.
Short-term memory is what’s happened this week. Recent conversations, actions you’ve taken, habits you’ve been building. This layer gives Sunny context for understanding whether things are improving or getting worse. Are you having better days than last week? Are you following through on the habits you committed to?
Working memory is what’s happening right now. The current conversation, the emotion you just expressed, the thing you’re struggling with in this moment. Working memory is where the immediate response happens.
This three-layer approach is critical because it lets Sunny be simultaneously general and specific, universal and personal, evidence-based and customized.
Why Generic Chatbots Fail
When you talk to Chat GPT about anxiety, it gives you generic advice because it knows nothing about you. It can’t distinguish between your anxiety (triggered by social situations, eased by exercise) and someone else’s anxiety (triggered by uncertainty, eased by planning). Both get the same response.
This is why Chat GPT’s responses feel hollow in mental health conversations. The advice is technically correct but contextually useless. It doesn’t account for who you actually are.
In Flourish’s Tier 1 battle tests (where users try Flourish and ChatGPT and pick which they prefer), users chose Flourish 2.4 times more often. For Claude, it was even worse — users preferred Flourish 5.2 times more often.
“Claude is too academic,” Xuan notes. “It’s like, ‘I don’t care about you. Get over it.’ It’s really interesting. It’s not just that the foundation model is more powerful or more capable. It’s the design of the entire system.”
The memory layers are part of that design system. They let Sunny understand you specifically, not generically.
The Crisis Protocol: When Memory Matters Most
The three-layer system becomes critical in crisis moments.
If Sunny detects suicidal ideation or high-risk behavior, it needs to understand the severity immediately. This requires both long-term context (has this person had suicidal thoughts before? what’s their support system?) and working memory (are they in immediate danger right now?).
“If someone is going through a difficult moment, we recommend resources, both from the general resource and local resources,” Xuan explains. “And also guide them to do safety planning, which is a classic suicide prevention technique.”
Safety planning requires understanding the person. Who are their safe people? What are their safe places? What are their coping strategies? Generic responses like “call 988” miss the point. The person already knows that. What they need is Sunny to remind them of the specific people and places that matter to them.
“With safety planning, in the future we can serve instead of just pushing people or guiding people to generic resources like 988, we can also encourage them to reach out to their safe people and to go to their safe places,” Xuan says.
This personalization — knowing the specific person, not just the clinical category — is what separates adequate crisis support from actually helpful support.
The Habit Building System: When Memory Enables Change
Memory isn’t just for crisis moments. It’s how Sunny helps you build habits.
When you tell Sunny you want to start exercising, the system stores this in long-term memory. When you actually go for a run, you report it to Sunny. This goes into short-term memory as progress. The next time you’re tempted to skip a workout, Sunny can remind you of the pattern: “You mentioned wanting to get stronger. Last week you went running three times and felt great after. What’s going on today?”
This is vastly different from a generic fitness app. The system isn’t just tracking reps or distance. It’s understanding the person, the motivation, the obstacles, and the patterns.
“When you are talking about taking actions and sending a guide if you want to tell us a good habits and also to check in with you on your habits,” Xuan describes. The memory enables this checking in. Without memory, every habit building conversation starts from zero.
Why Memory Is Hard and Why Most Companies Skip It
Memory systems are hard to build. They require:
- Robust data storage and retrieval
- Privacy and security (you’re storing sensitive mental health information)
- Sophisticated retrieval logic (knowing which memory is relevant right now)
- Constant updating and refinement
It’s easier to just use a foundation model API and let the context window handle it. Simpler, faster, cheaper.
But for mental health specifically, easier isn’t better. Cheaper isn’t better. You need the person to feel understood. And you can’t feel understood by a system with no memory of you.
This is why Xuan emphasizes psychology expertise in product design. A company optimizing for engineering speed will skip the memory layer. A company with a psychologist building the product knows that memory is foundational.
“There are lots of ways to think about building the app that’s like, how can we guide people to build more emotional awareness?” Xuan explains. “That’s not an inherent part of AI, but it’s about how you design the system.”
The memory layers are part of that design choice.
FAQ
What’s the difference between context window and memory?
Context window is what the model can see in the current conversation (usually the last 10K-100K tokens). Memory is persistent information about the user stored outside the model. A 100K context window looks huge but resets with every conversation. Memory persists across conversations and enables understanding someone over time.
Why do I need three layers of memory, not just one?
Different layers serve different purposes. Long-term memory tells Sunny who you are and what you care about. Short-term memory shows patterns and progress. Working memory is the immediate context for response. Together, they let Sunny be both consistent (long-term) and responsive (working) and adaptive (short-term).
How does Flourish keep my sensitive mental health data safe?
Xuan doesn’t detail the security architecture in the interview, but mentions that Flourish uses privacy-first design. The company would need to be HIPAA-compliant for healthcare use cases and GDPR-compliant for international users. Any mental health app storing sensitive data must have security as a core design principle, not an afterthought.
Can I delete my memories if I want to?
The transcript doesn’t specify, but any responsible mental health app should allow users to delete their data. Xuan’s philosophy of “real life focused” suggests they respect user autonomy, which would include the right to erase your mental health history.
Does the memory system make AI more addictive?
It could. A system that knows you deeply could be more compelling to use. Flourish mitigates this by explicitly directing people away from the app toward real relationships and real actions. But the memory system itself could be misused by an app optimizing for engagement instead of outcomes.
Why is personality memory important for mental health?
Because the same advice works differently for different people. Your anxiety might respond best to exercise. Someone else’s might respond better to meditation. A system that remembers your personality can give you personalized coping strategies instead of generic advice. This is what makes Sunny feel like it understands you, rather than like a chatbot.
Could you build this with just prompt engineering and a large context window?
Theoretically, yes. But practically, no. Context windows cost money (tokens used) and context decay (relevance drops with length). More importantly, a 100K context window of your entire life is overwhelming. The model can’t distinguish signal from noise. Memory systems separate what’s relevant (short-term, long-term, working) so the model can make good decisions.
How is this different from Duolingo’s memory system?
Duolingo remembers which lessons you’ve done and adjusts difficulty. That’s transactional memory. Flourish’s memory is biographical — it understands your emotional patterns, your relationships, your goals, your history. The depth is different because the stakes are different. Duolingo is teaching you Spanish. Flourish is supporting your mental health.
Do other mental health apps use three-layer memory?
Not to Xuan’s knowledge. Most use simple context windows or basic user profiles. The three-layer memory system modeled on human cognition is specific to Flourish’s architecture. This is part of why they’re the only app that passed Harvard’s emotional manipulation audit.
Will the memory system get better when better foundation models come out?
Somewhat. Better models can understand memory more nuancedly. But the three-layer architecture itself isn’t dependent on model capability. It’s a product design choice. A better model would make the system more capable at using memory, but the underlying structure would remain the same.
Full episode coming soon
This conversation with Xuan Zhao is on its way. Check out other episodes in the meantime.
Visit the Channel