Designing AI products without real data leads to guesswork. This article explains why traditional UX approaches break down, how real data changes design decisions, and what it takes to build a truly AI-native product.
For months, we designed Norvana the way we’d designed every product before. Screens in Figma. Flows. Prototypes. Health dashboards with energy scores that looked polished and meant nothing — because without real user data, none of it could be tested on real people.
Then we built a small demo in code, pulled actual data from Apple HealthKit, and fed it to a model. Within two days, we knew the demo was useless. But we didn’t guess — we felt why. We also knew something more important: this is the only way to design AI. That realization changed how we work on everything since.
Designing without data
When Norvana came to us, the brief was bold and open: build a personal AI health companion. The founder had an ambitious product vision but needed help shaping it into a defined experience. We had a lot of flexibility and freedom to define the product.
We did what designers do. We mapped user needs. We explored information architectures. We designed screens — onboarding flows, health dashboards, nutrition trackers, and activity summaries. Energy scores with nice gradients. Sleep analysis with clean typography. All of it looked like a product. None of it was one.
Why this approach breaks in AI products
The problem wasn’t the craft. The screens were well-designed. The problem was that every decision was a guess. How would the AI respond to a user who sleeps well but eats badly? What would personalized nutrition advice actually look like when it accounts for someone’s bloodwork, medication, and eating habits simultaneously? We didn’t know. We couldn’t know — because the AI wasn’t running. We were drawing pictures of intelligence.
We spent months in this loop. New screens, new flows, feedback, revisions. We kept designing. We kept guessing.
Then we tried something different. We put a designer in front of Cursor, connected Apple HealthKit, pulled real step counts, heart rate data, and sleep patterns into a small iOS demo, and fed it to a model. Everything ran on-device — no server, no privacy concerns, easy to test. The whole thing took days, not weeks.
The demo was rough. The UI was basic. But for the first time, we could feel the product. Not look at a picture of it — feel how the AI responded to actual data, see where its recommendations were generic, notice where the experience broke. Two days in, we scrapped the demo. It wasn’t good enough. But it told us more about what Norvana needed to be than months of mockups ever had.
That was the turn. From that point, Figma became a sketch tool — a place to rough out ideas. Then build the actual experience in code, test it, and get back to Figma to refine the UI.
What we cut and why
Once you feel the product working with real data, you start seeing what doesn’t belong. Health apps love scores. 87 out of 100. Four stars. Streak counters. Badge collections. The logic is simple: gamification drives retention. Users chase numbers, numbers drive daily opens, and daily opens look good in investor decks.
Why scores don’t work
Scores teach users to chase a number instead of understanding their health. A sleep score of 83 tells you nothing about why you woke up three times last night. A nutrition score of 91 doesn’t explain that your vitamin D intake looks fine on paper, but your bloodwork says otherwise. The number becomes the goal. The actual health insight disappears behind it.
A 2014 study published by the American Psychological Association found that people who were told they had poor REM sleep performed worse on cognitive tests — even though the sleep data itself was fake. The belief about the number shaped the outcome more than how rested they actually felt.
That changes how you think about health scores. A number doesn’t just reflect your condition — it can start influencing it. Once your health becomes a daily performance metric, the score itself becomes part of the experience. Sometimes louder than the actual signal.
How we upgraded the scores
We replaced scores with word-based status labels:
- 𐩒Optimized
- 𐩒Needs attention
- 𐩒On track
No leaderboards. No streaks. The language tells you where you stand without turning your health into a game.
Cutting the feature overload
We also cut the feature list. The early vision included doctor booking, meal planning across every dietary framework, workout programs for every sport, and mindfulness modules. Each one was reasonable on its own. Together, they made a product that did everything and felt like nothing.
We narrowed to three areas: activity, nutrition and sleep. Three things with clear data inputs, measurable signals, and room for AI to add real value. Mindfulness stayed on the roadmap — but we understood it required a fundamentally different approach to context and couldn’t be rushed into a prototype.

The principle behind every cut
The principle behind every cut was the same:
"if it doesn’t get better with more data about you, it doesn’t belong in an AI-native product. "
Gamification works — behavioral science confirms that variable reward schedules drive habit formation. But it’s a shortcut. It creates dependency on the mechanic, not on the value.
A streak counter works the same for every user. A personalized sleep recommendation that knows your medication, your stress levels, and your room temperature gets sharper every week. One manufactures retention. The other builds a relationship.
Garbage in, garbage out
The quality of an AI experience is defined by the data behind it. In health, this is unforgiving. The less you know about someone, the worse the recommendations. Tell an AI you sleep badly, and it will suggest a dark room, white noise, and no screens before bed. Useful if you’ve never thought about sleep hygiene. Useless if you already do all of that and still wake up at 3 a.m.
What personalization actually means
The difference between generic and personal is context. If Norvana knows you sleep in a dark room with white noise, that your bloodwork shows low magnesium, and that your stress levels spike on weekday evenings — now it can say something worth hearing. Maybe the problem isn’t your environment. Maybe it’s the cortisol pattern, and the magnesium isn’t helping because you’re taking it in the morning instead of before bed.
That’s a fundamentally different experience. The same principle applies to nutrition. Scan an apple and Norvana tells one user it’s a great source of fiber. It tells another — a diabetic — to watch the sugar content and consider timing it away from evening meals. One product, same input, different advice. That’s what personalization means when it’s backed by real context.
This meant we needed a lot of information upfront. Not eventually, not over time through passive tracking — before the first recommendation. Norvana’s usefulness on day one depends directly on how much context it has on day one.
We started with what we could pull automatically. Apple HealthKit gave us steps, heart rate, sleep stages, and activity history. That’s a baseline — enough to know someone is active or sedentary, sleeping six hours or eight. But it’s not enough for personalization that feels real. The deeper layer requires information only the user can provide. Bloodwork results, chronic conditions, medication, dietary preferences, allergies, how they feel on a given day. The kind of detail a doctor would spend a consultation collecting. We needed an onboarding experience that could gather all of this without feeling like a medical intake form.
That’s what led us to voice.
Apple HealthKit provides raw metrics — step counts, heart rate, sleep duration. It doesn’t tell you what someone eats, how they feel, what medication they take, or what their bloodwork says.
Apple HealthKit provides raw metrics — step counts, heart rate, sleep duration. It doesn’t tell you what someone eats, how they feel, what medication they take, or what their bloodwork says.
Voice onboarding as a relationship
The industry solves onboarding the same way every time: fewer fields, fewer screens, faster conversion. Remove friction. Get the user to the product as quickly as possible. Skip what you can, ask what you must, defer the rest.
We needed the opposite. Norvana’s first experience had to collect deep personal context — sleep habits, eating patterns, energy levels, medical history, and how you feel day to day. Without that, the AI has nothing meaningful to work with. The question wasn’t how to reduce onboarding. It was how to make a long onboarding feel effortless.
Voice solved the paradox.
People speak faster than they type. But speed wasn’t the main reason. From a behavioral science perspective, people enjoy talking about themselves — especially when someone is listening, asking follow-ups, and responding with genuine curiosity. A 20-minute voice conversation doesn’t feel like a form. It feels like meeting someone.
Norvana’s onboarding takes 15 to 30 minutes. That’s roughly the length of a short doctor’s visit — but it doesn’t feel like one.
Norvana’s onboarding takes 15 to 30 minutes. That’s roughly the length of a short doctor’s visit — but it doesn’t feel like one.
How trust is built
The voice starts by acknowledging what it already knows. If your Apple HealthKit data shows 15,000 average daily steps, Norvana opens with that — “You walk a lot, that’s impressive, tell me more about your routine.” It mirrors how humans build rapport: start with appreciation, then go deeper. The conversation doesn’t begin with “tell me about yourself.” It begins with proof that the AI has already been paying attention.
We didn’t create a strict list of questions. We gave the AI a list of topics it needed to cover — sleep, nutrition, activity, medical history — but the order and phrasing were up to the model. It chose its next question based on where the conversation was going, not a predefined sequence. If you started talking about stress, it followed that thread before circling back to diet. This made the onboarding feel like a real conversation rather than a voice-operated questionnaire.
And that’s the key insight about voice in AI products. Voice onboarding mirrors how humans actually meet — the first conversation. You don’t fill out a form when you meet someone new. You talk. You share. You build trust through the act of speaking. We needed Norvana’s first interaction to work the same way.
Giving AI a personality
Voice alone wasn’t enough. A voice that reads questions from a script still feels like a form — even if it sounds human. The onboarding needed to feel like a conversation with someone who has a character.
That’s why the AI needed a personality. Not a gimmick — a real sense of character that makes the user feel like they’re building a relationship, not feeding data into a system. AI is perceived as an entity. People anthropomorphize it, whether you design for that or not. We decided to design it deliberately.
We were heavily inspired by Sesame when their conversational AI appeared. At the time, it felt like the most natural AI voice interaction we’d experienced. We studied their approach to personality and adapted elements of it — the slight playfulness, the willingness to interrupt itself, and the way it handles pauses.
What changed everything
One prompt change made a dramatic difference: telling the model “you have your own opinion.” Before that line, Norvana agreed with everything. Tell it you want to run a marathon after three strength sessions and it would say, “great idea.” After that line, it pushed back. “That’s ambitious — let’s talk about recovery first.” One sentence in the system prompt turned a compliant assistant into a companion that actually cared about your health.
Another lesson from prompt engineering: describe what the AI should do, not what it shouldn’t. “Don’t speak in long sentences” doesn’t work as well as “skip the acknowledgment and go straight to the answer.” Affirmative instructions shape behavior more reliably than restrictions. It’s a small technical detail, but it changed how every interaction in Norvana feels.
“You have your own opinion” — one line in the system prompt that changed the entire character of the AI from agreeable assistant to opinionated companion.
“You have your own opinion” — one line in the system prompt that changed the entire character of the AI from agreeable assistant to opinionated companion.
The combination of open conversation and genuine personality created an onboarding that people didn’t want to skip. The friction wasn’t removed. It was transformed into the experience itself.
Building for the model you don’t have yet
The current model is the worst model you’ll ever use. That sounds like optimism. It’s a design principle.
When we were building Norvana’s voice onboarding, GPT-4.0 was the best available model. It worked — but not the way we wanted. The conversation felt slightly wooden. The AI would acknowledge too much, repeat parts of what you said, and lose the thread in longer exchanges. We spent weeks tuning the system prompt, adjusting phrasing, trying to make it feel natural. We got it to maybe 70%.
Then OpenAI released GPT-4.1. We switched the model. Changed nothing else — same system prompt, same architecture, same personality instructions. The conversation came alive. Weeks of tuning became irrelevant overnight. The model just got it.
We hit the same wall with image generation. We didn’t wait — we built our own scaffolding that worked at the time, but is already redundant with today’s models. That’s a fun story.
Build for where the models are going, not where they are today. If something works at 50% now and you can see the trajectory, build it. The risk is real — you ship experiences that aren’t fully polished. But the alternative is worse: waiting for perfect models and building nothing.
Build for where the models are going, not where they are today. If something works at 50% now and you can see the trajectory, build it. The risk is real — you ship experiences that aren’t fully polished. But the alternative is worse: waiting for perfect models and building nothing.
The art director agent
Every workout image in Norvana is AI-generated. Personalized to the user’s age, body type, fitness level, and goals. Figures are headless, cut at the neck, so you see yourself instead of a stranger. The body is rendered slightly above your current level — a future version of you, not your present condition.
The concept was clear. The execution was a disaster.
We fed exercise descriptions directly to first-generation Gemini image models. The results looked like anatomical horror. Heads rotated 180 degrees. Barbells fused to necks. Limbs bent in directions that don’t exist in human physiology. Out of seven generated images, four were unusable. Yoga and stretching were the worst — anything with complex body positioning broke completely. Gym exercises with equipment were even more chaotic, with weights merging into bodies and proportions collapsing.
The fix wasn’t better prompts
We tried that. More detail in the image prompt didn’t help because the image model couldn’t interpret exercise descriptions the way a human would. “Bench press” means something specific to you and me. To an image model, it’s a loose collection of concepts — person, lying down, bar, weight — with no anatomical logic connecting them.
So we added a middle layer. An agent who acts as an art director.
The pipeline works in three steps:
- One model generates the weekly workout plan — exercises, sets, reps, rest periods.
- A separate model takes that plan, selects the most visually representative exercise from each session, and describes it in precise anatomical detail: body position, limb angles, grip width, equipment dimensions, weight specifications, and how light falls on the figure. Not “person doing a deadlift.” More like “standing figure, feet hip-width apart, knees slightly bent, arms fully extended gripping a barbell at 60cm width, torso hinged forward at 35 degrees from vertical, posterior chain engaged.”
- That description feeds the image model.
The art director agent doesn’t generate images. It translates exercise concepts into anatomical descriptions that image models can actually render correctly.
The art director agent doesn’t generate images. It translates exercise concepts into anatomical descriptions that image models can actually render correctly.

The difference was immediate. Error rate dropped from four-of-seven broken images to one-of-seven. The remaining errors were minor — a slightly disconnected finger, an odd shadow. Not the anatomical nightmares we started with.
"The pattern is stealable: when a generative model can’t handle a complex input directly, add an interpretation layer that translates your domain language into the model’s language. "
The art director agent doesn’t make the image model smarter. It gives the image model instructions it can actually follow.
Food images, by contrast, worked out of the box. AI-generated meal photography looked great from the start — no pipeline needed. The lesson: not every generation task requires scaffolding. Know which ones do.
Generative UI — components, not templates
In 2024, when we were building Norvana, most AI products solved the output problem in one of two ways. Either the AI returned plain text — a wall of words the user has to parse — or the output was locked into predefined templates that the designer built in advance. Text is flexible but hard to scan. Templates are scannable but rigid.
Both approaches miss the point. If you type the output too strictly, you kill the model’s intelligence. You’re not using AI — you’re filling a form with extra steps. But if you give the model complete freedom over how it responds, the output goes off the rails. Inconsistent formatting, unpredictable layouts, responses that feel different every time in ways that erode trust.
The golden middle: give the AI a set of building blocks and let it compose
In Norvana, we defined a component library for AI responses — text blocks, metric cards, actionable buttons, biomarker visualizations, and nutritional breakdowns. The model doesn’t decide what these components look like. That’s designed. The model decides which components to use, in what order, and with what content. The structure is flexible. The building blocks are controlled.

Some sections are predefined. Scanning food always returns a nutritional breakdown in a consistent format — that experience is designed end-to-end. But within that format, the AI can add components based on your context. If a scanned meal affects a biomarker that’s out of range, it injects a card showing the connection between what you’re eating and your blood data. That card wasn’t part of a static template. The AI decided it was relevant and composed it from the available building blocks.
At the other end of the spectrum, the AI can generate full custom pages. A health report, a weekly summary, a deep dive into your sleep trends — these aren’t screens we designed in advance. They’re assembled by the model from the component library based on what’s relevant to you right now.
This changes the designer’s job. You’re not designing screens anymore. You’re designing a system of components and the rules for when they appear. The AI becomes the layout engine. Your job is to make sure every possible composition looks and feels right — which means testing hundreds of response combinations, not drawing dozens of screen states.
The result is an interface that feels alive. The same question can produce different response formats depending on what the AI knows and what’s relevant to your health context. The format matches the intent.
We implemented this through tool calls — the AI decides which response format to invoke based on conversation context. In 2024, this wasn’t standard. Today, products like Claude and Cursor use similar approaches for structured responses.
We implemented this through tool calls — the AI decides which response format to invoke based on conversation context. In 2024, this wasn’t standard. Today, products like Claude and Cursor use similar approaches for structured responses.
Context is the actual product
The UI you see in Norvana is three tabs. Health, Today, Focus. An “Ask Norvana” bar at the bottom of every screen. Clean typography. Calm colors. Nothing remarkable on the surface.
The product isn’t the surface. Every recommendation, every insight, every response Norvana generates is shaped by what it knows about you — and how that knowledge is structured, prioritized, and delivered to the model at the right moment. That’s the actual product. The UI is what sits on top.
How context works
Context architecture in Norvana works in layers. The first layer is always available: what happened in the last three days — your recent sleep, activity, meals, and conversations. This data loads into every interaction because it’s what you’re most likely to ask about. The second layer is your biomarkers that fall outside normal ranges. If your iron is low or your cortisol is elevated, Norvana keeps that accessible because any health question you ask might connect to it. You say, “I’ve been tired lately,” and the model already has the data to connect fatigue to your bloodwork without making a separate retrieval call.
The third layer is everything else — your full medical history, past conversations, dietary preferences, workout history, and medication. This information is stored but not loaded into every request. The AI can retrieve it when needed, but it doesn’t carry the weight of your entire health profile in every interaction. This keeps responses fast and token costs manageable.
Which data stays hot and which stays retrievable is a design decision that directly affects latency, token cost, and response quality. Get it wrong and the AI either responds slowly or responds with generic advice.
Which data stays hot and which stays retrievable is a design decision that directly affects latency, token cost, and response quality. Get it wrong and the AI either responds slowly or responds with generic advice.
Then there’s screen awareness. The Ask Norvana bar isn’t a separate chat — it’s a layer that knows which view you’re on. Ask a question while looking at your workout plan and the model knows which exercises you’ve completed, which are next, what your energy level is today, and what hurt last time. Ask the same question from the nutrition screen and the context shifts entirely. You don’t have to explain where you are or what you’re looking at. The AI already knows.
This is where the chicken-or-turkey moment happens. You scan a meal and the AI isn’t sure what it sees. Instead of guessing, it asks back — as a tappable single-select, not a text question. It knows your dietary context, your allergies, and your goals. If it’s confident, it skips the question entirely and gives you the breakdown. If it’s not, it asks the one thing it needs to know. That decision — ask or answer — is driven by the context layer, not by a designer mapping every possible state in advance.
The deeper insight is this: in AI-native products, the experience isn’t designed in the interface. It’s designed in the data architecture underneath. Which context loads when, how memory persists across sessions, what the model knows versus what it retrieves versus what it asks — these are design decisions that shape the experience.
The same scanned apple gets different advice for different users. One person hears it’s a great source of fiber. A diabetic user gets a note about sugar content and timing. Same input, different context, different response.
The same scanned apple gets different advice for different users. One person hears it’s a great source of fiber. A diabetic user gets a note about sugar content and timing. Same input, different context, different response.
Final insight
Designing an AI product is designing the environment where humans and AI meet. The UI is the interface for the human. The context is the interface for the AI. You need to design both.
That’s what Norvana taught us. The UI is the surface. The context is the product.
How would this work for your product?
Discuss with your AI.

Oleg leads the UI for AI direction — shaping how people interact with intelligent systems and ensuring that AI feels intuitive, human, and useful.

