Skip to main content

Aurora Model — Best Practices

*Create ultra-realistic avatar videos with a single image using the Aurora engine.*

Updated over 3 weeks ago

Aurora enables natural, cinematic avatar generation for a wide variety of use cases — from talking heads and product demos to expressive performances.

This guide outlines **best practices** for achieving the most realistic and expressive results.

🧠 Overview

Aurora turns any photo or single image into a lifelike talking avatar.

It works best when the audio, avatar, and prompt align naturally in emotion and context.

The Aurora editor allows you to:

  • Upload or select an avatar photo

  • Add a voice input (script or audio)

  • Choose a prompt to define the avatar’s behavior and tone


🖼️ Image Guidelines

Aurora is designed to be flexible and works well with a variety of image types and styles — from realistic portraits to stylized art or full-body compositions.

To help the model interpret your input effectively:

  • You can use photos, renders, or character art — Aurora adapts automatically.

  • Ensure the main subject is clear and distinguishable in the image.

  • If motion or expressions look unnatural, try using an image with a clearer face or pose.

  • For multi-scene consistency, use images with similar style and framing (e.g., all portrait shots or all illustrations).

There are no strict limitations on angle, composition, or lighting — Aurora dynamically adjusts to your chosen image.


🔊 Audio & Voice Settings

🎧 Recommended Settings

  • Voice Model: Always use Voice Model V3 for the most natural sync and expressive range.

  • Voice Speed: Keep speech moderate and clear — avoid overly fast delivery.

  • Audio Source: You can upload recorded audio or type a script for TTS generation.

⚙️ Tips for Best Results

  • If lip-sync or movement feels off, slow down the voice slightly.

  • Use natural pauses between sentences to improve rhythm and breathing.

  • Keep tone consistent across multiple clips for seamless storytelling.


🧩 Prompt Design — The Key to Realism

Prompts define how the avatar behaves — body movement, emotion, and style.

Use concise, descriptive prompts that match your voice tone and scene context.


🏗️ Base Prompt

Use this as the foundation for all Aurora generations:

4K studio interview, medium close‑up (shoulders‑up crop). Solid light‑grey seamless backdrop, uniform soft key‑light—no lighting change. Presenter faces lens, steady eye‑contact. Hands remain below frame, body perfectly still. Ultra‑sharp.

💡 Tip:

Use GPT to automatically optimize your final prompt by combining this base style with your specific use case. For example: “Generate an optimized Aurora prompt for a skincare product demo” — GPT will blend cinematic framing with the expressive behavior needed for that context.


🎭 Prompt Examples

Below are Aurora prompts for common use cases:


🎤 Natural Talking Avatar

The person is talking and facing the camera directly and very naturally with breathing chest mild movement. Natural explaining gestures and eye movements.


🎶 Singing Avatar

The person moves in sync with the rhythm of the song, their body expressing the flow and emotion of the music. They sing with emotion, their voice and facial expressions reflecting the depth of the lyrics and music, and every note resonating with heartfelt feeling and passion for the performance.


🎙️ Podcast

The person is looking and facing to the side as if talking to someone in that direction, with engaging expression showing interest in the topic.


🎥 Any Angles

The person looks and faces at the camera naturally talking to viewers.


📱 Product Avatars — Holding Phone

The person holding the phone is showing the screen of the phone to the camera while talking and casually look at the phone and points at the phone from time to time-to-time while expressing.


💧 Product Avatars — Facial Skin Care

The person holding the product is showing the label face of the product to the camera while explaining, the person touches her face to explain the effects of the product on her skin.


🎁 Product Avatar — Any Object

The person holding the product is showing the label face of the product to the camera while explaining, the person points at the object on her hand from time-to-time while explaining.


💬 Product Avatars — Enthusiastic Explanation

The person holding the product is showing the label face of the product to the camera while explaining, the person's hands move enthusiastically trying to explain the benefit of the product.


🤳 Selfie Avatar

The person is talking in front of the camera with one hand not visible into the camera. The camera has a slight shake as hand-held camera.


🐾 Pet and Avatar

The person is stroking the pet's fur while talking in front of the camera.


⚡ Intense Expressions

The person speaks with intense emotion and enthusiasm.


🧍 Animation

The character speaks with unique characteristics and with natural movements.


🐕 Animals — Four-Legged

The animal speaks with natural animal mouth movements. The animal stands up with all four legs.


🦊 Animal — Humanoid

The animal speaks with natural animal mouth movements while moving like a human being expressing the topic.


🧪 Troubleshooting & Optimization

If your video doesn’t look quite right, try these adjustments:

  1. Refine your prompt — small tweaks can make big improvements in realism.

  2. Adjust voice speed — slightly slower pacing often improves sync.

  3. Try a different image — if motion feels mismatched to the pose.

  4. Experiment with angles — e.g., “slightly tilted,” “shoulders-up,” or “three-quarter view.”


🧭 Quick Checklist

Category

Best Practice

Voice

Use Voice Model V3, moderate speech speed

Prompt

Use base cinematic setup + behavior description

Audio

Include natural pauses for breathing and emphasis

Regeneration

Adjust prompt or voice pace if sync feels off

Use Case Fit

Match prompt type to content (product, podcast, music, etc.)


🌟 Final Notes

  • Aurora adapts beautifully to many creative styles — experiment freely.

  • Keep voice, image, and prompt emotionally consistent for the best results.

  • Use expressive cues like “gentle gestures,” “confident smile,” or “animated energy” to fine-tune realism.

  • If results vary, re-render with slightly revised description or pacing for optimal performance.

Did this answer your question?