cover

Visurai — Visual Learning Copilot

Date: 2025-10-17
Stacks:ReactTypescriptPythonLangchainLLMLovableSupabaseNext.jsImage Generation
Tags:HackathonAI

Visurai — Visual Learning Copilot

🏆 Built at the Good Vibes Only AI/ML Buildathon @ USC (2025)

Service Link

https://visurai-story-maker.lovable.app/

Overview

Project Demo : https://drive.google.com/file/d/16_YFVfVJoDPQqLkXXaRXSv_Dyr98bxey/view?usp=sharing

Visurai helps dyslexic and visual learners comprehend material by converting text into a sequence of AI-generated images with optional narration.

Paste any text and get:

  • A title and segmented scenes that preserve key facts and names
  • High-quality images per scene (Flux via Replicate or OpenAI gpt-image-1)
  • Per‑scene TTS audio and a single merged audio track with a timeline
  • Optional OCR to start from an image instead of text
  • Features

  • Context‑aware scene segmentation and detail‑preserving visual prompts (GPT‑4o)
  • Image generation providers:
  • Narration:
  • Live progress via SSE (/generate_visuals_events)
  • OCR routes: generate from image URL or upload
  • Absolute asset URLs using PUBLIC_BASE_URL (e.g., ngrok) for frontend access
  • Architecture

    Repository Structure

    Prerequisites

  • Python 3.10+ (tested up to 3.13)
  • ffmpeg installed (required for merged audio)
  • Provider keys as needed:
  • Backend — Quick Start (run from repo root)

    From the repo root:

    backend/.env (example)

    Verify

    Frontend — Quick Start (pnpm)

    Configure your frontend to call the backend base URL (e.g., PUBLIC_BASE_URL).

    Typical React workflow:

    Ensure your frontend uses absolute URLs from the backend responses (e.g., image_url, audio_url), which already include the PUBLIC_BASE_URL when set.

    If your frontend needs an explicit base URL, set it (e.g., Vite):

    Engine Switch: LangGraph vs Imperative

    The backend can run either:

  • Imperative flow (default): sequential segmentation → prompts → images
  • LangGraph flow: graph-based orchestration
  • Enable LangGraph by setting an env var and restarting the server:

    Endpoints are the same (e.g., POST /generate_visuals), but execution uses the graph.

    API Highlights

  • POST /generate_visuals → scenes with image URLs and a title
  • POST /generate_visuals_with_audio → scenes + per‑scene audio URLs + durations
  • POST /generate_visuals_single_audio → merged audio_url, total duration, timeline, scenes
  • GET /generate_visuals_events → Server‑Sent Events stream for progress
  • POST /visuals_from_image_url and /visuals_from_image_upload → OCR then visuals
  • Troubleshooting

  • Audio fails to load after revisiting a story
  • OpenAI Images error: invalid size
  • Replicate credit errors
  • Mixed content blocked
  • CORS
  • License

    MIT License © 2025 Visurai Team

    Made with care for learners who think in pictures.

    ← Back to project list