Skip to content

System Architecture Overview

SisterShield is a monolithic Next.js 14 application using the App Router. All server and client code lives in a single repository. The architecture follows a layered design: browser client, Next.js server (pages + API routes), data layer (Prisma ORM + PostgreSQL with pgvector), AI services (LLM providers, embeddings, image generation), a RAG pipeline for evidence-based content, a Knowledge Graph for curriculum mapping, and a Twine pipeline for interactive story building.

System Architecture

RAG + Knowledge Graph Data Flow

The following diagram shows how educational documents flow through the RAG pipeline into the knowledge base, and how that knowledge is used during story generation.

Component Architecture

The UI is organized into three layers.

Layout Components

Located in src/components/layout/, these provide the application shell:

  • Header — top navigation bar with logo, nav links, locale switcher, Quick Exit, and Get Help buttons.
  • Sidebar — dashboard navigation (role-aware, showing different links for Students vs Teachers).
  • Footer — site footer with links and safety resources.
  • LocaleSwitcher — dropdown to toggle between Korean and English.

Page Components

Located in src/app/[locale]/, these are Next.js route-based components:

  • (auth)/login and (auth)/register — authentication pages.
  • (dashboard)/dashboard — role-based home screen.
  • (dashboard)/courses/* — course listing, detail, edit, generate, and play pages.
  • (dashboard)/submissions/* — submission listing, detail, creation, and review pages.
  • (dashboard)/progress — student progress overview.
  • (dashboard)/settings — user settings.
  • page.tsx (root) — public landing page with hero section and evidence.
  • pilot/ — pilot program request page.

Shared UI Components

Located in src/components/:

  • ui/ — shadcn/ui primitives (Button, Dialog, Tabs, Toast, Select, Progress, etc.).
  • twine/TwinePlayer.tsx — iframe-based Twine story renderer with postMessage communication.
  • safety/QuickExit.tsx — emergency exit button (Escape key, clears session, redirects to weather.com).
  • safety/GetHelp.tsx — dialog with Korean and international crisis resources.
  • courses/CourseCard.tsx — card component for course listings.
  • courses/ImagePromptPanel.tsx — UI for AI image generation workflow.
  • hero/ — landing page components (evidence carousel, stats).

Data Flow Overview

Request Lifecycle

  1. Browser sends request to Next.js server.
  2. Middleware detects locale from URL path (/en-US/dashboard or /ko-KR/dashboard) and loads translations.
  3. NextAuth middleware verifies JWT token from cookie.
  4. Server component or API route executes, querying PostgreSQL through Prisma.
  5. Response rendered (RSC for pages, JSON for API routes).

Twine Pipeline

The Twine pipeline converts interactive story content into playable, tracked course HTML:

  1. Parse — JSDOM extracts tw-storydata / tw-passagedata from Twine HTML; detects format (Harlowe, SugarCube).
  2. Validate — BFS traversal detects dead links, duplicate passages, and orphan passages.
  3. Compile — Tweego CLI compiles Twee 3 source to Harlowe-3 HTML (or custom renderer as fallback).
  4. Inject Tracking — MutationObserver script for passage change detection, progress calculation, quiz scoring, and postMessage communication with the parent frame.
  5. Build — Orchestrates compile + inject, stores build HTML at builds/{courseId}/v{version}/index.html and Twee source at sources/{courseId}/v{version}/source.twee.

RAG Pipeline

The RAG pipeline provides evidence-based context for AI-generated stories:

  1. Ingest — Scans RAG/Data/ for PDFs and DOCX files; extracts text via pdf-parse / mammoth; detects category and source organization.
  2. Chunk — Splits documents into 500-800 token segments with 100-token overlap; respects section headers and paragraph boundaries.
  3. Embed — Generates 1536-dimensional vectors via OpenAI text-embedding-3-small in batches of 100.
  4. Store — Upserts RagDocument and RagChunk records into PostgreSQL with pgvector; builds IVFFlat index.
  5. Search — Hybrid retrieval combining cosine similarity, keyword matching, and Reciprocal Rank Fusion (RRF); quality scoring boosts high-value chunks.
  6. Format — Groups top-K chunks by document, assigns citation keys [S1], [S2], etc., and injects as EVIDENCE_CONTEXT into the LLM prompt.
  7. Cite — Extracts [SOURCE:Sn] markers from generated Twee, maps to RagChunk / RagDocument metadata, and stores RagCitation records.

Knowledge Graph

The Knowledge Graph organizes the RAG knowledge base into a structured curriculum taxonomy:

  • KnowledgeConcept — Hierarchical tree of concepts (categories: risk-type, prevention-strategy, legal-framework, coping-skill) with bilingual names.
  • Concept Tagger — Automatically tags RagChunk records with matching concepts via keyword matching and embedding similarity.
  • RagConceptTag — Many-to-many links between chunks and concepts, enabling concept-based retrieval filtering and curriculum gap analysis.

LLM Integration

The src/lib/llm/client.ts module abstracts LLM calls behind a callLLM() function. The LLM_PROVIDER environment variable switches between anthropic (Claude Sonnet) and openai (GPT-4o). Key LLM-powered features:

  • Story Generation — Generates Twee 3 interactive stories with RAG-sourced evidence, following a structured prompt with protagonist design, dangerous-choice architecture, quiz structure, and resource passages.
  • Translation — Three modes: single-field, batch metadata, and full Twee source translation (preserving passage structure and link targets).
  • Error Fixing — Auto-repairs dead links, duplicate passages, and orphans using RAG-retrieved reference patterns from approved stories.
  • Image Prompts — Generates DALL-E 3 prompts with a unified style directive and fixed character roster; all characters depicted as 18+ with safety constraints.

Image generation always uses OpenAI DALL-E 3 regardless of the text LLM provider.

Database Models

The Prisma schema defines the following core models:

ModelPurpose
UserRegistered users with role (STUDENT, TEACHER, ADMIN)
AccountOAuth account linking (NextAuth adapter)
SessionDatabase-backed sessions (NextAuth adapter)
SubmissionTwine file uploads with review workflow
CoursePublished courses with bilingual metadata
CourseVersionVersioned course builds (HTML with tracking)
ProgressPer-user, per-course progress tracking
HeroEvidenceItemLanding page evidence quotes
PilotRequestPilot program interest submissions
TeacherAccessLogAudit trail for teacher actions
RagDocumentIngested source documents (PDF, DOCX) with category and status
RagChunkDocument chunks with 1536-d embeddings for vector search
KnowledgeConceptHierarchical concept taxonomy with bilingual names
RagConceptTagMany-to-many links between chunks and concepts
RagCitationCourse-to-chunk citation tracking for source attribution

Internationalization Data Flow

All user-facing database fields use a JSON i18n pattern:

{
"en-US": "English text",
"ko-KR": "Korean text"
}

The Prisma Locale enum uses underscores (en_US, ko_KR), while the application uses hyphens (en-US, ko-KR). Helper functions dbLocaleToI18n() and i18nLocaleToDb() in src/lib/i18n/config.ts handle the conversion.

File Storage

Twine HTML files (uploaded and built), Twee sources, and generated images are stored on the local filesystem. The storage layer in src/lib/storage/ provides an abstraction interface designed for future migration to S3 or MinIO. Files are served through /api/files/[...path].

PathContents
uploads/Raw Twine HTML submissions
builds/{courseId}/v{version}/Compiled course HTML with tracking
sources/{courseId}/v{version}/Twee 3 source files
images/{courseId}/DALL-E 3 generated artwork