
PicoAI Assistant
Full-Stack Developer & System Architect · 2025
An enterprise-grade, multi-agent AI chat platform with real-time streaming, tool execution, and RAG.

Overview
PicoAI Assistant is a three-tier platform (Next.js 15 frontend, Express.js gateway, Laravel 12 backend) that enables businesses to deploy multiple specialized AI assistants from a single self-hosted system. The gateway orchestrates real-time SSE streaming across multiple AI providers while the backend manages data, authentication, and an MCP server for extensible tool execution.
The Problem
- Deploy multiple specialized AI assistants (e.g., e-commerce support, internal knowledge bot) from a single platform
- Give AI agents access to live business data and external APIs — not just static training data
- Maintain full control over agent behavior, tool access, and usage limits per user
- Stream responses in real-time with rich content rendering (charts, product carousels, images)
- Keep a self-hosted, privacy-first architecture with no dependency on third-party SaaS chat platforms
Solution Architecture
The gateway is stateless (no database) — all persistence routes through Laravel. AI streaming (SSE) happens at the gateway level, keeping the backend focused on data operations. Data flows from Frontend → Gateway (session cookie) → Laravel (Bearer token) → MySQL.
- React 19 UI
- Zustand
- React Query
- Tailwind CSS
- App Router
- SSE Stream
- AI Providers
- MCP Client
- Session Mgmt
- Tool Exec
- MySQL 8.0
- Passport OAuth
- MCP Server
- RAG Pipeline
- Livewire Admin
Tech Stack
Key Features
Multi-Agent System
Each AI agent is independently configurable with its own model, instructions, temperature, max tokens, and tool assignments. Agents are managed via many-to-many relationships with tools, allowing fine-grained priority and per-tool settings.
Multi-Provider AI Streaming
A provider abstraction layer normalizes responses across OpenAI, Anthropic Claude, Ollama, and Gemini into a unified event stream. Adding a new provider requires implementing a single interface.
MCP Integration
A full dual-role MCP implementation: Laravel MCP Server with 11+ auto-discovered tools (JSON-RPC 2.0 compliant), and Gateway MCP Client with 60-second session caching and tool format transformation.
RAG Pipeline
Type-aware document processing with strategy-based chunking, OpenAI text-embedding-3-large embeddings, MySQL JSON column storage, and cosine similarity search with configurable relevance thresholds.
External API Integration
First-class Service → Tool → Endpoint architecture for connecting any HTTP API with endpoint discovery, keyword filtering, authentication header injection, and streamed results.
Rich Content Rendering
The chat interface renders product carousels (Embla), interactive charts (Recharts), AI-generated images (gpt-image-1), syntax-highlighted code blocks, and full Markdown.
OAuth2 SSO
Laravel Passport handles OAuth2 token issuance. The gateway exchanges authorization codes for tokens stored in httpOnly cookies. The frontend never sees the Bearer token.
Usage & Quota Management
Per-user token limits (input + output tracking), image generation caps with monthly resets, rate limiting with 429 responses, and date-range analytics for reporting.
Challenges & Solutions
Multi-turn tool execution in streaming
AI models request tool calls mid-stream, and results need to be fed back for follow-up reasoning — all while maintaining the SSE connection. Built a tool follow-up mechanism that accumulates results, reconstructs conversation context with provider-specific formatting, and re-enters the streaming loop.
RAG without a vector database
Adding Pinecone or Weaviate would have increased infrastructure complexity for self-hosted deployments. Stored embeddings as MySQL JSON columns and implemented cosine similarity search in PHP. The architecture allows dropping in a dedicated vector store later without changing the search interface.
Consistent authentication across three services
Three services with different auth mechanisms needed seamless SSO. Laravel Passport handles OAuth2 token issuance, the gateway exchanges authorization codes for tokens stored in httpOnly cookies, and the frontend never sees the Bearer token.
Architecture Decisions
| Decision | Rationale |
|---|---|
| Stateless gateway | No database at the gateway — all data ops proxy through Laravel. Simplifies the gateway to a streaming orchestrator. |
| MySQL JSON for embeddings | Avoided adding a vector DB dependency. Cosine similarity computed in PHP. Sufficient for current scale; can migrate to pgvector/Pinecone later. |
| MCP over custom tooling | Adopted the open Model Context Protocol standard for tool interop. Future-proofs integrations and allows third-party MCP servers. |
| Provider abstraction | Unified AiProvider interface decouples business logic from any single AI vendor. Switching or adding providers requires no changes to streaming or tool execution code. |
| Feature-based frontend structure | Each feature (agents, auth, chat, conversations) is self-contained with its own api/, components/, stores/. Scales better than layer-based organization. |
| Laravel Modules | nwidart/laravel-modules for modular backend architecture — keeps domain logic separated as the codebase grows. |
Stateless gateway
No database at the gateway — all data ops proxy through Laravel. Simplifies the gateway to a streaming orchestrator.
MySQL JSON for embeddings
Avoided adding a vector DB dependency. Cosine similarity computed in PHP. Sufficient for current scale; can migrate to pgvector/Pinecone later.
MCP over custom tooling
Adopted the open Model Context Protocol standard for tool interop. Future-proofs integrations and allows third-party MCP servers.
Provider abstraction
Unified AiProvider interface decouples business logic from any single AI vendor. Switching or adding providers requires no changes to streaming or tool execution code.
Feature-based frontend structure
Each feature (agents, auth, chat, conversations) is self-contained with its own api/, components/, stores/. Scales better than layer-based organization.
Laravel Modules
nwidart/laravel-modules for modular backend architecture — keeps domain logic separated as the codebase grows.
My Role
Full-Stack Developer & System Architect
- Designed the three-tier service architecture and inter-service communication patterns
- Built the real-time AI streaming pipeline with multi-provider support
- Implemented the MCP server and client for extensible tool execution
- Developed the RAG pipeline for knowledge-base-powered AI responses
- Created the OAuth2 SSO authentication flow across all services
- Built the admin dashboard with Livewire for agent/tool/user management
- Configured Docker Compose orchestration with Nginx SSL termination
- Wrote OpenAPI documentation with Swagger UI and ReDoc
Gallery

Outcome & Impact
- Multi-agent deployment — businesses can spin up specialized AI assistants without code changes
- Real-time streaming with tool execution provides a responsive, interactive chat experience
- Self-hosted architecture gives organizations full control over data and costs
- Extensible tool system via MCP allows integration with any external API or data source
- Admin dashboard empowers non-technical users to manage agents, tools, and usage limits