ai/studio

Lightning ⚡ RAG User Documentation

Introduction

Lightning RAG (L⚡RAG) is an advanced Retrieval Augmented Generation platform designed to transform how you interact with your data. By combining powerful document processing, database connectivity, and natural language understanding, L⚡RAG enables you to have intelligent conversations with all your information sources through an intuitive chat interface.

This documentation provides comprehensive guidance on how to use Lightning RAG effectively, from creating your first collection to building sophisticated data interactions.

Getting Started

System Requirements

Modern web browser (Chrome, Firefox, Safari, or Edge)
Internet connection for cloud-based features
Minimum screen resolution: 1280x800

Accessing Lightning RAG

Navigate to your organization's Lightning RAG instance URL
Log in with your credentials
You'll be directed to the Collections dashboard

Core Concepts

Collections

Collections are the fundamental building blocks in Lightning RAG. A collection is a set of related data from a specific source type that has been processed and optimized for conversational AI interaction.

Collection Types

Lightning RAG supports three primary data structure categories:

Unstructured Data
- PDF Collections
  - Document-based collections with free-form text, tables, images
  - Supports multiple document formats (PDF, DOCX, TXT)
  - Uses OCR and document understanding technology
Semi-structured Data
- MongoDB Collections
  - Works with NoSQL document databases
  - Handles nested document structures
  - Supports aggregation pipelines
- API Collections
  - Connects to external APIs
  - Auto-generates schema from OpenAPI/Swagger definitions
  - Handles authentication and parameter mapping
Structured Data
- SQL Collections
  - Connects to relational databases with rigid schemas
  - Supports schema understanding and query generation
  - Compatible with PostgreSQL, MySQL, SQL Server, and more
- Excel Collections
  - Processes tabular spreadsheet data
  - Handles multiple sheets and complex formulas
  - Auto-converts to optimized SQL structures internally

Embedding Types

For unstructured data collections (PDF), Lightning RAG offers three embedding technologies:

PaddleOCR
- High-accuracy document processing optimized for complex layouts
- Best for documents with mixed content (text, tables, images)
- Default processing engine
Llama Parse (Cloud)
- Advanced cloud-based parsing for sophisticated document structures
- Superior handling of tables and structured data
- Requires internet connectivity
Docling (On-Premise)
- Secure, locally-hosted solution for sensitive document processing
- Ensures data never leaves your infrastructure
- Ideal for confidential or regulated information

User Interface Overview

Navigation

Collections: Manage and interact with your data sources
Dashboards: View insights and analytics across collections
Analytics: Track usage, performance metrics, and user engagement
Settings: Configure system preferences and user access

Collections Dashboard

The Collections dashboard displays all your available collections with key information:

Collection name and type
Item count (documents, tables, etc.)
Status (Ready, Processing)
Action buttons (Build, Chat, Share)

Filtering and Sorting

Filter collections by data structure (Unstructured, Semi-structured, Structured) or specific type
Search collections by name using the search bar
Sort collections by name, type, creation date, or status

Creating Collections

Step 1: Initiate Collection Creation

Click the "+ New Collection" button in the top right corner
Select the data structure category and collection type
Enter a name for your collection

Step 2: Configure Source

Depending on the collection type, you'll see different source options:

For Unstructured Data (PDF Collections)

Choose a source:
- Upload: Upload files from your computer
- Web Scraper: Extract content from websites
- URL: Import documents from direct links
Select embedding type:
- PaddleOCR: Best for general documents
- Llama Parse: Optimal for complex structures
- Docling: For sensitive information

For Semi-structured Data

MongoDB Collections:

Enter connection details:
- Connection URI
- Database name
- Collection names
- Authentication credentials

API Collections:

Enter API details:
- API name and description
- Base URL
- Authentication method
- Endpoint configuration

For Structured Data

SQL Collections:

Enter database connection details:
- Database type (PostgreSQL, MySQL, etc.)
- Host and port
- Database name
- Authentication credentials
- Table selection (optional)

Excel Collections:

Upload Excel files or provide URL
Select sheets to include
Choose embedding options (Schema only or Data enhanced)

Step 3: Create and Build

Click "Create Collection" to initialize your collection
The system will automatically begin the build process
Building includes:
- Document parsing and OCR (for unstructured data)
- Schema analysis (for structured and semi-structured data)
- Vector embedding generation
- Index optimization

Working with Collections

Collection States

Collections exist in one of two states:

Processing: The collection is being built or updated
Ready: The collection is available for chat interaction

Collection Actions

Build/Rebuild

The Build process prepares your collection for chat interaction:

Click the "Build" button on a collection card
Monitor progress in the detail view
Building time varies based on collection size and complexity

Chat

Start a conversation with your collection:

Click the "Chat" button on a Ready collection
Type natural language questions in the chat interface
Receive AI-generated responses based on your collection data
Follow-up with additional questions for context-aware responses

Share collections with team members:

Click the "Share" button on a collection
Set access permissions (View, Chat, Edit)
Enter recipient email addresses or copy shareable link
Optional: Add expiration date or password protection

Collection Details

Access detailed information and settings by clicking on a collection card:

Overview Tab

Collection metadata (type, data structure category, creation date, size)
Status and processing information
Recent activity log

Content Tab

List of items in the collection (documents, tables, endpoints)
Preview functionality for supported content types
Item-specific metadata

Settings Tab

Rename collection
Modify embedding type (for unstructured data collections)
Configure refresh settings
Delete collection

Analytics Tab

Usage statistics (query count, user count)
Performance metrics (response time, accuracy)
Popular queries and topics

Advanced Features

Dynamic Collection Mapping

For enterprise users, Lightning RAG supports dynamic collection mapping:

Create collection templates with variable placeholders
Set up mapping rules based on user roles or session parameters
Collections automatically adapt to the current user context

Published Links

Create embeddable Lightning RAG dashboards:

Configure a collection for publishing
Generate a MINDSHARE_KEY for secure access
Set refresh rate for dashboard data
Embed the published link in other applications

RBAC Settings

Control access with role-based permissions:

Configure organization-level access policies
Assign roles to team members
Set granular permissions for collections and features

Troubleshooting

Common Issues

Collection Building Fails

Check source file formats and compatibility
Verify database connection details
Ensure API endpoints are accessible
Review error logs in the detail view

Chat Responses Are Inaccurate

Rebuild the collection with updated content
Try refining your question with more specific details
Check if the information exists in your collection
For unstructured data, consider changing the embedding type

Performance Issues

Split large collections into smaller, focused collections
Optimize database queries for structured data collections
Reduce the scope of semi-structured collections
Use local embedding types for faster processing of unstructured data

Best Practices

Collection Organization

Create purpose-specific collections rather than catch-all repositories
Use clear, descriptive naming conventions
Consider organizing by data structure type for efficient management
Regularly audit and clean up unused collections

Data Type Selection Guidelines

Unstructured Data: Ideal for narrative content, reports, manuals, and documents with mixed content types
Semi-structured Data: Best for flexible data models, nested information, and varying schemas
Structured Data: Optimal for relational information, tabular data, and precise querying needs

Effective Querying

Start with simple, direct questions
Provide context in your questions
Ask for specific formats when needed (tables, lists, summaries)
Use follow-up questions to refine results

Data Management

Keep source data updated for optimal results
Schedule regular rebuilds for frequently changing collections
Monitor analytics to identify usage patterns and improvement areas
Implement version control for critical collections

Glossary

Collection: A processed dataset optimized for conversational AI
Embedding: Vector representations of content for semantic search
MINDSHARE_KEY: Secure access token for published dashboards
OCR: Optical Character Recognition for extracting text from images
RAG: Retrieval Augmented Generation - combining retrieval systems with generative AI
RBAC: Role-Based Access Control for managing permissions
Structured Data: Data organized according to a predefined schema (SQL, Excel)
Semi-structured Data: Data with some organizational properties but flexible schema (MongoDB, APIs)
Unstructured Data: Data without predefined organization (PDF, documents)
Vector Database: Database optimized for similarity search of embeddings