Quantum Minds Media Operators
Introduction
Media operators in Quantum Minds enable you to work with non-text content types including images, audio, and speech. These operators allow your minds to process and generate multimedia content, expanding the capabilities beyond text-only applications to create rich, multimodal experiences.
Available Media Operators
Operator | Description | Common Use Cases |
---|---|---|
ImageGenerationUrl | Creates images from text prompts with URL output | Visual content creation, marketing materials, illustrations |
ImageGeneration64 | Creates images with base64-encoded output | Embedded visuals, application integration, offline use |
SpeechToText | Converts audio to text transcriptions | Transcription, meeting notes, audio analysis |
TextToSpeech | Converts text to spoken audio | Audio content creation, accessibility, voice interfaces |
ImageGenerationUrl
The ImageGenerationUrl operator creates images based on text descriptions and returns accessible URLs for the generated images.
Inputs
Parameter | Type | Required | Description |
---|---|---|---|
model | string | Yes | Image generation model to use (e.g., "dall-e-3") |
prompt | string | Yes | Description of the image to generate |
quality | string | Yes | Image quality level (e.g., "standard", "hd") |
size | string | Yes | Image dimensions (e.g., "1024x1024", "512x512") |
trigger | string | No | Optional control signal |
Outputs
Parameter | Type | Description |
---|---|---|
type | string | Output format (markdown) |
content | string | Markdown containing the image URL and generation details |
Example Usage
Model: "dall-e-3"
Prompt: "A futuristic smart city with sustainable architecture, flying vehicles, green spaces, and solar panels, in a photorealistic style"
Quality: "hd"
Size: "1024x1024"
Output: Markdown with an embedded image URL showing the described futuristic city
Best Practices
- Provide detailed, specific descriptions
- Mention style, perspective, lighting, and mood
- Include important details about what should and shouldn't be included
- Be aware of content policies that may restrict certain types of images
- Use quality and size parameters appropriate for your use case
ImageGeneration64
The ImageGeneration64 operator creates images based on text descriptions and returns them in base64-encoded format for direct embedding.
Inputs
Similar to ImageGenerationUrl:
Parameter | Type | Required | Description |
---|---|---|---|
model | string | Yes | Image generation model to use |
prompt | string | Yes | Description of the image to generate |
quality | string | Yes | Image quality level |
size | string | Yes | Image dimensions |
trigger | string | No | Optional control signal |
Outputs
Parameter | Type | Description |
---|---|---|
type | string | Output format (markdown) |
content | string | Markdown with embedded base64-encoded image |
Supported Models
Both image generation operators support:
Model | Provider | Strengths |
---|---|---|
dall-e-2 | OpenAI | Faster generation, stylized results |
dall-e-3 | OpenAI | Higher quality, better prompt adherence |
Choosing Between Image Operators
Consideration | ImageGenerationUrl | ImageGeneration64 |
---|---|---|
Persistence | Images hosted externally | Image data contained in output |
Loading speed | May be faster for large images | May be slower for large images |
Integration | Requires URL access | Works offline or in closed systems |
Storage | Minimal output size | Larger output size |
Sharing | Easier to share URLs | Requires full data transfer |
SpeechToText
The SpeechToText operator converts audio content to text transcriptions.
Inputs
Parameter | Type | Required | Description |
---|---|---|---|
file | string | Yes | Audio file to transcribe |
model | string | No | Transcription model to use (default based on file type) |
trigger | string | No | Optional control signal |
Outputs
Parameter | Type | Description |
---|---|---|
type | string | Output format (markdown) |
content | markdown | Transcribed text |
Supported Models
Model | Provider | Best For |
---|---|---|
whisper-large | Fireworks | General transcription, multiple languages |
whisper-large-v3 | Groq | Enhanced accuracy, speaker diarization |
Example Usage
File: [Meeting recording audio file]
Model: "whisper-large-v3"
Output: Complete transcription of the meeting with speaker identification
Best Practices
- Use high-quality audio when possible
- Consider pre-processing noisy audio
- Choose appropriate models for your language needs
- Be aware of length limitations for audio files
- Review and correct transcriptions for critical content
TextToSpeech
The TextToSpeech operator converts text to spoken audio content.
Inputs
Parameter | Type | Required | Description |
---|---|---|---|
text | string | Yes | Text to convert to speech |
language | string | No | Language code (e.g., "en-US", "fr-FR") |
trigger | string | No | Optional control signal |
Outputs
Parameter | Type | Description |
---|---|---|
type | string | Output format (markdown) |
content | markdown | Markdown with embedded audio player |
Example Usage
Text: "Welcome to our quarterly financial review. In this presentation, we'll cover the key performance indicators, revenue highlights, and our outlook for the coming quarter."
Language: "en-US"
Output: Audio file containing the spoken version of the provided text
Best Practices
- Break long text into natural paragraphs
- Use punctuation to influence pacing and intonation
- Consider including pronunciation guides for uncommon terms
- Test with smaller segments before processing large content
- Be mindful of total character count for efficiency
Combining Media Operators
Media operators can be combined to create powerful multimedia experiences:
Text-to-Speech-to-Text Verification
{
"operator": "TextToSpeech",
"input": {
"text": "Important announcement regarding the system upgrade scheduled for next weekend."
}
}
↓
{
"operator": "SpeechToText",
"input": {
"file": "$TextToSpeech_001.output.content"
}
}
↓
{
"operator": "TableToTextSummary",
"input": {
"prompt": "Compare the original text with the transcription and identify any discrepancies",
"dataframe": { "original": "Important announcement...", "transcribed": "$SpeechToText_001.output.content" }
}
}
Image Generation with Audio Description
{
"operator": "ImageGenerationUrl",
"input": {
"model": "dall-e-3",
"prompt": "A visualization of global supply chain networks with highlighted routes and distribution centers",
"quality": "standard",
"size": "1024x1024"
}
}
↓
{
"operator": "TextToSpeech",
"input": {
"text": "This visualization shows our global supply chain network. The red lines represent primary shipping routes, while the blue dots indicate major distribution centers. Note the concentration of activity in Southeast Asia and North America."
}
}
↓
{
"operator": "CardGenerator",
"input": {
"prompt": "Create an interactive card with the image and audio narration"
}
}
Integration with Other Operators
Multi-Modal Content Creation
Media operators work seamlessly with other operator categories:
Data Visualization with Narration:
SQLExecution → TableToGraph → TextToSpeechAutomated Report Generation:
PandasAi → TextToSpeech → CardGeneratorImage Generation from Data:
TableToTextSummary → ImageGenerationUrlAudio Transcription Analysis:
SpeechToText → TextToNoSQL → MongoExecution
Using with MultiModal Operators
Media operators complement the MultiModal operators:
- Use SpeechToText to prepare audio for GeminiMultiModal analysis
- Use ImageGenerationUrl to create visuals based on ClaudeMultiModal insights
- Process TextToSpeech output with GeminiMultiModal for secondary analysis
Advanced Use Cases
Accessibility Enhancement
Create accessible versions of content:
{
"operator": "RAGSummarize",
"input": {
"prompt": "Summarize the key points from our annual report",
"collection": "company_reports"
}
}
↓
{
"operator": "TextToSpeech",
"input": {
"text": "$RAGSummarize_001.output.content"
}
}
Content Localization
Translate and voice content for multiple languages:
{
"operator": "OpenSearch",
"input": {
"prompt": "Translate the following product description to Spanish, French, and German: [product description]"
}
}
↓
{
"operator": "TextToSpeech",
"input": {
"text": "$OpenSearch_001.output.content",
"language": "es-ES, fr-FR, de-DE"
}
}
Interactive Tutorials
Create rich learning experiences:
{
"operator": "ImageGenerationUrl",
"input": {
"prompt": "Step-by-step illustration of how to configure the system settings",
"model": "dall-e-3",
"quality": "hd",
"size": "1024x1024"
}
}
↓
{
"operator": "TextToSpeech",
"input": {
"text": "In this tutorial, we'll walk through the system configuration process..."
}
}
↓
{
"operator": "CardGenerator",
"input": {
"prompt": "Create an interactive tutorial card with both visual and audio guidance"
}
}
Best Practices for Media Operators
Performance Considerations
- Be aware of processing times for media generation
- Consider asynchronous processing for long audio files
- Optimize image sizes for your specific use case
- Cache frequently used media when possible
Content Quality
- Provide detailed prompts for image generation
- Use clear, well-paced text for speech synthesis
- Ensure audio files have good quality for transcription
- Review generated media for accuracy and appropriateness
Technical Limitations
- Be aware of file size limits for audio processing
- Consider format compatibility across systems
- Understand resolution constraints for image generation
- Plan for potential failures in media processing
Ethical Considerations
- Ensure generated content adheres to appropriate guidelines
- Consider bias in image generation and speech recognition
- Be transparent about AI-generated media when appropriate
- Respect copyright and intellectual property in prompts
Next Steps
Explore how Media Operators can be combined with Excel Operators to create rich, data-driven presentations and reports.
Overview | Operator Categories | LLM Operators | Excel Operators