# Speech Integration Plan

## Overview
Extend existing chat functionality to support speech-to-text (Whisper) and text-to-speech capabilities.

## Implementation Plan

### 1. Extend ChatController

Add two new methods to `ChatController`:

```javascript
// Speech to text using Whisper
static async transcribeAudio(req, res) {
  try {
    const audioFile = req.file; // From multer middleware
    const transcription = await openai.audio.transcriptions.create({
      file: audioFile,
      model: "whisper-1"
    });
    
    return res.status(200).json({
      success: true,
      data: {
        text: transcription.text
      }
    });
  } catch (error) {
    return res.status(500).json({
      success: false,
      error: error.message
    });
  }
}

// Text to speech
static async synthesizeSpeech(req, res) {
  try {
    const { text, voice = 'alloy' } = req.body;
    
    const mp3 = await openai.audio.speech.create({
      model: "tts-1",
      voice: voice,
      input: text
    });

    // Set proper headers for audio streaming
    res.setHeader('Content-Type', 'audio/mpeg');
    res.setHeader('Transfer-Encoding', 'chunked');

    // Stream the response
    const buffer = Buffer.from(await mp3.arrayBuffer());
    res.end(buffer);
  } catch (error) {
    return res.status(500).json({
      success: false,
      error: error.message
    });
  }
}
```

### 2. Add New Routes
Extend `/src/agent-PipeLine/routes/chat.js` with new endpoints:

```javascript
/**
 * @swagger
 * /agent/speech/transcribe:
 *   post:
 *     tags:
 *       - AI Pipeline
 *     summary: Transcribe audio to text
 *     description: Uses Whisper to convert speech to text
 *     requestBody:
 *       required: true
 *       content:
 *         multipart/form-data:
 *           schema:
 *             type: object
 *             properties:
 *               file:
 *                 type: string
 *                 format: binary
 */
router.post('/speech/transcribe', 
  upload.single('file'), // Add multer middleware
  ChatController.transcribeAudio
);

/**
 * @swagger
 * /agent/speech/synthesize:
 *   post:
 *     tags:
 *       - AI Pipeline
 *     summary: Convert text to speech
 *     description: Uses OpenAI TTS to convert text to speech
 *     requestBody:
 *       required: true
 *       content:
 *         application/json:
 *           schema:
 *             type: object
 *             required:
 *               - text
 *             properties:
 *               text:
 *                 type: string
 *               voice:
 *                 type: string
 *                 enum: [alloy, echo, fable, onyx, nova, shimmer]
 *                 default: alloy
 */
router.post('/speech/synthesize', ChatController.synthesizeSpeech);
```

### 3. Update Upload Middleware
Add audio file handling to upload configuration:

```javascript
// In src/config/upload.js
const audioFileFilter = (req, file, cb) => {
  const allowedMimes = [
    'audio/mpeg',
    'audio/mp3',
    'audio/wav',
    'audio/webm'
  ];
  
  if (allowedMimes.includes(file.mimetype)) {
    cb(null, true);
  } else {
    cb(new Error('Invalid file type. Only audio files are allowed.'));
  }
};

export const audioUpload = multer({
  storage: multer.memoryStorage(),
  fileFilter: audioFileFilter,
  limits: {
    fileSize: 25 * 1024 * 1024 // 25MB limit (OpenAI's max)
  }
});
```

## Next Steps

1. Update the OpenAI configuration to ensure audio models are properly configured
2. Implement error handling for audio file validation
3. Add rate limiting specific to audio endpoints
4. Add tests for new speech functionality
5. Update API documentation with new endpoints

## Usage Examples

### Transcribe Audio
```bash
curl -X POST http://your-api/agent/speech/transcribe \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.mp3"
```

### Generate Speech
```bash
curl -X POST http://your-api/agent/speech/synthesize \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, how can I help you today?",
    "voice": "alloy"
  }' \
  --output speech.mp3