Integrating Deepgram Speech-to-Text into Clinical Consultation Workflows
Technology Blogs

Integrating Deepgram Speech-to-Text into Clinical Consultation Workflows

Nadeem K
Associate Software Engineer

Healthcare documentation is one of those areas where technology can make a real difference. Every day, doctors spend hours writing up consultation notes, often staying late just to catch up on paperwork. I’ve been exploring how speech-to-text technology, specifically Deepgram, can transform this process and make life easier for healthcare providers.

In this post, I’ll walk through the technical considerations, implementation strategies, and real benefits of integrating Deepgram into clinical workflows. Whether you’re a developer working on healthcare software or a tech leader evaluating transcription solutions, this should give you practical insights into what works and what doesn’t.

The Healthcare Documentation Problem

If you’ve ever watched a doctor during a consultation, you’ll notice they’re often split between talking to the patient and typing notes. It’s not ideal for anyone involved. The challenges go beyond just time management:

Medical terminology is complex: Try building a general speech recognition system that can handle “pneumonoultramicroscopicsilicovolcanoconiosiss” correctly every time

Privacy regulations are strict: HIPAA isn’t just a suggestion – you need bulletproof data handling

Speed matters: In urgent care situations, waiting hours for transcripts isn’t an option

Everything needs to connect: The transcript has to end up in the EHR system somehow

These constraints make healthcare transcription quite different from, say, transcribing a podcast or meeting.

Why Deepgram Makes Sense for Healthcare

After testing several speech-to-text services, Deepgram stands out for a few key reasons:

They Actually Understand Medical Language:

Deepgram has medical-specific models that know the difference between “hypertension” and “hypotension.” Their nova-medical model handles drug names, anatomy terms, and clinical abbreviations much better than generic models. When a doctor says “patient presents with dyspnea on exertion,” you want that transcribed correctly, not as “patient presents with disneyland exception.”

Real-time Processing That Actually Works:

Some services batch process everything, meaning you wait 10-15 minutes for results. Deepgram can stream transcriptions in real-time or process uploaded files within seconds. In healthcare, especially urgent situations, this speed difference matters.

HIPAA Compliance Baked In:

They’ve done the heavy lifting on compliance certifications. While you still need to implement proper security on your end, having a HIPAA-compliant transcription service as your foundation makes the whole process much more manageable.

It Scales Without Breaking:

Whether you’re handling one small clinic or a health system with thousands of daily consultations, the infrastructure holds up. No one wants their transcription service to crash during peak clinic hours.

System Architecture That Works

Building a healthcare transcription system isn’t just about connecting to an API. You need to think about the entire flow from audio capture to final clinical notes. Here’s what I’ve found works well:

Core Components

The typical setup includes a few key pieces:

  1. Mobile/Web App: Doctors record consultations or upload audio files
  2. API Server: Handles file processing, security, and Deepgram integration
  3. Storage Layer: Secure file storage with proper encryption
  4. Queue System: Processes transcriptions asynchronously
  5. Database: Stores transcripts, metadata, and audit logs

The Processing Flow

Here’s how audio typically moves through the system:

  1.  Audio Recording → Doctor captures consultation on device
  2. Secure Upload → Encrypted file transfer to secure storage
  3. Queue Processing → Background job picks up the file
  4. Deepgram API → Medical model processes the audio
  5. Post-processing → Clean up formatting, add structure
  6. Integration → Push results to EHR or clinical system

This flow ensures patient data stays secure while providing fast, accurate transcriptions.

Technical Implementation

Let’s get into the actual code. I’ll use Node.js examples since they’re pretty straightforward and widely applicable.

Basic Deepgram Integration

Here’s a simple function to send audio to Deepgram for transcription:

javascript

const axios = require(‘axios’);

const fs = require(‘fs’);

async function transcribeAudio(audioFilePath, options = {}) {

  const {

    model = ‘nova-2-medical’,

    language = ‘en-US’,

    smart_format = true,

    diarize = true

  } = options;

  try {

    const audioBuffer = fs.readFileSync(audioFilePath);

    const response = await axios.post(

      `https://api.deepgram.com/v1/listen?model=${model}&language=${language}&smart_format=${smart_format}&diarize=${diarize}`,

      audioBuffer,

      {

        headers: {

          ‘Authorization’: `Token ${process.env.DEEPGRAM_API_KEY}`,

          ‘Content-Type’: ‘audio/wav’ // adjust based on your audio format

        },

        timeout: 120000 // 2 minute timeout

      }

    );

    return {

      success: true,

      transcript: response.data.results.channels[0].alternatives[0].transcript,

      diarized: response.data.results.channels[0].alternatives[0].paragraphs?.transcript,

      duration: response.data.metadata.duration,

      confidence: response.data.results.channels[0].alternatives[0].confidence

    };

  } catch (error) {

    console.error(‘Deepgram transcription failed:’, error.message);

    return {

      success: false,

      error: error.message,

      status: error.response?.status

    };

  }

}

Handling Different Audio Formats

Healthcare environments produce all kinds of audio files. Here’s how to handle various formats:

javascript

const mime = require(‘mime-types’);

function getContentType(filePath) {

  const mimeType = mime.lookup(filePath);

  // Common healthcare audio formats

  const supportedFormats = {

    ‘audio/wav’: ‘audio/wav’,

    ‘audio/mpeg’: ‘audio/mpeg’,

    ‘audio/mp4’: ‘audio/mp4’,

    ‘audio/webm’: ‘audio/webm’,

    ‘audio/x-m4a’: ‘audio/mp4’,

    ‘audio/aac’: ‘audio/aac’

  };

  if (!supportedFormats[mimeType]) {

    throw new Error(`Unsupported audio format: ${mimeType}`);

  }

  return supportedFormats[mimeType];

}

async function transcribeAudioFile(filePath, options = {}) {

  const contentType = getContentType(filePath);

  // Update the previous function to use dynamic content type

  const audioBuffer = fs.readFileSync(filePath);

  const response = await axios.post(

    `https://api.deepgram.com/v1/listen?model=nova-2-medical&smart_format=true&diarize=true`,

    audioBuffer,

    {

      headers: {

        ‘Authorization’: `Token ${process.env.DEEPGRAM_API_KEY}`,

        ‘Content-Type’: contentType

      },

      timeout: 120000

    }

  );

  return response.data;

}

Asynchronous Processing with Queues

For production healthcare systems, you don’t want to make doctors wait while audio processes. Here’s how to handle transcription jobs asynchronously:

javascript

const Queue = require(‘bull’);

const transcriptionQueue = new Queue(‘audio transcription’);

// Add a job to the queue

async function queueTranscription(audioFilePath, patientId, consultationId) {

  const job = await transcriptionQueue.add(‘transcribe’, {

    audioFilePath,

    patientId,

    consultationId,

    timestamp: new Date()

  }, {

    attempts: 3,

    backoff: {

      type: ‘exponential’,

      delay: 2000

    }

  });

  return job.id;

}

// Process transcription jobs

transcriptionQueue.process(‘transcribe’, async (job) => {

  const { audioFilePath, patientId, consultationId } = job.data;

  try {

    // Update job progress

    job.progress(10);

    const result = await transcribeAudioFile(audioFilePath);

    job.progress(50);

    if (result.success) {

      // Store transcript in database

      await storeTranscript({

        patientId,

        consultationId,

        transcript: result.transcript,

        diarizedTranscript: result.diarized,

        confidence: result.confidence,

        duration: result.duration

      });

      job.progress(100);

      // Notify frontend that transcription is complete

      await notifyTranscriptionComplete(consultationId, result.transcript);

      return { success: true, transcript: result.transcript };

    } else {

      throw new Error(`Transcription failed: ${result.error}`);

    }

  } catch (error) {

    console.error(‘Transcription job failed:’, error);

    throw error;

  }

});

Security and Privacy Requirements

Healthcare transcription isn’t like transcribing a business meeting. You’re dealing with protected health information (PHI), which means security can’t be an afterthought.

See How AI Transcription Fits Your Clinical Workflow. Book A Demo.

Advanced Features That Make a Difference

Speaker Diarization

One feature that’s particularly valuable in healthcare is speaker diarization – figuring out who said what. In a typical consultation, you have the doctor asking questions and the patient responding. Being able to automatically separate these speakers makes the transcript much more useful.

javascript

async function transcribeWithSpeakers(audioFilePath) {

  const result = await transcribeAudioFile(audioFilePath, {

    model: ‘nova-2-medical’,

    diarize: true,

    smart_format: true

  });

  if (result.results.channels[0].alternatives[0].paragraphs) {

    // Diarized transcript with speaker labels

    const speakerTranscript = result.results.channels[0].alternatives[0].paragraphs.transcript;

    // Regular transcript without speakers

    const regularTranscript = result.results.channels[0].alternatives[0].transcript;

    return {

      regular: regularTranscript,

      withSpeakers: speakerTranscript,

      speakers: extractSpeakers(speakerTranscript)

    };

  }

  return { regular: result.results.channels[0].alternatives[0].transcript };

}

function extractSpeakers(diarizedText) {

  // Extract unique speakers from the diarized transcript

  const speakerMatches = diarizedText.match(/Speaker \d+:/g);

  return […new Set(speakerMatches)].sort();

}

Real-time Progress Updates

Users don’t want to upload a file and wonder if anything’s happening. Here’s how to provide progress updates:

javascript

const WebSocket = require(‘ws’);

const wss = new WebSocket.Server({ port: 8080 });

// Notify clients of transcription progress

function notifyProgress(consultationId, progress, message) {

  const notification = {

    consultationId,

    progress,

    message,

    timestamp: new Date().toISOString()

  };

  // Send to all connected clients for this consultation

  wss.clients.forEach(client => {

    if (client.readyState === WebSocket.OPEN) {

      client.send(JSON.stringify(notification));

    }

  });

}

// Updated queue processor with progress updates

transcriptionQueue.process(‘transcribe’, async (job) => {

  const { consultationId } = job.data;

  try {

    notifyProgress(consultationId, 10, ‘Processing audio file…’);

    const result = await transcribeAudioFile(job.data.audioFilePath);

    notifyProgress(consultationId, 60, ‘Transcription complete, processing results…’);

    await storeTranscript(result);

    notifyProgress(consultationId, 100, ‘Transcript ready!’);

    return result;

  } catch (error) {

    notifyProgress(consultationId, 0, `Error: ${error.message}`);

    throw error;

  }

});

## Benefits and Outcomes

## Real-World Results (The Good Stuff)

Here’s what I’ve seen when healthcare teams actually start using this technology:

### Documentation Time Plummets

The biggest win is time savings. What used to take doctors 15-20 minutes after each patient now happens automatically during the consultation. I’ve seen practices report:

– 30-minute consultations used to require 15 minutes of post-visit documentation

– Now that documentation exists before the doctor even finishes with the patient

– Some practices save 2-3 hours per day per provider

Doctors Can Actually Look at Patients Again

This sounds dramatic, but it’s real. When you don’t have to frantically type notes while trying to listen to a patient describe their symptoms, the whole dynamic changes. Doctors report much better patient engagement.

Quality Actually Improves

Here’s something unexpected – the transcripts are often more complete than handwritten notes. Doctors capture casual comments that might be medically relevant but would normally be forgotten by the time they sit down to write notes.

Common Challenges

The Audio Quality Is Terrible:

Healthcare environments are noisy. You’ve got HVAC systems, medical equipment beeping, people talking in hallways. Here’s how to deal with it:

javascript

// Pre-process audio to improve quality

const ffmpeg = require(‘fluent-ffmpeg’);

async function enhanceAudioQuality(inputPath, outputPath) {

  return new Promise((resolve, reject) => {

    ffmpeg(inputPath)

      .audioChannels(1) // Convert to mono

      .audioFrequency(16000) // Optimize for speech

      .audioFilters([

        ‘highpass=f=85’, // Remove low-frequency noise

        ‘lowpass=f=8000’, // Remove high-frequency noise

        ‘dynaudnorm’  // Normalize audio levels

      ])

      .save(outputPath)

      .on(‘end’, resolve)

      .on(‘error’, reject);

  });

}

Environment Setup

Keep your development and production environments separate. Here’s what I recommend:

bash

# .env.development

DEEPGRAM_API_KEY=your_dev_key_here

DEEPGRAM_MODEL=nova-2-medical

AUDIO_UPLOAD_BUCKET=dev-healthcare-audio

MAX_FILE_SIZE_MB=50

# .env.production

DEEPGRAM_API_KEY=your_prod_key_here

DEEPGRAM_MODEL=nova-2-medical

AUDIO_UPLOAD_BUCKET=prod-healthcare-audio

MAX_FILE_SIZE_MB=100

<h3>Testing Your Setup</h3>

Here’s a simple test script to make sure everything’s working:

javascript

// test-transcription.js

async function testDeepgramIntegration() {

  const testAudioPath = ‘./sample-consultation.mp3’;

  try {

    console.log(‘Testing Deepgram integration…’);

    const result = await transcribeAudioFile(testAudioPath, {

      model: ‘nova-2-medical’,

      smart_format: true

    });

    if (result.results?.channels?.[0]?.alternatives?.[0]?.transcript) {

      console.log(‘✅ Integration working!’);

      console.log(‘Sample transcript:’, result.results.channels[0].alternatives[0].transcript.substring(0, 100) + ‘…’);

    } else {

      console.log(‘❌ Integration failed – no transcript returned’);

    }

  } catch (error) {

    console.log(‘❌ Integration failed:’, error.message);

  }

}

testDeepgramIntegration();

What’s Next?

Real-Time Transcription

Once you’ve got the basic file upload working, you might want to try real-time transcription for live consultations. This is more complex but incredibly powerful:

javascript

const WebSocket = require(‘ws’);

function setupLiveTranscription() {

  const ws = new WebSocket(‘wss://api.deepgram.com/v1/listen?model=nova-2-medical’, {

    headers: {

      ‘Authorization’: `Token ${process.env.DEEPGRAM_API_KEY}`

    }

  });

  ws.on(‘open’, () => {

    console.log(‘Connected to Deepgram live transcription’);

  });

  ws.on(‘message’, (data) => {

    const result = JSON.parse(data);

    if (result.channel?.alternatives?.[0]?.transcript) {

      // Update UI with live transcript

      updateLiveTranscript(result.channel.alternatives[0].transcript);

    }

  });

  return ws;

}

Custom Medical Vocabularies

You can train Deepgram to better understand your specific medical environment:

javascript

// Submit custom vocabulary to improve accuracy

async function uploadCustomVocabulary() {

  const customTerms = [

    ‘myocardial infarction’,

    ‘pneumothorax’,

    ‘appendectomy’,

    ‘cholecystectomy’,

    // Add your facility’s commonly used terms

  ];

  // This would be part of your setup process

  await axios.post(‘https://api.deepgram.com/v1/projects/your-project/models/your-model/vocabulary’, {

    vocabulary: customTerms

  }, {

    headers: {

      ‘Authorization’: `Token ${process.env.DEEPGRAM_API_KEY}`,

      ‘Content-Type’: ‘application/json’

    }

  });

}

coma

Conclusion

Building a healthcare transcription system with Deepgram is impactful because it directly improves how clinicians spend their time. Instead of being tied to documentation, providers can focus more on patient care. The technology is mature and continues to evolve, making it a practical investment for modern healthcare organizations.

The real challenge isn’t the tech itself—it’s aligning the solution with clinical workflows. Starting small with a single specialty or clinic helps validate adoption and usability. Once it fits naturally into how providers work, scaling becomes far more effective and sustainable.

The returns are both immediate and meaningful. Organizations can reduce documentation time, improve note quality, and enhance patient-provider interactions. When implemented thoughtfully, speech-to-text doesn’t just digitize workflows—it fundamentally improves them.

Nadeem K

Nadeem K

Associate Software Engineer

Nadeem is a front-end developer with 1.5+ years of experience. He has experience in web technologies like React.js, Redux, and UI frameworks. His expertise in building interactive and responsive web applications, creating reusable components, and writing efficient, optimized, and DRY code. He enjoys learning about new technologies.

Share This Blog

Read More Similar Blogs

Let’s Transform
Healthcare,
Together.

Partner with us to design, build, and scale digital solutions that drive better outcomes.

Location

5900 Balcones Dr, Ste 100-7286, Austin, TX 78731, United States

Contact form