OVID Embase | ||
1 | automatic speech recognition/ | 469 |
2 | ((voice or speech) adj (recogni* or respon*)).tw. | 2516 |
3 | or/1-2 | 27490 |
4 | exp research/ | 380483 |
5 | (qualitative* or quantitative* or mixed method* or descriptive* or research*).tw. | 1194784 |
6 | or/4-5 | 14148120 |
7 | 3 and 6 | 483 |
8 | limit 7 to yr = “2000 -Current” | 433 |
OVID Medline | ||
1 | Speech Recognition Software | 416 |
2 | ((voice or speech) adj (recogni* or respon*)).tw. | 2081 |
3 | or/1-2 | 2263 |
4 | exp Research/ | 224487 |
5 | (qualitative* or quantitative* or mixed method* or descriptive* or research*).tw. | 840821 |
6 | or/4-5 | 971456 |
7 | 3 and 6 | 360 |
8 | limit 7 to yr = “2000 -Current” | 319 |
OVID PreMedline | ||
1 | ((voice or speech) adj (recogni* or respon*)).tw. | 140 |
2 | (qualitative* or quantitative* or mixed method* or descriptive* or research*).tw. | 94513 |
3 | 1 and 2 | 20 |
4 | limit 3 to yr = “2000 -Current” | 19 |
Al-Aynati 2003[] | Alapetite, 2008[] | Alapetite, 2009[31] | Callaway, 2002[] | Derman, 2010[] | Devine, 2000[] | Irwin, 2007[] | Kanal, 2001[] | Koivikko, 2008[] | Langer, 2002[] | Mohr, 2003[] | NSLHD 2012[29] | Singh, 2011[] | Zick, 2001[] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Screening Questions | ||||||||||||||
Clear research questions | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Appropriate data collected | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
1. Qualitative | ||||||||||||||
Appropriate qualitative data sources | ||||||||||||||
Appropriate qualitative method | ||||||||||||||
Description of the context | ||||||||||||||
Discussion of researchers’ reflexivity | ||||||||||||||
2. Randomized controlled | ||||||||||||||
Appropriate randomization | Yes | No | ||||||||||||
Allocation concealment and/or blinding | Yes | No | ||||||||||||
Complete outcome data | Yes | Yes | ||||||||||||
Low withdrawal/drop out | Yes | Yes | ||||||||||||
Screening Questions | ||||||||||||||
3. Non-randomized | ||||||||||||||
Recruitment minimized bias | No | |||||||||||||
Appropriate outcome measures | Yes | |||||||||||||
Intervention & control group comparable | Yes | |||||||||||||
Complete outcome data/acceptable response rate | Yes | |||||||||||||
4. Quantitative descriptive | ||||||||||||||
Appropriate sampling1 | No | Yes | Yes | No | Yes | Yes | No | Yes | Yes | Yes | No | |||
Appropriate sample2 | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | |||
Appropriate measurement (valid/standard) | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | |||
Acceptable response rate | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes | No | Yes | Yes | |||
Total Score3(Yes =1, No = 0) | 5 | 4 | 6 | 6 | 5 | 6 | 5 | 5 | 6 | 5 | 6 | 4 | 6 | 5 |
Author | Aim | Setting | Outcome measures | Results |
---|---|---|---|---|
Year | Sample | |||
Country Design | Speech technology (ST) | |||
Design | ||||
Al-Aynati and Chorneyko 2003 [] | To compare SR software with HT for generating pathology reports | Setting: Surgical pathology | 1. Accuracy rate | Accuracy rate (mean %) |
Sample: 206 pathology reports | 2. Recognition/ Transcription errors | SR: 93.6 HT: 99.6 | ||
Canada Experimental | ST: IBM Via Voice Pro version 8 with pathology vocabulary dictionary | Mean recognition errors | ||
SR: 6.7 HT: 0.4 | ||||
Mohr et al. 2003 [] | To compare SR software with HT for clinical notes | Setting: Endocrinology and Psychiatry | 1. Dictation/recording time + transcription (minutes) = Report Turnaround Time (RTT). | RTT (mins) |
Endocrinology | ||||
SR: (Recording + transcription) = 23.7 | ||||
HT: (Dictation + transcription) = 25.4 | ||||
USA Experimental | Sample: 2,354 reports | |||
ST: Linguistic Technology Systems LTI with clinical notes application | SR: 87.3% (CI 83.3, 92.3) productive compared to HT. | |||
Psychiatry transcriptionist | ||||
SR: (Recording + transcription) = 65.2 | ||||
HT: (Dictation + transcription) = 38.1 | ||||
SR: 63.3% (CI 54.0, 74.0) productive compared to HT. | ||||
Psychiatry secretaries | ||||
SR: (Recording + transcription) = 36.5 | ||||
HT: (Dictation + transcription) = 30.5 | ||||
SR: 55.8% (CI 44.6, 68.0) productive compared to HT. | ||||
Author, secretary, type of notes were predictors of productivity (p < 0.05). | ||||
NSLHD 2012 [29] | To compare accuracy and time between SR software and HT to produce emergency department reports | Setting: Emergency Department | 1. RTT | RTT mean (range) in minutes |
Australian Experimental | Sample: 12 reports | SR: 1.07 (46 sec, 1.32) | ||
ST: Nuance Dragon Voice Recognition | HT: 3.32 (2.45, 4.35) | |||
HT: Spelling and punctuation errors | ||||
SR: Occasional misplaced words | ||||
Alapetite, 2008 [] | To evaluate the impact of background | Setting: Simulation laboratory | 1. Word Recognition Rate (WRR) | WRR |
Denmark Non-experimental | noise (sounds of alarms, aspiration, metal, people talking, scratch, silence, ventilators) and other factors affecting SR accuracy when used in operating rooms | Sample: 3600 short anaesthesia commands | Microphone | |
Microphone 1: Headset 83.2% | ||||
ST: Philips Speech Magic 5.1.529 SP3 and Speech Magic Inter Active Danish language, Danish medical dictation adapted by Max Manus | Microphone 2: Handset 73.9% | |||
Recognition mode | ||||
Command 81.6% | ||||
Free text 77.1% | ||||
Background noise | ||||
Scratch 66.4% | ||||
Silence 86.8% | ||||
Gender | ||||
Male 76.8% | ||||
Female 80.3% | ||||
Alapetite et al. 2009 [31] | To identify physician’s perceptions, attitudes and expectations of SR technology. | Setting: Hospital (various clinical settings) | 1. Users’ expectation and experience | Overall |
Denmark Non-experimental | Sample: 186 physicians | Predominant response noted. | Q1 Expectation: positive 44% | |
Q1 Experience: negative 46% | ||||
Performance | ||||
Q8 Expectation: negative 64% | ||||
Q8 Experience: negative 77% | ||||
Time | ||||
Q14 Expectation: negative 85% | ||||
Q14 Experience: negative 95% | ||||
Social influence | ||||
Q6 Expectation negative 54% | ||||
Q6 Experienced negative 59% | ||||
Callaway et al. 2002 [] | To compare an off the shelf SR software with manual transcription services for radiology reports | Setting: 3 military medical facilities | 1. RTT (referred to as TAT) | RTT |
USA Non-experimental | Sample: Facility 1: 2042 reports | 2. Costs | Facility 1: Decreased from 15.7 hours (HT) to 4.7 hours (SR) | |
Facility 2: 26600 reports | Completed in <8 h: SR 25% HT 6.8% | |||
Facility 3: 5109 reports | Facility 2: Decreased from 89 hours (HT) to 19 hours (SR) | |||
ST: Dragon Medical | Cost | |||
Professional 4.0 | Facility 2: $42,000 saved | |||
Facility 3: $10,650 saved | ||||
Derman et al. 2010 [] | To compare SR with existing methods of data entry for the creation of electronic progress notes | Setting: Mental health hospital | 1. Perceived usability | Usability |
Canada Non-experimental | Sample: 12 mental health physicians ST: Details not provided | 2. Perceived time savings | 50% prefer SR | |
3. Perceived impact | Time savings: No sig diff (p = 0.19) | |||
Impact | ||||
Quality of care No sig diff (p = 0.086) | ||||
Documentation No sig diff (p = 0.375) | ||||
Workflow No sig improvement (p = 0.59) | ||||
Devine et al. 2000 [] | To compare ‘out-of-box’ performance of 3 continuous SR software packages for the generation of medical reports. | Sample: 12 physicians from Veterans Affairs facilities New England | 1. Recognition errors (mean error rate) | Recognition errors (mean-%) |
USA Non-experimental | ST: System 1 (S1) IBM ViaVoice98 General Medicine Vocabulary. | 2. Dictation time | Vocabulary | |
3. Completion time | S1 (7.0 -9.1%) S3 (13.4-15.1%) S2 (14.1-15.2%) | |||
System 2 (S2) Dragon Naturally Speaking Medical Suite, V 3.0. | 4. Ranking | S1 Best with general English and medical abbreviations. | ||
Dictation time: No sig diff (P < 0.336). | ||||
System 3 (S3) L&H Voice Xpress for Medicine, General Medicine Edition, V 1.2. | 5. Preference | Completion time (mean): | ||
S2 (12.2 min) S1 (14.7 min) S3 (16.1 min) | ||||
Ranking: 1S12S23S3 | ||||
Irwin et al. 2007 [] | To compare SR features and functionality of 4 dental software application systems. | Setting: Simulated dental | 1. Training time | Training time |
USA Non-experimental | Sample: 4 participants (3 students, 1 faculty member) | 2. Charting time | S1 11 min 8 sec S2 9 min 1 sec (no data reported for S3 ad S4). | |
3. Completion | ||||
ST: Systems 1 (S1) Microsoft SR with Dragon NaturallySpeaking. | 4. Ranking | Charting time: S1 5 min 20 sec S2 9 min 13 sec, (no data reported for S3 ad S4). | ||
System 2 (S2) Microsoft SR | Completion %: S1 100 S2 93 S3 90 S4 82 | |||
Systems 3 (S3) & System 4 (S4) Default speech engine. | Ranking | |||
1 S1 104/189 2 S2 77/189 | ||||
Kanal et al. 2001 [] | To determine the accuracy of continuous SR for transcribing radiology reports | Setting: Radiology department | 1. Error rates | Error rates (mean ± %) |
USA Non-experimental | Sample: 72 radiology reports 6 participants | Overall (10.3 ± 33%) | ||
Significant errors (7.8 ± 3.4%) | ||||
ST: IBM MedSpeaker/Radiology software version 1.1 | Subtle significant errors (1.2 ± 1.6%) | |||
Koivikko et al. 2008 [] | To evaluate the effect of speech recognition onadiology workflow systems over a period of 2 years | Setting: Radiology department | 1. RTT (referred to as TAT) at 3 collection points: | RTT (mean ± SD) in minutes |
Finland Non-experimental | Sample: >20000 reports; 14 Radiologists | HT: 2005 (n = 6037) | HT: 1486 ± 4591 | |
ST: Finnish Radiology Speech | SR1: 2006 (n = 6486) | SR1: 323 ± 1662 | ||
Recognition System (Philips Electronics) | SR2: 2007 (n = 9072) | SR2: 280 ± 763 | ||
HT: cassette-based reporting | 2. Reports completed ≤ 1 hour | Reports ≤ 1 hour (%) | ||
SR1: SR in 2006 | HT: 26 | |||
SR2: SR in 2007 | SR1: 58 | |||
Training: | ||||
10-15 minutes training in SR | ||||
Langer 2002 [] | To compare impact of SR on radiologist productivity. Comparison of 4 workflow systems | Setting: Radiology departments | 1. RTT (referred to as TAT) | RTT (mean ± SD%) in hours/ RP |
USA Non-experimental | Sample: Over 40 radiology sites | 2. Report productivity (RP), number of reports per day | System 1 | |
System 1 Film, report dictated, HT | RTT: 48.2 ± 50 RP: 240 | |||
System 2 Film, report dictated, SR | System 2 | |||
System 3 Picture archiving and communication system + HT | RTT: 15.5 ± 93 RP: 311 | |||
System 3 | ||||
System 4 Picture archiving and communication system + SR | RTT: 13.3 ± 119 (t value at 10%) RP: 248 | |||
System 4 | ||||
RTT: 15.7 ± 98 (t value at 10%) RP: 310 | ||||
Singh et al. 2011 [] | To compare accuracy and turnaround | Setting: Surgical pathology | 1. RTT (referred to as TAT) | RTT in days |
USA Non-experimental | times between SR software and traditional transcription service (TS) when used for generating surgical pathology reports | Sample: 5011 pathology reports | 2. Reports completed ≤ 1 day | Phase 0: 4 |
ST: VoiceOver (version 4.1) Dragon Naturally Speaking Software (version 10) | 3. Reports completed ≤ 2 day | Phase 1: 4 | ||
Phase 0: 3 years prior SR | Phase 2–4: 3 | |||
Phase 1: First 35 months of SR use, gross descriptions | Reports ≤ 1 day (%) | |||
Phase 0: 22 | ||||
Phase 2–4: During use of SR for gross descriptions and final diagnosis | Phase 1: 24 | |||
Phase 2–4: 36 | ||||
Reports ≤ 2 day (%) | ||||
Phase 0: 54 | ||||
Phase 1: 60 | ||||
Phase 2–4: 67 | ||||
Zick et al. 2001 [] | To compare accuracy and RTT between | Setting: Emergency Department | 1. RTT (referred to as TAT) | RTT in mins |
USA Non-experimental | SR software and traditional transcription service (TS) when used for recording in patients’ charts in ED | Sample: Two physicians - 47 patients’ charts | 2. Accuracy | SR: 3.55 TS: 39.6 |
3. Errors per chart | Accuracy % (Mean and range) | |||
ST: Dragon NaturallySpeaking Medical suite version 4 | 4. Dictation and editing time | SR: 98.5 (98.2-98.9) TS: 99.7 (99.6-99.8) | ||
4. Throughput | Average errors/chart | |||
SR: 2.5 (2–3) TS: 1.2 (0.9-1.5) | ||||
Average dictation time in mins (Mean and range) | ||||
SR: 3.65 (3.35-3.95) TS: 3.77 (3.43-4.10) | ||||
Throughput (words/minute) | ||||
SR: 54.5 (49.6-59.4) TS: 14.1 (11.1-17.2) |