Purpose: To develop and gather validity evidence for a novel tool for assessment of cochlear implant (CI) surgery, including virtual reality CI surgery training.
Methods: Prospective study gathering validity evidence according to Messick’s framework. Four experts developed the CI Surgery Assessment Tool (CISAT). A total of 35 true novices (medical students), trained novices (residents) and CI surgeons performed two CI-procedures each in the Visible Ear Simulator, which were rated by three blinded experts. Classical test theory and generalizability theory were used for reliability analysis.
Results: The CISAT significantly discriminated between the three groups (p < 0.001). The generalizability coefficient was 0.76 and most of the score variance (53.3%) was attributable to the participant and only 6.8% to the raters. When exploring a standard setting for CI surgery, the contrasting groups method suggested a pass/fail score of 36.0 points (out of 55), but since the trained novices performed above this, we propose using the mean CI surgeon performance score (45.3 points).
Conclusion: Validity evidence for simulation-based assessment of CI performance supports the CISAT. Together with the standard setting, the CISAT might be used to monitor progress in competency-based training of CI surgery and to determine when the trainee can advance to further training.
INTRODUCTION: Simulation-integrated tutoring in virtual reality (VR) simulation training by green-lighting is a common learning support in simulation-based temporal bone surgical training. However, tutoring overreliance can negatively affect learning. We therefore wanted to investigate the effects of simulator-integrated tutoring on performance and learning.
METHODS: A prospective, educational cohort study of a learning intervention (simulator-integrated tutoring) during repeated and distributed VR simulation training for directed, self-regulated learning of the mastoidectomy procedure. Two cohorts of novices (medical students) were recruited: 16 participants were trained using the intervention program (intermittent simulator-integrated tutoring) and 14 participants constituted a non-tutored reference cohort. Outcomes were final-product performance assessed by two blinded raters, and simulator-recorded metrics.
RESULTS: Simulator-integrated tutoring had a large and positive effect on the final-product performance while turned on (mean difference 3.8 points, p<0.0001). However, this did not translate to a better final-product performance in subsequent non-tutored procedures. The tutored cohort had a better metrics-based score, reflecting higher efficiency of drilling (mean difference 3.6 %, p=0.001). For the individual metrics, simulator-integrated tutoring had mixed effects both during procedures and on the tutored cohort in general (learning effect).
CONCLUSIONS: Simulator-integrated tutoring by green-lighting did not induce a better final-product performance but increased efficiency. The mixed effects on learning could be caused by tutoring overreliance, resulting from a lack of cognitive engagement when the tutor-function is on. Further learning strategies such as feedback should be explored to support novice learning and cognitive engagement.
Purpose: Reliable assessment of surgical skills is vital for competency-based medical training. Several factors influence not only the reliability of judgements but also the number of observations needed for making judgments of competency that are both consistent and reproducible. The aim of this study was to explore the role of various conditions-through the analysis of data from large-scale, simulation-based assessments of surgical technical skills-by examining the effects of those conditions on reliability using Generalizability theory.
Method: Assessment data from large-scale, simulation-based temporal bone surgical training research studies in 2012-2018 were pooled, yielding collectively 3,574 assessments of 1,723 performances. The authors conducted generalizability analyses using an unbalanced random-effects design, and they performed decision studies to explore the effect of the different variables on projections of reliability.
Results: Overall, five observations were needed to achieve a Generalizability coefficient > 0.8. Several variables modified the projections of reliability: increased learner experience necessitated more observations (5 for medical students, 7 for residents, and 8 for experienced surgeons); the more complex cadaveric dissection required fewer observations than virtual reality simulation (2 vs. 5 observations); and increased fidelity simulation graphics reduced the number of observations needed from 7 to 4. The training structure (either massed or distributed practice) and simulator-integrated tutoring had little effect on reliability. Finally, more observations were needed during initial training when the learning curve was steepest (6 observations) compared with the plateau phase (4 observations).
Conclusions: Reliability in surgical skills assessment seems less stable than it is often reported to be. Training context and conditions influence reliability. The findings from this study highlight that medical educators should exercise caution when using a specific simulation-based assessment in other contexts.
OBJECTIVE: Handheld otoscopy requires both technical and diagnostic skills, and is often reported to be insufficient after medical training. We aimed to develop and gather validity evidence for an assessment tool for handheld otoscopy using contemporary medical educational standards.
STUDY DESIGN: Educational study.
SETTING: University/teaching hospital.
SUBJECTS AND METHODS: A structured Delphi methodology was used to develop the assessment tool: nine key opinion leaders (otologists) in undergraduate training of otoscopy iteratively achieved consensus on the content. Next, validity evidence was gathered by the video-taped assessment of two handheld otoscopy performances of 15 medical students (novices) and 11 specialists in otorhinolaryngology using two raters. Standard setting (pass/fail criteria) was explored using the contrasting groups and Angoff methods.
RESULTS: The developed Copenhagen Assessment Tool of Handheld Otoscopy Skills (CATHOS) consists 10 items rated using a 5-point Likert scale with descriptive anchors. Validity evidence was collected and structured according to Messick’s framework: for example the CATHOS had excellent discriminative validity (mean difference in performance between novices and experts 20.4 out of 50 points, p<0.001); and high internal consistency (Cronbach’s alpha=0.94). Finally, a pass/fail score was established at 30 points for medical students and 42 points for specialists in ORL.
CONCLUSION: We have developed and gathered validity evidence for an assessment tool of technical skills of handheld otoscopy and set standards of performance. Standardized assessment allows for individualized learning to the level of proficiency and could be implemented in under- and postgraduate handheld otoscopy training curricula, and is also useful in evaluating training interventions.
Purpose: At graduation from medical school, competency in otoscopy is often insufficient. Simulation-based training can be used to improve technical skills, but the suitability of the training model and assessment must be supported by validity evidence. The purpose of this study was to collect content validity evidence for a simulation-based test of handheld otoscopy skills.
Methods: First, a three-round Delphi study was conducted with a panel of nine clinical teachers in otorhinolaryngology (ORL) to determine the content requirements in our educational context. Next, the authenticity of relevant cases in a commercially available technology-enhanced simulator (Earsi, VR Magic, Germany) was evaluated by specialists in ORL. Finally, an integrated course was developed for the simulator based on these results.
Results: The Delphi study resulted in nine essential diagnoses of normal variations and pathologies that all junior doctors should be able to diagnose with a handheld otoscope. Twelve out of 15 tested simulator cases were correctly recognized by at least one ORL specialist. Fifteen cases from the simulator case library matched the essential diagnoses determined by the Delphi study and were integrated into the course.
Conclusion: Content validity evidence for a simulation-based test of handheld otoscopy skills was collected. This informed a simulation-based course that can be used for undergraduate training. The course needs to be further investigated in relation to other aspects of validity and for future self-directed training.
PURPOSE: Virtual reality (VR) simulation surgical skills training is well established, but self-directed practice is often associated with a learning curve plateau. In this study, we investigate the effects of structured self-assessment as a means to improve performance in mastoidectomy training.
METHODS: The study was a prospective, educational study. Two cohorts of novices (medical students) were recruited for practice of anatomical mastoidectomy in a training program with five distributed training blocks. Fifteen participants performed structured self-assessment after each procedure (intervention cohort). A reference cohort of another 14 participants served as controls. Performances were assessed by two blinded raters using a modified Welling Scale and simulator-recorded metrics.
RESULTS: The self-assessment cohort performed superiorly to the reference cohort (mean difference of final product score 0.87 points, p = 0.001) and substantially reduced the number of repetitions needed. The self-assessment cohort also had more passing performances for the combined metrics-based score reflecting increased efficiency. Finally, the self-assessment cohort made fewer collisions compared with the reference cohort especially with the chorda tympani, the facial nerve, the incus, and the malleus.
CONCLUSIONS: VR simulation training of surgical skills benefits from having learners perform structured self-assessment following each procedure as this increases performance, accelerates the learning curve thereby reducing time needed for training, and induces a safer performance with fewer collisions with critical structures. Structured self-assessment was in itself not sufficient to counter the learning curve plateau and for continued skills development additional supports for deliberate practice are needed.
OBJECTIVE: Competency-based surgical training involves progressive autonomy given to the trainee. This requires systematic and evidence-based assessment with well-defined standards of proficiency. The objective of this study is to develop standards for the cross-institutional mastoidectomy assessment tool to inform decisions regarding whether a resident demonstrates sufficient skill to perform a mastoidectomy with or without supervision.
METHODS: A panel of fellowship-trained content experts in mastoidectomy was surveyed in relation to the 16 items of the assessment tool to determine the skills needed for supervised and unsupervised surgery. We examined the consensus score to investigate the degree of agreement among respondents for each survey item as well as additional analyses to determine whether the reported skill level required for each survey item was significantly different for the supervised versus unsupervised level.
RESULTS: Ten panelists representing different US training programs responded. There was considerable consensus on cut-off scores for each item and trainee level between panelists, with moderate (0.62) to very high (0.95) consensus scores depending on assessment item. Further analyses demonstrated that the difference between supervised and unsupervised skill levels was significantly meaningful for all items. Finally, minimum-passing scores for each item was established.
CONCLUSION: We defined performance standards for the cross-institutional mastoidectomy assessment tool using the Angoff method. These cut-off scores that can be used to determine when trainees can progress from performance under supervision to performance without supervision. This can be used to guide training in a competency-based training curriculum.
OBJECTIVE: To investigate validity evidence, and strengths and limitations of performance metrics in mastoidectomy training.
METHODS: A systematic review following the PRISMA guidelines. Studies reporting performance metrics in mastoidectomy/temporal bone surgery were included. Data on design, outcomes, and results were extracted by two reviewers. Validity evidence according to Messick’s framework and level of evidence were assessed.
RESULTS: The search yielded a total of 1085 studies from the years 1947-2018 and 35 studies were included for full data extraction after abstract and full-text screening. 33 different metrics on mastoidectomy performance were identified and ranked according to the number of reports. Most of the 33 metrics identified had some amount of validity evidence. The metrics with most validity evidence were related to drilling time, volume drilled per time, force applied near vital structures, and volume removed.
CONCLUSIONS: This review provides an overview of current metrics of mastoidectomy performance, their validity, strengths and limitations, and identifies the gap in validity evidence of some metrics. Evidence-based metrics can be used for performance assessment in temporal bone surgery and for providing integrated and automated feedback in virtual reality simulation training. The use of such metrics in simulation-based mastoidectomy training can potentially address some of the limitations in current temporal bone skill assessment and ease assessment in repeated practice. However, at present, an automated feedback based on metrics in VR simulation does not have sufficient empirical basis and has not been generally accepted for use in training and certification.
OBJECTIVE: Often the assessment of mastoidectomy performance requires time-consuming manual rating. Virtual reality (VR) simulators offer potentially useful automated assessment and feedback but should be supported by validity evidence. We aimed to investigate simulator metrics for automated assessment based on the expert performance approach, comparison with an established assessment tool, and the consequences of standard setting.
METHODS: The performances of 11 experienced otosurgeons and 37 otorhinolaryngology residents. Participants performed three mastoidectomies in the Visible Ear Simulator. Nine residents contributed additional data on repeated practice in the simulator. One hundred and twenty-nine different performance metrics were collected by the simulator and final-product files were saved. These final products were analyzed using a modified Welling Scale by two blinded raters.
RESULTS: Seventeen metrics could discriminate between resident and experienced surgeons’ performances. These metrics mainly expressed various aspects of efficiency: Experts demonstrated more goal-directed behavior and less hesitancy, used less time, and selected large and sharp burrs more often. The combined metrics-based score (MBS) demonstrated significant discriminative ability between experienced surgeons and residents with a mean difference of 16.4% (95% confidence interval [12.6-20.2], P << 0.001). A pass/fail score of 83.6% was established. The MBS correlated poorly with the final-product score but excellently with the final-product score per time.
CONCLUSION: The MBS mainly reflected efficiency components of the mastoidectomy procedure, and although it could have some uses in self-directed training, it fails to measure and encourage safe routines. Supplemental approaches and feedback are therefore required in VR simulation training of mastoidectomy.
BACKGROUND: Virtual reality surgical simulation of mastoidectomy is a promising training tool for novices. Final-product analysis for assessing novice mastoidectomy performance could be limited by a peak or ceiling effect. These may be countered by simulator-integrated tutoring.
METHODS: Twenty-two participants completed a single session of self-directed practice of the mastoidectomy procedure in a virtual reality simulator. Participants were randomised for additional simulator-integrated tutoring. Performances were assessed at 10-minute intervals using final-product analysis.
RESULTS: In all, 45.5 per cent of participants peaked before the 60-minute time limit. None of the participants achieved the maximum score, suggesting a ceiling effect. The tutored group performed better than the non-tutored group but tutoring did not eliminate the peak or ceiling effects.
CONCLUSION: Timing and adequate instruction is important when using final-product analysis to assess novice mastoidectomy performance. Improved real-time feedback and tutoring could address the limitations of final product based assessment.
BACKGROUND: Temporal bone surgery requires integration of complex knowledge and technical skills. This can be difficult to accomplish with traditional cadaveric dissection training, which is often organized as single-instance participation in a temporal bone course. Simulator-integrated tutoring in virtual reality (VR) surgical simulators can visually guide the procedure and facilitate self-directed surgical skills acquisition. This study aims to explore the performances of novice otorhinolaryngology residents in a freeware VR simulator and in cadaveric dissection training of mastoidectomy.
METHODS: Thirty-four novice otorhinolaryngology residents performed a single and self-directed mastoidectomy procedure in a freeware VR temporal bone simulator before performing a similar procedure on a cadaveric temporal bone. VR simulation and cadaveric dissection performances were assessed by two blinded expert raters using final product analysis.
RESULTS: Participants achieved a higher mean final product score in VR simulation compared with cadaveric dissection (14.9 and 13.2, respectively; P = 0.02). Significantly more of the participants had their best performance in VR simulation (P = 0.04). No differences in computer experience and interest were found between the group that performed better in VR simulation and the group that performed better in cadaveric dissection.
CONCLUSIONS: Novice performance in a freeware VR temporal bone simulator was significantly better than in cadaveric dissection. The simulator-integrated tutor function and reduced complexity of the procedure in VR simulation could be possible explanations for this finding. VR simulation training could be used in the initial training of novices, reserving dissection training for more advanced training after basic competencies have been acquired with VR simulation.
OBJECTIVES/HYPOTHESIS: The future development of integrated automatic assessment in temporal bone virtual surgical simulators calls for validation against currently established assessment tools. This study aimed to explore the relationship between mastoidectomy final-product performance assessment in virtual simulation and traditional dissection training.
STUDY DESIGN: Prospective trial with blinding.
METHODS: A total of 34 novice residents performed a mastoidectomy on the Visible Ear Simulator and on a cadaveric temporal bone. Two blinded senior otologists assessed the final-product performance using a modified Welling scale. The simulator gathered basic metrics on time, steps, and volumes in relation to the on-screen tutorial and collisions with vital structures.
RESULTS: Substantial inter-rater reliability (kappa = 0.77) for virtual simulation and moderate inter-rater reliability (kappa = 0.59) for dissection final-product assessment was found. The simulation and dissection performance scores had significant correlation (P = .014). None of the basic simulator metrics correlated significantly with the final-product score except for number of steps completed in the simulator.
CONCLUSIONS: A modified version of a validated final-product performance assessment tool can be used to assess mastoidectomy on virtual temporal bones. Performance assessment of virtual mastoidectomy could potentially save the use of cadaveric temporal bones for more advanced training when a basic level of competency in simulation has been achieved.
IMPORTANCE: Repeated and deliberate practice is crucial in surgical skills training, and virtual reality (VR) simulation can provide self-directed training of basic surgical skills to meet the individual needs of the trainee. Assessment of the learning curves of surgical procedures is pivotal in understanding skills acquisition and best-practice implementation and organization of training.
OBJECTIVE: To explore the learning curves of VR simulation training of mastoidectomy and the effects of different practice sequences with the aim of proposing the optimal organization of training.
DESIGN, SETTING, AND PARTICIPANTS: A prospective trial with a 2 × 2 design was conducted at an academic teaching hospital. Participants included 43 novice medical students. Of these, 21 students completed time-distributed practice from October 14 to November 29, 2013, and a separate group of 19 students completed massed practice on May 16, 17, or 18, 2014. Data analysis was performed from June 6, 2014, to March 3, 2015.
INTERVENTIONS: Participants performed 12 repeated virtual mastoidectomies using a temporal bone surgical simulator in either a distributed (practice blocks spaced in time) or massed (all practice in 1 day) training program with randomization for simulator-integrated tutoring during the first 5 sessions.
MAIN OUTCOMES AND MEASURES: Performance was assessed using a modified Welling Scale for final product analysis by 2 blinded senior otologists.
RESULTS: Compared with the 19 students in the massed practice group, the 21 students in the distributed practice group were older (mean age, 25.1 years), more often male (15 [62%]), and had slightly higher mean gaming frequency (2.3 on a 1-5 Likert scale). Learning curves were established and distributed practice was found to be superior to massed practice, reported as mean end score (95% CI) of 15.7 (14.4-17.0) in distributed practice vs. 13.0 (11.9-14.1) with massed practice (P = .002). Simulator-integrated tutoring accelerated the initial performance, with mean score for tutored sessions of 14.6 (13.9-15.2) vs. 13.4 (12.8-14.0) for corresponding nontutored sessions (P < .01) but at the cost of a drop in performance once tutoring ceased. The performance drop was less with distributed practice, suggesting a protective effect when acquired skills were consolidated over time. The mean performance of the nontutored participants in the distributed practice group plateaued on a score of 16.0 (15.3-16.7) at approximately the ninth repetition, but the individual learning curves were highly variable.
CONCLUSIONS AND RELEVANCE: Novices can acquire basic mastoidectomy competencies with self-directed VR simulation training. Training should be organized with distributed practice, and simulator-integrated tutoring can be useful to accelerate the initial learning curve. Practice should be deliberate and toward a standard set level of proficiency that remains to be defined rather than toward the mean learning curve plateau.
A variety of structured assessment tools for use in surgical training have been reported, but extant assessment tools often employ paper-based rating forms. Digital assessment forms for evaluating surgical skills could potentially offer advantages over paper-based forms, especially in complex assessment situations. In this paper, we report on the development of cross-platform digital assessment forms for use with multiple raters in order to facilitate the automatic processing of surgical skills assessments that include structured ratings. The FileMaker 13 platform was used to create a database containing the digital assessment forms, because this software has cross-platform functionality on both desktop computers and handheld devices. The database is hosted online, and the rating forms can therefore also be accessed through most modern web browsers. Cross-platform digital assessment forms were developed for the rating of surgical skills. The database platform used in this study was reasonably priced, intuitive for the user, and flexible. The forms have been provided online as free downloads that may serve as the basis for further development or as inspiration for future efforts. In conclusion, digital assessment forms can be used for the structured rating of surgical skills and have the potential to be especially useful in complex assessment situations with multiple raters, repeated assessments in various times and locations, and situations requiring substantial subsequent data processing or complex score calculations.