jagomart
digital resources
picture1_Mcq Internal Medicine Pdf 117052 | Kelly 2012


 134x       Filetype PDF       File size 0.17 MB       Source: www.cpass.umontreal.ca


File: Mcq Internal Medicine Pdf 117052 | Kelly 2012
teaching and learning in medicine 24 3 187 193 copyright c 2012 taylor francis group llc issn 1040 1334 print 1532 8015 online doi 10 1080 10401334 2012 692239 applied ...

icon picture PDF Filetype PDF | Posted on 05 Oct 2022 | 3 years ago
Partial capture of text on file.
               Teaching and Learning in Medicine, 24(3), 187–193
                       
               Copyright C 2012, Taylor & Francis Group, LLC
               ISSN: 1040-1334 print / 1532-8015 online
               DOI:10.1080/10401334.2012.692239
               APPLIED RESEARCH
               Comparing a Script Concordance Examination to a
               Multiple-Choice Examination on a Core Internal
               Medicine Clerkship
               William Kelly, Steven Durning, and Gerald Denton
               Department of Medicine, Uniformed Services University of Health Sciences, Bethesda, Maryland, USA
                                                                                         preferred by students. SC was more reliable and preferred when
                  Background: Script concordance (SC) questions, in which a              administered to house staff.
               learnerisgivenabriefclinicalscenariothenaskedifadditionalin-
               formation makes one hypothesis more or less likely, with answers
               compared to a panel of experts, are designed to reflect a learner’s        BACKGROUND
               clinical reasoning. Purpose: The purpose is to compare reliability,
               validity, and learner satisfaction between a three-option modified             Although descriptive evaluation (i.e., narrative comments
               SCexaminationtoamultiple-choicequestion(MCQ)examination                   madebyclinicalteachers regarding student performance) is the
               amongmedicalstudentsduringa3rd-yearinternalmedicineclerk-                 predominant method of evaluation used by internal medicine
               ship, to compare reliability and learner satisfaction of SC between       clerkships, written examinations account for up to one third of
               medical students and a convenience sample of house staff, and to                              1
               comparelearnersatisfactionwithSCbetween1st-and4th-quarter                 a student’s grade. The National Board of Medical Examiners
               medical students. Methods: Using a prospective cohort design, we          (NBME)subjectexamination in Medicine is frequently admin-
               compared the reliability of 20-item SC and MCQ examinations,              istered, but in a 2005 national survey,1 only 20% of internal
               sequentially administered on the same day. To measure validity,           medicine clerkships used it as the sole examination, compared
               scores were compared to scores on the National Board of Medical           with 50% in 1999. Over that same period, administration of
               Examiners(NBME)subjectexaminationinmedicineandtoaclin-                    local, faculty-developed examinations had become more com-
               ical performance measure. SC and MCQ were also administered
               to a convenience sample of internal medicine house staff. Medical         mon (36% vs. 27%) although the contribution of these exami-
               studentandhousestaffwereanonymouslysurveyedregardingsat-                  nations to final grades decreased (14% vs. 21%). The content,
               isfaction with the examinations. Results: There were 163 students         reliability, and validity of these examinations have not been
               who completed the examinations. With students, the initial relia-         fully described. The multiple-choice question (MCQ) format
               bility of the SC was half that of MCQ (KR20 0.19 vs. 0.41), but           is most common (unpublished data from the 2009 Clerkship
               withhousestaff(n=15),reliabilitywasthesame(KR20=0.52for                   Directors in Internal Medicine survey). MCQs can efficiently
               both examinations). SC performance correlated with student clin-
               ical performance, whereas MCQ did not (r = .22, p = .005 vs. .11,         and reliably assess medical knowledge, but the answer options
               p=.159).StudentsreportedthatSCquestionswerenomorediffi-                    provided may“cue”theexamineeandmayseemremovedfrom
               cult and were answered more quickly than MCQ questions. Both              real clinical situations.2 MCQs do not precisely mirror clinical
               exams were considered easier than NBME, and all 3 were consid-            practice as physicians do not get to choose a single best an-
               ered equally fair. More students preferred MCQ over SC (55.8%
               vs. 18.0%), whereas house staff preferred SC (46% vs. 23%; p =            swer out of four provided possibilities when evaluating a real
   Downloaded by [Bibliothèques de l'Université de Montréal] at 08:02 19 September 2013 .03). Conclusions: This SC examination was feasible and was morepatient.
               validthantheMCQexaminationbecauseofbettercorrelationwith                                    3
               clinical performance, despite being initially less reliable and less          Script theory postulates that in real clinical practice, physi-
                                                                                         cians apply prestored knowledge sets (or “scripts”) to under-
                                                                                         stand a patient’s clinical presentation and then either accept
                                                                                         or reject this hypothesis when presented with additional in-
                                                                                         formation.4,5 The scripts of experienced clinicians may vary
                  Portions of this data were presented in abstract form at the 2009      substantially, but essential elements are believed to be similar,
               CDIMand2010AAMCmeetings.Theopinionsinthisarticlearethose
               of the authors and do not represent the official policy of the Uniformed   and students can be measured by their agreement—or concor-
               Services University. We thank Mrs. Robin Howard and Dr. Audrey            dance with—this standard.6 A recent PubMed search returned
               Changfortheir assistance with review and statistical analysis.            48 studies utilizing script concordance (SC) testing in differ-
                  Correspondence may be sent to William F. Kelly, Medicine-EDP,          ent settings. In addition to various medicine specialties, these
               4301 Jones Bridge Road, Bethesda, MD 20814-4599, USA. E-mail:                                                                                  7,8
               william.kelly@usuhs.edu                                                   studies included assessing intraoperative decision making,
                                                                                      187
             188                                                 W.KELLY,S.DURNING,G.DENTON
             cultural competence,9 and as screening for poorly performing           Seventy-five SC items can be completed in 1 hour.14 In pooled
             physicians.10 SC testing has been studied in medical students in       analysis of three studies of physicians in residency training and
                                                      11                                                                         15
             their preclinical (basic science) years    and to assess improve-      nurse practitioner trainees, Gagnon et al.      found that two to
                                             12                 13
             mentafter curriculum change.       Lubarskyetal.     recently sum-     four questions nested into each of 15 to 25 cases may be the
             marized published validity evidence for SC testing, noting its         best combination for content validity and reliability, though this
             content and internal structure (reliability) is strongly supported,    hasn’t been studied in medical students. Reports of 20 cases
             with response process and educational consequences being less          with 60 items (i.e., three SC questions per case) have shown
             well established.                                                      good reliability (Cronbach’s α>.75).16,17
                 SC examinations can be constructed using published guid-              SCquestions may use a 3-point or 5-point Likert scale, with
             ance,14 starting with brief clinical vignettes. Questions (items)      5-point scales (“aggregate scoring, with five-points”) in which
             containing supplemental information are then nested in these           there is no one, true correct answer being more complex and
             cases. Rather than choosing one best answer as in MCQs, ex-            difficult to answer10,14 (Figure 1a). Charlin et al.,18 among their
             aminees indicate on a Likert-type scale whether the proposed           many foundational contributions to SC, have shown that more
             diagnosis,investigation,ortreatmentismoreorlesslikelybased             variability within the expert panel responses is associated with
             on new information provided. Thus SC items are hypothesized            a high effect size for discrimination between groups. Discrim-
             totestclinicalreasoningmoredirectlythanMCQs.Forasample                 ination may come at the expense of reliability (measurement
                                                                                                                                                     19
             SC vignette with nested items, see Figure 1a. The examinee’s           error) if the expert panel includes deviant responses. Gagnon
             score for each item is based on how much his or her answers            demonstrated that such responses may have a negligible impact
             match those of an expert panel. Many test items can be ob-             if the panel size is large (>15 experts), but if removing them,
             tained from one vignette, allowing for more items per unit time.       Gagnonrecommendeddoingsobasedontheirdistancefromthe
                                     A 61-year-old female with diabetes is admitted to your ward team. You note her serum 
                                     sodium is 125 (normal 135–142) and you will need to explain this on rounds. In terms of 
                                     the hyponatremia: 
                                     Hypothesis:  If you were       … and then this new            … What is the change in 
                                     thinking …                     information becomes            YOUR HYPOTHESIS 
                                                                    available … 
                                     (1) Pseudohyponatremia         Serum glucose is 190           Much more likely (+2) 
                                     from her diabetes              (normal 80–131)                More likely (+1) 
                                                                                                   No change 
                                                                                                   Less likely (–1) 
                                                                                                   Much less likely (–2) 
                                     (2) Compulsive water           Urine specific gravity <       Much more likely (+2) 
                                     drinking                       1.003                          More likely (+1) 
                                                                    Urine osmolarity is 50         No change 
                                                                                                   Less likely (–1) 
   Downloaded by [Bibliothèques de l'Université de Montréal] at 08:02 19 September 2013            Much less likely (–2) 
                                     (3) Overcorrection of earlier  She was given one liter of     Much more likely (+2) 
                                     hypernatremia                  D5W (free h2o) when her        More likely (+1) 
                                                                    serum sodium was 135. 
                                                                                                   No change 
                                                                                                   Less likely (–1) 
                                                                                                   Much less likely (–2) 
             FIG. 1a.  Example of Script Concordance format. Note. Several questions (items) can be nested into one clinical vignette. Scores are based on how close the
             learner’s answers are to a panel of experts that have already taken the test. The common 5-point scale is shown here, though in our study we used 3 points (more
             likely, no change,orless likely).
                                                                   SCRIPTCONCORDANCEVS.MULTIPLECHOICE                                                                  189
                                          Examples of SC answers and reasoning combinations from Item 1 in Figure 1a 
                                          Correct answer/Correct Reasoning             “Less Likely” – “The serum glucose is not high 
                                                                                       enough to account for that degree of 
                                                                                       hyponatremia” 
                                          Correct answer/Incorrect reasoning           “Less Likely” – High glucose causes 
                                                                                       hypERnatremia due to osmotic diuresis of water 
                                          Incorrect answer/Correct reasoning           “More Likely” – High glucose osmotically shifts 
                                                                                       water from the cells to the extravascular space.” 
                                          Incorrect answer/Incorrect reasoning  “No change” – Glucose has nothing to do with 
                                                                                       sodium, but can cause intracellular potassium 
                                                                                       shifts” 
                                                   FIG. 1b.   With the addition of free text justification, there are four possible outcomes.
               modalscore.Blandetal.20comparedmultipledifferentwaysof                          isting, low-stakes 30-item MCQ with two 20-item exams, one
               scoring and showed 3-point mode “single best answer” scoring                    MCQ,andoneSC,giventoall3rd-yearmedicalstudentsatthe
               to have a similar validity coefficient (a score correlating with                 end of their internal medicine clerkship during the 2008–2009
               level of training) with an only slightly lower reliability. As they             academicyear.TheInstitutionalReviewBoardattheUniformed
               pointed out, Likert scoring can erroneously judge examinees                     Services University determined that this project did not require
               who fail to recognize the direction of a changing hypothesis                    review.
               (as more or less likely) to be the same as examinees that ap-                   ExamandSurveyDevelopment
               preciate the direction but fail to agree with the expert panel on
               the degree of change. Single-best answer scoring may also be                        For the MCQ portion, the 20 best questions from our ex-
               more appropriate when assessing junior learners like medical                    isting exam that historically performed well psychometrically
               undergraduates. Three-point scales offer the advantage of ease                  (in terms of difficulty, discrimination, and interitem reliability)
               of standard-setting and test administration and are more appro-                 were retained. For the SC portion, questions were created to
               priate for medical students, whose clinical reasoning is often                  cover the same content areas as in the MCQs but with 20 ques-
               still in an early developmental stage. More recently, Charlin                   tions nested within five clinical vignettes. A published SC exam
               et al.21 have recommended transforming raw scores into a scale                  writing guide14 was followed including creation by two of the
               in which the expert panel’s mean is set as the value of refer-                  authors (WFKandGDD),adequatecontentsampling,anditem
               ence and the standard deviation is used to measure examinee                     selectionandscoringbasedonareferencestandardof10experts
               performance.                                                                    (our internal medicine clerkship site directors). Questions that
                   WedecidedtoreplaceourlocallydevelopedMCQexamina-                            theexpertpanelfeltwerepoorlywordedwereeditedoromitted,
               tionwithanSCexaminationinthehopesofhavingbetterassess-                          with a goal of unanimity in expert responses. Students then an-
               ment of medical knowledge and clinical reasoning, and a more                    swered along a 3-point Likert scale of less likely, no change,or
               reliable and valid test instrument, that is, a format perceived as              morelikely.Weusedmodalscoringinwhichthemostcommon,
               more representative of actual clinical practice by students and                 or modal, answer given by the experts was considered to be the
               teachers. This change was prospectively studied to assess test                  correct answer.
               instrument quality and learner satisfaction.                                        Theauthorsdevelopedapaper-basedlearnersatisfactionsur-
                   The purposes of this project were (a) to compare feasibility,               veyregardingtheNBME,MCQ,andSCexams,includingques-
    Downloaded by [Bibliothèques de l'Université de Montréal] at 08:02 19 September 2013 reliability, validity, and learner satisfaction of a three-option SCtions with 7-point Likert-scaled responses concerning test diffi-
               examinationwithaMCQexaminationduringa3rd-yearinternal                           culty (too easy to too difficult), time pressure (too much time to
               medicine clerkship; (b) to compare feasibility and acceptability                not enough time), clinical relevance (not relevant at all to com-
               of SC questions between medical students and a convenience                      pletely relevant), and fairness (completely fair to totally unfair).
               sample of house staff; and (c) to compare learner satisfaction                  The survey included space for participants to write free-text
               with SC between first- and fourth-quarter medical students.                      commentsandtoindicate their format preferences.
                                                                                               House Staff Sampling
               METHODS                                                                             Prior to administering the exams and survey to medical stu-
                                                                                               dents, the MCQ and SC examinations and the survey were
               Design                                                                          administered to a convenience sample of 15 internal medicine
                   This is a prospective comparison study of faculty-written                   residents at a house staff meeting to assess feasibility of the for-
               MCQand faculty-written SC questions. We replaced our ex-                        mats, validity of the instrument, and opinions of these learners.
             190                                               W.KELLY,S.DURNING,G.DENTON
             HousestaffdidnottaketheNBMEexamination,sotheydidnot                 RESULTS
             answer survey questions regarding that examination.                    All 163 students completed the exams, and all but two par-
                                                                                 ticipated in the learner satisfaction survey. Student performance
             ExamandSurveyAdministration                                         on the MCQ and SC exams was similar when scored dichoto-
                The MCQ and SC tests were administered to medical stu-           mously; mean (standard deviation) correct out of 20 items was
             dents independently and sequentially during the final week           13.24(2.32)versus13.69(1.90).Eachexamhadthesamenum-
             of their 12-week internal medicine clerkship. The order of          ber (15; 75%) of “acceptably difficult” items (ones that no less
             administration varied with each quarterly exam administration.      than30%ofstudentsbutnomorethan90%answeredcorrectly)
             Twenty-fiveminuteswereallowedforeachoftheexams,which                 with difficulty Index (M, SD) being 66.23 (24.77) versus 68.50
             were closed-book and proctored. Remaining time could not be         (26.91), p = .67, respectively. Our MCQ and SC items both had
             usedtoreturntotheotherexamformat.Answerswereenteredon               lowbutsimilardiscriminating power (ability to separate the top
             standardbubble-sheetsforcomputerizedscoring.Studentswere            and bottom third of students based on overall raw score) with
             also asked to write a very brief, free-text justification of their   mean (standard deviation) of 8.00 (5.32) versus 6.53 (4.77),
             clinical reasoning for each SC answer in a separate test booklet.   p=.33,andpoint-biserial coefficient (correlation between per-
             On the same test day, students also completed the NBME ex-          formanceonanindividualitemandperformanceoverall),mean
             amination, which is a 100-question multiple-choice exam that        (standard deviation) of 0.28 (0.13) versus 0.24 (0.12), p = .14.
             lasts 2.5 hours. At the end of the test day, all students completed    Neither exam had good reliability, but the SC exam had half
             thelearnersatisfactionsurvey,whichwasanonymousandpaper              the reliability of the MCQ (KR20 0.19 vs. 0.41). Among inter-
             based.                                                              nal medicine house staff (N = 15), there was no difference in
                                                                                 reliability (KR20 = 0.52 for both formats). Test item analysis
                                                                                 allowedimprovementofreliabilityofthestudent’sSCexamina-
             Analysis                                                            tionwitheliminationofonequestion(KR20=0.32).Reliability
                Testquestionsweremachinescoredandstandardpsychome-               oftheSCexaminationdidnotchangewhenscoredasacontinu-
             tric properties of the two tests were determined. SC scores were    ousvariable, that is, distance from the mode, in which a student
             calculated dichotomously as correct or incorrect using 3-point      could score 0, 1, or 2 points away from each designated correct
             Likert modal scoring, and, in a separate analysis, exams were       answer.
             scored using 3-point Likert “distance from mode” scoring in            Overall, the SC items were answered correctly 65.0% of the
             which a student could be 0, 1, or 2 points away from the most       time.Thefree-textjustificationforthesecorrectanswersdemon-
             common expert answer to each item. For the purposes of this         strated good clinical reasoning almost all the time (94.8%).
             project, free-text answers were reviewed by one of the authors      Documentation of reasoning was missing or incorrect in the
             (WFK)andmatchedwithSCquestionstocreatefourcategories:               other 5.2%. An example of right-for-the-wrong-reason would
             correct, with or without good clinical reasoning, and incorrect,    be noting that in a renal failure vignette a fractional excretion
             with or without good reasoning (Figure 1b). Kuder-Richardson        of sodium (FENA) of 4% would make a prerenal etiology “less
             20(KR20)wasusedtocomparereliabilityoftheMCQandSC                    likely” but then writing “because the FENA should be higher”
             exam scores of students and house staff. KR20 is a measure of       or simply writing “I have no idea.” When SC items were an-
             internal consistency for instruments with dichotomous choices.      sweredincorrectly,40.8%stillhadsomecorrectmedicalknowl-
             It is analogous to Cronbach’s alpha, which can be used for con-     edge in the free-text answer. The most common example was
                                22 Values greater than 0.9 indicate excellent    in a vignette with a post-op orthopedic surgery patient with
             tinuous measures.
             internal consistency.                                               chest tightness and shortness of breath. Given that pulmonary
                Student clinical performance, one of our usual methods of        embolism was suspected, a normal chest X-ray would make
             student assessment during this clerkship, is a weighted summa-      pulmonary embolism “more likely.” Many students indicated
   Downloaded by [Bibliothèques de l'Université de Montréal] at 08:02 19 September 2013 tion of teachers’ clinical grade recommendations based on the“no change” or “less likely” while still noting correctly in their
             RIMEscheme.23TheRIMEschemeisavalidatedframeworkfor                  free-text comments that “PE doesn’t normally show up on x-
             evaluation widely used in undergraduate medical education. As       ray,” “unlikely to see Hampton’shump,”or“x-rayisinsensitive,
             a measure of validity, we correlated MCQ and SC exam scores         need a VQ scan or spiral CT.” Whereas most students demon-
             to student clinical performance using Pearson’s correlation co-     strated sound clinical reasoning, many seemed uncomfortable
             efficient. For the learner satisfaction survey, we reported means    withreal-timediagnosticuncertaintysuchasbeingunwillingto
             and summarized free-text comments. Mean Likert-scale differ-        let an ECG without ischemic changes decrease their suspicion
             ences on the learner satisfaction survey between students and       for myocardial infarction without yet having the cardiac en-
             house staff and between first and fourth quarter students were       zymes.Othersappreciatedlabresultsstrictlyaseithernormalor
             compared using parametric and nonparametric tests including         abnormal rather than as a continuum. After reviewing free-text
             WilcoxonSignedRankstest.Dataanalysiswasperformedusing               responses and the psychometric analysis of test performance,
             SPSS, with statistical significance level of .05.                    werewroteafewofthequestions,andtheresulting30-itemSC
The words contained in this file might help you see if this file matches what you are looking for:

...Teaching and learning in medicine copyright c taylor francis group llc issn print online doi applied research comparing a script concordance examination to multiple choice on core internal clerkship william kelly steven durning gerald denton department of uniformed services university health sciences bethesda maryland usa preferred by students sc was more reliable when background questions which administered house staff learnerisgivenabriefclinicalscenariothenaskedifadditionalin formation makes one hypothesis or less likely with answers compared panel experts are designed reect learner s clinical reasoning purpose the is compare reliability validity satisfaction between three option modied although descriptive evaluation i e narrative comments scexaminationtoamultiple choicequestion mcq madebyclinicalteachers regarding student performance amongmedicalstudentsduringard yearinternalmedicineclerk predominant method used ship clerkships written examinations account for up third medical con...

no reviews yet
Please Login to review.