PRP005: A Natural Language Processing Approach to Examining Subtle Bias Towards Patients in Electronic Health Records
Isabel Bilotta, MA; Michael Hansen, MD; Winston Liaw, MD, MPH; Yang Xiang, PhD; Scott Tonidandel, PhD
Abstract
Context: Subtle bias by health care clinicians is linked to negative outcomes for racial minority, particularly African American, patients. This bias refers to an evaluation, decision, perception, or action in favor of or against a person or group compared with another.
Objective: Applying natural language processing to Electronic Health Record (EHR) notes, our objective is to examine differences, indicating possible bias, in clinicians’ communication with patients of diverse racial and ethnic backgrounds.
Study Design: In this cross-sectional study, we use the natural language processing tool, Sentiment Analysis and Social Cognition Engine (SEANCE), to assess multiple linguistic markers in the EHR text.
Dataset: We extracted EHR encounters (n = 15,460 encounters) for individuals at least 18 years of age diagnosed with type 2 diabetes, who received care from family physicians, general internists, or endocrinologists practicing in an academic, urban network of clinics between 2006 and 2015.
Outcome Measures: The SEANCE scores are given as proportions of total words within a text. We utilized SEANCE component indices, which are macro-level summaries of different linguistic categories. Specifically, we examine the components of: negative adjectives, positive adjectives (e.g., “acceptable”), well-being, trust verbs (e.g., “acknowledge”), failure and disgust, joy (e.g., “happy”), politics (e.g., “ally”), and respect (e.g., “cooperation”). The data are analyzed through cross-classified random effects models which include fixed effects of race and age, and random effects of patient and provider.
Results: The cross-classified random effects model for the SEANCE linguistic component of negative adjectives was significant at the p < 0.05 level, such that providers (n=273) use more negative adjectives in the EHR for African American patients (β= .50) compared to Hispanic (β= .453) and White patients β= .434), controlling for patient age. Data analyses for the other SEANCE components of interest: positive adjectives, well-being, trust verbs, failure and disgust, joy, politics, and respect are in progress.
Objective: Applying natural language processing to Electronic Health Record (EHR) notes, our objective is to examine differences, indicating possible bias, in clinicians’ communication with patients of diverse racial and ethnic backgrounds.
Study Design: In this cross-sectional study, we use the natural language processing tool, Sentiment Analysis and Social Cognition Engine (SEANCE), to assess multiple linguistic markers in the EHR text.
Dataset: We extracted EHR encounters (n = 15,460 encounters) for individuals at least 18 years of age diagnosed with type 2 diabetes, who received care from family physicians, general internists, or endocrinologists practicing in an academic, urban network of clinics between 2006 and 2015.
Outcome Measures: The SEANCE scores are given as proportions of total words within a text. We utilized SEANCE component indices, which are macro-level summaries of different linguistic categories. Specifically, we examine the components of: negative adjectives, positive adjectives (e.g., “acceptable”), well-being, trust verbs (e.g., “acknowledge”), failure and disgust, joy (e.g., “happy”), politics (e.g., “ally”), and respect (e.g., “cooperation”). The data are analyzed through cross-classified random effects models which include fixed effects of race and age, and random effects of patient and provider.
Results: The cross-classified random effects model for the SEANCE linguistic component of negative adjectives was significant at the p < 0.05 level, such that providers (n=273) use more negative adjectives in the EHR for African American patients (β= .50) compared to Hispanic (β= .453) and White patients β= .434), controlling for patient age. Data analyses for the other SEANCE components of interest: positive adjectives, well-being, trust verbs, failure and disgust, joy, politics, and respect are in progress.
Tim Riley
triley1@pennstatehealth.psu.edu 11/22/2020Very interesting work! What applications do you anticipate for this data to reduce bias and discrimination in healthcare? I wonder if it would be possible to use this data to target providers for implicit bias training?