PRP005: A Natural Language Processing Approach to Examining Subtle Bias Towards Patients in Electronic Health Records

Isabel Bilotta, MA; Michael Hansen, MD; Winston Liaw, MD, MPH; Yang Xiang, PhD; Scott Tonidandel, PhD


Context: Subtle bias by health care clinicians is linked to negative outcomes for racial minority, particularly African American, patients. This bias refers to an evaluation, decision, perception, or action in favor of or against a person or group compared with another.
Objective: Applying natural language processing to Electronic Health Record (EHR) notes, our objective is to examine differences, indicating possible bias, in clinicians’ communication with patients of diverse racial and ethnic backgrounds.
Study Design: In this cross-sectional study, we use the natural language processing tool, Sentiment Analysis and Social Cognition Engine (SEANCE), to assess multiple linguistic markers in the EHR text.
Dataset: We extracted EHR encounters (n = 15,460 encounters) for individuals at least 18 years of age diagnosed with type 2 diabetes, who received care from family physicians, general internists, or endocrinologists practicing in an academic, urban network of clinics between 2006 and 2015.
Outcome Measures: The SEANCE scores are given as proportions of total words within a text. We utilized SEANCE component indices, which are macro-level summaries of different linguistic categories. Specifically, we examine the components of: negative adjectives, positive adjectives (e.g., “acceptable”), well-being, trust verbs (e.g., “acknowledge”), failure and disgust, joy (e.g., “happy”), politics (e.g., “ally”), and respect (e.g., “cooperation”). The data are analyzed through cross-classified random effects models which include fixed effects of race and age, and random effects of patient and provider.
Results: The cross-classified random effects model for the SEANCE linguistic component of negative adjectives was significant at the p < 0.05 level, such that providers (n=273) use more negative adjectives in the EHR for African American patients (β= .50) compared to Hispanic (β= .453) and White patients β= .434), controlling for patient age. Data analyses for the other SEANCE components of interest: positive adjectives, well-being, trust verbs, failure and disgust, joy, politics, and respect are in progress.
Leave a Comment
Tim Riley 11/22/2020

Very interesting work!  What applications do you anticipate for this data to reduce bias and discrimination in healthcare?  I wonder if it would be possible to use this data to target providers for implicit bias training?  

Jessica Kram

Very interesting work! Study really helps us understand other implicit biases we may not have realized we had.

Mack Ruffin
mruffin@pennstatehealth.psu 11/22/2020

How many of the Hispanic patients were seen with a translator? I don’t recall using most of the words listed in SEANCE. The words I do use are adverse, advise, blood, clinic,and nutrition.  I would review some of the charts to make sure how the words were used.

Jaky Kueper 11/24/2020

Engaging work! I look forward to seeing the next steps.

Social Media


11400 Tomahawk Creek Parkway
Leawood, KS 66211