· 7 years ago · Oct 26, 2018, 09:22 AM
1Hypertension sample (code review)
2===========
3
4
5# Motivation
6
7This project requires selecting a sample of patients who meet the following criteria.
8
91. aged 18-85
102. had 2 encounters between `2015-01-01` and `2017-12-31` with a hypertension ICD9 diagnosis
11
12These data live in RDB flat files, a collection of 17 tab-delimited datasets from the Clarity data warehouse. The data had been imported into a MySQL database, but was later relocated into a MS SQL Server database.
13
14This documents covers changes to the query that creates a sample of 23632 patients who meet the above criteria.
15
16## The MySQL Query
17
18The first attempt to create this sample was in MySQL. The query is outlined below.
19
20### 1 - Create a subset from ENCOUNTERS
21
22This is a list of OUTPATIENT encounters approved by the PI. Three additional Encounter_Types were chosen after review by another member in the PHDI group ('Education', 'Procedure visit', 'Initial consult'). They get stored in the `tmp_HtnEncounterTypeList` table
23
24```sql
25CREATE TABLE tmp_HtnEncounterTypeList
26SELECT * FROM
27ENCOUNTERS
28-- these encounters were approved by KBD
29WHERE Encounter_Type IN('Appointment', 'Office Visit', 'History',
30'Nurse Only', 'Care Coordination', 'Routine Prenatal',
31'Off License Non Med Ctr', 'Home Care Visit', 'Evaluation',
32'Therapy', 'Nutrition', 'Genetic Counseling', 'Social Work',
33'Anti-coag visit', 'Initial Prenatal', 'Day Treatment',
34'Postpartum Visit', 'Home Health Admission', 'Ophth Exam',
35'Remote CHF Monitoring', 'LACTATION CONSULT',
36'Office Visit - Admin Closed', 'Left without being seen',
37'Community Orders', 'Care Coordination - Home Health',
38'Medication Management', 'Rx Refill Authorize', 'Nurse Triage',
39-- MBL recommended adding the following encounter_types
40'Education','Procedure visit', 'Initial consult');
41```
42
43Now this table can be used to get the larger sample of patients who meet the additional encounter criteria in the `DIAGNOSES` table.
44
45```sql
46CREATE TABLE
47 PHDI_HTN_SAMP
48SELECT
49-- temp_enc is the tmp_HtnEncounterTypeList table
50tmp_enc.EncounterDate,
51tmp_enc.EncounterKey,
52tmp_enc.Encounter_ID,
53tmp_enc.Encounter_Type,
54tmp_enc.Encounter_Is_Inpatient,
55tmp_enc.Encounter_Is_Walk_In,
56tmp_enc.Encounter_Age,
57
58/*
59get Patient_ID from DIAGNOSES (not from ENCOUNTERS)
60tmp_enc.Patient_ID,
61*/
62
63/* DIAGNOSES table conditions */
64
65diag.ICD9_Code,
66/* this links to Encounter_ID in the ENCOUTNERS file*/
67diag.Diagnoses_Encounter_ID,
68/* type of diagnosis */
69diag.Diagnosis_Type,
70/*source table for diagnosis*/
71diag.Diagnosis_Status,
72/*status can be active, deleted, or resolved. */
73diag.Diagnosis_Present_On_Admission,
74/*yes or no*/
75diag.Diagnosis_Name,
76/* name of dx from dx_name in Clarity */
77diag.Patient_ID
78/* get Patient_ID to link back to PATIENTS, etc */
79FROM
80tmp_HtnEncounterTypeList AS tmp_enc
81INNER JOIN DIAGNOSES AS diag
82ON tmp_enc.Encounter_ID = diag.Diagnoses_Encounter_ID
83-- these are all the diagnosis codes for hypertension
84WHERE diag.ICD9_Code
85IN ("401", "401.0", "401.1", "401.9","402",
86"402.0","402.00","402.01", "402.1","402.10","402.11","402.9",
87"402.90","402.91","403","403.00","403.01", "403.1","403.10",
88"403.11","403.9","403.91","404","404.0","404.00","404.01",
89"404.02","404.03","404.1","404.10","404.11","404.12","404.13",
90"404.9","404.90", "404.91","404.92","404.93","405","405.0",
91"405.01","405.09","405.1","405.11", "405.19", "405.9",
92"405.91", "405.99")
93
94 AND (tmp_enc.EncounterDate
95-- 2015 & 2017 data (updated)
96 BETWEEN '2015-01-01' AND '2017-12-31')
97-- age between 18 and 85
98 AND (tmp_enc.Encounter_Age BETWEEN 18 AND 85);
99
100/*
101213813 row(s) affected
102Records: 213813
103Duplicates: 0
104Warnings: 0
10512137.804 sec
106*/
107
108/* CHECK COUNT ON NEW TABLE */
109SELECT COUNT(*) FROM PHDI_HTN_SAMP;
110/*
111+----------+
112| COUNT(*) |
113+----------+
114| 213813 |
115+----------+
1161 row in set (0.16 sec)
117*/
118```
119
120
121## 2 - REDUCE TO 2+ Hypertension Encounters
122
123Reduce this to distinct `Patient_ID`s for patients with a hypertension diagnosis and at least 2 outpatient encounters. I can do this by counting the number of encounter IDs per `Patient_ID`
124
125
126```sql
127CREATE TABLE
128 temp_HtnCountPatIDEncID
129SELECT
130COUNT(*) AS count,
131 Encounter_ID,
132 Patient_ID
133FROM
134 PHDI_HTN_SAMP
135GROUP BY
136 Encounter_ID, Patient_ID
137HAVING
138 count > 1
139ORDER BY
140 count DESC;
141
142/*
14330311 row(s) affected
144Records: 30311
145Duplicates: 0
146Warnings: 0
147*/
148
149/* check this new temp_HtnCountPatIDEncID table */
150-- =======================================================
151
152SELECT * FROM temp_HtnCountPatIDEncID
153ORDER BY count
154DESC LIMIT 5;
155
156/*
157
158+-------+-----------------+-----------------+
159| count | Encounter_ID | Patient_ID |
160+-------+-----------------+-----------------+
161| 15 | 808640570379794 | 942523706238717 |
162| 12 | 152675774414092 | 243384487461299 |
163| 12 | 378120614681393 | 744728490710259 |
164| 12 | 273427328560501 | 388629973400384 |
165| 9 | 889970415737480 | 529602517839521 |
166+-------+-----------------+-----------------+
1675 rows in set (0.03 sec)
168*/
169
170SELECT * FROM temp_HtnCountPatIDEncID
171ORDER BY count LIMIT 5;
172
173/*
174+-------+-----------------+-----------------+
175| count | Encounter_ID | Patient_ID |
176+-------+-----------------+-----------------+
177| 2 | 471763866953552 | 434451227076352 |
178| 2 | 862538943067193 | 16760296188295 |
179| 2 | 453247585333884 | 131788644008338 |
180| 2 | 106287259142846 | 302721116691828 |
181| 2 | 604085796978325 | 977933479938656 |
182+-------+-----------------+-----------------+
1835 rows in set (0.02 sec)
184*/
185/* these range from 15 t0 2 */
186/*
187```
188
189
190## 3 - DISTINCT PATIENT IDs: get the DISTINCT Patient_IDs
191
192Now get the distinct `PATIENT_ID`s from the `temp_HtnCountPatIDEncID` table and put in separate table
193
194
195```sql
196CREATE TABLE PHDI_HTN_2ENC_IDs
197SELECT DISTINCT Patient_ID FROM temp_HtnCountPatIDEncID;
198SELECT * FROM PHDI_HTN_2ENC_IDs;
199-- 19347 rows
200/* DROP TEMP TABLES */
201-- DROP TABLE temp_HtnCountPatIDEncID;
202-- DROP TABLE tmp_HtnEncounterTypeList;
203```
204
205## MySQL Code Review
206
207The MySQL query was very slow. This was due to a combination of 1) using free MySQL community edition, and 2) not properly indexing the new table of encounter types.
208
209## The MS SQL Server query
210
211This is the query saved as a stored procedure in the MS SQL server database (`[dbo].[deprecated_spc_HTN]`). It was reviewed and a couple changes were made to get the correct sample (and to improve speed).
212
213
214```sql
215USE [phdi]
216GO
217/****** Object: StoredProcedure [dbo].[deprecated_spc_HTN] ******/
218/*
219SET ANSI_NULLS ON
220GO
221SET QUOTED_IDENTIFIER ON
222GO
223*/
224
225/*
226exec spc_HTN
227this proc has hard-coded encounter dates
228BETWEEN '2015-01-01' AND '2017-12-31'
229*/
230```
231
232My previous strategy was to create a subset of the `Encounter_Type`s, then join to the `DIAGNOSES` table using the `Diagnoses_Encounter_ID`. This slowed the process down because there wasn't a unique index in the temp table (`tmp_HtnEncounterTypeList`).
233
234This code chunk creates the `PHDI_HTN_SAMP` table and formats each column before entering data into the table. Minor difference in workflow between MySQL and SQL Server.
235
236```sql
237CREATE TABLE #PHDI_HTN_SAMP(
238 [Encounterdate] [date] NULL,
239 [encounterkey] [varchar](250) NULL,
240 [Encounter_ID] [bigint] NULL,
241 [Encounter_type] [varchar](250) NULL,
242 [Encounter_Is_Inpatient] [varchar](250) NULL,
243 [Encounter_Is_Walk_In] [varchar](250) NULL,
244 [Encounter_age] [smallint] NULL,
245 [ICD9_code] [varchar](250) NULL,
246 [Diagnoses_Encounter_ID] [bigint] NULL,
247 [Diagnosis_Type] [varchar](250) NULL,
248 [Diagnosis_Status] [varchar](250) NULL,
249 [Diagnosis_Present_On_Admission] [varchar](250) NULL,
250 [Diagnosis_Name] [varchar](250) NULL,
251 [Patient_ID] [bigint] NULL
252)
253```
254
255Now the `SELECT` statement follows the `INSERT INTO` command.
256
257```sql
258INSERT into #PHDI_HTN_SAMP
259SELECT
260 enc.EncounterDate,
261 enc.EncounterKey,
262 enc.Encounter_ID,
263 enc.Encounter_Type,
264 enc.Encounter_Is_Inpatient,
265 enc.Encounter_Is_Walk_In,
266 enc.Encounter_Age,
267 /* DIAGNOSES table conditions */
268 diag.ICD9_Code,
269 /* this links to Encounter_ID in the ENCOUTNERS file*/
270 diag.Diagnoses_Encounter_ID,
271 /* type of diagnosis */
272 diag.Diagnosis_Type,
273 /*source table for diagnosis*/
274 diag.Diagnosis_Status,
275 /*status can be active, deleted, or resolved. */
276 diag.Diagnosis_Present_On_Admission,
277 /*yes or no*/
278 diag.Diagnosis_Name,
279 /* name of dx from dx_name in Clarity */
280 diag.Patient_ID
281 /* get Patient_ID to link back to PATIENTS, etc */
282FROM
283 ENCOUNTERS AS enc -- >> ENCOUNTERS alias
284INNER JOIN
285 DIAGNOSES AS diag -- >> DIAGNOSES alias
286
287ON enc.Encounter_ID = diag.Diagnoses_Encounter_ID
288 -- >> these are all the diagnosis codes for hypertension
289WHERE diag.ICD9_Code
290 IN ('401', '401.0', '401.1', '401.9','402',
291 '402.0','402.00','402.01', '402.1','402.10','402.11','402.9',
292 '402.90','402.91','403','403.00','403.01', '403.1','403.10',
293 '403.11','403.9','403.91','404','404.0','404.00','404.01',
294 '404.02','404.03','404.1','404.10','404.11','404.12','404.13',
295 '404.9','404.90', '404.91','404.92','404.93','405','405.0',
296 '405.01','405.09','405.1','405.11', '405.19', '405.9',
297 '405.91', '405.99') AND
298 (enc.EncounterDate
299 -- >> date parameters (2015 & 2017 data (updated))
300 BETWEEN '2015-01-01' AND '2017-12-31')
301 -- >> age parameters (age between 18 and 85)
302
303 AND (enc.Encounter_Age BETWEEN 18 AND 85)
304
305/*
306This was the previous temporary table that was slowing the process down. By
307adding this as a WHERE condition it becomes part of the JOIN (and has indexes).
308*/
309 AND enc.Encounter_Type IN('Appointment', 'Office Visit',
310 'History', 'Nurse Only', 'Care Coordination', 'Routine Prenatal',
311 'Off License Non Med Ctr', 'Home Care Visit', 'Evaluation',
312 'Therapy', 'Nutrition', 'Genetic Counseling', 'Social Work',
313 'Anti-coag visit', 'Initial Prenatal', 'Day Treatment',
314 'Postpartum Visit', 'Home Health Admission', 'Ophth Exam',
315 'Remote CHF Monitoring', 'LACTATION CONSULT',
316 'Office Visit - Admin Closed', 'Left without being seen',
317 'Community Orders', 'Care Coordination - Home Health',
318 'Medication Management', 'Rx Refill Authorize', 'Nurse Triage',
319 -- MBL recommended adding the following encounter_types
320 'Education','Procedure visit', 'Initial consult')
321```
322
323This portion is where the code was rewritten to get the sample we wanted. Previously the grouping statement included the `Patient_ID` and `Encounter_ID`, but this was not correct.
324
325Instead create a table with the `DISTINCT` `Patient_ID` and `Encounter_ID`s, then we only have to count the number of times a `Patient_ID` shows up in the `temp` table.
326
327```sql
328 CREATE TABLE #temp (Encounter_ID bigint, Patient_ID bigint)
329 INSERT INTO #temp
330 SELECT DISTINCT
331 Encounter_ID, Patient_ID
332 FROM #PHDI_HTN_SAMP
333```
334
335There are multiple `Encounter_ID`s per `Patient_ID`. Moreover, there can be multiple `ICD9_Code`s on a single `Encounter_ID`. So grouping by `Encounter_ID` and `Patient_ID` is unnecessary.
336
337```sql
338 TRUNCATE TABLE PHDI_HTN_2ENC_IDs -- << deletes the table if it exists
339 INSERT INTO PHDI_HTN_2ENC_IDs
340 SELECT
341 patient_id
342 FROM #temp
343 GROUP BY patient_id
344 HAVING count(patient_id)>1
345 ORDER BY patient_id
346```
347
348Now drop the temporary tables.
349
350```sql
351 drop table #temp
352 drop table #PHDI_HTN_SAMP
353```
354<!--stackedit_data:
355eyJoaXN0b3J5IjpbLTEyNTA4NTUyMjcsLTM0NTY4NTEzNSwyMT
356IxMDA2NjI0LDEwNDA4NjY5NTQsMTY1MTUyODUxMSwtMTI5NDQ2
357MDQ3MSwtMTUyOTM5OTg2MywyNDA4NDg1OTcsMTg3MTIyNDc0Ny
358wtMTkxNjcxODUxOSwzMDI0NjY2NTcsNDU4Nzc1OTgwLC02MTMx
359NDkxMjMsMTY4NDU2OTEzMCwtMTgzOTUxNDQyMF19
360-->