[R] ์นดํ”Œ๋ž€ ๋งˆ์ด์–ด(kaplan-meier) [Survival Analysis]

2023. 1. 10. 11:44ใ†๐Ÿง‘๐Ÿป‍๐Ÿ’ป With Data/๋ฐ์ดํ„ฐ ๋ถ„์„

R์—์„œ ์นดํ”Œ๋ž€ ๋งˆ์ด์–ด ๊ธฐ๋ฒ•์„ ๊ตฌํ˜„ํ•ด๋ณด์ž

์ด ๋ชจ๋ธ์€ ์ฃผ๋กœ ์˜๋ฃŒ ๋ฐ์ดํ„ฐ์—์„œ ๋งŽ์ด ์ด์šฉ๋˜์ง€๋งŒ, ์šฐ๋ฆฌ๋Š” ๋น„์ฆˆ๋‹ˆ์Šค ๋„๋ฉ”์ธ์œผ๋กœ ์˜ฎ๊ฒจ ๊ณ ๊ฐ๋“ค์˜ ์ƒ์กด, ์ดํƒˆ์„ ๋ถ„์„ํ•ด๋ณธ๋‹ค.

๋ฐ์ดํ„ฐ๋Š” ์—ฐ์Šต ๋ฐ์ดํ„ฐ์ธ survivalDataExercise.csv๋ฅผ ์ด์šฉํ•˜์˜€๋‹ค. (๋Œ“๊ธ€ ์ฃผ์‹œ๋ฉด ๊ณต์œ ํ•ด๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.)

๋จผ์ €

 

library(dplyr)
library(ggplot2)
library(survival)
library(survminer)

#๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์—†์œผ๋ฉด install.packages("")

 

setwd("") #๊ฒฝ๋กœ ์„ค์ •
surv = read.csv("")#๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

๋ณ€์ˆ˜๋Š” ์ตœ๊ทผ ๊ตฌ๋งค๋กœ๋ถ€ํ„ฐ ๊ฒฝ๊ณผ๋œ ์ผ์ˆ˜์™€ ์„ฑ๋ณ„, ์ƒํ’ˆ๊ถŒ ์ด์šฉ์—ฌ๋ถ€, ํ™˜๋ถˆ์—ฌ๋ถ€, ์žฌ๊ตฌ๋งค์—ฌ๋ถ€๊ฐ€ ์žˆ๋‹ค.

surv = surv %>% mutate(event = ifelse(boughtAgain==1, 0, 1),
                       gender = as.factor(gender),
                       voucher = as.factor(voucher),
                       returned = as.factor(returned),
                       boughtAgain = as.factor(boughtAgain))

#๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ +  event๋ผ๋Š” ์ดํƒˆ ๋ณ€์ˆ˜๋กœ ๋„ฃ์–ด์ฃผ๊ธฐ (์žฌ๊ตฌ๋งค 1 = ์ƒ์กด / ์žฌ๊ตฌ๋งค 0 = ์ดํƒˆ ,์‚ฌ๋ง)

 

survobj = Surv(surv$daysSinceFirstPurch, surv$event)

#์ƒ์กด์ƒํƒœ ์‹๋ณ„ํ•˜๊ธฐ. Surv ํ•จ์ˆ˜๋กœ (๊ธฐ๊ฐ„, ์‚ฌ๊ฑด)์„ ๋„ฃ์–ด์ค˜์„œ ๊ธฐ๊ฐ„์— ๋”ฐ๋ฅธ ์ƒ์กด์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•œ๋‹ค.

+๋Š” ์ƒ์กด์„ ์˜๋ฏธํ•œ๋‹ค.

km.model <- survfit(survobj ~ 1, data = surv, type="kaplan-meier") 
summary(km.model) #๋ชจ๋ธ ๋งŒ๋“ค๊ธฐ (๋ณ€์ˆ˜ ๊ณ ๋ ค ์•ˆํ–ˆ์„ ๋•Œ)

 

ggsurvplot(km.model,
           risk.table = "nrisk_cumcensor", # Add risk table
           ggtheme = theme_bw(), # Change ggplot2 theme
           palette = c("#2E9FDF"),
           title ='Kaplan-Meier Survival Model', 
           legend = 'none')

#์‹œ๊ฐํ™”

x๋Š” ์‹œ๊ฐ„์ด๊ณ  y๋Š” ์ƒ์กด์œจ์ด๋‹ค. ์ฆ‰, ์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ์ƒ์กด์œจ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

์ด์ œ ๊ฐ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜์— ๋”ฐ๋ฅธ ์ƒ์กด์œจ์„ ๋น„๊ตํ•ด๋ณด์ž.

 

surv_plots <- list() #ํ”Œ๋ž์„ ๋‹ด์„ ๋ฆฌ์ŠคํŠธ ๋งŒ๋“ค๊ธฐ
surv_plots[[1]] = ggsurvplot(survfit(Surv(daysSinceFirstPurch, event) ~ gender, data = surv), conf.int = TRUE,
                             title = 'Kaplan-Meier for Gender', xlab = 'Time')
surv_plots[[2]] = ggsurvplot(survfit(Surv(daysSinceFirstPurch,event) ~ voucher, data = surv), conf.int = TRUE,
                             title = 'Kaplan-Meier  for voucher', xlab = 'Time')
surv_plots[[3]] = ggsurvplot(survfit(Surv(daysSinceFirstPurch,event) ~ returned, data = surv), conf.int = TRUE,
                             title = 'Kaplan-Meier  for returned', xlab = 'Time')

arrange_ggsurvplots(surv_plots, nrow = 2, ncol = 2)

 

์œ„ ์ด๋ฏธ์ง€๋ฅผ ํ•ด์„ํ•˜๋ฉด ์œ ์˜๋ฏธํ•œ ์‹œ์‚ฌ์ ์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

์˜ˆ์ปจ๋ฐ, ์„ฑ๋ณ„, ํ™˜๋ถˆ์—ฌ๋ถ€, ์ฟ ํฐ์‚ฌ์šฉ์—ฌ๋ถ€์˜ ์ƒ์กด์œจ ์ฐจ์ด, ๊ฐ์†Œํญ์„ ํ•œ ๋ˆˆ์— ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๋‹ค.

 

๋‘ ๊ทธ๋ž˜ํ”„์˜ ์ฐจ์ด๋ฅผ ํ†ต๊ณ„์ ์œผ๋กœ ๊ฒ€์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ• ๋˜ํ•œ ์กด์žฌํ•œ๋‹ค.

 

๊ทธ๊ฒƒ์€ ๋กœ๊ทธ ์ˆœ์œ„ ๊ฒ€์ •๋ฒ•(log-rank test)์ด๋‹ค.

ํ•จ์ˆ˜ ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. 

survdiff(Surv(daysSinceFirstPurch, event) ~ gender, data = surv) #์„ฑ๋ณ„์— ๋”ฐ๋ฅธ
survdiff(Surv(daysSinceFirstPurch, event) ~ voucher, data = surv) #์ฟ ํฐ ์‚ฌ์šฉ ์—ฌ๋ถ€์— ๋”ฐ๋ฅธ
survdiff(Surv(daysSinceFirstPurch, event) ~ returned, data = surv) #ํ™˜๋ถˆ ์—ฌ๋ถ€์— ๋”ฐ๋ฅธ

 

ํ•œ ์‚ฌ๋ก€๋กœ ์„ฑ๋ณ„์— ๋”ฐ๋ฅธ ๋กœ๊ทธ ์ˆœ์œ„ ๊ฒ€์ • ๊ฒฐ๊ณผ(๋งจ ์œ„ ์ฝ”๋“œ ๊ฒฐ๊ณผ)๋ฅผ ์‚ดํŽด๋ณด์ž.

์ด๋•Œ p-value๋ฅผ ์‚ดํŽด๋ณด๋ฉด <2e-16์ด๋‹ค. ์ด๋Š” ์œ ์˜์ˆ˜์ค€ 0.05๋ณด๋‹ค ํ•œ์ฐธ ์ž‘์€ ๊ฐ’์ด๋‹ค.

์ด์— ๋‘ ๊ทธ๋ž˜ํ”„์˜ ์ฐจ์ด๊ฐ€ ์—†๋‹ค๋ผ๋Š” ๊ท€๋ฌด๊ฐ€์„ค์„ ๊ธฐ๊ฐํ•˜๊ณ  ๋‘ ๊ทธ๋ž˜ํ”„๋Š” ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค๋ผ๊ณ  ๊ฒฐ๋ก ์„ ์ง“๋Š”๋‹ค.

 

์ฆ‰, ์œ„์™€ ๊ฐ™์€ ๋ถ„์„๊ฒฐ๊ณผ๋Š” ๋‚จ์ž/์—ฌ์ž ๊ทธ๋ž˜ํ”„์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, ์—ฌ์„ฑ์˜ ์ƒ์กด์œจ์ด ๋” ๋น ๋ฅด๊ฒŒ ๊ฐ์†Œํ•œ๋‹ค๋Š” ๊ฒƒ์ด ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜๋ฏธํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๋ ค์ค€๋‹ค.

์ด์—, ์—ฌ์„ฑ์„ ๋ถ™์žก์œผ๋ ค๋ฉด ์ดˆ๊ธฐ์— ๋งˆ์ผ€ํŒ… ๋…ธ๋ ฅ์„ ๊ธฐ์šธ์—ฌ์•ผ ํ•œ๋‹ค๋Š” ์‹œ์‚ฌ์ ์„ ์•Œ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๋‹ค.

 

์ด์™€ ๊ฐ™์€ ํ•ด์„๋“ค์€ ๋งˆ์ผ€ํŒ… ์ธก๋ฉด์—์„œ ๊ต‰์žฅํžˆ ํฐ ์‹œ์‚ฌ์ ๊ณผ ๋‚˜์•„๊ฐˆ ๋ฐฉํ–ฅ์„ ์•Œ๋ ค์ค€๋‹ค.

 

๊ด€๋ จ๊ฐœ๋…

https://seollane22.tistory.com/4

 

์นดํ”Œ๋ž€-๋งˆ์ด์–ด ์ถ”์ •๋ฒ• [Survival analysis]

์ƒ์กด๋ถ„์„, ์ดํƒˆ๋ถ„์„(์ด์ œ๋ถ€ํ„ฐ ์ด ๋‘˜์„ ๊ตฌ๋ถ„ํ•˜์ง€ ์•Š์„ ์˜ˆ์ •)์„ R๊ณผ ํŒŒ์ด์ฌ์„ ํ†ตํ•ด ์ง„ํ–‰ํ•˜๊ฒ ๋‹ค. ๊ทธ ๋ถ„์„์˜ ์ฒซ๋ฒˆ์งธ๋กœ ์นดํ”Œ๋ž€ ๋งˆ์ด์–ด ๊ธฐ๋ฒ•์„ R์„ ํ†ตํ•ด ๊ตฌํ˜„ํ•ด๋ณด์ž. ๋จผ์ € ๊ธฐ๋ฒ•์— ๋Œ€ํ•œ ์„ค๋ช…์ด ํ•„์š”ํ•œ๋ฐ

seollane22.tistory.com