728x90
๋ฐ˜์‘ํ˜•

Amazon Sagemaker์—์„œ๋„ ์“ฐ์ด๋Š” RRCF๋Œ€ํ•ด์„œ ์†Œ๊ฐœํ• ๊นŒ ํ•œ๋‹ค.

 

RRCF ์†Œ๊ฐœ

  • RRCF ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐ์ดํ„ฐ์˜ ํŠน์ด์น˜๋ฅผ ๊ฒ€์ถœํ•˜๊ธฐ ์œ„ํ•œ ์•™์ƒ๋ธ” ๋ฐฉ๋ฒ•์ž„
  • ์—ฌ๋Ÿฌ ๊ธฐ๋Šฅ ์ œ๊ณต
    • ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋„๋ก ์„ค๊ณ„
    • ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ ์ œ๊ณต
    • ๊ด€๋ จ ์—†๋Š” ์ฐจ์›์˜ ์˜ํ–ฅ ๊ฐ์†Œ
    • ํŠน์ด์น˜์˜ ์กด์žฌ๋ฅผ ์ˆจ๊ธธ ์ˆ˜ ์žˆ๋Š” ์ค‘๋ณต ๋ฐ ๊ทผ์ ‘ํ•œ ๊ฒƒ์„ ์ •์ƒ์ ์œผ๋กœ ์ฒ˜๋ฆฌ§๋ช…ํ™•ํ•œ ๊ธฐ๋ณธ ํ†ต๊ณ„์  ์˜๋ฏธ๋ฅผ ๊ฐ–๋Š” ์ด์ƒ ์ง•ํ›„ ์ ์ˆ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํŠน์ง•
  • ์ด์ƒ ์ง•ํ›„ ๊ฐ์ง€ ๋ฐฉ๋ฒ• ์ผ๋ถ€
    •  
      ๋‹จ์ผ ํด๋ž˜์Šค ์ง€์› ๋ฒกํ„ฐ ๋จธ์‹ (OC-SVM; One class Support Vector Machines)
    • ๊ฐ•๋ ฅํ•œ ๊ณต๋ถ„์‚ฐ ์ถ”์ •
    •  
      LOF(Local Outlier Factor)
    •  
      ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง(RNN; Replicator Neural Network)
  •  
    ์œ„์˜ ๋ฐฉ๋ฒ•์—๋Š” ๋ช‡ ๊ฐ€์ง€์˜ ๋ฌธ์ œ๋ฅผ ์•ผ๊ธฐํ•˜๋Š”๋ฐ ๊ทธ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Isolation Forest(IF) ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ƒˆ๋กœ์šด ์•™์ƒ๋ธ” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•จ
  •  
    IF ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํŠน์ด์น˜๋ฅผ ํƒ์ง€ํ•˜๋Š”๋ฐ ์žˆ์–ด ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์ง€๋งŒ, ๋ช‡ ๊ฐ€์ง€์˜ ํ•œ๊ณ„๊ฐ€ ์žˆ์Œ
    •  
      tree๊ฐ€ ์ƒ์„ฑ๋˜๋ฉด Isolation tree์—์„œ ํฌ์ธํŠธ๋ฅผ ์‚ฝ์ž…, ์‚ญ์ œ ํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐ์ดํ„ฐ์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋„๋ก ์„ค๊ณ„๋˜์ง€ ์•Š์Œ
    •  
      IF ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๏ผ‚๋ถ€์ ์ ˆํ•œ ์ฐจ์›"์— ๋ฏผ๊ฐํ•˜๋ฉฐ, ์ด๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ์ ์€ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋Š” ์ฐจ์›์—์„œ
      ํŒŒํ‹ฐ์…˜์ด ๋‚ญ๋น„๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธ
    •  
      tree ๊นŠ์ด๊ฐ€ ํŠน์ด์น˜๋ฅผ ๊ฐ์ง€ํ•˜๋Š” ๋ฐ ์žˆ์–ด ๊ฒฝํ—˜์  ์„ฑ๊ณต์„ ๋ณด์—ฌ์ฃผ์ง€๋งŒ, ์ด ์ธก์ •์ง€ํ‘œ๋ฅผ ์ด์ƒ์ ์ˆ˜๋กœ
      ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์— ๋Œ€ํ•œ ์ด๋ก ์  ์ •๋‹น์„ฑ์€ ๊ฑฐ์˜ ์—†์Œ
  •  
    RRCF ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆ ๋จ

 

์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋™์ž‘ ์›๋ฆฌ

  • RRCF ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋žœ๋ค ๋ฐ์ดํ„ฐ ์ ๋“ค์„ ๊ฐ€์ ธ์™€ ๋™์ผํ•œ ์ˆ˜์˜ ์ ๋“ค๋กœ ์ž˜๋ผ tree๋ฅผ ๋งŒ๋“ฌ
  • Tree๋ฅผ ๋ชจ๋‘ ๊ฒฐํ•ฉํ•˜๋ฉด ๋ฐ์ดํ„ฐ ์ ๋“ค์ด forest๋กœ ํ˜•์„ฑ๋˜๊ณ , ํŠน์ • ๋ฐ์ดํ„ฐ ์ ๋“ค์ด ์ด์ƒ์น˜์ธ์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ
  • ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ฐฐ์น˜์— ๋”ฐ๋ผ ์ ์ˆ˜ ๋ถ€์—ฌ
  • ์ฃผํ™ฉ์ƒ‰ ์ ์ด ๋งŽ์€ ์ ์ˆ˜๋ฅผ ์–ป์Œ
  • ์›์•ˆ์— ์žˆ๋Š” ๊ฐ ๋ฐ์ดํ„ฐ ์ ์˜ ์ ์ˆ˜๊ฐ€ ์ด์ƒ์น˜ ๊ฐ’๋ณด๋‹ค ์ž‘์Œ

  • ์ด์ƒ์น˜ ์ ์ˆ˜๋Š” ์›์—์„œ ์–ผ๋งˆ๋‚˜ ๋ฉ€๋ฆฌ ๋–จ์–ด์ ธ ์žˆ๋Š”์ง€์— ๋”ฐ๋ผ ์ ์ˆ˜๋ฅผ ๋ถ€์—ฌํ•จ
  • ์ ์ˆ˜๊ฐ€ ์ž‘์„์ˆ˜๋ก ์ •์ƒ์ด๊ณ  ์ ์ˆ˜๊ฐ€ ๋†’์„์ˆ˜๋ก ์ด์ƒ์น˜์ž„
  • ๋ฐ์ดํ„ฐ ์ ๋“ค์˜ ์ ์ˆ˜๊ฐ€ ํ‘œ์ค€ํŽธ์ฐจ 3์„ ์ดˆ๊ณผํ•˜๋Š” ๊ฒฝ์šฐ ๋น„์ •์ƒ์ ์ธ ๊ฒƒ์œผ๋กœ ๊ฐ„์ฃผ ํ•จ

์ ์ˆ˜ ๊ณ„์‚ฐ ๋ฐฉ๋ฒ•

  1. ๊ฐ ์ฐจ์›์˜ ์ตœ์†Ÿ๊ฐ’๊ณผ ์ตœ๋Œ“๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ Bounding Box๋ฅผ ๋งŒ๋“ฌ
  2. ์ฐจ์› ์ค‘ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•˜๊ณ  ์ฐจ์›์˜ ๋ฒ”์œ„๋ฅผ ์ž„์˜๋กœ ์ž๋ฆ„. ์œ„ ์˜ˆ์—์„œ๋Š” x์ถ•์„ ์ ˆ๋‹จ ํ•จ
  3. ์™ผ์ชฝ๊ณผ ์˜ค๋ฅธ์ชฝ ๋ชจ๋‘์— ๋Œ€ํ•ด Bounding Box๋ฅผ ๋‹ค์‹œ ๋งŒ๋“ฌ
  4. ๊ฐ๊ฐ ์ƒˆ๋กœ์šด Bounding Box์—์„œ ๋ฌด์ž‘์œ„๋กœ ์ž๋ฆ„
  5. Tree๊ฐ€ root์— ๊ฐ€๊นŒ์ด ์žˆ๋Š” ์ ์ด ์žˆ๋‹ค๋ฉด ๊ทธ๊ฒƒ๋“ค์€ ์ž˜๋ ค ๊ณ ๋ฆฝ ๋˜๊ณ , root์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์ ์ˆ˜๊ฐ€ ๋†’์•„์ง
  6. Tree์˜ ๋ชจ๋“  ์ง€์ ์ด ์™„์ „ํžˆ ๊ฒฉ๋ฆฌ๋  ๋•Œ๊นŒ์ง€ ์ˆ˜ํ–‰๋จ

์ฐธ๊ณ  ์‚ฌ์ดํŠธ

 

 

์ด์ƒ ํƒ์ง€๋ฅผ ์œ„ํ•œ Amazon SageMaker ์˜ Random Cut Forest ๋นŒํŠธ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜ | Amazon Web Services

Amazon SageMaker์—์„œ ์ƒˆ๋กœ์šด ๋นŒํŠธ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ Random Cut Forest(RCF)๋ฅผ ์‚ฌ์šฉํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. RCF๋Š” ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ด์ƒ์น˜(outlier)๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋น„์ง€๋„ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ์ด๋ฒˆ ๋ธ”๋กœ๊ทธ์—์„œ๋Š” ์ด์ƒ ํƒ

aws.amazon.com

 

 

An Introduction to SageMaker Random Cut Forests — Amazon SageMaker Examples 1.0.0 documentation

This notebook was tested in Amazon SageMaker Studio on a ml.t3.medium instance with Python 3 (Data Science) kernel. Our first step is to setup our AWS credentials so that AWS SageMaker can store and access training data and model artifacts. We also need so

sagemaker-examples.readthedocs.io

 

Random Cut Forest — sagemaker 2.72.3 documentation

input_shape (dict) – Specifies the name and shape of the expected inputs for your trained model in json dictionary form, for example: {‘data’:[1,3,1024,1024]}, or {‘var1’: [1,1,28,28], ‘var2’:[1,1,28,28]}

sagemaker.readthedocs.io

 

RCF ์ž‘๋™ ๋ฐฉ์‹ - Amazon SageMaker

์ด ํŽ˜์ด์ง€์— ์ž‘์—…์ด ํ•„์š”ํ•˜๋‹ค๋Š” ์ ์„ ์•Œ๋ ค ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์‹ค๋ง์‹œ์ผœ ๋“œ๋ ค ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค. ์ž ๊น ์‹œ๊ฐ„์„ ๋‚ด์–ด ์„ค๋ช…์„œ๋ฅผ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๋ง์”€ํ•ด ์ฃผ์‹ญ์‹œ์˜ค.

docs.aws.amazon.com

 

[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] ์‹ค์‹œ๊ฐ„ ์ด์ƒ ๊ฐ์ง€ ๋ชจ๋ธ Robust Random Cut Forest (RRCF)

์žฌ์•ผ์˜ ์ˆจ์€ ๊ณ ์ˆ˜๊ฐ€ ๋˜๊ณ  ์‹ถ์€ ์ดˆ์‹ฌ์ž

hiddenbeginner.github.io

 

rrcf ๐ŸŒฒ๐ŸŒฒ๐ŸŒฒ

๐ŸŒฒ Implementation of the Robust Random Cut Forest Algorithm for anomaly detection on streams

klabum.github.io

 

Robust Random Cut Forest (RRCF): A No Math Explanation

A few weeks ago my colleague, Christopher Sycalik, R&D Lead for Platform DXC Intelligence, and I had an opportunity to play with the AWS Kinesis Analytics algorithm Robust Random Cut Forrest (RRCF). RRCF provides anomaly detection on streaming data.

www.linkedin.com

 

RRCF ๊ฒ€์ƒ‰์–ด๋กœ ์ณ์„œ ์ฐพ์•„๋ณด๋ฉด ์ข‹์€ ์ž๋ฃŒ๋“ค ๋งŽ์Œ

https://medium.com

728x90
๋ฐ˜์‘ํ˜•
728x90
๋ฐ˜์‘ํ˜•

ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ Matplotlib๋กœ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™” ํ•  ๋•Œ๋งˆ๋‹ค ํ•œ๊ธ€๋“ค์ด ๊นจ์ง€๋Š” ํ˜„์ƒ๋“ค์„ ๋งŽ์ด ๋ณผ๊ฑฐ๋‹ค.

๋ฌผ๋ก  ์˜์–ด๋กœ ์ž‘์„ฑํ•˜๋ฉด ์ด๋Ÿฌํ•œ ์˜ค๋ฅ˜๋Š” ๋‚˜ํƒ€๋‚˜์ง€ ์•Š์„๊ฑฐ์ง€๋งŒ.....

์—ฌํŠผ ์˜ค๋Š˜์€ ํ•œ๊ธ€๊นจ์ง ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์„ ์˜ฌ๋ฆฌ๊ณ ์ž ํ•œ๋‹ค.

 

๊ฐœ๋ฐœํ™˜๊ฒฝ - Jupyter Notebook

 

Matplotlib ํ•œ๊ธ€ ํฐํŠธ ์ •๋ณด ์กฐํšŒ

1. Matplotlib์˜ font_manager์— ํ˜„์žฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ „์ฒด ํฐํŠธ์™€ ๊ฒฝ๋กœ๋ฅผ ํ•˜๋‚˜ ํ•˜๋‚˜์”ฉ ์ฐพ๋Š” ๋ฐฉ๋ฒ•

import matplotlib.font_manager as fm
font_list = fm.findSystemFonts(fontpaths=None, fontext='ttf')
font_list[:]

 

2. ์•Œ๊ณ  ์žˆ๋Š” ํฐํŠธ๋ฅผ ์ž…๋ ฅํ•ด์„œ ํฐํŠธ์˜ ์—ฌ๋ถ€๋ฅผ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•

import matplotlib.font_manager as fm

font_name = 'Nanum'
[(f.name, f.name) for f in fm.fontManager.ttflist if f'{font_name}' in f.name]

 

2๊ฐ€์ง€ ๋ฐฉ๋ฒ• ์ค‘ ์•„๋ฌด ๋ฐฉ๋ฒ•์ด๋‚˜ ์‚ฌ์šฉํ•ด์„œ ํฐํŠธ ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•ด์ฃผ๋ฉฐ ๋ฉ๋‹ˆ๋‹ค.

 

Matplotlib ํ•œ๊ธ€ ์ ์šฉ

ํ•œ๊ธ€ ์ ์šฉ์€ importํ•ด์„œ ๋ถˆ๋Ÿฌ์™€์ฃผ๊ธฐ๋งŒ ํ•˜๋ฉด๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ•œ๊ธ€ ํฐํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ๋˜๋ฉด ๋งˆ์ด๋„ˆ์Šค ( - )๊ฐ€ ๊นจ์ง ํ˜„์ƒ๋„ ์ผ์–ด๋‚˜๋Š”๋ฐ ๊ทธ ๋งˆ์ด๋„ˆ์Šค์— ๊ด€ํ•œ ๋‚ด์šฉ๋„ ๋ฐ‘์— ์ ์–ด ๋‘์—ˆ๋‹ค.

import matplotlib.pyplot as plt

# family์—๋‹ค๊ฐ€ ์ž๊ธฐ๊ฐ€ ์“ธ ํฐํŠธ ์ด๋ฆ„์„ ์ ์–ด์ค€๋‹ค.
# ํฐํŠธ ์„ค์ •
plt.rc('font', family='NanumGothicCoding')
# ๋งˆ์ด๋„ˆ์Šค ๊นจ์ง ํ˜„์ƒ
plt.rc('axes', unicode_minus=False)

 

728x90
๋ฐ˜์‘ํ˜•
728x90
๋ฐ˜์‘ํ˜•

์ฐพ์•„๋ณด๋ฉด์„œ ์ข‹์€ ์ž๋ฃŒ๋“ค ๊ณ„์† ์—…๋ฐ์ดํŠธ ํ•  ์˜ˆ์ •


 

HIRA OAK Repository: ์ด์ƒ์น˜ ํƒ์ƒ‰์„ ์œ„ํ•œ ํ†ต๊ณ„์  ๋ฐฉ๋ฒ•

HIRA OAK Repository HIRA ๋ฐœ๊ฐ„ 3. ์ •์ฑ…๋™ํ–ฅ ์ด์ƒ์น˜ ํƒ์ƒ‰์„ ์œ„ํ•œ ํ†ต๊ณ„์  ๋ฐฉ๋ฒ• Metadata Downloads DC(XML) EXCEL Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

repository.hira.or.kr

 

728x90
๋ฐ˜์‘ํ˜•
728x90
๋ฐ˜์‘ํ˜•

Seaborn : ํ†ต๊ณ„ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”

 

seaborn.boxplot — seaborn 0.11.2 documentation

Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the x, y, and/or hue parameters.

seaborn.pydata.org

Matplotlib : Python์„ ์‚ฌ์šฉํ•œ ์‹œ๊ฐํ™”

 

Matplotlib: Python plotting — Matplotlib 3.4.3 documentation

 

matplotlib.org

Pandas : ๋ฐ์ดํ„ฐ ์กฐ์ž‘ ๋ฐ ๋ถ„์„

 

pandas documentation — pandas 1.3.3 documentation

The reference guide contains a detailed description of the pandas API. The reference describes how the methods work and which parameters can be used. It assumes that you have an understanding of the key concepts.

pandas.pydata.org

 

728x90
๋ฐ˜์‘ํ˜•
728x90
๋ฐ˜์‘ํ˜•

์„ค๋ช…

- '์ƒ์ž ์ˆ˜์—ผ ๊ทธ๋ฆผ(box-and-whisker plot, box-and-whisker diagram)' ๋˜๋Š” ์ƒ์ž ๊ทธ๋ฆผ(Box plot) ๋˜๋Š” ์ƒ์ž ์ฐจํŠธ(Box Chart)๋ผ๊ณ  ๋ถˆ๋ฆผ

- ๋ฐ์ดํ„ฐ์˜ ์ตœ๋Œ€, ์ตœ์†Œ, ์ค‘๊ฐ„๊ฐ’๊ณผ ์‚ฌ๋ถ„์œ„ ์ˆ˜ ๋“ฑ์„ ํšจ์œจ์ ์œผ๋กœ ๊ฐ€์‹œํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ์ฐจํŠธ

- ๋ฒ”์ฃผํ˜• ๋ฐ์ดํ„ฐ ๊ธฐ์ค€์œผ๋กœ ์ˆ˜์น˜ํ˜• ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ํŒŒ์•…ํ•˜๋Š”๋ฐ ์ ํ•ฉ

- ๋ฐ์ดํ„ฐ์˜ ์ค‘์‹ฌ๊ณผ ์‚ฐํฌ, ๋ชจ์–‘์„ ๊ฐœ๋žต์ ์œผ๋กœ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์Œ

- ์ด์ƒ์น˜์˜ ์กด์žฌ ์—ฌ๋ถ€๋ฅผ ํŒŒ์•…ํ•  ๋•Œ ์‚ฌ์šฉ

- ๊ทธ๋ฃน๊ฐ„ ํ‰๊ท ๊ณผ ์‚ฐํฌ ๋น„๊ต์— ํšจ๊ณผ์ 

- 5๊ฐœ์˜ ํ†ต๊ณ„๋Ÿ‰๊ณผ ์ž๋ฃŒ์˜ ํ‰๊ท ๊ฐ’, ์‚ฌ๋ถ„์œ„์ˆ˜ ๋ฒ”์œ„๋ฅผ ์‰ฝ๊ฒŒ ํŒŒ์•…

- ์ž๋ฃŒ ๋ถ„ํฌ์˜ ๋น„๋Œ€์นญ๋„๋ฅผ ํŒŒ์•… → ์ž๋ฃŒ์˜ ํ‰๊ท ๊ณผ ์ค‘์•™๊ฐ’์„ ๋น„๊ต

- ๊ทธ๋ž˜ํ”„

  1. 25th percentile, Lower Quartile, ์ œ 1์‚ฌ๋ถ„์œ„(Q1)
    โ˜ž 25%์˜ ์œ„์น˜
    โ˜ž  Q1 = (n+1) * 4๋ฒˆ์งธ ์ˆซ์ž์˜ ๊ฐ’
  2.  ์ค‘์•™๊ฐ’(MEDIAN), ์ œ 2์‚ฌ๋ถ„์œ„(Q2)
    โ˜ž ์ค‘์•™๊ฐ’ 50%์˜ ์œ„์น˜ 
    โ˜ž ์ค‘์•™ ๊ฐ’์€ ์ง์ˆ˜์ผ ๊ฒฝ์šฐ 2๊ฐœ๊ฐ€ ๋  ์ˆ˜ ๋„ ์žˆ๊ณ , ๊ทธ๊ฒƒ์˜ ํ‰๊ท ์ด ์ค‘์•™ ๊ฐ’์ด ๋  ์ˆ˜ ์žˆ์Œ
    โ˜ž ํ™€์ˆ˜์ผ ๊ฒฝ์šฐ, ์ค‘์•™ ๊ฐ’์€ 1๊ฐœ๊ฐ€ ๋จ
  3.  75th percentile, Upper quartile, ์ œ 3์‚ฌ๋ถ„์œ„ (Q3)
    โ˜ž 75%์˜ ์œ„์น˜
    โ˜ž Q3 = (n+1) * (3 / 4)๋ฒˆ์งธ ์ˆซ์ž์˜ ๊ฐ’
  4. ๋ฐ•์Šค(Box), IQR(Inter Quartile Range)
    โ˜ž 25%(Q1) ~ 75%(Q3) ๊นŒ์ง€ ๊ฐ’๋“ค์„ ๋ฐ•์Šค๋กœ ๋‘˜๋Ÿฌ ์Œˆ
    โ˜ž Q3 - Q1์˜ ๊ฐ’
  5.  ์ˆ˜์—ผ(Whisker)
    โ˜ž ๋ฐ•์Šค์˜ ๊ฐ ๋ชจ์„œ๋ฆฌ (Q1, Q3)๋กœ ๋ถ€ํ„ฐ IQR์˜ 1.5๋ฐฐ ๋‚ด์— ์žˆ๋Š” ๊ฐ€์žฅ ๋ฉ€๋ฆฌ ๋–จ์–ด์ง„ ๋ฐ์ดํ„ฐ ์ ๊นŒ์ง€ ์ด์–ด์ ธ ์žˆ๋Š” ๊ฒƒ
    โ˜ž ๋ฐ•์Šค์˜ ์ƒํ™”์ขŒ์šฐ๋กœ ๋ป—์–ด ๋‚˜๊ฐ€๋Š” ์„ 
    โ˜ž ์ตœ๋Œ“๊ฐ’ : ์ œ 3์‚ฌ๋ถ„์œ„์ˆ˜ + (1.5 * IQR๊ฐ’) = ์ตœ๋Œ“๊ฐ’
    โ˜ž ์ตœ์†Ÿ๊ฐ’ : ์ œ 1์‚ฌ๋ถ„์œ„์ˆ˜ - (1.5 * IQR๊ฐ’) = ์ตœ์†Ÿ๊ฐ’
  6.  Outlier(์ด์ƒ์น˜), Extreme(๊ทน๋‹จ์น˜)
    โ˜ž ์ˆ˜์—ผ๋ณด๋‹ค ๋ฐ”๊นฅ์ชฝ์— ์œ„์น˜ํ•œ ๋ฐ์ดํ„ฐ
    โ˜ž ์ด์ƒ์น˜ : ์ˆ˜์—ผ์˜ ์ตœ๋Œ“ · ์ตœ์†Ÿ ๊ฐ’์„ ๋ฒ—์–ด๋‚œ ๋ฐ์ดํ„ฐ
                   ๋ฐ•์Šค๊ธธ์ด์˜ 1.5๋ฐฐ
    โ˜ž ๊ทน๋‹จ์น˜ : ๋ฐ•์Šค ๊ธธ์ด์˜ 3๋ฐฐ
                   ์ œ 3 ์‚ฌ๋ถ„์œ„์ˆ˜ + (3.0 * IQR๊ฐ’) ๋ฒ—์–ด๋‚œ ๋ฐ์ดํ„ฐ
                   ์ œ 1 ์‚ฌ๋ถ„์œ„์ˆ˜ - (3.0 * IQR๊ฐ’) ๋ฒ—์–ด๋‚œ ๋ฐ์ดํ„ฐ

Python ์ฝ”๋“œ

Jupyter Notebook ํ™œ์šฉ ํ–ˆ๊ณ , ์ž์„ธํ•œ ๋‚ด์šฉ๋“ค์€ ๋‚˜์ค‘์— ๋‹ค์‹œ ์ถ”๊ฐ€ ํ•  ์˜ˆ์ •

์„ธ๋กœ

๊ฐ€๋กœ

์—ฌ๋Ÿฌ๊ฐœ

 

ํŠน์„ฑ ์ด์šฉ


์ฐธ๊ณ  ์‚ฌ์ดํŠธ

 

์ƒ์ž ์ˆ˜์—ผ ๊ทธ๋ฆผ - ์œ„ํ‚ค๋ฐฑ๊ณผ, ์šฐ๋ฆฌ ๋ชจ๋‘์˜ ๋ฐฑ๊ณผ์‚ฌ์ „

๊ธฐ์ˆ  ํ†ต๊ณ„ํ•™์—์„œ '์ƒ์ž ์ˆ˜์—ผ ๊ทธ๋ฆผ'(box-and-whisker plot, box-and-whisker diagram) ๋˜๋Š” '์ƒ์ž ๊ทธ๋ฆผ'(box plot, boxplot)์€ ์ˆ˜์น˜์  ์ž๋ฃŒ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๊ทธ๋ž˜ํ”„์ด๋‹ค. ์ด ๊ทธ๋ž˜ํ”„๋Š” ๊ฐ€๊ณตํ•˜์ง€ ์•Š์€ ์ž๋ฃŒ ๊ทธ๋Œ€๋กœ๋ฅผ ์ด์šฉํ•˜

ko.wikipedia.org

 

Chapter 11. ์ฐจํŠธ๋ฅผ ๋ฉ‹์ง€๊ฒŒ ๊ทธ๋ ค๋ณด์ž

์ด๋ฒˆ ์‹œ๊ฐ„์˜ ๋ชฉ์ฐจ 1. ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”๊ฐ€ ๊ผญ ํ•„์š”ํ• ๊นŒ? 2. ํŒŒ์ด์ฌ์—์„œ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”์˜ ๊ธฐ๋ณธ์ด ๋˜๋Š” ๋งทํ”Œ๋กฏ๋ฆฝ 3. plot() ํ•จ์ˆ˜๋ฅผ ์ข€ ๋” ๋‹ค์–‘ํ•˜๊ฒŒ ์จ ๋ณด์ž! 4. ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ ค ๋ณด์ž! 5. ์‚ฐํฌ๋„ ๊ทธ๋ž˜ํ”„๋ฅผ

jiwoncho20213135python.tistory.com

 

๊ต์œกํ†ต๊ณ„ 2.3. ๋ฐ•์Šคํ”Œ๋กฏ(Box Plot)๊ณผ ์‚ฐํฌ๋„

5. ๋ฐ•์Šค ํ”Œ๋กฏ (Box Plot) : ๋ฐ•์Šค ๋ชจ์–‘์˜ ๊ทธ๋ž˜ํ”„. 1) ๊ฐ€์šด๋ฐ ๋ฐ•์Šค๋Š” ์ „์ฒด ์ž๋ฃŒ์˜ 50%๊ฐ’. (ํ•˜์œ„ 25%์™€ ์ƒ์œ„...

blog.naver.com

 

matplotlib.pyplot.boxplot — Matplotlib 3.4.3 documentation

Parameters: xArray or a sequence of vectors.The input data. notchbool, default: FalseWhether to draw a notched box plot (True), or a rectangular box plot (False). The notches represent the confidence interval (CI) around the median. The documentation for b

matplotlib.org

 

(ํŒŒ์ด์ฌ-Matplotlib) ์‹œ๊ฐํ™” ํŠœํ† ๋ฆฌ์–ผ - ๋ฐ•์Šคํ”Œ๋กฏ

๊ฐ•์˜ ํ™๋ณด ์ทจ์ค€์ƒ์„ ์œ„ํ•œ ๊ฐ•์˜๋ฅผ ์ œ์ž‘ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ณธ ๋ธ”๋กœ๊ทธ๋ฅผ ํ†ตํ•ด์„œ ๊ฐ•์˜๋ฅผ ์ˆ˜๊ฐ•ํ•˜์‹  ๋ถ„์€ ๊ฒŒ์‹œ๊ธ€ ์ œ๋ชฉ๊ณผ ๋งํฌ๋ฅผ ์ˆ˜๊ฐ•ํ•˜์—ฌ ์ธํ”„๋Ÿฐ ๋ฉ”์‹œ์ง€๋ฅผ ํ†ตํ•ด ๋ณด๋‚ด์ฃผ์‹œ๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค. ์Šคํƒ€๋ฒ…์Šค ์•„์ด์Šค ์•„

dschloe.github.io

 

Boxplot ์ƒ์ž๋„ํ‘œ๋ฅผ ํ†ตํ•œ ์ด์ƒ์น˜ ํƒ์ง€ - [๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ]

Boxplot๋ž€? ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ํ•œ ๋ˆˆ์— ๋ณผ ์ˆ˜ ์žˆ๊ฒŒ ์‹œ๊ฐํ™”ํ•˜์—ฌ ์ด์ƒ์น˜(Outlier)๋“ฑ์„ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ๋Š” ์‹œ๊ฐํ™” ๋„๊ตฌ. median - ์ค‘์•™๊ฐ’ IQR - 25% ~ 75%์— ํ•ด๋‹นํ•œ ๋ถ€๋ถ„   75th Percentile - ์ œ 3์‚ฌ๋ถ„์œ„์ˆ˜ 25th Percen..

yoon1seok.tistory.com

 

728x90
๋ฐ˜์‘ํ˜•

+ Recent posts