랜덤 포레스트의 특성 중요도 사용

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

hyeori

랜덤 포레스트의 특성 중요도 사용 본문

머신러닝

랜덤 포레스트의 특성 중요도 사용

혜오리이 2024. 5. 8. 22:43

앙상블 기법 - 랜덤 포래스트

RandomForestClassifier 모델 훈련
feature_importances_ 속성에서 확인

Wine 데이터 셋에서 500개의 트리를 가진 랜덤 포레스트를 훈련한다.
각각의 중요도에 따라 13개의 특성에 순위를 매긴다.

*트리 기반의 모델은 표준화나 정규화할 필요 X

from sklearn.ensemble import RandomForestClassifier

feat_labels = df_wine.columns[1:]

forest = RandomForestClassifier(n_estimators=500,
                                random_state=1)

forest.fit(X_train, y_train)
importances = forest.feature_importances_

indices = np.argsort(importances)[::-1]

for f in range(X_train.shape[1]):
    print("%2d) %-*s %f" % (f + 1, 30, 
                            feat_labels[indices[f]], 
                            importances[indices[f]]))

plt.title('Feature Importance')
plt.bar(range(X_train.shape[1]), 
        importances[indices],
        align='center')

plt.xticks(range(X_train.shape[1]), 
           feat_labels[indices], rotation=90)
plt.xlim([-1, X_train.shape[1]])
plt.tight_layout()
# plt.savefig('images/04_09.png', dpi=300)
plt.show()

Wine 데이터셋 특성의 상대적인 중요도에 따른 순위 그래프, 특성 중요도의 합 = 1

RandomForest 에서 두 개 이상의 특성이 매우 상관관계가 높다면 하나의 특성은 매우 높은 순위를 갖지만, 다른 특성 정보는 완전히 잡아내지 못한다.

SelectFrom Model :

모델 훈련이 끝난 후 사용자가 지정한 임계 값을 기반으로 특성을 선택한다.
Pipeline 중간 단계에서 RandomForestClassifier를 특성 선택기로 사용할 때 유용하다.

'머신러닝' 카테고리의 다른 글

유용한 특성 선택 (0)	2024.05.08
특성 스케일 맞추기 (0)	2024.05.08
데이터셋을 훈련 데이터셋과 테스트 데이터 셋으로 나누기 (0)	2024.05.08
범주형 데이터 다루기 (0)	2024.05.08
누락된 데이터 다루기 (0)	2024.05.08

'머신러닝' Related Articles

hyeori

랜덤 포레스트의 특성 중요도 사용 본문

랜덤 포레스트의 특성 중요도 사용

'머신러닝' 카테고리의 다른 글

티스토리툴바