Masking 연습 --박사학위 문제

Notice

가슴에 새길 말..

Recent Posts

Recent Comments

Tags more

Today

Total

Archives

관리 메뉴

SiLaure's Data

Masking 연습 --박사학위 문제 본문

Records of/Another

Masking 연습 --박사학위 문제

data_soin 2021. 7. 27. 16:13

import pandas as pd 
# 다른 파일도 불러오자. 
data2 = pd.read_csv("../data/kaggle_survey_2020_responses.csv") 
data2



# 박사 학위 소지자들만 골라보자. 
data2.Q4.unique() 

# --masking 작업 
data2["Q4"] == "Doctoral degree" 
data2[data2.Q4 == "Doctoral degree"] 



 # masking을 전체 dataframe에 index로 걸어주면 해당 column이 true인 데이터만 뽑아낼 수 있다. 
phd = data2["Q4"] == "Doctoral degree" 
phd



# (OPTIONAL) 박사 학위 소지자이면서, 대한민국 국적을 가진 사람들을 뽑아보자. 
# set(phd["Q3"]) 
# data2.Q3.unique() 
data2.Q3.isin(["Republic of Korea", "South Korea"]) 
data2_korea = data2.Q3.isin(["Republic of Korea", "South Korea"]) 
data2[data2_korea & phd] 


# phd["Q3"] == "Republic of Korea" 
# phd["Q3"] == "South Korea" 
# phd_korean = phd[phd["Q3"] == "Republic of Korea"] 
# phd_korean

boolean array이기 때문에, False/True 정보들끼리 연산해야해서

phd = [data2["Q4"] == "Doctoral degree" ]
는 안 된다.

data2_korea = data2[data2.Q3.isin(["Republic of Korea", "South Korea"]) ] 역시 결과값이 데이터 값이기 때문에
data2_korea = data2.Q3.isin(["Republic of Korea", "South Korea"]) 로 boolean형으로 고쳐야 한다.

'Records of > Another' 카테고리의 다른 글

[Data Statistical Analysis] 통계란, 그리고 EDA와 CDA란? (0)	2021.08.18
[Python EDA] Stackoverflow 2020 survey (0)	2021.07.28
데이터분석과 선형대수 (0)	2021.07.23
Jupyter notebook 글꼴 바꾸기 (0)	2021.07.22
[Python] slicing/indexing(sequence type data) (0)	2021.07.20

'Records of/Another' Related Articles

Comments

SiLaure's Data

Masking 연습 --박사학위 문제 본문

Masking 연습 --박사학위 문제

'Records of > Another' 카테고리의 다른 글

티스토리툴바