[전처리] 서울시 구별 CCTV와 인구 관계 분석

Python/Python 실전편

[전처리] 서울시 구별 CCTV와 인구 관계 분석

0ㅑ채

|2024. 3. 20. 14:22

1. 서울시 자치구별 CCTV 현황 데이터 가져오기

- http://data.seoul.go.kr/

열린데이터광장 메인

데이터분류,데이터검색,데이터활용

data.seoul.go.kr

- CCTV 검색

- xlsx 파일을 다운로드(파일 이름을 cctv로 수정)

# 데이터 읽기

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pandas import Series, DataFrame
import platform
from matplotlib import font_manager, rc

#데이터 읽어오기
cctv = pd.read_excel('./data/cctv.xlsx')
print(cctv.head())
cctv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25 entries, 0 to 24
Data columns (total 10 columns):
#   Column    Non-Null Count  Dtype
---  ------    --------------  -----
0   기관명       25 non-null     object
1   소계        25 non-null     int64
2   2011년 이전  23 non-null     float64
3   2012년     23 non-null     float64
4   2013년     23 non-null     float64
5   2014년     25 non-null     int64
6   2015년     25 non-null     int64
7   2016년     25 non-null     int64
8   2017년     25 non-null     int64
9   2018년     25 non-null     int64
dtypes: float64(3), int64(6), object(1)
memory usage: 2.1+ KB

2. 서울시 자치구별 인구 현황 데이터 가져오기

- http://data.seoul.go.kr/

- 자치구별 인구 검색

- pop.txt

# 데이터 읽기

pop = pd.read_csv('./data/pop.txt',  encoding='utf-8', skiprows=2, delimiter='\t', thousands=',')
print(pop.head())
print()
pop.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 14 columns):
#   Column    Non-Null Count  Dtype
---  ------    --------------  -----
0   기간        26 non-null     object
1   자치구       26 non-null     object
2   세대        26 non-null     int64
3   계         26 non-null     int64
4   남자        26 non-null     int64
5   여자        26 non-null     int64
6   계.1       26 non-null     int64
7   남자.1      26 non-null     int64
8   여자.1      26 non-null     int64
9   계.2       26 non-null     int64
10  남자.2      26 non-null     int64
11  여자.2      26 non-null     int64
12  세대당인구     26 non-null     float64
13  65세이상고령자  26 non-null     int64
dtypes: float64(1), int64(11), object(2)
memory usage: 3.0+ KB

3. 데이터 전처리

# 컬럼 이름 변경

cctv.rename(columns={cctv.columns[0] : '구별'}, inplace=True)
print(cctv.head())
print()

gu = []
for x in cctv['구별']:
    gu.append(x.replace(' ', ''))
cctv['구별'] = gu

pop.rename(columns={pop.columns[1] : '구별'}, inplace=True)
print(pop.head())

# 필터링

#pop에서 컬럼 추출
pop = pop[['기간', '구별', '계', '남자', '여자']]

#pop의 첫번째 행은 합계
#첫번째 행 제거
pop.drop([0], inplace=True)

#여성인구 비율을 알아보기 위해서 새로운 열 생성
pop['여성비율'] = pop['여자']/pop['계']*100
pop

# 병합

#구별 컬럼을 이용해서 2개의 frame을 합치기
df = pd.merge(cctv, pop, on='구별')

# 불필요한 컬럼 제거

del df['2011년 이전']
del df['2012년']
del df['2013년']
del df['2014년']
del df['2015년']
del df['2016년']
del df['2017년']
del df['기간']

# 인덱스 재설정

df.set_index('구별', inplace=True)
df

4. 시각화

font_name = font_manager.FontProperties(fname="c:/Windows/Fonts/malgun.ttf").get_name()
rc('font', family=font_name)
df['소계'].plot(kind='barh', grid=True, figsize=(10,10))
plt.show()

df['소계'].sort_values().plot(kind='barh', grid=True, figsize=(5,5))
plt.show()

df['cctv비율'] = df['소계']/df['계'] * 100
df['cctv비율'].sort_values().plot(kind='barh', grid=True, figsize=(5,5))
plt.show()

# 시각화 - 산포도

plt.figure(figsize=(6,6))
plt.scatter(df['계'], df['소계'], s=50)
plt.xlabel('인구수')
plt.ylabel('CCTV개수')
plt.grid()
plt.show()

# 시각화 - 기울기와 y절편 구해서 라인 그리기

fp1 = np.polyfit(df['계'], df['소계'], 1)
f1 = np.poly1d(fp1)
fx = np.linspace(100000, 700000, 100)

plt.figure(figsize=(5,5))
plt.scatter(df['계'], df['소계'], s=50)
plt.plot(fx, f1(fx), ls='dashed', lw=3, color='g')
plt.xlabel('인구수')
plt.ylabel('CCTV')
plt.grid()
plt.show()

#오차 표시

fp1 = np.polyfit(df['계'], df['소계'], 1)
f1 = np.poly1d(fp1)
fx = np.linspace(100000, 700000, 100)
df['오차'] = np.abs(df['소계'] - f1(df['계']))
plt.figure(figsize=(14,10))
plt.scatter(df['계'], df['소계'], c=df['오차'], s=50)
plt.plot(fx, f1(fx), ls='dashed', lw=3, color='g')

for n in range(24):
    plt.text(df['계'][n]*1.02, df['소계'][n]*0.98,
             df.index[n], fontsize=12)

plt.xlabel('인구수')
plt.ylabel('인구당비율')
plt.colorbar()
plt.grid()
plt.show()

'Python > Python 실전편' 카테고리의 다른 글

[전처리] 서울시 범죄 현황 시각화 (0)	2024.03.20
[전처리] 지도 출력 Choropleth (0)	2024.03.20
[딥러닝] Keras _ 패션 이미지 분류 (0)	2024.03.20
[딥러닝] Keras 이항분류 _ 레드와 화이트와인 분류 (0)	2024.03.20
[Python] 선형회귀 실습 _ 보스톤 주택 가격에 대한 선형 회귀 (0)	2024.03.18