본문 바로가기

2022.01./코딩 수업

01.26 softmax(확률), 추론(배치처리), 손실함수

[0] 신경망
p90
1. 출력층 설계하기
1)softmax

(1)e^a/ sum(e^a)

A = np.array([1.5,2,2.2])           #아주 적은 차이의 리스트를 만들어본다.
exp_A = np.exp(A)                   #자연상수의 a승
print(exp_A)            
sum_exp_A = np.sum(exp_A)           #자연상수 a승의 전체 합 
Y = exp_A/sum_exp_A                 #전체 합으로 자연상수의 a승을 하나하나 나누어준다
print(Y)                            #셋을 합하면 1이 나온다.

globFunc에 def softmax로 작성.

(2)오버플로우 현상 해결하기.

자연상수^a/sum(자연상수^a) ==(자연상수-C)^a/sum((자연상수-C)^a) 같다는 원리를 로그함수를 이용해 배웠다.
그러니 숫자 너무 커지지 않게 애초에 e에서 입력한 array 중 max값을 미리 빼주면, 오버플로우 현상을 방지할 수 있다.

def softmax(A):
    MaxVal = np.max(A)                  #받은 어레이중 최대값을 구한다.
    exp_A = np.exp(A - MaxVal)          #어레이에서 그 최대값을 뺀 만큼을 e^a승 해준다.           
    sum_exp_A = np.sum(exp_A)           #e^a의 전체 합 
    Y = exp_A/sum_exp_A                 #전체 합으로 e^a를 하나하나 나누어준다
    return Y                            #셋을 합하면 1이 나온다.

(3) 특징
신경망 학습시킬때 사용하는 softmax 함수.
단조증가 함수이기 때문에, a<b면 f(a)<f(b). 결과만 분류할 때는 소프트맥스 함수 생략 가능하다.

2)mnist
학습->추론
0~255 흑백 이미지 데이터 이미 학습된 상태.


(1) 데이터셋

import sys,os
from mnist import load_mnist
(x_train,t_train),(x_test,t_test) = load_mnist(flatten=True,normalize=False)

print(x_train.shape)
print(t_train.shape)
print(x_test.shape)
print(x_test.shape)



(2) 추론처리

from mnist import load_mnist
import pickle
import numpy as np
import globalFunc as gf

def get_data():                                                             #데이터 셋
    (x_train, t_train), (x_test, t_test) = load_mnist(flatten=True, normalize=False,one_hot_label=False)    
    return x_test, t_test                                                   #이미 훈련된 가중치 활용, 훈련용 데이터 따로 필요 없다! x는 컴퓨터가, t는 사람이.

def init_network():                                                         #이미 있는 데이터 오픈해서 네트워크로 피클화.
    with open("sample_weight.pkl",'rb')as f:
        network = pickle.load(f)
        return network                                                      # 이 파일에서 받은 네트워크.

def predict(net,x):                                                         #위에서 받은 네트워크, x 이미지파일 두개의 값 입력받는다.
        W1,W2,W3=net['W1'],net['W2'],net['W3']                              #네트워크에 있는 w1,w2,w3 가중치를 보기 편하게 이름지어주기 
        b1,b2,b3=net['b1'],net['b2'],net['b3']                              #네트워크에 있는 편향값도 편하게 이름지어주기.
        H1 = np.dot(x,W1) + b1                                              #히든 레이어=원래값 가중치 두개의 곱에 편향 더하기                                              
        Z1 = gf.sigmoid(H1)                                                 #시그모이드 0~1사이의 값으로 변환
        H2 = np.dot(Z1,W2) + b2                                             #히든 레이어=원래값 가중치 두개의 곱에 편향 더하기 
        Z2 = gf.sigmoid(H2)                                                 #시그모이드 0~1사이의 값으로 변환
        H3 = np.dot(Z2,W3)+b3
        Z3= gf.softmax(H3)                                                  #마지막에 소프트맥스로 추론! 확률의 총 합은 1.  

        return Z3


(3) 추론 -> 배치처리

argmax가 뭔데?
>>> import numpy as np
>>> x = np.array([[0.1,0.0,0.1],[0.3,0.1,0.6],[0.2,0.5,0.3],[0.8,0.1,0.1]])
>>> print(np.argmax(x,axis=1)) #각 행마다 어디가 가장 값이 큰지 각각 알려줘
[0 2 1 0]
>>> print(np.argmax(x,axis=0)) #각 컬럼마다 어느 위치가 값이 큰지 각각 알려줘
[3 2 1]


x,t = get_data()
network = init_network()
accuracy_cnt = 0                                        #아직 0개
batch_size=100                                          #메모리 사이즈에 영향을 받는다. 컴퓨터 좋으면 1000개씩 10000개씩도 돌릴 수 있대.
for i in range(0, len(x), batch_size):
    x_batch = x[i : i + batch_size]                           #이미지데이터 슬라이스로 꺼내기
    y_batch = predict(network, x_batch)                          #x의 i번째 데이터(이미지) #위에서 받은 네트워크로 프레딕트.
    p_batch = np.argmax(y_batch,axis=1)                                   #각 행마다 어느 자리의 값이 가장 큰가?
    accuracy_cnt += np.sum(p_batch == t[i : i + batch_size])              #T 지도 학습에서 제공된 레이블(사람이 제공하는 답) t[i]와 같다면? 참은1 거짓은 0으로 나온다. 그 숫자 다 더해서 카운트에 누적합시다.

2. 손실 함수

1) 훈련데이터와 시험데이터를 나누는 이유  

훈련 데이터 이외의 다른 데이터 셋에서도 범용할 수 있도록 하기 위해서.

2) 오차제곱합과 교차엔트로피 오차의 비교
 (1) 오차제곱합
오차끼리 제곱해서 전부 더함 /2

def sum_of_squares_error(y,t):          #손실 함수, 손실치에 대한 그대로의 기울기-사이클로이드 곡선에서 직선같은 느낌의 변화 유발
    return 0.5 * np.sum((y-t)**2)

 (2) 교차 엔트로피 오차

def cross_entropy_error(y,t):               #손실 함수, 손실치에 대한 그대로의 기울기-사이클로이드 곡선(최적)의 변화 유발
                                            #정답 아닌 것에 대한 추정치는 무시(T가 원핫이기 때문에 나머지는 0으로 처리되어 곱할 의미가 없다.)
    delta = 1e-7                            #log 씌웠을 때 -무한 방지 하기위해 최저값 설정해줍니다. #log 씌웠을 때 -무한 방지. 자연상수 아님. (1*10^-7의 리터럴 표기법)
    return -np.sum(t * np.log(y + delta))    #음수값 나오기때문에 앞에 - 붙여준다.  

 (3) 민감도 비교
import numpy as np
import globalFunc as gf
t= np.array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0])                         #타겟값 2 예측
correct = np.array([0.1, 0.05, 0.6, 0.0, 0.05, 0.1, 0.0, 0.1, 0.0, 0.0]) #2일 확률 0.6
incorrect = np.array([0.1, 0.05, 0.1, 0.0, 0.05, 0.1, 0.0, 0.6, 0.0, 0.0]) #2일 확률 0.1

print("sum of squares error correct : " + str(gf.sum_of_squares_error(correct,t))) #실수 형태라 문자열로 변환시켜 프린트
print("cross entrophy error correct : " + str(gf.cross_entropy_error(correct,t)))
print("sum of squares error incorrect : " + str(gf.sum_of_squares_error(incorrect,t)))
print("cross entrophy error incorrect : " + str(gf.cross_entropy_error(incorrect,t)))

# 같은값으로 넣었는데도 결과는 4~5배 넘게 차이납니다.
# 단순히 오차 제곱해서 다더해준 오차제곱합보다, 오차값에 로그씌워줘 기울기를 급격하게 만들어준 교차 엔트로피 오차가 더 민감하게 반응한다는 걸 보여줍니다.

3)미니 배치 학습

배치 사용해서 속도 빠르게 해보자.
위의 코드에서 함수부분을 살짝 수정했다.

def cross_entropy_error(y,t):               #손실 함수, 손실치에 대한 그대로의 기울기-사이클로이드 곡선(최적)의 변화 유발
                                            #정답 아닌 것에 대한 추정치는 무시(T가 원핫이기 때문에 나머지는 0으로 처리되어 곱할 의미가 없다.)
    delta = 1e-7                            #log 씌웠을 때 -무한 방지. 자연상수 아님. (1*10^-7의 리터럴 표기법)
    if y.ndim == 1:                         #한 건의 데이터(1차원)에 대한 처리인가?
        t = t.reshape(1,t.size)             #그럼 2차원으로 바꿔주세요. 행렬로 만들어서 그 위치 좌표 이용할거다.
        y = y.reshape(1,y.size)
    batch_size = y.shape[0]
    if t.shape[-1] == 1:                         #가장 마지막 데이터(정답 레이블)가 한 건으로 구성 되어있어? 다양한 선택지
        return -np.sum(np.log(y[np.arange(batch_size),t] + delta)) / batch_size
    else:                                        #one-hot 레이블이라면,
        return -np.sum(t * np.log(y + delta)) / batch_size #훨씬 간단.

디버그 f11 -> 함수로 들어가서 어떤 함수로 진행되는지 확인 할 수 있다.
디버그 f10 -> 코드가 어떻게 진행되고 있는지 줄 단계로 볼 수 있다.




print("Accuracy : "+str(float(accuracy_cnt)/len(x)))

'2022.01. > 코딩 수업' 카테고리의 다른 글

02.04 옵티마이저  (0) 2022.02.07
01.28  (0) 2022.01.28
01.25 퍼셉트론, 신경망  (0) 2022.01.25
01.14  (0) 2022.01.16
01.13  (0) 2022.01.13