Section 10, Using Hidden Layers to Solve Nonlinear Problems-XOR Problem

Multi-layer neural networks are very easy to understand, that is, adding more layers between the input and output, and multiple neurons can be added to each layer. The following example classifies XOR data by adding a hidden layer.

One XOR data set

The so-called “XOR data” is derived from the XOR operation and can be drawn in the rectangular coordinate system, as shown in the following figure:

We can see that it is difficult to separate the two types of data through a straight line in this picture, but we can map the data to a high-dimensional space through a function similar to the kernel function in the support vector machine, and then use the linear classification method to classify the data Classification. Adding a hidden layer between the input layer and the output layer serves as a kernel function.

The code to generate the data set is as follows:

'''
Generate simulation data set
'''
train_x = np.array([[0,0],[0,1],[1,0],[1,1]],dtype=np.float32)
#Non-one_hot encoding
#train_y = np.array([[0],[1],[1],[0]],dtype = np.float32)
#Number of output layer nodes
#n_label = 1


#one_hotencoding
train_y = np.array([[1, 0], [0, 1], [0, 1], [1, 0]],dtype = np.float32)
#Number of output layer nodes
n_label = 2

2 Define parameters

'''
define variables
'''
#learning rate
learning_rate = 1e-4
#The number of input layer nodes
n_input = 2
#Number of hidden layer nodes
n_hidden = 2


input_x = tf.placeholder(tf.float32,[None,n_input])
input_y = tf.placeholder(tf.float32,[None,n_label])

'''
Define learning parameters

h1 represents the hidden layer
h2 represents the output layer
'''
weights = {
        'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)), #variance 0.01
        'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01))
        }


biases = {
        'h1':tf.Variable(tf.zeros([n_hidden])),
        'h2':tf.Variable(tf.zeros([n_label]))
        }

3 Define network structure

'''
Define network model
'''
# hidden layer
layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))


#1 softmax method
y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['h2']),biases['h2']))
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( labels=input_y,logits=y_pred))


#2 tanh method + squared difference#Output layer
#y_pred = tf.nn.tanh(tf.add(tf.matmul(layer_1,weights['h2']),biases['h2']))
#Define cost function Quadratic cost function
#loss = tf.reduce_mean((y_pred - input_y)**2)



train = tf.train.AdamOptimizer(learning_rate).minimize(loss)

Four start training

'''
Start training
'''
training_epochs = 100000
sess = tf.InteractiveSession()

#initialization
sess.run(tf.global_variables_initializer())

for epochs in range(training_epochs):
    _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:train_y})
    if epoch % 10000 == 0:
        print(lo)
    
# Calculate the predicted value
print(sess.run(y_pred,feed_dict={input_x:train_x}))


# View the output of the hidden layer
print(sess.run(layer_1,feed_dict={input_x:train_x}))

The result of the operation is as follows:

Two arrays are output above. The first array is an output of 4 rows and 2 columns. If we round the value, we will find that it is exactly the same as our input train_y value.

The second array is also an array of 4 rows and 2 columns. We can see that these 4 elements are linearly separable, that is to say, the hidden layer maps the input data from the input space to the feature space, transforming the nonlinear problem into a linear classification problem.

If the result of your run cannot be classified correctly, you may need to adjust your learning rate and iteration period, because it may fall into a local optimum during training.

Full code:

# -*- coding: utf-8 -*-
"""
Created on Wed Apr 25 20:41:45 2018

@author: zy
"""

'''
Nonlinear classification problem: a problem that cannot be separated by a straight line
We can solve non-linear problems using hidden layers
'''

'''
Fitting an XOR operation using a neural network with hidden layers
input1 input2 output
  0 0 0
  0 1 1
  1 0 1
  1 1 0

'''

import tensorflow as tf
import numpy as np


'''
Generate a simulated dataset
'''
train_x = np.array([[0,0],[0,1],[1,0],[1,1]],dtype=np.float32)
#non-one_hot encoding
#train_y = np.array([[0],[1],[1],[0]],dtype = np.float32)
#Number of output layer nodes
#n_label = 1


#one_hotencoding
train_y = np.array([[1, 0], [0, 1], [0, 1], [1, 0]],dtype = np.float32)
#Number of output layer nodes
n_label = 2


'''
define variables
'''
# learning rate
learning_rate = 1e-4
#The number of input layer nodes
n_input = 2
#Number of hidden layer nodes
n_hidden = 2


input_x = tf.placeholder(tf.float32,[None,n_input])
input_y = tf.placeholder(tf.float32,[None,n_label])

'''
Define learning parameters

h1 represents the hidden layer
h2 represents the output layer
'''
weights = {
        'h1':tf.Variable(tf.truncated_normal(shape=[n_input,n_hidden],stddev = 0.01)), #variance 0.1
        'h2':tf.Variable(tf.truncated_normal(shape=[n_hidden,n_label],stddev=0.01))
        }


biases = {
        'h1': tf. Variable(tf. zeros([n_hidden])),
        'h2': tf. Variable(tf. zeros([n_label]))
        }


'''
Define network model
'''
# hidden layer
layer_1 = tf.nn.relu(tf.add(tf.matmul(input_x,weights['h1']),biases['h1']))


#1 softmax method
y_pred = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['h2']),biases['h2']))
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( labels=input_y,logits=y_pred))


#2 tanh method + squared difference#Output layer
#y_pred = tf.nn.tanh(tf.add(tf.matmul(layer_1,weights['h2']),biases['h2']))
#Define cost function Quadratic cost function
#loss = tf.reduce_mean((y_pred - input_y)**2)



train = tf.train.AdamOptimizer(learning_rate).minimize(loss)


'''
Start training
'''
training_epochs = 100000
sess = tf.InteractiveSession()

#initialization
sess.run(tf.global_variables_initializer())

for epoch in range(training_epochs):
    _,lo = sess.run([train,loss],feed_dict={input_x:train_x,input_y:train_y})
    if epoch % 10000 == 0:
        print(lo)
    
# Calculate the predicted value
print(sess.run(y_pred,feed_dict={input_x:train_x}))


#View the output of the hidden layer
print(sess.run(layer_1,feed_dict={input_x:train_x}))

View Code