Compare commits

...

10 Commits

Author SHA1 Message Date
bada43da23 Update README.md 2025-01-17 08:09:57 +01:00
cc552a63a8 update readme 2025-01-17 01:08:40 -06:00
90207ab834 edit readme 2025-01-17 00:26:17 -06:00
cc944d8cc3 push to git 2025-01-16 22:03:28 -06:00
40a2b072b8 adds backprop (unfinished) 2025-01-14 18:47:35 -06:00
f549f9440c added forward prop 2025-01-12 13:31:31 -06:00
8eab98586c refactor. simplify code - only gonna use mnist 2025-01-12 09:56:33 -06:00
576cb9c490 free layer 2025-01-11 16:04:23 -06:00
6521badd2d create layers functions 2025-01-11 15:56:14 -06:00
8c48894c42 create layer struct 2025-01-11 14:36:46 -06:00
6 changed files with 1123 additions and 79 deletions

View File

@ -1,15 +1,40 @@
# nn - implementation of neural networks in c # nn - Neural Networks in C
implements neural networks in c, targets embedded systems (microcontrollers, fpgas) This repository implements various neural networks in C, focusing mainly on targetting embedded systems or creating hardware accelerators (FPGA-Based, ASIC, etc.) \
#### current implementations This project was created as part of my independent study course, where I am currently researching the design of hardware accelerators for high-performance workloads
`snn.c` - a simple feedforward neural network written in ~150loc. \
`cnn.c` - TODO, implements a convolutional neural network \
`cnn-hls.c` - TODO, has fpga hls specific types/pragmas in order to synthesize to verilog; run on an fpga \
depends on native c libraries and [gsl](https://www.gnu.org/software/gsl/) ### Current Implementations (project index)
`snn.c` - A simple feedforward neural network written in ~150loc. Depends on c native libraries and [GSL](https://www.gnu.org/software/gsl/) \
`cnn.c` - Implements a fully featured cnn library in ~600loc. Depends solely on C native libraries \
`cnn-hls.c` - The version of `cnn.c` with HLS specific optimizations (Pragmas, Systolic Array Mutliplication, etc.); aims at being synthesized through Vitus HLS to create a FPGA Based CNN Accelerator \
`mnist.c` - Driver code for `cnn.c` which trains on the [MNIST](https://yann.lecun.com/exdb/mnist/) dataset
### future goals ### Usage
cnn w/ pragmas -> successfully compiled to verilog using vivado/vitus \ `mnist.c` is a great example of how the library is used, but basic usage boils down to a few simple things:
self-made matrix multiplication library, relying only on native c ones \
code cleanup and optimization 1) Importing `cnn.c` into your code
2) Creating a network and creating layers:
```c
// an example of a lenet-5 inspired 8 layer network
Network* network = create_network(8);
network->layers[0] = create_input(IMG_HEIGHT, IMG_WIDTH, 1);
network->layers[1] = create_conv(IMG_HEIGHT, IMG_WIDTH, 1, 6, 5, 1, 2);
network->layers[2] = create_maxpool(network->layers[1]->height, network->layers[1]->width, network->layers[1]->channels, 2, 2);
network->layers[3] = create_conv(network->layers[2]->height, network->layers[2]->width, network->layers[2]->channels, 16, 5, 1, 0);
network->layers[4] = create_maxpool(network->layers[3]->height, network->layers[3]->width, network->layers[3]->channels, 2, 2);
network->layers[5] = create_fc(120, network->layers[4]->height * network->layers[4]->width * network->layers[4]->channels, a_sigmoid);
network->layers[6] = create_fc(84, 120, a_sigmoid);
network->layers[7] = create_fc(NUM_CLASSES, 84, a_softmax);
```
3) Forward and backpropogation through the Network!
## Project Information
### Abstract
For my project, I propose an implementation of a Convolutional Neural Network based handwritten digital classifier using the MNIST dataset on a Field Programmable Gate Array (FPGA). I utilize High Level Synthesis (HLS) tool called Vitus HLS developed by [AMD/Xilinx](https://www.xilinx.com/products/boards-and-kits.html) in order to implement the accelerator through C, eliminating the need to write any code in HDL Languages such as Verilog/VHDL. To reduce any performance losses, I implement a systolic array based architecture and utilize techniques such as pipelining, loop unrolling, and memory partitioning. Through this project, I aim to highlight the feasibility and viability of FPGAs for low latency, highly energy efficient machine learning workflows, possibly placing them in consideration as a replacement for GPUs for infrence based tasks.
### What is an FPGA?
A Field Programmable Gate Array, or FPGA, is a type of integrated circuit that is made up of a massive collection of unconnected digital logic parts. When someone designs *gateware* for an FPGA, they are essentially connecting these logic blocks together in a way that creates a new piece of hardware. FPGAs are also "field programmable," meaning that they can be reconfigured on-the-fly as per the designer's needs. While often used as tools for rapidly prototyping hardware designs, the nature of an FPGA's highly specialized and customizable hardware design allows them to achieve very low latency, high throughput, and be very energy efficient.
#### What is High Level Synthesis (HLS)?
High Level Synthesis is a method of designing gateware that allows a programmer to write the gateware in a higher level language like C, C++, or even [Python](https://fastmachinelearning.org/hls4ml/). A High Level Synthesis tool takes this description of the intended function of the hardware from a higher level language and then synthesizes it into RTL level code (such as Verilog or VHDL). Since writing in languages such as Verilog can be tedious and time consuming, HLS serves as an alternative for designers who want to efficiently build and verify hardware in a language that is much easier to write, and is also a tool that invites normal programmers with no experience writing HDL languages to start developing hardware. In this project, I chose to use HLS to not only work under my time constraint, but evaluate how well the HLS workflow truly is to an invidual with little to no experience in HDL languages.
### Reflection and Next Steps
This project was an amazing way to get involved with both the FPGA and hardware design/accelerator design space. I was able to gain a lot of hands on experience with the design workflow for developing gateware for an FPGA, and also was able to gain insights on performance optimization concepts and methods such as systolic arrays, loop pipelining/unrolling, and code inlining. Furthermore, I was able to work more with the mathematics and theory behind Deep Learning and Neural Networks, which is very good knowledge to have given the development of artifical intellegence. The next steps of this project include cleaning up and optimizing the code, possibly implementing quantization, batch normalization, and other types of layerz such as residual blocks to further improve performance for the neural network. On the hardware side, next steps include obtaining a physical FPGA development board to actually deploy this program onto, and possibly performing a rewrite of the code to not rely on HLS, but write the neural network from scratch in an HDL language such as Verilog.

310
cnn-hls.c Normal file
View File

@ -0,0 +1,310 @@
#include "ap_fixed.h"
#include "hls_stream.h"
#include "hls_math.h"
#include <string.h>
// Fixed point definitions for better hardware efficiency
typedef ap_fixed<16,8> data_t; // 16 bits total, 8 integer bits
typedef ap_fixed<16,8> weight_t;
typedef ap_fixed<32,16> acc_t; // Wider accumulator to prevent overflow
// Enums remain the same
typedef enum {
input,
conv,
max_pool,
fully_connected
} ltype;
typedef enum {
fc_input,
fc_hidden,
fc_output,
} fcpos;
typedef enum {
a_sigmoid,
a_softmax,
} activation;
// Maximum size definitions for static arrays
#define MAX_LAYER_SIZE 1024
#define MAX_FILTER_SIZE 11
#define MAX_CHANNELS 256
#define MAX_FILTERS 256
// Layer struct optimized for HLS
struct Layer {
ltype type;
int height;
int width;
int channels;
union {
struct {
int num_filters;
int filter_size;
int stride;
int zero_padding;
int input_height;
int input_width;
int input_channels;
weight_t weights[MAX_FILTERS][MAX_CHANNELS][MAX_FILTER_SIZE][MAX_FILTER_SIZE];
data_t biases[MAX_FILTERS];
} conv_params;
struct {
int pool_size;
int stride;
int input_height;
int input_width;
} pool_params;
struct {
int output_size;
weight_t weights[MAX_LAYER_SIZE][MAX_LAYER_SIZE];
data_t biases[MAX_LAYER_SIZE];
activation type;
} fc_params;
} params;
data_t output[MAX_LAYER_SIZE];
data_t delta[MAX_LAYER_SIZE];
data_t pre_activation[MAX_LAYER_SIZE];
};
// Helper functions
data_t sigmoid(data_t x) {
#pragma HLS INLINE
return 1.0 / (1.0 + hls::exp(-x));
}
data_t relu(data_t x) {
#pragma HLS INLINE
return (x > 0) ? x : 0;
}
// Systolic array matrix multiplication for fully connected layers
void systolic_matrix_multiply(
const weight_t weights[MAX_LAYER_SIZE][MAX_LAYER_SIZE],
const data_t input[MAX_LAYER_SIZE],
acc_t output[MAX_LAYER_SIZE],
int M, int N) {
#pragma HLS PIPELINE II=1
#pragma HLS ARRAY_PARTITION variable=weights cyclic factor=16 dim=2
#pragma HLS ARRAY_PARTITION variable=input cyclic factor=16
static acc_t pe_array[MAX_LAYER_SIZE];
#pragma HLS ARRAY_PARTITION variable=pe_array cyclic factor=16
// Initialize processing elements
for (int i = 0; i < M; i++) {
#pragma HLS UNROLL factor=16
pe_array[i] = 0;
}
// Systolic computation
for (int k = 0; k < N; k++) {
for (int i = 0; i < M; i++) {
#pragma HLS PIPELINE II=1
#pragma HLS UNROLL factor=16
pe_array[i] += weights[i][k] * input[k];
}
}
// Write results
for (int i = 0; i < M; i++) {
#pragma HLS UNROLL factor=16
output[i] = pe_array[i];
}
}
// Optimized convolution forward pass
void conv_forward(Layer& layer, const data_t input[MAX_LAYER_SIZE]) {
#pragma HLS INLINE off
const int padding = layer.params.conv_params.zero_padding;
const int stride = layer.params.conv_params.stride;
const int filter_size = layer.params.conv_params.filter_size;
const int num_filters = layer.params.conv_params.num_filters;
const int input_height = layer.params.conv_params.input_height;
const int input_width = layer.params.conv_params.input_width;
const int input_channels = layer.params.conv_params.input_channels;
// Create padded input buffer
data_t padded_input[MAX_CHANNELS][MAX_FILTER_SIZE][MAX_FILTER_SIZE];
#pragma HLS ARRAY_PARTITION variable=padded_input complete dim=1
const int padded_height = input_height + 2 * padding;
const int padded_width = input_width + 2 * padding;
const int output_height = (padded_height - filter_size) / stride + 1;
const int output_width = (padded_width - filter_size) / stride + 1;
// Main convolution loops
CONV_FILTERS: for(int f = 0; f < num_filters; f++) {
CONV_OUTPUT_H: for(int oh = 0; oh < output_height; oh++) {
CONV_OUTPUT_W: for(int ow = 0; ow < output_width; ow++) {
#pragma HLS PIPELINE II=1
acc_t sum = 0;
CONV_CHANNELS: for(int c = 0; c < input_channels; c++) {
CONV_KERNEL_H: for(int fh = 0; fh < filter_size; fh++) {
CONV_KERNEL_W: for(int fw = 0; fw < filter_size; fw++) {
#pragma HLS UNROLL factor=3
int ih = oh * stride + fh;
int iw = ow * stride + fw;
if (ih >= 0 && ih < padded_height && iw >= 0 && iw < padded_width) {
sum += input[c * input_height * input_width + (ih-padding) * input_width + (iw-padding)] *
layer.params.conv_params.weights[f][c][fh][fw];
}
}
}
}
sum += layer.params.conv_params.biases[f];
int output_idx = f * output_height * output_width + oh * output_width + ow;
layer.pre_activation[output_idx] = sum;
layer.output[output_idx] = relu(sum);
}
}
}
}
// Optimized max pooling forward pass
void maxpool_forward(Layer& layer, const data_t input[MAX_LAYER_SIZE]) {
#pragma HLS INLINE off
const int pool_size = layer.params.pool_params.pool_size;
const int stride = layer.params.pool_params.stride;
const int input_height = layer.height;
const int input_width = layer.width;
const int input_channels = layer.channels;
const int output_height = (input_height - pool_size) / stride + 1;
const int output_width = (input_width - pool_size) / stride + 1;
POOL_CHANNELS: for(int c = 0; c < input_channels; c++) {
POOL_OUTPUT_H: for(int oh = 0; oh < output_height; oh++) {
POOL_OUTPUT_W: for(int ow = 0; ow < output_width; ow++) {
#pragma HLS PIPELINE II=1
data_t max_val = -INFINITY;
POOL_WINDOW_H: for(int ph = 0; ph < pool_size; ph++) {
POOL_WINDOW_W: for(int pw = 0; pw < pool_size; pw++) {
#pragma HLS UNROLL
int ih = oh * stride + ph;
int iw = ow * stride + pw;
data_t val = input[c * input_height * input_width + ih * input_width + iw];
max_val = (val > max_val) ? val : max_val;
}
}
layer.output[c * output_height * output_width + oh * output_width + ow] = max_val;
}
}
}
}
// Optimized fully connected forward pass using systolic array
void fc_forward(Layer& layer, const data_t input[MAX_LAYER_SIZE]) {
#pragma HLS INLINE off
const int output_size = layer.params.fc_params.output_size;
const int input_size = layer.height * layer.width * layer.channels;
// Use systolic array for matrix multiplication
acc_t temp_output[MAX_LAYER_SIZE];
systolic_matrix_multiply(layer.params.fc_params.weights, input, temp_output, output_size, input_size);
// Add biases and apply activation
FC_OUTPUT: for(int o = 0; o < output_size; o++) {
#pragma HLS PIPELINE II=1
acc_t sum = temp_output[o] + layer.params.fc_params.biases[o];
if(layer.params.fc_params.type == a_sigmoid) {
layer.pre_activation[o] = sum;
layer.output[o] = sigmoid(sum);
} else {
layer.output[o] = sum; // For softmax, store raw values
}
}
// Apply softmax if needed
if(layer.params.fc_params.type == a_softmax) {
acc_t max_val = layer.output[0];
acc_t sum = 0;
// Find max value for numerical stability
SOFTMAX_MAX: for(int i = 1; i < output_size; i++) {
#pragma HLS PIPELINE II=1
max_val = (layer.output[i] > max_val) ? layer.output[i] : max_val;
}
// Compute exponentials and sum
SOFTMAX_EXP: for(int i = 0; i < output_size; i++) {
#pragma HLS PIPELINE II=1
layer.output[i] = hls::exp(layer.output[i] - max_val);
sum += layer.output[i];
}
// Normalize
SOFTMAX_NORM: for(int i = 0; i < output_size; i++) {
#pragma HLS PIPELINE II=1
layer.output[i] /= sum;
}
}
}
// Top-level function for HLS synthesis
void cnn_forward(
data_t input[MAX_LAYER_SIZE],
data_t output[MAX_LAYER_SIZE],
Layer layers[],
int num_layers) {
#pragma HLS INTERFACE m_axi port=input offset=slave bundle=gmem0
#pragma HLS INTERFACE m_axi port=output offset=slave bundle=gmem1
#pragma HLS INTERFACE m_axi port=layers offset=slave bundle=gmem2
#pragma HLS INTERFACE s_axilite port=num_layers bundle=control
#pragma HLS INTERFACE s_axilite port=return bundle=control
data_t layer_input[MAX_LAYER_SIZE];
data_t layer_output[MAX_LAYER_SIZE];
// Copy input to local buffer
memcpy(layer_input, input, MAX_LAYER_SIZE * sizeof(data_t));
// Process each layer
LAYER_LOOP: for(int i = 0; i < num_layers; i++) {
#pragma HLS LOOP_TRIPCOUNT min=1 max=20
Layer& current_layer = layers[i];
switch(current_layer.type) {
case conv:
conv_forward(current_layer, layer_input);
break;
case max_pool:
maxpool_forward(current_layer, layer_input);
break;
case fully_connected:
fc_forward(current_layer, layer_input);
break;
default:
break;
}
// Copy output to input buffer for next layer
memcpy(layer_input, current_layer.output, MAX_LAYER_SIZE * sizeof(data_t));
}
// Copy final output
memcpy(output, layer_input, MAX_LAYER_SIZE * sizeof(data_t));
}

656
cnn.c
View File

@ -1,85 +1,605 @@
#include <stdio.h> // convolutional neural network c header library
// inspired by euske's nn1
// meant to be synthesized into RTL through Vitus HLS for an FPGA implementation
#include <stdlib.h> #include <stdlib.h>
#include <math.h> #include <math.h>
#include <string.h>
typedef enum { typedef enum {
input, input,
conv, conv,
max_pool, max_pool,
fully_connected, fully_connected
output
} ltype; } ltype;
typedef enum { typedef enum {
relu, fc_input,
softmax, fc_hidden,
sigmoid, fc_output,
tanh } fcpos;
typedef enum {
a_sigmoid,
a_softmax,
} activation; } activation;
typedef struct { typedef struct {
int filters; ltype type;
int filter_h; int height;
int filter_w; int width;
int stride; int channels; // in this case, "channels" are the number of filters that are coming in
int zeropadding; // amount of zeropadding (1 = one layer... etc.)
} convparams;
typedef struct { union {
int pool_height; // height and width of the pooling window struct {
int pool_width; int num_filters;
} poolparams; int filter_size; // single integer b/c filter will usually be square shaped
int stride;
int zero_padding; // single integer for how many layers of zero padding
int input_height;
int input_width;
int input_channels;
float (*weights);
float (*biases);
} conv_params;
typedef struct { struct {
ltype type; int pool_size; // single integer again
activation atype; int stride;
int input_height;
int input_height; int input_width;
int input_width; } pool_params;
int input_channels;
int output_height; struct {
int output_width; int output_size;
int output_channels; float* weights;
float* biases;
union { activation type;
convparams layerconv; } fc_params;
poolparams layerpool; } params;
} params; float* output;
float* delta;
float* weights; float* pre_activation;
float* biases; float (*activation_g)(float);
} Layer; } Layer;
Layer* createlayer(ltype type, int height, int width, int channels, void* params) { typedef struct {
Layer* layer = (Layer*)malloc(sizeof(Layer)); Layer** layers;
layer->type = type; int num_layers;
layer->input_height = height; } Network;
layer->input_width = width;
layer->input_channels = channels;
layer->weights = NULL; Network* create_network(int capacity) {
layer->biases = NULL; Network* network = (Network*)malloc(sizeof(Network));
network->layers = (Layer**)malloc(capacity * sizeof(Layer*));
switch(type) { network->num_layers = capacity;
case input: { return network;
layer->output_height = input_height; }
layer->output_width = input_width;
layer->output_channels = input_channels; float he_init(int fan_in) {
layer->activation = relu; float scale = sqrt(2.0f / fan_in);
break; float random = (float)rand() / RAND_MAX * 2 - 1;
} return random * scale;
case conv: { }
convparams* cparams = (convparams*)params;
layer->params.layerconv = *cparams; float glorot_init(int fan_in, int fan_out) {
layer->activation = relu; float limit = sqrt(6.0f / (fan_in + fan_out));
float random = (float)rand() / RAND_MAX;
// https://cs231n.github.io/convolutional-networks/#pool - formula to find dimensions return random * 2 * limit - limit;
layer->output_height = ((input_height + 2*conv_params->zero_padding - conv_params->filter_height) / conv_params->stride_height) + 1; }
layer->output_width = ((input_width + 2*conv_params->zero_padding - conv_params->filter_width) / conv_params->stride_width) + 1;
float relu(float x) {
layer->output_channels = convparams->filters; return x > 0 ? x : 0;
}
} float sigmoid(float x) {
return 1 / (1 + exp(-x));
}
float relu_g(float x) {
return x > 0 ? 1 : 0;
}
float sigmoid_g(float x) {
float sig = sigmoid(x);
return sig * (1 - sig);
}
void softmax(float* input, float* output, int size) {
float max = input[0];
for(int i = 1; i < size; i++) {
if(input[i] > max) {
max = input[i];
}
}
float sum = 0;
for(int i = 0; i < size; i++) {
output[i] = exp(input[i] - max);
sum += output[i];
}
for(int i = 0; i < size; i++) {
output[i] /= sum;
}
}
Layer* create_input(int height, int width, int channels) {
Layer* layer = (Layer*)malloc(sizeof(Layer));
layer->type = input;
layer->height = height;
layer->width = width;
layer->channels = channels;
layer->output = (float*)calloc(height * width * channels, sizeof(float));
return layer;
}
Layer* create_conv(int input_height, int input_width, int input_channels, int num_filters, int filter_size, int stride, int padding) {
Layer* layer = (Layer*)malloc(sizeof(Layer));
layer->type = conv;
layer->params.conv_params.num_filters = num_filters;
layer->params.conv_params.filter_size = filter_size;
layer->params.conv_params.stride = stride;
layer->params.conv_params.zero_padding = padding;
layer->params.conv_params.input_height = input_height;
layer->params.conv_params.input_width = input_width;
layer->params.conv_params.input_channels = input_channels;
// output dimensions
// https://cs231n.github.io/convolutional-networks/
int output_h = (input_height + 2 * padding - filter_size) / stride + 1;
int output_w = (input_width + 2 * padding - filter_size) / stride + 1;
layer->height = output_h;
layer->width = output_w;
layer->channels = num_filters;
layer->activation_g = relu_g;
// conv layer uses relu, use HE init
int weights_size = num_filters * input_channels * filter_size * filter_size;
int fan_in = input_channels * filter_size * filter_size;
layer->params.conv_params.weights = (float*)calloc(weights_size, sizeof(float));
for (int i = 0; i < weights_size; i++) {
layer->params.conv_params.weights[i] = he_init(fan_in);
}
layer->params.conv_params.biases = (float*)calloc(num_filters, sizeof(float));
layer->output = (float*) calloc(output_h * output_w * num_filters, sizeof(float));
layer->delta = (float*) calloc(output_h * output_w * num_filters, sizeof(float));
layer->pre_activation = (float*)calloc(output_h * output_w * num_filters, sizeof(float));
return layer;
}
Layer* create_maxpool(int input_height, int input_width, int input_channels, int pool_size, int stride) {
Layer* layer = (Layer*)malloc(sizeof(Layer));
layer->type = max_pool;
layer->params.pool_params.pool_size = pool_size;
layer->params.pool_params.stride = stride;
layer->params.pool_params.input_height = input_height;
layer->params.pool_params.input_width = input_width;
// output dimensions
// https://cs231n.github.io/convolutional-networks/
int output_h = (input_height - pool_size) / stride + 1;
int output_w = (input_width - pool_size) / stride + 1;
layer->height = output_h;
layer->width = output_w;
layer->channels = input_channels;
layer->output = (float*) calloc(output_h * output_w * input_channels, sizeof(float));
layer->delta = (float*) calloc(output_h * output_w * input_channels, sizeof(float));
return layer;
}
Layer* create_fc(int output_size, int input_size, activation type) {
Layer* layer = (Layer*)malloc(sizeof(Layer));
layer->type = fully_connected;
layer->params.fc_params.output_size = output_size;
layer->params.fc_params.type = type; // activation type can either be sigmoid or softmax (output layer)
layer->activation_g = (type == a_sigmoid) ? sigmoid_g : NULL; // null is softmax (doesnt have a gradient)
// use glorot initalization
layer->params.fc_params.weights = (float*)calloc(output_size * input_size, sizeof(float));
for (int i = 0; i < (output_size * input_size); i++) {
layer->params.fc_params.weights[i] = glorot_init(input_size, output_size);
}
layer->params.fc_params.biases = (float*)calloc(output_size, sizeof(float));
layer->height = 1;
layer->width = 1;
layer->channels = output_size;
layer->output = (float*) calloc(output_size, sizeof(float));
layer->delta = (float*) calloc(output_size, sizeof(float));
layer->pre_activation = (float*) calloc(output_size, sizeof(float));
return layer;
}
void free_layer(Layer* layer) {
switch (layer->type) {
case input:
free(layer->output);
free(layer);
break;
case conv:
free(layer->params.conv_params.weights);
free(layer->params.conv_params.biases);
free(layer->output);
free(layer->delta);
free(layer->pre_activation);
free(layer);
break;
case max_pool:
free(layer->output);
free(layer->delta);
free(layer);
break;
case fully_connected:
free(layer->params.fc_params.weights);
free(layer->params.fc_params.biases);
free(layer->output);
free(layer->delta);
free(layer->pre_activation);
free(layer);
break;
}
}
void destroy_network(Network* network) {
if (!network) return;
for (int i = 0; i < network->num_layers; i++) {
if (network->layers[i]) {
free_layer(network->layers[i]);
}
}
free(network->layers);
free(network);
}
void conv_forward(Layer* layer, float* input) {
int padding = layer->params.conv_params.zero_padding;
int stride = layer->params.conv_params.stride;
int filter_size = layer->params.conv_params.filter_size;
int num_filters = layer->params.conv_params.num_filters;
int input_height = layer->params.conv_params.input_height;
int input_width = layer->params.conv_params.input_width;
int input_channels = layer->params.conv_params.input_channels;
int padded_height = input_height + 2 * padding;
int padded_width = input_width + 2 * padding;
float* padded_input = (float*) calloc(padded_height * padded_width * input_channels, sizeof(float));
for (int c = 0; c < input_channels; c++) {
for (int h = 0; h < input_height; h++) {
for (int w = 0; w < input_width; w++) {
padded_input[c * padded_height * padded_width + (h + padding) * padded_width + (w + padding)] = input[c * input_height * input_width + h * input_width + w];
}
}
}
int output_height = (padded_height - filter_size) / stride + 1;
int output_width = (padded_width - filter_size) / stride + 1;
int output_size = output_height * output_width * num_filters;
// for every filter
for(int f = 0; f < num_filters; f++) {
for(int oh = 0; oh < output_height; oh++) {
for(int ow = 0; ow < output_width; ow++) {
float sum = 0;
for(int c = 0; c < input_channels; c++) {
for(int fh = 0; fh < filter_size; fh++) {
for(int fw = 0; fw < filter_size; fw++) {
int ih = oh * stride + fh;
int iw = ow * stride + fw;
if (ih >= 0 && ih < padded_height && iw >= 0 && iw < padded_width) {
int input_idx = c * padded_height * padded_width + ih * padded_width + iw;
int weight_idx = f * input_channels * filter_size * filter_size +
c * filter_size * filter_size +
fh * filter_size + fw;
sum += padded_input[input_idx] * layer->params.conv_params.weights[weight_idx];
}
}
}
}
sum += layer->params.conv_params.biases[f];
int output_idx = f * output_height * output_width + oh * output_width + ow;
layer->pre_activation[output_idx] = sum;
layer->output[output_idx] = relu(sum);
}
}
}
free(padded_input);
}
void maxpool_forward(Layer* layer, float* input) {
int pool_size = layer->params.pool_params.pool_size;
int stride = layer->params.pool_params.stride;
// prev layer
int input_height = layer->height;
int input_width = layer->width;
int input_channels = layer->channels;
int output_height = (input_height - pool_size) / stride + 1;
int output_width = (input_width - pool_size) / stride + 1;
int output_size = output_height * output_width * input_channels;
for(int c = 0; c < input_channels; c++) {
for(int oh = 0; oh < output_height; oh++) {
for(int ow = 0; ow < output_width; ow++) {
float max_val = -INFINITY;
for(int ph = 0; ph < pool_size; ph++) {
for(int pw = 0; pw < pool_size; pw++) {
int ih = oh * stride + ph;
int iw = ow * stride + pw;
float val = input[c * input_height * input_width + ih * input_width + iw];
if(val > max_val) {
max_val = val;
}
}
}
layer->output[c * output_height * output_width + oh * output_width + ow] = max_val;
}
}
}
}
void fc_forward(Layer* layer, float* input) {
int output_size = layer->params.fc_params.output_size;
int input_size = layer->height * layer->width * layer->channels;
// flatten
float* flattened_input = (float*) calloc(input_size, sizeof(float));
for(int i = 0; i < input_size; i++) {
flattened_input[i] = input[i];
}
// matmul (output = bias + (input * weight))
float* temp_output = (float*) calloc(output_size, sizeof(float));
for(int o = 0; o < output_size; o++) {
float sum = 0;
for(int i = 0; i < input_size; i++) {
sum += flattened_input[i] * layer->params.fc_params.weights[o * input_size + i];
}
sum += layer->params.fc_params.biases[o];
temp_output[o] = sum;
}
// apply the correct activation (sigmoid for non output layers, softmax for output)
if(layer->params.fc_params.type == a_sigmoid) {
for(int o = 0; o < output_size; o++) {
layer->pre_activation[o] = temp_output[o];
layer->output[o] = sigmoid(temp_output[o]);
}
} else if(layer->params.fc_params.type == a_softmax) {
softmax(temp_output, layer->output, output_size);
}
free(temp_output);
free(flattened_input);
}
void forward_propagation(Layer* layer, float* input_fc) {
int input_size;
switch(layer->type) {
case input:
// input to layer->output
input_size = (layer->height * layer->width * layer->channels);
for(int i = 0; i < input_size; i++) {
layer->output[i] = input_fc[i];
}
break;
case conv:
conv_forward(layer, input_fc);
break;
case max_pool:
maxpool_forward(layer, input_fc);
break;
case fully_connected:
fc_forward(layer, input_fc);
break;
}
}
void network_forward(Network* network, float* input) {
float* current_input = input;
for (int i = 0; i < network->num_layers; i++) {
forward_propagation(network->layers[i], current_input);
current_input = network->layers[i]->output;
}
}
void fc_backward(Layer* layer, float* prev_delta, float* input, float learning_rate) {
int output_size = layer->params.fc_params.output_size;
int input_size = layer->height * layer->width * layer->channels;
float* gradient;
if(layer->params.fc_params.type == a_softmax) {
gradient = (float*)malloc(output_size * sizeof(float));
for(int i = 0; i < output_size; i++) {
gradient[i] = layer->output[i];
if(prev_delta[i] > 0.5) { // one hot encoded
gradient[i] -= 1.0;
}
}
} else {
gradient = prev_delta;
}
// update weights and biases
for(int o = 0; o < output_size; o++) {
for(int i = 0; i < input_size; i++) {
layer->params.fc_params.weights[o * input_size + i] -=
learning_rate * gradient[o] * input[i];
}
layer->params.fc_params.biases[o] -= learning_rate * gradient[o];
}
// gradient
if(layer->activation_g) {
for(int i = 0; i < input_size; i++) {
float sum = 0;
for(int o = 0; o < output_size; o++) {
sum += layer->params.fc_params.weights[o * input_size + i] * gradient[o];
}
layer->delta[i] = sum * layer->activation_g(layer->pre_activation[i]);
}
}
if(layer->params.fc_params.type == a_softmax) {
free(gradient);
}
}
void conv_backward(Layer* layer, float* prev_delta, float* input, float learning_rate) {
int num_filters = layer->params.conv_params.num_filters;
int channels = layer->channels;
int filter_size = layer->params.conv_params.filter_size;
int input_height = layer->height;
int input_width = layer->width;
int padding = layer->params.conv_params.zero_padding;
int stride = layer->params.conv_params.stride;
int output_height = (input_height + 2 * padding - filter_size) / stride + 1;
int output_width = (input_width + 2 * padding - filter_size) / stride + 1;
// gradient w/respect to filters
for(int f = 0; f < num_filters; f++) {
for(int c = 0; c < channels; c++) {
for(int fh = 0; fh < filter_size; fh++) {
for(int fw = 0; fw < filter_size; fw++) {
float grad = 0;
for(int oh = 0; oh < output_height; oh++) {
for(int ow = 0; ow < output_width; ow++) {
int ih = oh * stride + fh - padding;
int iw = ow * stride + fw - padding;
if(ih >= 0 && ih < input_height && iw >= 0 && iw < input_width) {
grad += input[c * input_height * input_width + ih * input_width + iw] * prev_delta[f * output_height * output_width + oh * output_width + ow];
}
}
}
int index = f * channels * filter_size * filter_size + c * filter_size * filter_size + fh * filter_size + fw;
layer->params.conv_params.weights[index] -= learning_rate * grad;
}
}
}
}
// gradient w/respect to biases
for(int f = 0; f < num_filters; f++) {
float grad = 0;
for(int oh = 0; oh < output_height; oh++) {
for(int ow = 0; ow < output_width; ow++) {
grad += prev_delta[f * output_height * output_width + oh * output_width + ow];
}
}
layer->params.conv_params.biases[f] -= learning_rate * grad;
}
// gradient with respect to inputs
for(int c = 0; c < channels; c++) {
for(int ih = 0; ih < input_height; ih++) {
for(int iw = 0; iw < input_width; iw++) {
float grad = 0;
for(int f = 0; f < num_filters; f++) {
for(int fh = 0; fh < filter_size; fh++) {
for(int fw = 0; fw < filter_size; fw++) {
int oh = (ih - fh + padding) / stride;
int ow = (iw - fw + padding) / stride;
if((ih - fh + padding) % stride == 0 && (iw - fw + padding) % stride == 0 && oh < output_height && ow < output_width) {
int w_index = f * channels * filter_size * filter_size + c * filter_size * filter_size + fh * filter_size + fw;
grad += layer->params.conv_params.weights[w_index] * prev_delta[f * output_height * output_width + oh * output_width + ow];
}
}
}
}
layer->delta[c * input_height * input_width + ih * input_width + iw] = grad * layer->activation_g(layer->pre_activation[c * input_height * input_width + ih * input_width + iw]);
}
}
}
}
void maxpool_backward(Layer* layer, float* prev_delta, float* input, float learning_rate) {
int pool_size = layer->params.pool_params.pool_size;
int stride = layer->params.pool_params.stride;
int input_height = layer->params.pool_params.input_height;
int input_width = layer->params.pool_params.input_width;
int channels = layer->channels;
// Zero initialize deltas
memset(layer->delta, 0, input_height * input_width * channels * sizeof(float));
int output_height = layer->height;
int output_width = layer->width;
for(int c = 0; c < channels; c++) {
for(int oh = 0; oh < output_height; oh++) {
for(int ow = 0; ow < output_width; ow++) {
// finds max value
int maxI = -1, maxJ = -1;
float maxVal = -INFINITY;
for(int ph = 0; ph < pool_size; ph++) {
for(int pw = 0; pw < pool_size; pw++) {
int ih = oh * stride + ph;
int iw = ow * stride + pw;
// checks bounds
if (ih < input_height && iw < input_width) {
float val = input[c * input_height * input_width + ih * input_width + iw];
if(val > maxVal) {
maxVal = val;
maxI = ih;
maxJ = iw;
}
}
}
}
// only propagate gradient if a valid max position is found
if(maxI != -1 && maxJ != -1) {
int delta_idx = c * output_height * output_width + oh * output_width + ow;
layer->delta[c * input_height * input_width + maxI * input_width + maxJ] =
prev_delta[delta_idx];
}
}
}
}
}
void backward_propagation(Layer* layer, float* prev_delta, float* input_fc, float learning_rate) {
switch(layer->type) {
case fully_connected:
fc_backward(layer, prev_delta, input_fc, learning_rate);
break;
case conv:
conv_backward(layer, prev_delta, input_fc, learning_rate);
break;
case max_pool:
maxpool_backward(layer, prev_delta, input_fc, learning_rate);
break;
case input:
// No backpropagation for input layer
break;
}
}
void network_backward(Network* network, float* label, float learning_rate) {
// ouput
Layer* output_layer = network->layers[network->num_layers - 1];
// output gradient
for(int o = 0; o < output_layer->channels; o++) {
output_layer->delta[o] = output_layer->output[o] - label[o];
}
// backprop
for(int i = network->num_layers - 2; i >= 0; i--) {
Layer* current_layer = network->layers[i];
Layer* next_layer = network->layers[i + 1];
backward_propagation(current_layer, next_layer->delta, current_layer->output, learning_rate);
}
} }

189
mnist.c Normal file
View File

@ -0,0 +1,189 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "cnn.c"
#define IMG_HEIGHT 28
#define IMG_WIDTH 28
#define NUM_CLASSES 10
#define BATCH_SIZE 32
#define LEARNING_RATE 0.01
#define NUM_EPOCHS 10
float* read_mnist_images(const char* filename, int* num_images) {
FILE* fp = fopen(filename, "rb");
if (!fp) {
printf("Error opening file %s\n", filename);
return NULL;
}
int magic_number = 0;
fread(&magic_number, sizeof(int), 1, fp);
magic_number = ((magic_number & 0xff000000) >> 24) |
((magic_number & 0x00ff0000) >> 8) |
((magic_number & 0x0000ff00) << 8) |
((magic_number & 0x000000ff) << 24);
if (magic_number != 2051) {
printf("Invalid MNIST image file format\n");
fclose(fp);
return NULL;
}
fread(num_images, sizeof(int), 1, fp);
*num_images = ((*num_images & 0xff000000) >> 24) |
((*num_images & 0x00ff0000) >> 8) |
((*num_images & 0x0000ff00) << 8) |
((*num_images & 0x000000ff) << 24);
int rows, cols;
fread(&rows, sizeof(int), 1, fp);
fread(&cols, sizeof(int), 1, fp);
rows = ((rows & 0xff000000) >> 24) |
((rows & 0x00ff0000) >> 8) |
((rows & 0x0000ff00) << 8) |
((rows & 0x000000ff) << 24);
cols = ((cols & 0xff000000) >> 24) |
((cols & 0x00ff0000) >> 8) |
((cols & 0x0000ff00) << 8) |
((cols & 0x000000ff) << 24);
if (rows != IMG_HEIGHT || cols != IMG_WIDTH) {
printf("Invalid image dimensions\n");
fclose(fp);
return NULL;
}
float* images = (float*)malloc(*num_images * IMG_HEIGHT * IMG_WIDTH * sizeof(float));
unsigned char* temp = (unsigned char*)malloc(IMG_HEIGHT * IMG_WIDTH);
for (int i = 0; i < *num_images; i++) {
fread(temp, 1, IMG_HEIGHT * IMG_WIDTH, fp);
for (int j = 0; j < IMG_HEIGHT * IMG_WIDTH; j++) {
images[i * IMG_HEIGHT * IMG_WIDTH + j] = temp[j] / 255.0f;
}
}
free(temp);
fclose(fp);
return images;
}
float* read_mnist_labels(const char* filename, int* num_labels) {
FILE* fp = fopen(filename, "rb");
if (!fp) {
printf("Error opening file %s\n", filename);
return NULL;
}
int magic_number = 0;
fread(&magic_number, sizeof(int), 1, fp);
magic_number = ((magic_number & 0xff000000) >> 24) |
((magic_number & 0x00ff0000) >> 8) |
((magic_number & 0x0000ff00) << 8) |
((magic_number & 0x000000ff) << 24);
if (magic_number != 2049) {
printf("Invalid MNIST label file format\n");
fclose(fp);
return NULL;
}
fread(num_labels, sizeof(int), 1, fp);
*num_labels = ((*num_labels & 0xff000000) >> 24) |
((*num_labels & 0x00ff0000) >> 8) |
((*num_labels & 0x0000ff00) << 8) |
((*num_labels & 0x000000ff) << 24);
float* labels = (float*)calloc(*num_labels * NUM_CLASSES, sizeof(float));
unsigned char* temp = (unsigned char*)malloc(*num_labels);
fread(temp, 1, *num_labels, fp);
for (int i = 0; i < *num_labels; i++) {
labels[i * NUM_CLASSES + temp[i]] = 1.0f;
}
free(temp);
fclose(fp);
return labels;
}
int main() {
// load mnist
int num_train_images, num_train_labels;
float* train_images = read_mnist_images("train-images-idx3-ubyte", &num_train_images);
float* train_labels = read_mnist_labels("train-labels-idx1-ubyte", &num_train_labels);
// creating a lenet-5 inspired network
Network* network = create_network(8);
network->layers[0] = create_input(IMG_HEIGHT, IMG_WIDTH, 1);
network->layers[1] = create_conv(IMG_HEIGHT, IMG_WIDTH, 1, 6, 5, 1, 2);
network->layers[2] = create_maxpool(network->layers[1]->height, network->layers[1]->width, network->layers[1]->channels, 2, 2);
network->layers[3] = create_conv(network->layers[2]->height, network->layers[2]->width, network->layers[2]->channels, 16, 5, 1, 0);
network->layers[4] = create_maxpool(network->layers[3]->height, network->layers[3]->width, network->layers[3]->channels, 2, 2);
network->layers[5] = create_fc(120, network->layers[4]->height * network->layers[4]->width * network->layers[4]->channels, a_sigmoid);
network->layers[6] = create_fc(84, 120, a_sigmoid);
network->layers[7] = create_fc(NUM_CLASSES, 84, a_softmax);
// training loop
for (int epoch = 0; epoch < NUM_EPOCHS; epoch++) {
float total_loss = 0.0f;
int correct = 0;
for (int i = 0; i < num_train_images; i++) {
// forward pass
network_forward(network, &train_images[i * IMG_HEIGHT * IMG_WIDTH]);
// accuracy
float* output = network->layers[network->num_layers - 1]->output;
int predicted = 0;
float max_prob = output[0];
for (int j = 1; j < NUM_CLASSES; j++) {
if (output[j] > max_prob) {
max_prob = output[j];
predicted = j;
}
}
int true_label = 0;
for (int j = 0; j < NUM_CLASSES; j++) {
if (train_labels[i * NUM_CLASSES + j] > 0.5f) {
true_label = j;
break;
}
}
if (predicted == true_label) correct++;
// backprop
network_backward(network, &train_labels[i * NUM_CLASSES], LEARNING_RATE);
// cross entropy loss
float loss = 0.0f;
for (int j = 0; j < NUM_CLASSES; j++) {
if (train_labels[i * NUM_CLASSES + j] > 0.5f) {
loss -= log(output[j] + 1e-10);
}
}
total_loss += loss;
// progress
if ((i + 1) % 100 == 0) {
printf("Epoch %d/%d, Step %d/%d, Loss: %.4f, Accuracy: %.2f%%\n",
epoch + 1, NUM_EPOCHS, i + 1, num_train_images,
total_loss / (i + 1), 100.0f * correct / (i + 1));
}
}
printf("Epoch %d/%d completed, Average Loss: %.4f, Accuracy: %.2f%%\n",
epoch + 1, NUM_EPOCHS, total_loss / num_train_images,
100.0f * correct / num_train_images);
}
// Clean up
free(train_images);
free(train_labels);
destroy_network(network);
return 0;
}

BIN
train-images.idx3-ubyte Normal file

Binary file not shown.

BIN
train-labels.idx1-ubyte Normal file

Binary file not shown.