What is a GPU? Are They Needed for Deep Learning?

11 min readAug 20, 2022

In the age in which we live with the development of technology, more and more different concepts emerge every day. One of them should undoubtedly be the concept of “Artificial Intelligence”. Although artificial intelligence is used a lot, “Deep Learning” will meet us when we go deep into the current situation. In many areas today, we develop deep learning models to achieve specific tasks. In this article, we explore the GPU for deep learning with code samples.

This tutorial’s code is available on Github and its full implementation as well on Google Colab.

“In the era where artificial intelligence and algorithms make more decisions in our lives and in organizations, the time has come for people to tap into their intuition as an adjunct to today’s technical capabilities. Our inner wisdom can embed empirical data with humanity.”
- Abhishek Ratna

What is a GPU?
Are They Needed for Deep Learning?
What are the differences between GPU and CPU?
How to develop a deep learning model with GPU?
Comparison of bandwidth for CPUs and GPUs over time

What is a GPU?

A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device.

A GPU, short for graphics processing unit, is a computer chip that performs rapid mathematical equations for the purpose of rendering images.Specific or integrated may be part of a graphics card.

In embedded systems, it is quite possible to see GPUs in many areas such as personal computers or workstations. GPU depending on parallel processing structure, with advantages over graphics processor contained in image and video processing is typically quite successful. GPUs are becoming more popular for artificial intelligence (AI) these days.

One of the most important hardware you will need when developing a deep learning model should be the GPU.

Are They Needed for Deep Learning?

While using deep learning methods, the hardware called GPU, which is frequently mentioned, is constantly trying the possibilities that exist in the background. So this situation is actually not very abstract. While we talked about the more thematic area of the GPU above, we will discuss what GPU means for deep learning.

More speed and performance are needed when it comes to training models, whether in machine learning or data science. To be more specific, we know the complex structure of artificial neural networks. While these neural networks work with little or big data at the backend, an increase in uptime will be observed while training the training set. In addition, as the data set grows, the working time can sometimes even take up to a week.

Suppose you have a sample data set with image content. Algorithmically, take action for each data when it comes to feed forward and back propagation of ANNs in this data set. If you don’t have a GPU, the machine will take on more of the processing load and therefore it will take a very long time to give you the processing result.

What are the differences between GPU and CPU?

A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program.

The traditional CPU (Central Processing Unit) based approaches might not be the best solution for performing parallel computing as their cost and scalability issues. As seen in Figure 1, the CPU performs the basic arithmetic, logic, control, and Input /Output (I / O) operations specified by the instructions in the program.

A GPU (graphics processing unit) is a specialized type of microprocessor, primarily designed for quick image rendering. GPUs appeared as a response to graphically intense applications that put a burden on the CPU and degraded computer performance.

CPUs and GPUs are completely interchangeable, and they both do their job in different ways. Brain and power are usually given as examples of this pair. Because the CPU is literally the central processing unit, it is called the computer’s brain. Of course, the GPU wouldn’t make any sense without a CPU. Since the CPU is assumed to be the brain, it deals with different types of computations, while the GPU is required to focus on a specific task.

Talking about the advantages of the GPU, while the CPU can solve the processes it will do one after the other, the GPU can efficiently solve multiple tasks synchronously at the same time. In a way, these two complement each other.

In addition to switching devices, multiple GPUs can be used synchronously to increase the performance of the deep learning model. This is called Multi GPU. You can run multi-GPU in parallel or continue without parallel. When the GPU is powered up without parallel, each GPU will work separately. In this way, faster work will not be provided.

The use of multiple GPUs is very flexible in PyTorch and TensorFlow frameworks, which are popular deep learning models.

PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook’s AI Research lab.

TensorFlow is an open source framework that flexibly enables model parallelism.

TensorFlow is a free and open-source software library for machine learning. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.

🎥 Let’s examine how an image is processed in Adobe Illustrator to see the differences in the working mechanism of the GPU and CPU.

Adobe Illustrator CC: NVIDIA GPUs vs CPU

NVIDIA CEO Jensen Huang describes a three-pronged attempt to bring accelerated and AI computing to Arm CPU platforms at the GPU Technology Conference (GTC) Keynote: “Computing for the Age of AI”. It also summarizes the many news announcements made at this address to advance the AI era.

The Mythbusters, Adam Savage and Jamie Hyneman demonstrate the power of GPU computing.

As additional information, besides the CPU and GPU, there is a data processing unit called DPU. Admittedly, perhaps the only element that has existed on the computer for many years has been the CPU. More recently, the graphics processing unit, which we call the GPU, has taken on the role. Recently, a DPU has been created by making multi-core CPUs programmable with software in data centers.

How to develop a deep learning model with GPU?

When developing deep learning models, you must first make sure that your computer has a GPU and that it is available.

⚙️OS: Windows 10 Pro
⚙️CUDA Toolkit: 10
⚙️cuDNN: 7.4
⚙️TensorFlow GPU: 1.14.0
⚙️Keras: 2.2.5

There are many options for software technology. I use one of my web frameworks, TensorFlow, that has the most documentation.

Determine your processor suitable for your machine at link. For example, in the image below, we have learned the processing capacity by checking the processor of the machine I work with actively.

The following command line is run to get information about the graphics card suitable for your machine.

wmic path win32_VideoController get name

Controlling the Quadro RTX 5000 Processor

To use the GPU, you need to install the appropriate cuDNN tool and CUDA toolkit for your machine. Otherwise, you will not be able to use the GPU. To use GPU with TensorFlow, it is necessary to install the tensorflow-gpu library. If loading with conda, the appropriate CUDA and cuDNN versions will also be displayed during the process.

Installing CUDA Toolkit and cuDNN Tool

We will need to install the CUDA and cuDNN tool that matches the version of TensorFlow we will be using. I would like to warn you that if you download different versions, you will encounter many errors. In TensorFlow 2.x versions, log or shape errors can be received. I’ve noticed that after multiple installations, tensorflow-gpu works fine with 1.14.0 or 1.15.0.

Available TensorFlow Versions for CUDA [Res][4]

The compilations required for tensorflow-gpu==1.14.0 show 7.4 for cuDNN and version 10 for the CUDA Toolkit.

When installing with Conda, you will receive approval for the installation. You can also see the CUDA and cuDNN versions installed while giving this approval. In this way, we will see that we are on the right track.

However, the Keras library must be installed in addition to TensorFlow. Keras is an open source software library that provides a Python interface for neural networks. Keras acts as an interface for the TensorFlow library.

📚 Deep Learning with Python, written by François Chollet, the creator of Keras, is very successful for those who want to work in this field.

To create a GPU setup environment with Deep Learning, visit the article Creating a Deep Learning Environment with TensorFlow GPU.

Virtual environments are often used to avoid incorrect installation in the base environment of the machine. In this step, a virtual environment of a specific Python version is set up.

conda create -n virtualenv python=3.6conda activate virtualenv

Conda or pip commands are used to load TensorFlow GP. In this step tensorflow gpu will be installed.

pip install tensorflow-gpu

If no version is specified when installing tensorflow-gpu, the latest version will be installed. If a certain version version is to be installed, it is done by writing the version version.

pip install tensorflow-gpu==1.15.0

After the TensorFlow GPU is installed, you should run the following line of code for control purposes.

import tensorflow as tf
tf.test.gpu_device_name()

To check the tensorflow gpu version you have installed, the pip show command.

pip show tensorflow-gpu

To check the availability of the GPU, the following snippet is run.

%tensorflow_version 2.ximport tensorflow as tfdevice_name = tf.test.gpu_device_name()if device_name != ‘/device:GPU:0’:raise SystemError(‘GPU device not found’)print(‘Found GPU at: {}’.format(device_name))

When using tensorflow 2.4.1 version, tensorflow 2.x version is shown. This case is used for 1.x versions as follows.

%tensorflow_version 1.ximport tensorflow as tfdevice_name = tf.test.gpu_device_name()if device_name != ‘/device:GPU:0’:raise SystemError(‘GPU device not found’)print(‘Found GPU at: {}’.format(device_name))

Computers are likely to have more than one GPU. By default, your machine or your media uses GPU thanks to Google Colab. The other device can be used instead of this GPU used instead. It will work for device / GPU: 1 in the following snippet.

import tensorflow as tf
tf.device(‘/device:GPU:1’)

A parallel GPU usage has been reached. Also I must say that it is possible to set the GPU even on your terminal before going into your notebook.

from tensorflow.python.client import device_libdevice_lib.list_local_devices()

When considered for Colab Notebook, 4 devices will be offered. These are CPU, GPU, XLA_CPU and XLA_GPU. 4 devices are shown in the list here. 2 of them are a concept excluding CPU and GPU.

As mentioned in the docs, XLA stands for “accelerated linear algebra”. It’s Tensorflow’s relatively new optimizing compiler that can further speed up your ML models’ GPU operations by combining what used to be multiple CUDA kernels into one (simplifying because this isn’t that important for your question).

import tensorflow as tftry:tf.device(‘/job:localhost/replica:0/task:0/device:GPU:1’)except RuntimeError as e:print(e)

If the physical GPU did not exist for device 1, I suggest you try the code below as well. In order to learn the name of the device used, the test gpu will run when it is a GPU.

import tensorflow as tftf.test.gpu_device_name()

Model training is sometimes tested to check the GPU operating speed. Convolutional network is run on certain layers by loading data from the MNIST data set.

import tensorflow as tfmnist = tf.keras.datasets.fashion_mnist(training_images, training_labels), (test_images, test_labels) = mnist.load_data()training_images=training_images / 255.0test_images=test_images / 255.0model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),tf.keras.layers.Dense(128, activation=tf.nn.relu),tf.keras.layers.Dense(10, activation=tf.nn.softmax)])model.compile(optimizer=’adam’, loss=’sparse_categorical_crossentropy’, metrics=[‘accuracy’])model.fit(training_images, training_labels, epochs=5)test_loss = model.evaluate(test_images, test_labels)

Here, as the number of data increases and the problem becomes more intense, the difference between training will become much wider. Since there is not a very large data set in this example line of code, there are no big differences between CPU and GPU when processing data. However, there will be a significant difference when processing big data.

The following command is run to control the active TensorFlow version.

import tensorflow as tfsess=tf.Session(config=tf.ConfigProto(log_device_placement=True))

Along with TensorFlow, the Keras framework, which has been mentioned frequently recently, TensorFlow is now used as tf.keras. It is the framework that should be used in deep learning environments.

pip install keras==2.2.5

Comparison of bandwidth for CPUs and GPUs over time

Bandwidth refers to the external bandwidth between a GPU and its associated system. It’s a measure of the data transfer speed across the bus that connects the two (for example, PCIe or Thunderbolt). Bandwidth doesn’t refer to the internal bandwidth of a GPU, which is a measure of the data transfer speed between components within the GPU [7].

The CPU uses too much memory when a model is being trained during deep learning. Especially, this situation will be encountered when processing large data sets. However, when looking at the GPU, when training the model, while processing with memory called VRAM, the remaining memory of the CPU will be allocated to other tasks. Thus, even in complex problems, the process can be completed with few cycles.

Figure 2. GPU vs CPU Performance [Res][2]

The high-end NVIDIA GPUs have much, much wider buses and higher memory clock rates than any CPU. The maximum memory bandwidth is the maximum rate at which data can be read from or stored into a semiconductor memory by the processor (in GB/s).

The theoretical maximum memory bandwidth for Intel Core X-Series Processors can be calculated by multiplying the memory frequency (one half since double data rate x 2), multiplied by the number of the bytes of width, and multiplied by the number of the channels supported for the processor.

For example:
For DDR4 2933 the memory supported in some core-x -series is (1466.67 X 2) X 8 (# of bytes of width) X 4 (# of channels) = 93,866.88 MB/s bandwidth, or 94 GB/s [Res][3].

A lower-than-expected memory bandwidth may be seen due to many system variables, such as software workloads and system power states.

High-end NVIDIA GPUs have much wider buses and higher memory clock speeds than any CPU. Considering the Intel processor Core i7 with the highest memory bandwidth, it appears to have a memory bus with a width of 192 bits and a memory speed (effectively) up to 800 MHz. The fastest NVIDIA GPU is GTX 285 [Res][3].

🎉 You can also follow me GitHub, YouTube and Twitter account for more content!

References

[1]From Wikipedia, the free encyclopedia, ‘Graphics processing unit’, https://en.wikipedia.org/wiki/Graphics_processing_unit.

[2]https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html.

[3]https://searchvirtualdesktop.techtarget.com/definition/GPU-graphics-processing-unit.

[4] From Wikipedia, the free encyclopedia, https://en.wikipedia.org/wiki/Central_processing_unit.

[5]https://blogs.nvidia.com/blog/2020/05/20/whats-a-dpu-data-processing-unit/.

[6]https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/.

[7]https://developer.apple.com/documentation/metal/gpu_selection_in_macos/understanding_gpu_bandwidth.

[8]https://en.wikipedia.org/wiki/PyTorch.

[9]From Wikipedia, the free encyclopedia, https://en.wikipedia.org/wiki/Memory_bandwidth.

[10] From Wikipedia, the free encyclopedia, https://en.wikipedia.org/wiki/TensorFlow

[11] From Wikipedia, the free encyclopedia, https://en.wikipedia.org/wiki/Keras

Resources

[1]https://lambdalabs.com/blog/choosing-a-gpu-for-deep-learning/

[2]D. R. V. L. B. Thambawita, Roshan Ragel and Dhammika Elkaduwe, To Use or Not to Use: Graphics Processing Units (GPUs) for Pattern Matching Algorithms, 2015.

[3]https://forums.developer.nvidia.com/t/why-gpu-has-large-memory-bandwidth-than-cpu/10294.

[4]https://www.tensorflow.org/install/source_windows.

[5] François Chollet, Deep Learning with Python, https://g.co/kgs/CRASoC.