Hello Everyone, in this tutorial we will be learning creation and management of NumPy arrays for data analysis. In the world of data analysis, efficiency and performance are crucial when handling large datasets. Python’s NumPy (Numerical Python) library is a powerful tool that provides high-performance, multi-dimensional arrays and a wide range of mathematical functions to process data effectively. Unlike Python lists, NumPy arrays (ndarrays) offer faster computations, better memory management, and seamless operations on large datasets, making them an essential component of Data Analysis, Data Science, and Machine Learning. This tutorial will guide you through the creation and management of NumPy arrays, covering different ways to initialize arrays, manipulate their structure, and perform essential operations to make data processing easier and more efficient. Whether you’re a beginner or an experienced programmer, mastering NumPy will significantly enhance your ability to work with numerical data in Python.
Why use NumPy?
One might think that if we have collection datatypes like Lists, Tuples and Sets, why use an another fancy collection datatype like NumPy Arrays. The answer to that question is pretty much simple, in the context of Data Analysis mathematical operations are essential for creating meaningfull outcomes. Lists and Tuples store ordered element but do not support direct element-wise operations like NumPy arrays. On the other hand, Sets stores unordered collections of unique elements but do not support indexing and direct mathematical operations. Lists, tuples, and sets are general-purpose data structures in Python, but they are not true vectors. In order to use them we have to convert them into vector values. Here NumPy comes in the picture, NumPy can convert them into arrays, enabling vectorized operations. This is because NumPy arrays act as a wrapper around these collection types, transforming them into a structured, optimized data format for numerical computation.
import numpy as np list = [1, 2, 3] tuple = (4, 5, 6) set = {7, 8, 9} numpyList = np.array(list) numpyTuple = np.array(tuple) numpySet = np.array(sorted(set)) print(numpyList + numpyTuple)
Output
[5 7 9]
Creating a NumPy Array
1. 0-dim Array
array0d = np.array(1) print(f"array0d: {array0d}"), #shows the elements of an array print(f"shape: {array0d.shape}"), #shows the number of rows and columns print(f"dimension: {array0d.ndim}") #shows the dimension
Output
array0d: 1 shape: () dimension: 0
2. 1-dim Array
array1d = np.array([1,2,3]) print(f"array1d: {array1d}"), print(f"shape: {array1d.shape}"), print(f"dimension: {array1d.ndim}")
Output
array1d: [1 2 3] shape: (3,) dimension: 1
3. 2-dim Array
array2d = np.array([[1,2,3],[4,5,6]]) print("array2d: "), print(array2d), print(f"shape: {array2d.shape}"), print(f"dimension: {array2d.ndim}")
Output
array2d: [[1 2 3] [4 5 6]] shape: (2, 3) dimension: 2
4. 3-dim Array
array3d = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]]) print("array3d: "), print(array3d), print(f"shape: {array3d.shape}"), print(f"dimension: {array3d.ndim}")
Output
array3d: [[[ 1 2 3] [ 4 5 6]] [[ 7 8 9] [10 11 12]]] shape: (2, 2, 3) dimension: 3
5. Array of Zeros
arrayz = np.zeros((2,3)) print("arrayz: "), print(arrayz), print(f"shape: {arrayz.shape}"), print(f"dimension: {arrayz.ndim}")
Output
arrayz: [[0. 0. 0.] [0. 0. 0.]] shape: (2, 3) dimension: 2
6. Array of Ones
arrayo = np.ones((2,3)) print("arrayo: "), print(arrayo), print(f"shape: {arrayo.shape}"), print(f"dimension: {arrayo.ndim}")
Output
arrayo: [[1. 1. 1.] [1. 1. 1.]] shape: (2, 3) dimension: 2
7. Array of Specific Number
arrayf = np.full((2,3), 5) print("arrayf: "), print(arrayf), print(f"shape: {arrayf.shape}"), print(f"dimension: {arrayf.ndim}")
Output
arrayf: [[5 5 5] [5 5 5]] shape: (2, 3) dimension: 2
8. Identity Matrix
arrayi = np.eye(3) print("arrayi: "), print(arrayi), print(f"shape: {arrayi.shape}"), print(f"dimension: {arrayi.ndim}")
Output
arrayi: [[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]] shape: (3, 3) dimension: 2
9. Array with Specific Ranges
There are 2 methods to create a specific range array. Both generate sequences of numbers in NumPy, but they differ in how they define the range and the number of elements.
-
np.arange(start, stop, step)
#Using np.arange(start, stop, step/difference) (It can also work without step but then it will take all whole numbers till the stop) arrayr = np.arange(1,10,2) print(f"arrayr: {arrayr}"), print(f"type: {type(arrayr)}"), #shows the type of the argument print(f"shape: {arrayr.shape}"), print(f"dimension: {arrayr.ndim}")
Output
arrayr: [1 3 5 7 9] type: <class 'numpy.ndarray'> shape: (5,) dimension: 1
-
np.linspace(start, stop, number of elements)
#Using np.linspace(start, stop, number of elements/divisions) (It can also work without divisions but then it will take 50 divisions as default) arrayl = np.linspace(0, 1, 5) print(f"arrayl: {arrayl}"), print(f"type: {type(arrayl)}"), print(f"shape: {arrayl.shape}"), print(f"dimension: {arrayl.ndim}")
Output
arrayl: [0. 0.25 0.5 0.75 1. ] type: <class 'numpy.ndarray'> shape: (5,) dimension: 1
When to use what?
Use np.arange() when you know the step size.
Use np.linspace() when you know how many values you need.
Managing a NumPy Array
1. Reshaping an Array
#Reshaping (converting 1-dim array to 2-dim array) arrayDummy = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]) print(f"arrayDummy: {arrayDummy}"), print(f"shape: {arrayDummy.shape}"), print(f"size: {arrayDummy.size}"), print(f"dimension: {arrayDummy.ndim}") reshapedDummy = arrayDummy.reshape(3,3) print("reshapedDummy: "), print(reshapedDummy), print(f"shape: {reshapedDummy.shape}"), print(f"size: {reshapedDummy.size}"), print(f"dimension: {reshapedDummy.ndim}")
Output
arrayDummy: [1 2 3 4 5 6 7 8 9] shape: (9,) size: 9 dimension: 1 reshapedDummy: [[1 2 3] [4 5 6] [7 8 9]] shape: (3, 3) size: 9 dimension: 2
2. Indexing and Slicing
#Indexing and Slicing print(f"3th index of arrayDummy: {arrayDummy[3]}") #Specific access print(f"1st index of row and 0th index of column of reshapedDummy: {reshapedDummy[1][0]}") #Specific access reshapedDummy[1][0] = 99 #Modifing elements print("Updated reshapedDummy: "), print(reshapedDummy) print(f"Slicing the column: {reshapedDummy[:,1]}") #All rows, second col print(f"Slicing the row: {reshapedDummy[1,:]}") #Second row, all cols
Output
3rd index of arrayDummy: 99 1st index of row and 0th index of column of reshapedDummy: 99 Updated reshapedDummy: [[ 1 2 3] [99 5 6] [ 7 8 9]] Slicing the column: [2 5 8] Slicing the row: [99 5 6]
NumPy arrays are a game-changer when it comes to handling and analyzing numerical data efficiently in Python. They provide speed, flexibility, and powerful mathematical operations that traditional Python collections lack. In this tutorial, we explored how to create and manage NumPy arrays, from basic initialization to advanced operations like reshaping, slicing, and working with specific number sequences. By mastering NumPy, you’ll be better equipped to handle data-intensive tasks in fields like Data Science, Machine Learning, and Analytics.
Happy Coding!