NumPy data access and functions

CSV file access of data

CSV file
CSV (Comma-Separated Value) is a common file format used to store batch data.

np.savetxt(frame,array,fmt='%.18e',delimiter=None)

frame: file, string or generator, which can be a .gz or .bz2 compressed file.
array: array stored in the file.
fmt: The format for writing files, for example: %d %.2f %.18e.
delimiter: split the string, the default is any space.
Example: savetxt() save file

In [1]: import numpy as np

In [2]: a = np.arange(100).reshape(5,20)//0~99, divided into 5 lines, each line has 20 data

In [3]: np.savetxt('a.csv', a, fmt='%d', delimiter=',')

The “a.csv” file information is as follows:

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79
80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99
In [4]: np.savetxt('a1.csv', a, fmt='%.1f', delimiter=',')

The “a1.csv” file information is as follows:

0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0
20.0,21.0,22.0,23.0,24.0,25.0,26.0,27.0,28.0,29.0,30.0,31.0,32.0,33.0,34.0,35.0,36.0,37.0,38.0,39.0
40.0,41.0,42.0,43.0,44.0,45.0,46.0,47.0,48.0,49.0,50.0,51.0,52.0,53.0,54.0,55.0,56.0,57.0,58.0,59.0
60.0,61.0,62.0,63.0,64.0,65.0,66.0,67.0,68.0,69.0,70.0,71.0,72.0,73.0,74.0,75.0,76.0,77.0,78.0,79.0
80.0,81.0,82.0,83.0,84.0,85.0,86.0,87.0,88.0,89.0,90.0,91.0,92.0,93.0,94.0,95.0,96.0,97.0,98.0,99.0
np.loadtxt(frame, dtype=np.float, delimiter=None, unpack=False)

frame: file, string or generator, which can be a .gz or .bz2 compressed file.
dtype: data type, optional.
delimiter: split the string, the default is any space.
unpack: If True, the read attributes will be written to different variables.
Example: loadtxt() reads file

In [5]: b = np.loadtxt('a1.csv', delimiter=',')

In [6]: b
Out[6]:
array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.,
         11., 12., 13., 14., 15., 16., 17., 18., 19.],
       [ 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30.,
         31., 32., 33., 34., 35., 36., 37., 38., 39.],
       [ 40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50.,
         51., 52., 53., 54., 55., 56., 57., 58., 59.],
       [ 60., 61., 62., 63., 64., 65., 66., 67., 68., 69., 70.,
         71., 72., 73., 74., 75., 76., 77., 78., 79.],
       [80., 81., 82., 83., 84., 85., 86., 87., 88., 89., 90.,
         91., 92., 93., 94., 95., 96., 97., 98., 99.]])

In [7]: b = np.loadtxt('a1.csv', dtype=np.int, delimiter=',')

In [8]: b
Out[8]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
        37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,
        57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
        77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
        97, 98, 99]])

Limitations of CSV files

CSV can only effectively store one- and two-dimensional arrays. np.savetxt() and np.loadtxt() can only effectively access one-dimensional and two-dimensional arrays.

Access to multidimensional data

a.tofile(frame, sep='', format='%s')

frame: file, string.
sep: data split string, if it is an empty string, the file is written as binary.
format: The format for writing data.
Example: tofile() stores multidimensional data

In [9]: a = np.arange(100).reshape(5,10,2)

In [10]: a.tofile('b.dat', sep=',', format='%d')

The “b.dat” file information is as follows:

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 ,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48 ,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73 ,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98 ,99
In [11]: a.tofile('b1.dat', format='%d')

“b1.dat” file information (binary file) is as follows: “

 0000 0000 0100 0000 0200 0000 0300 0000
0400 0000 0500 0000 0600 0000 0700 0000
0800 0000 0900 0000 0a00 0000 0b00 0000
0c00 0000 0d00 0000 0e00 0000 0f00 0000
1000 0000 1100 0000 1200 0000 1300 0000
1400 0000 1500 0000 1600 0000 1700 0000
1800 0000 1900 0000 1a00 0000 1b00 0000
1c00 0000 1d00 0000 1e00 0000 1f00 0000
2000 0000 2100 0000 2200 0000 2300 0000
2400 0000 2500 0000 2600 0000 2700 0000
2800 0000 2900 0000 2a00 0000 2b00 0000
2c00 0000 2d00 0000 2e00 0000 2f00 0000
3000 0000 3100 0000 3200 0000 3300 0000
3400 0000 3500 0000 3600 0000 3700 0000
3800 0000 3900 0000 3a00 0000 3b00 0000
3c00 0000 3d00 0000 3e00 0000 3f00 0000
4000 0000 4100 0000 4200 0000 4300 0000
4400 0000 4500 0000 4600 0000 4700 0000
4800 0000 4900 0000 4a00 0000 4b00 0000
4c00 0000 4d00 0000 4e00 0000 4f00 0000
5000 0000 5100 0000 5200 0000 5300 0000
5400 0000 5500 0000 5600 0000 5700 0000
5800 0000 5900 0000 5a00 0000 5b00 0000
5c00 0000 5d00 0000 5e00 0000 5f00 0000
6000 0000 6100 0000 6200 0000 6300 0000
b1.dat
np.fromfile(frame, dtype=float, count=-1, sep='')

frame: file, string.
dtype: the data type to be read.
count: The number of elements to read, -1 means reading the entire file.
sep: data split string, if it is an empty string, the file is written as binary.
Example: fromfile() function reads multidimensional data

In [9]: c = np.fromfile('b.dat', dtype=np.int, sep=',')

In [10]: c
Out[10]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

In [11]: c = np.fromfile('b.dat', dtype=np.int, sep=',').reshape(5,10,2)

In [12]: c
Out[12]:


array([[[ 0, 1],
        [ twenty three],
        [4, 5],
        [6, 7],
        [8, 9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]],

       [[20, 21],
        [22, 23],
        [24, 25],
        [26, 27],
        [28, 29],
        [30, 31],
        [32, 33],
        [34, 35],
        [36, 37],
        [38, 39]],

       [[40, 41],
        [42, 43],
        [44, 45],
        [46, 47],
        [48, 49],
        [50, 51],
        [52, 53],
        [54, 55],
        [56, 57],
        [58, 59]],

       [[60, 61],
        [62, 63],
        [64, 65],
        [66, 67],
        [68, 69],
        [70, 71],
        [72, 73],
        [74, 75],
        [76, 77],
        [78, 79]],

       [[80, 81],
        [82, 83],
        [84, 85],
        [86, 87],
        [88, 89],
        [90, 91],
        [92, 93],
        [94, 95],
        [96, 97],
        [98, 99]]])

Out[12]:
 
In [13]: c = np.fromfile('b1.dat',dtype=np.int).reshape(5,10,2)
In [14]: cOut[14]:


array([[[ 0, 1],
        [ twenty three],
        [4, 5],
        [6, 7],
        [8, 9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]],

       [[20, 21],
        [22, 23],
        [24, 25],
        [26, 27],
        [28, 29],
        [30, 31],
        [32, 33],
        [34, 35],
        [36, 37],
        [38, 39]],

       [[40, 41],
        [42, 43],
        [44, 45],
        [46, 47],
        [48, 49],
        [50, 51],
        [52, 53],
        [54, 55],
        [56, 57],
        [58, 59]],

       [[60, 61],
        [62, 63],
        [64, 65],
        [66, 67],
        [68, 69],
        [70, 71],
        [72, 73],
        [74, 75],
        [76, 77],
        [78, 79]],

       [[80, 81],
        [82, 83],
        [84, 85],
        [86, 87],
        [88, 89],
        [90, 91],
        [92, 93],
        [94, 95],
        [96, 97],
        [98, 99]]])

Out[14]:
array([[[ 0, 1],
        [ twenty three],
        [4, 5],
        [6, 7],
        [8, 9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]],

       [[20, 21],
        [22, 23],
        [24, 25],
        [26, 27],
        [28, 29],
        [30, 31],
        [32, 33],
        [34, 35],
        [36, 37],
        [38, 39]],

       [[40, 41],
        [42, 43],
        [44, 45],
        [46, 47],
        [48, 49],
        [50, 51],
        [52, 53],
        [54, 55],
        [56, 57],
        [58, 59]],

       [[60, 61],
        [62, 63],
        [64, 65],
        [66, 67],
        [68, 69],
        [70, 71],
        [72, 73],
        [74, 75],
        [76, 77],
        [78, 79]],

       [[80, 81],
        [82, 83],
        [84, 85],
        [86, 87],
        [88, 89],
        [90, 91],
        [92, 93],
        [94, 95],
        [96, 97],
        [98, 99]]])

Note:
This method needs to know the dimensions and element types of the array when saving to the file when reading. a.tofile() and np.fromfile() need to be used together.
Additional information can be stored through metadata files. Array dimensions and element types can also be saved by file name (example: b1_int_5_10_2.dat)

Numpy’s convenient file access

np.save(fname,array) or np.savez(fname,array)

fname: file name, with .npy extension, and compression extension .npz
array: array variable

np.load(fname)

fname: file name, with .npy extension, and compression extension .npz
Example: Using save(), load()

In [15]: np.save('a.npy',a)

The “a.npy” file information is as follows:

 934e 554d 5059 0100 4600 7b27 6465 7363
7227 3a20 273c 6934 272c 2027 666f 7274
7261 6e5f 6f72 6465 7227 3a20 4661 6c73
652c 2027 7368 6170 6527 3a20 2835 2c20
3130 2c20 3229 2c20 7d20 2020 2020 200a
0000 0000 0100 0000 0200 0000 0300 0000
0400 0000 0500 0000 0600 0000 0700 0000
0800 0000 0900 0000 0a00 0000 0b00 0000
0c00 0000 0d00 0000 0e00 0000 0f00 0000
1000 0000 1100 0000 1200 0000 1300 0000
1400 0000 1500 0000 1600 0000 1700 0000
1800 0000 1900 0000 1a00 0000 1b00 0000
1c00 0000 1d00 0000 1e00 0000 1f00 0000
2000 0000 2100 0000 2200 0000 2300 0000
2400 0000 2500 0000 2600 0000 2700 0000
2800 0000 2900 0000 2a00 0000 2b00 0000
2c00 0000 2d00 0000 2e00 0000 2f00 0000
3000 0000 3100 0000 3200 0000 3300 0000
3400 0000 3500 0000 3600 0000 3700 0000
3800 0000 3900 0000 3a00 0000 3b00 0000
3c00 0000 3d00 0000 3e00 0000 3f00 0000
4000 0000 4100 0000 4200 0000 4300 0000
4400 0000 4500 0000 4600 0000 4700 0000
4800 0000 4900 0000 4a00 0000 4b00 0000
4c00 0000 4d00 0000 4e00 0000 4f00 0000
5000 0000 5100 0000 5200 0000 5300 0000
5400 0000 5500 0000 5600 0000 5700 0000
5800 0000 5900 0000 5a00 0000 5b00 0000
5c00 0000 5d00 0000 5e00 0000 5f00 0000
6000 0000 6100 0000 6200 0000 6300 0000
a.npy

By reading the binary file, it is found that the np.load() method not only stores data in the .npy file, but also adds additional information.

In [16]: b = np.load('a.npy')

In [17]: b
Out[17]:

array([[[ 0, 1],
        [ twenty three],
        [4, 5],
        [6, 7],
        [8, 9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]],

       [[20, 21],
        [22, 23],
        [24, 25],
        [26, 27],
        [28, 29],
        [30, 31],
        [32, 33],
        [34, 35],
        [36, 37],
        [38, 39]],

       [[40, 41],
        [42, 43],
        [44, 45],
        [46, 47],
        [48, 49],
        [50, 51],
        [52, 53],
        [54, 55],
        [56, 57],
        [58, 59]],

       [[60, 61],
        [62, 63],
        [64, 65],
        [66, 67],
        [68, 69],
        [70, 71],
        [72, 73],
        [74, 75],
        [76, 77],
        [78, 79]],

       [[80, 81],
        [82, 83],
        [84, 85],
        [86, 87],
        [88, 89],
        [90, 91],
        [92, 93],
        [94, 95],
        [96, 97],
        [98, 99]]])

Out[17]:

Numpy’s random number function

Numpy’s random sublibrary
Basic format: np.random.*
np.random.rand(), np.random.randn(), np.random.randint()
Random number function of np.random

Example: Function Test

In [18]: a = np.random.rand(3,4,5)//Three layers, four rows and five columns

In [19]: a
Out[19]:
array([[[ 0.97845512, 0.90466706, 0.92576248, 0.77775142, 0.84334893],
        [0.39599821, 0.31917683, 0.7961439, 0.01324569, 0.97660396],
        [0.5049603, 0.80952265, 0.67359257, 0.89334316, 0.94496225],
        [0.04840473, 0.04665257, 0.20956817, 0.62255095, 0.36600489]],

       [[ 0.58059326, 0.28464266, 0.23596248, 0.16677631, 0.86467069],
        [0.14691968, 0.60863245, 0.71725038, 0.69206766, 0.18301705],
        [0.73197901, 0.99051723, 0.10489076, 0.33979432, 0.0354286],
        [0.73696453, 0.48268632, 0.99294233, 0.06285961, 0.93090147]],

       [[ 0.07853777, 0.827061 , 0.66325364, 0.52289669, 0.96894828],
        [0.41912388, 0.01883408, 0.80978245, 0.93082898, 0.98095581],
        [0.58614214, 0.55996867, 0.37734444, 0.79280598, 0.03626233],
        [0.233132, 0.22514788, 0.32245147, 0.13739658, 0.18866422]]])

In [20]: sn = np.random.randn(3,4,5)

In [21]: sn
Out[21]:
array([[[-0.54821321, 0.35733947, 0.74102173, -1.26679716, -0.75072289],
        [0.13182283, 2.32578442, -0.52208189, 2.5041796, -0.96995644],
        [1.00171095, 0.97037733, 1.55386206, -0.94515087, 0.75707273],
        [-1.2481768, 0.53095038, 0.92527818, -0.17261088, -0.13667463]],

       [[ 2.18760173, -0.93813162, 0.19032109, -1.59605908, -0.96802666],
        [0.30649913, 1.32375007, 0.72547761, -1.59253182, -0.72385311],
        [-2.22923637, -1.05462649, 1.82672301, 0.47343961, -0.9786459],
        [-0.36857965, 0.59003624, 1.80140997, 1.00965744, 1.9037593 ]],

       [[ 0.36273071, -0.0447364 , 1.27120325, 0.21076423, -0.40820945],
        [-1.22315321, -1.94670543, 0.17959233, -1.1020581, 0.17423733],
        [-1.16368644, 0.00589158, 1.19701291, -0.4255035, -0.7508364],
        [-1.61788168, 0.50386607, 0.15993032, 0.36881486, -0.41457221]]])

In [22]: b = np.random.randint(100,200,(3,4))

In [23]: b
Out[23]:
array([[163, 171, 163, 168],
       [166, 127, 160, 109],
       [135, 111, 196, 190]])

In [24]: np.random.seed(10)

In [25]: np.random.randint(100,200,(3,4))//The data is randomly generated from 100 to 200, divided into 3 rows and 4 columns
Out[25]:
array([[109, 115, 164, 128],
       [189, 193, 129, 108],
       [173, 100, 140, 136]])

In [26]: np.random.seed(10)

In [27]: np.random.randint(100,200,(3,4))
Out[27]:
array([[109, 115, 164, 128],
       [189, 193, 129, 108],
       [173, 100, 140, 136]])

Random number function of np.random

Example: Function Test

In [28]: a = np.random.randint(100,200,(3,4))

In [29]: a
Out[29]:
array([[116, 111, 154, 188],
       [162, 133, 172, 178],
       [149, 151, 154, 177]])

In [30]: np.random.shuffle(a)//The first column is randomly arranged

In [31]: a
Out[31]:
array([[116, 111, 154, 188],
       [149, 151, 154, 177],
       [162, 133, 172, 178]])

In [32]: np.random.shuffle(a)//The first column is randomly arranged

In [33]: a
Out[33]:
array([[162, 133, 172, 178],
       [116, 111, 154, 188],
       [149, 151, 154, 177]])

In [34]: a = np.random.randint(100,200,(3,4))

In [35]: a
Out[35]:
array([[113, 192, 186, 130],
       [130, 189, 112, 165],
       [131, 157, 136, 127]])

In [36]: np.random.permutation(a)//The first column generates a random disordered array
Out[36]:
array([[113, 192, 186, 130],
       [130, 189, 112, 165],
       [131, 157, 136, 127]])

In [37]: a
Out[37]:
array([[113, 192, 186, 130],
       [130, 189, 112, 165],
       [131, 157, 136, 127]])

In [38]: b = np.random.randint(100,200,(8,))

In [39]: b
Out[39]: array([177, 122, 123, 194, 111, 128, 174, 188])

In [40]: np.random.choice(b,(3,2))//There can be duplicate data
Out[40]:
array([[122, 188],
       [123, 177],
       [174, 188]])

In [41]: np.random.choice(b,(3,2),replace=False)//No duplicate data
Out[41]:
array([[123, 111],
       [128, 188],
       [174, 122]])

In [42]: np.random.choice(b,(3,2),p= b/np.sum(b))//The probability P is the sum of the elements in b/b
Out[42]:
array([[174, 122],
       [188, 194],
       [174, 123]])

In [43]: u = np.random.uniform(0,10,(3,4))//The value is 0~10, 3 rows and 4 columns

In [44]: u
Out[44]:
array([[ 8.8393648 , 3.25511638, 1.65015898, 3.92529244],
       [0.93460375, 8.21105658, 1.5115202, 3.84114449],
       [9.44260712, 9.87625475, 4.56304547, 8.26122844]])

In [45]: n = np.random.normal(10,5,(3,4))

In [46]: n
Out[46]:
array([[ 12.8882903 , 2.6251256 , 10.39394227, 14.59206826],
       [7.5365132, 10.48231186, 6.73620032, 8.89118781],
       [4.65856717, 3.86153973, 1.00713488, 6.5739633 ]])

NumPy statistical functions

Statistical functions directly provided by Numpy
Basic format: np.*
For example: np.std(), np.var(), np.average()
Statistical functions of np.random

In [47]: a = np.arange(15).reshape(3,5)

In [48]: a
Out[48]:
array([[ 0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9],
       [10, 11, 12, 13, 14]])

In [49]: np.sum(a)
Out[49]: 105

**In [50]: np.mean(a,axis=1) # 2. = (0 + 5 + 10)/3
Out[50]: array([ 2., 7., 12.])
In [51]: np.mean(a,axis=0)
Out[51]: array([ 5., 6., 7., 8., 9.]) # 7. = (2 + 7 + 12)/3**

In [52]: np.average(a, axis=0, weights=[10,5,1]) # Weighted average: 4.1875 = (2*10 + 7*5 + 1*12)/(10 + 5 + 1)
Out[52]: array([ 2.1875, 3.1875, 4.1875, 5.1875, 6.1875])

In [53]: np.std(a)
Out[53]: 4.3204937989385739

In [54]: np.var(a)
Out[54]: 18.666666666666668

In [55]: b = np.arange(15,0,-1).reshape(3,5)

In [56]: b
Out[56]:
array([[15, 14, 13, 12, 11],
       [10, 9, 8, 7, 6],
       [5, 4, 3, 2, 1]])

In [57]: np.max(b)
Out[57]: 15

In [58]: np.argmax(b) # Flattened subscript
Out[58]: 0

In [59]: np.unravel_index(np.argmax(b), b.shape) # Reshape into multi-dimensional subscripts
Out[59]: (0, 0)

In [60]: np.ptp(b)
Out[60]: 14

In [61]: np.median(b)
Out[61]: 8.0

Numpy’s gradient function

Gradient function of np.random

In [62]: a = np.random.randint(0,20,(5))

In [63]: a
Out[63]: array([14, 16, 10, 17, 0])

In [64]: np.gradient(a) # There are values on both sides: -2. = (10-14)/2
Out[64]: array([ 2. , -2. , 0.5, -5. , -17. ])

In [65]: b = np.random.randint(0,20,(5))

In [66]: b
Out[66]: array([17, 9, 16, 9, 12])

In [67]: np.gradient(b) # Only one side of the value: -8. = (9-17)/1
Out[67]: array([-8. , -0.5, 0. , -2. , 3. ])

In [68]: c=np.random.randint(0,50,(3,5))

In [69]: c
Out[69]:
array([[13, 22, 23, 30, 11],
       [28, 10, 24, 9, 15],
       [18, 16, 7, 24, 11]])

In [70]: np.gradient(c)
Out[70]:
[array([[ 15. , -12. , 1. , -21. , 4. ],
        [ 2.5, -3. , -8. , -3. , 0. ],
        [-10. , 6. , -17. , 15. , -4. ]]),//Gradient of the outermost dimension
 array([[ 9. , 5. , 4. , -6. , -19. ],
        [-18. , -2. , -0.5, -4.5, 6. ],
        [ -2. , -5.5, 4. , 2. , -13. ]])]//Gradient of the second layer dimension

In [71]:

Unit summary