CSV file access of data
CSV file
CSV (Comma-Separated Value) is a common file format used to store batch data.
np.savetxt(frame,array,fmt='%.18e',delimiter=None)
frame: file, string or generator, which can be a .gz or .bz2 compressed file.
array: array stored in the file.
fmt: The format for writing files, for example: %d %.2f %.18e.
delimiter: split the string, the default is any space.
Example: savetxt() save file
In [1]: import numpy as np In [2]: a = np.arange(100).reshape(5,20)//0~99, divided into 5 lines, each line has 20 data In [3]: np.savetxt('a.csv', a, fmt='%d', delimiter=',')
The “a.csv” file information is as follows:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59 60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79 80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99
In [4]: np.savetxt('a1.csv', a, fmt='%.1f', delimiter=',')
The “a1.csv” file information is as follows:
0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0 20.0,21.0,22.0,23.0,24.0,25.0,26.0,27.0,28.0,29.0,30.0,31.0,32.0,33.0,34.0,35.0,36.0,37.0,38.0,39.0 40.0,41.0,42.0,43.0,44.0,45.0,46.0,47.0,48.0,49.0,50.0,51.0,52.0,53.0,54.0,55.0,56.0,57.0,58.0,59.0 60.0,61.0,62.0,63.0,64.0,65.0,66.0,67.0,68.0,69.0,70.0,71.0,72.0,73.0,74.0,75.0,76.0,77.0,78.0,79.0 80.0,81.0,82.0,83.0,84.0,85.0,86.0,87.0,88.0,89.0,90.0,91.0,92.0,93.0,94.0,95.0,96.0,97.0,98.0,99.0
np.loadtxt(frame, dtype=np.float, delimiter=None, unpack=False)
frame: file, string or generator, which can be a .gz or .bz2 compressed file.
dtype: data type, optional.
delimiter: split the string, the default is any space.
unpack: If True, the read attributes will be written to different variables.
Example: loadtxt() reads file
In [5]: b = np.loadtxt('a1.csv', delimiter=',') In [6]: b Out[6]: array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.], [ 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39.], [ 40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55., 56., 57., 58., 59.], [ 60., 61., 62., 63., 64., 65., 66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77., 78., 79.], [80., 81., 82., 83., 84., 85., 86., 87., 88., 89., 90., 91., 92., 93., 94., 95., 96., 97., 98., 99.]]) In [7]: b = np.loadtxt('a1.csv', dtype=np.int, delimiter=',') In [8]: b Out[8]: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
Limitations of CSV files
CSV can only effectively store one- and two-dimensional arrays. np.savetxt() and np.loadtxt() can only effectively access one-dimensional and two-dimensional arrays.
Access to multidimensional data
a.tofile(frame, sep='', format='%s')
frame: file, string.
sep: data split string, if it is an empty string, the file is written as binary.
format: The format for writing data.
Example: tofile() stores multidimensional data
In [9]: a = np.arange(100).reshape(5,10,2) In [10]: a.tofile('b.dat', sep=',', format='%d')
The “b.dat” file information is as follows:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 ,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48 ,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73 ,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98 ,99
In [11]: a.tofile('b1.dat', format='%d')
“b1.dat” file information (binary file) is as follows: “
0000 0000 0100 0000 0200 0000 0300 0000 0400 0000 0500 0000 0600 0000 0700 0000 0800 0000 0900 0000 0a00 0000 0b00 0000 0c00 0000 0d00 0000 0e00 0000 0f00 0000 1000 0000 1100 0000 1200 0000 1300 0000 1400 0000 1500 0000 1600 0000 1700 0000 1800 0000 1900 0000 1a00 0000 1b00 0000 1c00 0000 1d00 0000 1e00 0000 1f00 0000 2000 0000 2100 0000 2200 0000 2300 0000 2400 0000 2500 0000 2600 0000 2700 0000 2800 0000 2900 0000 2a00 0000 2b00 0000 2c00 0000 2d00 0000 2e00 0000 2f00 0000 3000 0000 3100 0000 3200 0000 3300 0000 3400 0000 3500 0000 3600 0000 3700 0000 3800 0000 3900 0000 3a00 0000 3b00 0000 3c00 0000 3d00 0000 3e00 0000 3f00 0000 4000 0000 4100 0000 4200 0000 4300 0000 4400 0000 4500 0000 4600 0000 4700 0000 4800 0000 4900 0000 4a00 0000 4b00 0000 4c00 0000 4d00 0000 4e00 0000 4f00 0000 5000 0000 5100 0000 5200 0000 5300 0000 5400 0000 5500 0000 5600 0000 5700 0000 5800 0000 5900 0000 5a00 0000 5b00 0000 5c00 0000 5d00 0000 5e00 0000 5f00 0000 6000 0000 6100 0000 6200 0000 6300 0000 b1.dat
np.fromfile(frame, dtype=float, count=-1, sep='')
frame: file, string.
dtype: the data type to be read.
count: The number of elements to read, -1 means reading the entire file.
sep: data split string, if it is an empty string, the file is written as binary.
Example: fromfile() function reads multidimensional data
In [9]: c = np.fromfile('b.dat', dtype=np.int, sep=',') In [10]: c Out[10]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]) In [11]: c = np.fromfile('b.dat', dtype=np.int, sep=',').reshape(5,10,2) In [12]: c Out[12]: array([[[ 0, 1], [ twenty three], [4, 5], [6, 7], [8, 9], [10, 11], [12, 13], [14, 15], [16, 17], [18, 19]], [[20, 21], [22, 23], [24, 25], [26, 27], [28, 29], [30, 31], [32, 33], [34, 35], [36, 37], [38, 39]], [[40, 41], [42, 43], [44, 45], [46, 47], [48, 49], [50, 51], [52, 53], [54, 55], [56, 57], [58, 59]], [[60, 61], [62, 63], [64, 65], [66, 67], [68, 69], [70, 71], [72, 73], [74, 75], [76, 77], [78, 79]], [[80, 81], [82, 83], [84, 85], [86, 87], [88, 89], [90, 91], [92, 93], [94, 95], [96, 97], [98, 99]]]) Out[12]: In [13]: c = np.fromfile('b1.dat',dtype=np.int).reshape(5,10,2) In [14]: cOut[14]: array([[[ 0, 1], [ twenty three], [4, 5], [6, 7], [8, 9], [10, 11], [12, 13], [14, 15], [16, 17], [18, 19]], [[20, 21], [22, 23], [24, 25], [26, 27], [28, 29], [30, 31], [32, 33], [34, 35], [36, 37], [38, 39]], [[40, 41], [42, 43], [44, 45], [46, 47], [48, 49], [50, 51], [52, 53], [54, 55], [56, 57], [58, 59]], [[60, 61], [62, 63], [64, 65], [66, 67], [68, 69], [70, 71], [72, 73], [74, 75], [76, 77], [78, 79]], [[80, 81], [82, 83], [84, 85], [86, 87], [88, 89], [90, 91], [92, 93], [94, 95], [96, 97], [98, 99]]]) Out[14]: array([[[ 0, 1], [ twenty three], [4, 5], [6, 7], [8, 9], [10, 11], [12, 13], [14, 15], [16, 17], [18, 19]], [[20, 21], [22, 23], [24, 25], [26, 27], [28, 29], [30, 31], [32, 33], [34, 35], [36, 37], [38, 39]], [[40, 41], [42, 43], [44, 45], [46, 47], [48, 49], [50, 51], [52, 53], [54, 55], [56, 57], [58, 59]], [[60, 61], [62, 63], [64, 65], [66, 67], [68, 69], [70, 71], [72, 73], [74, 75], [76, 77], [78, 79]], [[80, 81], [82, 83], [84, 85], [86, 87], [88, 89], [90, 91], [92, 93], [94, 95], [96, 97], [98, 99]]])
Note:
This method needs to know the dimensions and element types of the array when saving to the file when reading. a.tofile() and np.fromfile() need to be used together.
Additional information can be stored through metadata files. Array dimensions and element types can also be saved by file name (example: b1_int_5_10_2.dat)
Numpy’s convenient file access
np.save(fname,array) or np.savez(fname,array)
fname: file name, with .npy extension, and compression extension .npz
array: array variable
np.load(fname)
fname: file name, with .npy extension, and compression extension .npz
Example: Using save(), load()
In [15]: np.save('a.npy',a)
The “a.npy” file information is as follows:
934e 554d 5059 0100 4600 7b27 6465 7363 7227 3a20 273c 6934 272c 2027 666f 7274 7261 6e5f 6f72 6465 7227 3a20 4661 6c73 652c 2027 7368 6170 6527 3a20 2835 2c20 3130 2c20 3229 2c20 7d20 2020 2020 200a 0000 0000 0100 0000 0200 0000 0300 0000 0400 0000 0500 0000 0600 0000 0700 0000 0800 0000 0900 0000 0a00 0000 0b00 0000 0c00 0000 0d00 0000 0e00 0000 0f00 0000 1000 0000 1100 0000 1200 0000 1300 0000 1400 0000 1500 0000 1600 0000 1700 0000 1800 0000 1900 0000 1a00 0000 1b00 0000 1c00 0000 1d00 0000 1e00 0000 1f00 0000 2000 0000 2100 0000 2200 0000 2300 0000 2400 0000 2500 0000 2600 0000 2700 0000 2800 0000 2900 0000 2a00 0000 2b00 0000 2c00 0000 2d00 0000 2e00 0000 2f00 0000 3000 0000 3100 0000 3200 0000 3300 0000 3400 0000 3500 0000 3600 0000 3700 0000 3800 0000 3900 0000 3a00 0000 3b00 0000 3c00 0000 3d00 0000 3e00 0000 3f00 0000 4000 0000 4100 0000 4200 0000 4300 0000 4400 0000 4500 0000 4600 0000 4700 0000 4800 0000 4900 0000 4a00 0000 4b00 0000 4c00 0000 4d00 0000 4e00 0000 4f00 0000 5000 0000 5100 0000 5200 0000 5300 0000 5400 0000 5500 0000 5600 0000 5700 0000 5800 0000 5900 0000 5a00 0000 5b00 0000 5c00 0000 5d00 0000 5e00 0000 5f00 0000 6000 0000 6100 0000 6200 0000 6300 0000 a.npy
By reading the binary file, it is found that the np.load() method not only stores data in the .npy file, but also adds additional information.
In [16]: b = np.load('a.npy') In [17]: b Out[17]: array([[[ 0, 1], [ twenty three], [4, 5], [6, 7], [8, 9], [10, 11], [12, 13], [14, 15], [16, 17], [18, 19]], [[20, 21], [22, 23], [24, 25], [26, 27], [28, 29], [30, 31], [32, 33], [34, 35], [36, 37], [38, 39]], [[40, 41], [42, 43], [44, 45], [46, 47], [48, 49], [50, 51], [52, 53], [54, 55], [56, 57], [58, 59]], [[60, 61], [62, 63], [64, 65], [66, 67], [68, 69], [70, 71], [72, 73], [74, 75], [76, 77], [78, 79]], [[80, 81], [82, 83], [84, 85], [86, 87], [88, 89], [90, 91], [92, 93], [94, 95], [96, 97], [98, 99]]]) Out[17]:
Numpy’s random number function
Numpy’s random sublibrary
Basic format: np.random.*
np.random.rand(), np.random.randn(), np.random.randint()
Random number function of np.random
Example: Function Test
In [18]: a = np.random.rand(3,4,5)//Three layers, four rows and five columns In [19]: a Out[19]: array([[[ 0.97845512, 0.90466706, 0.92576248, 0.77775142, 0.84334893], [0.39599821, 0.31917683, 0.7961439, 0.01324569, 0.97660396], [0.5049603, 0.80952265, 0.67359257, 0.89334316, 0.94496225], [0.04840473, 0.04665257, 0.20956817, 0.62255095, 0.36600489]], [[ 0.58059326, 0.28464266, 0.23596248, 0.16677631, 0.86467069], [0.14691968, 0.60863245, 0.71725038, 0.69206766, 0.18301705], [0.73197901, 0.99051723, 0.10489076, 0.33979432, 0.0354286], [0.73696453, 0.48268632, 0.99294233, 0.06285961, 0.93090147]], [[ 0.07853777, 0.827061 , 0.66325364, 0.52289669, 0.96894828], [0.41912388, 0.01883408, 0.80978245, 0.93082898, 0.98095581], [0.58614214, 0.55996867, 0.37734444, 0.79280598, 0.03626233], [0.233132, 0.22514788, 0.32245147, 0.13739658, 0.18866422]]]) In [20]: sn = np.random.randn(3,4,5) In [21]: sn Out[21]: array([[[-0.54821321, 0.35733947, 0.74102173, -1.26679716, -0.75072289], [0.13182283, 2.32578442, -0.52208189, 2.5041796, -0.96995644], [1.00171095, 0.97037733, 1.55386206, -0.94515087, 0.75707273], [-1.2481768, 0.53095038, 0.92527818, -0.17261088, -0.13667463]], [[ 2.18760173, -0.93813162, 0.19032109, -1.59605908, -0.96802666], [0.30649913, 1.32375007, 0.72547761, -1.59253182, -0.72385311], [-2.22923637, -1.05462649, 1.82672301, 0.47343961, -0.9786459], [-0.36857965, 0.59003624, 1.80140997, 1.00965744, 1.9037593 ]], [[ 0.36273071, -0.0447364 , 1.27120325, 0.21076423, -0.40820945], [-1.22315321, -1.94670543, 0.17959233, -1.1020581, 0.17423733], [-1.16368644, 0.00589158, 1.19701291, -0.4255035, -0.7508364], [-1.61788168, 0.50386607, 0.15993032, 0.36881486, -0.41457221]]]) In [22]: b = np.random.randint(100,200,(3,4)) In [23]: b Out[23]: array([[163, 171, 163, 168], [166, 127, 160, 109], [135, 111, 196, 190]]) In [24]: np.random.seed(10) In [25]: np.random.randint(100,200,(3,4))//The data is randomly generated from 100 to 200, divided into 3 rows and 4 columns Out[25]: array([[109, 115, 164, 128], [189, 193, 129, 108], [173, 100, 140, 136]]) In [26]: np.random.seed(10) In [27]: np.random.randint(100,200,(3,4)) Out[27]: array([[109, 115, 164, 128], [189, 193, 129, 108], [173, 100, 140, 136]])
Random number function of np.random
Example: Function Test
In [28]: a = np.random.randint(100,200,(3,4)) In [29]: a Out[29]: array([[116, 111, 154, 188], [162, 133, 172, 178], [149, 151, 154, 177]]) In [30]: np.random.shuffle(a)//The first column is randomly arranged In [31]: a Out[31]: array([[116, 111, 154, 188], [149, 151, 154, 177], [162, 133, 172, 178]]) In [32]: np.random.shuffle(a)//The first column is randomly arranged In [33]: a Out[33]: array([[162, 133, 172, 178], [116, 111, 154, 188], [149, 151, 154, 177]]) In [34]: a = np.random.randint(100,200,(3,4)) In [35]: a Out[35]: array([[113, 192, 186, 130], [130, 189, 112, 165], [131, 157, 136, 127]]) In [36]: np.random.permutation(a)//The first column generates a random disordered array Out[36]: array([[113, 192, 186, 130], [130, 189, 112, 165], [131, 157, 136, 127]]) In [37]: a Out[37]: array([[113, 192, 186, 130], [130, 189, 112, 165], [131, 157, 136, 127]]) In [38]: b = np.random.randint(100,200,(8,)) In [39]: b Out[39]: array([177, 122, 123, 194, 111, 128, 174, 188]) In [40]: np.random.choice(b,(3,2))//There can be duplicate data Out[40]: array([[122, 188], [123, 177], [174, 188]]) In [41]: np.random.choice(b,(3,2),replace=False)//No duplicate data Out[41]: array([[123, 111], [128, 188], [174, 122]]) In [42]: np.random.choice(b,(3,2),p= b/np.sum(b))//The probability P is the sum of the elements in b/b Out[42]: array([[174, 122], [188, 194], [174, 123]])
In [43]: u = np.random.uniform(0,10,(3,4))//The value is 0~10, 3 rows and 4 columns In [44]: u Out[44]: array([[ 8.8393648 , 3.25511638, 1.65015898, 3.92529244], [0.93460375, 8.21105658, 1.5115202, 3.84114449], [9.44260712, 9.87625475, 4.56304547, 8.26122844]]) In [45]: n = np.random.normal(10,5,(3,4)) In [46]: n Out[46]: array([[ 12.8882903 , 2.6251256 , 10.39394227, 14.59206826], [7.5365132, 10.48231186, 6.73620032, 8.89118781], [4.65856717, 3.86153973, 1.00713488, 6.5739633 ]])
NumPy statistical functions
Statistical functions directly provided by Numpy
Basic format: np.*
For example: np.std(), np.var(), np.average()
Statistical functions of np.random
In [47]: a = np.arange(15).reshape(3,5) In [48]: a Out[48]: array([[ 0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) In [49]: np.sum(a) Out[49]: 105 **In [50]: np.mean(a,axis=1) # 2. = (0 + 5 + 10)/3 Out[50]: array([ 2., 7., 12.]) In [51]: np.mean(a,axis=0) Out[51]: array([ 5., 6., 7., 8., 9.]) # 7. = (2 + 7 + 12)/3** In [52]: np.average(a, axis=0, weights=[10,5,1]) # Weighted average: 4.1875 = (2*10 + 7*5 + 1*12)/(10 + 5 + 1) Out[52]: array([ 2.1875, 3.1875, 4.1875, 5.1875, 6.1875]) In [53]: np.std(a) Out[53]: 4.3204937989385739 In [54]: np.var(a) Out[54]: 18.666666666666668
In [55]: b = np.arange(15,0,-1).reshape(3,5) In [56]: b Out[56]: array([[15, 14, 13, 12, 11], [10, 9, 8, 7, 6], [5, 4, 3, 2, 1]]) In [57]: np.max(b) Out[57]: 15 In [58]: np.argmax(b) # Flattened subscript Out[58]: 0 In [59]: np.unravel_index(np.argmax(b), b.shape) # Reshape into multi-dimensional subscripts Out[59]: (0, 0) In [60]: np.ptp(b) Out[60]: 14 In [61]: np.median(b) Out[61]: 8.0
Numpy’s gradient function
Gradient function of np.random
In [62]: a = np.random.randint(0,20,(5)) In [63]: a Out[63]: array([14, 16, 10, 17, 0]) In [64]: np.gradient(a) # There are values on both sides: -2. = (10-14)/2 Out[64]: array([ 2. , -2. , 0.5, -5. , -17. ]) In [65]: b = np.random.randint(0,20,(5)) In [66]: b Out[66]: array([17, 9, 16, 9, 12]) In [67]: np.gradient(b) # Only one side of the value: -8. = (9-17)/1 Out[67]: array([-8. , -0.5, 0. , -2. , 3. ]) In [68]: c=np.random.randint(0,50,(3,5)) In [69]: c Out[69]: array([[13, 22, 23, 30, 11], [28, 10, 24, 9, 15], [18, 16, 7, 24, 11]]) In [70]: np.gradient(c) Out[70]: [array([[ 15. , -12. , 1. , -21. , 4. ], [ 2.5, -3. , -8. , -3. , 0. ], [-10. , 6. , -17. , 15. , -4. ]]),//Gradient of the outermost dimension array([[ 9. , 5. , 4. , -6. , -19. ], [-18. , -2. , -0.5, -4.5, 6. ], [ -2. , -5.5, 4. , 2. , -13. ]])]//Gradient of the second layer dimension In [71]:
Unit summary