pandas uses —Series/DataFrame

Learn from

Pandas Common Functions | Newbie Tutorial Pandas Common Functions The following lists some commonly used functions and usage examples in Pandas: Read data function description pd.read_csv(filename) reads CSV files; pd.read_excel(filename) reads Excel files; pd .read_sql(query, connection_object) reads data from SQL database; pd.read_json(json_string) reads data from JSON string; pd.read_..icon-default.png? t=N7T8https://www.runoob.com/pandas/pandas-functions.html

Installation

pip3 install pandas -i https://pypi.douban.com/simple

Data structure

The difference between Series and DataFrame

DataFrame and Series are two important data structures in the Pandas library. They have the following differences:

DataFrame:

  • A DataFrame is a two-dimensional table consisting of rows and columns.
  • Each column can be of a different data type (integer, float, string, etc.).
  • A DataFrame can be thought of as a dictionary consisting of multiple Series.
  • Data can be accessed and manipulated through column names.
  • Often used to process structured, tabular data.

Series:

  • A Series is a one-dimensional labeled array, similar to an indexed list.
  • Each element has a unique label (index), starting from 0 by default.
  • The data type in a Series can be any type (integer, floating point, string, etc.).
  • Data can be accessed and manipulated through indexes.
  • Often used to process one-dimensional, columnar data.

Therefore, you can think of a DataFrame as a table, where each column is a Series, and each Series has the same index, and together they form a complete data set.

Series

Pandas Series is similar to a column in a table, similar to a one-dimensional array, and can save any data type.

Series consists of index and columns. The function is as follows:

pandas.Series(data, index, dtype, name, copy)

Parameter Description:

  • data: A set of data (ndarray type).

  • index: Data index label, if not specified, starts from 0 by default.

  • dtype: Data type, which will be determined by itself by default.

  • name: Set the name.

  • copy: Copy data, default is False.

Create a simple Series instance:

import pandas as pd

a = [1, 2, 3]

myvar = pd.Series(a)

print(myvar[1])

The output is as follows:


We can read data based on the index value:

As can be seen from the above figure, if no index is specified, the index value starts from 0.

import pandas as pd

a = [1, 2, 3]

myvar = pd.Series(a)

print(myvar[1])

The output is as follows:

2

We can specify the index value, as shown in the following example:

import pandas as pd

a = ["Google", "Runoob", "Wiki"]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)

The output is as follows:

Read data based on index value:

import pandas as pd

a = ["Google", "Runoob", "Wiki"]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar["y"])

The output is as follows:

Runoob

We can also use key/value objects, similar to dictionaries, to create Series:

import pandas as pd

sites = {1: "Google", 2: "Runoob", 3: "Wiki"}

myvar = pd.Series(sites)

print(myvar)

The output is as follows:

As you can see from the picture above, the key of the dictionary becomes the index value.

If we only need part of the data in the dictionary, we only need to specify the index of the required data, as shown in the following example:

import pandas as pd

sites = {1: "Google", 2: "Runoob", 3: "Wiki"}

myvar = pd.Series(sites, index = [1, 2])

print(myvar)

The output is as follows:

Set Series name parameter:

import pandas as pd

sites = {1: "Google", 2: "Runoob", 3: "Wiki"}

myvar = pd.Series(sites, index = [1, 2], name="RUNOOB-Series-TEST" )

print(myvar)

The output is as follows:

DataFrame

DataFrame is a tabular data structure that contains a set of ordered columns, each column can be a different value type (numeric, string, Boolean). A DataFrame has both row and column indexes, and can be viewed as a dictionary composed of Series (shared with a single index).

The DataFrame construction method is as follows:

pandas.DataFrame(data, index, columns, dtype, copy)

Parameter Description:

  • data: A set of data (ndarray, series, map, lists, dict, etc. types).

  • index: Index value, or row label.

  • columns: Column labels, default is RangeIndex (0, 1, 2, …, n).

  • dtype: Data type.

  • copy: Copy data, default is False.

Pandas DataFrame is a two-dimensional array structure, similar to a two-dimensional array.

Example – Create using list

import pandas as pd

data = [['Google',10],['Runoob',12],['Wiki',13]]

df = pd.DataFrame(data,columns=['Site','Age'],dtype=float)

print(df)

The output is as follows:

Example – created using ndarrays

The following example is created using ndarrays. The ndarrays must be of the same length. If index is passed, the length of the index should be equal to the length of the array. If no index is passed, by default the index will be range(n), where n is the array length. ndarrays can refer to: NumPy Ndarray object

import pandas as pd

data = {'Site':['Google', 'Runoob', 'Wiki'], 'Age':[10, 12, 13]}

df = pd.DataFrame(data)

print(df)

The output is as follows:

Example – Create using dictionary

import pandas as pd

data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

df = pd.DataFrame(data)

print(df)

The output is:

Partial data without corresponding data is NaN.

Pandas can use the loc attribute to return the data of the specified row,

If no index is set, the index of the first row is 0, the index of the second row is 1, and so on:

import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

#Load data into DataFrame object
df = pd.DataFrame(data)

# Return to the first line
print(df.loc[0])
# Return to the second line
print(df.loc[1])

The output is as follows:

Note: The returned result is actually a Pandas Series data.

You can also return multiple rows of data, using the [[ … ]] format, where … is the index of each row, separated by commas:

Return the first and second rows

import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

#Load data into DataFrame object
df = pd.DataFrame(data)

# Return the first and second rows
print(df.loc[[0, 1]])

The output is:

Note: The returned result is actually a Pandas DataFrame data.

We can specify the index value, as in the following example:

import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df)

The output is:

Pandas can use the loc attribute to return the specified index corresponding to a certain row:

import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

#Specify index
print(df.loc["day2"])

The output result is:

The knowledge points of the article match the official knowledge archives, and you can further learn relevant knowledge. Python introductory skill treeStructured data analysis tool PandasPandas overview 361923 people are learning the system