The difference and basic use of reset_index, reindex and reindex_like in Pandas

Reset_index() usage details

reset_index() is a method in pandas to reset the index to a natural number, without changing the content and order of the original data.

DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill=”):

level: If the row index is a multiple index, level is used to set which levels of indexes are reset. The index of the specified target level is int, str, tuple, list, etc., and the default is None.

drop: After resetting the index, whether to delete the original row index, the default is False, do not delete the row index, and save it as a column in df.

inplace: Whether to modify the data itself, the default is False, do not modify the data itself, but create a new df object.

col_level: If the column index is a multiple index, set the level to which the index of a new column of data is added. int or str, default 0.

col_fill: If the column index is a multiple index, set the column index name of other levels of a new column of data. object, the default null character.

The reset_index() of Series does not have the last two parameters col_level and col_fill, but has a name parameter with similar functions.

# coding=utf-8
import pandas as pd

df = pd.DataFrame({'Col-1': [1, 3, 5], 'Col-2': [5, 7, 9]}, index=['A', \ 'B', 'C'])
print(df)
df1 = df.reset_index()
print(df1)
df.reset_index(drop=True, inplace=True)
print(df)

Output:

 Col-1 Col-2
A 1 5
B 3 7
C 5 9
  index Col-1 Col-2
0 A 1 5
1 B 3 7
2 C 5 9
   Col-1 Col-2
0 1 5
1 3 7
2 5 9

Detailed explanation of reindex() usage

reindex() is the basic method to achieve data alignment in pandas. Alignment refers to matching data with a given set of labels (row and column indexes) along a specified axis.

DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=None, level=None, fill_value=nan, limit=None, tolerance=None):

labels: The new row index/column label sequence, whether to set the row or set the column is consistent with the axis specified by the axis parameter.

axis: Specifies the axis to re-align. Can be axis name (“index”,”columns”) or number (0,1), defaults to row index.

index: Sets the row index after realignment.

columns: set the column index after realignment.

fill_value: After re-aligning the data, specify the scalar used to fill the missing values, and fill the empty value NaN by default.

method: After re-aligning the data, it is used to set the filling method of null values. The optional filling methods are: {None, “backfill”/”bfill”, “pad”/”ffill”} , “backfill”/”bfill” means filling with the value corresponding to the previous index, and “pad”/”ffill” means filling with the value corresponding to the next index. Note: The previous and next here are not the order in which the data is written, but are sorted according to the ASCII encoding order of the index characters.

limit: Sets the maximum number of consecutive elements to fill forward or backward.

copy: returns a new object even if the passed index is the same.

level: broadcast across levels, if the index is a multi-index, specify the level of the index.

tolerance: The maximum distance between the original label and the new label that does not match exactly, the value of the index at the matching position best meets the equation abs(index[indexer] – target) <= tolerance.

reindex() can reset index, add index or delete index for Series and DataFrame. The specific way is as follows:

  • Match existing data to a new set of labels, and reorder.

  • Insert a null value NaN at a location with no data but a label, and also support filling data logically.

For ease of understanding, see the following example in detail.

df = pd.DataFrame({'Col-1': [1, 3, 5], 'Col-2': [5, 7, 9]}, index=['A\ ', 'B', 'C'])
print(df)
# Default passed to the labels parameter
df2 = df.reindex(['C', 'B', 'A'])
print(df2)

Output:

 Col-1 Col-2
A 1 5
B 3 7
C 5 9
   Col-1 Col-2
C 5 9
B 3 7
A 1 5

The most basic usage is to pass the reordered index to the labels parameter. In the above example, [‘A’, ‘B’, ‘C’] will be re-pressed [‘C’, ‘ B’, ‘A’] for alignment.

# Specify whether to reset the row index or the column index, the default is the row index
df3 = df.reindex(['Col-2', 'c3', 'Col-1'], axis='columns')
print(df3)
# adjust both row and column
df4 = df.reindex(index=['C', 'B', 'A'], columns=['Col-2', 'c3', 'Col-1 '])
print(df4)

Output:

 Col-2 c3 Col-1
A 5 NaN 1
B 7 NaN 3
C 9 NaN 5
   Col-2 c3 Col-1
C 9.0 NaN 5.0
B 7.0 NaN 3.0
A 5.0 NaN 1.0

Whether it is a row or a column, DataFrame can be expanded or cut according to the label list passed in reindex(). When expanding, a null value is added at the expanded position by default. When cutting, it is equivalent to the slice operation of DataFrame.

# The position of no value is filled with empty value by default
df5 = df.reindex(['B', 'C', 'D'])
print(df5)
# Assign values to positions without values
df6 = df.reindex(['B', 'C', 'D'], fill_value=100)
print(df6)

Output:

 Col-1 Col-2
B 3.0 7.0
C 5.0 9.0
D NaN NaN
   Col-1 Col-2
B 3 7
C 5 9
D 100 100

Use the fill_value parameter to fill the expanded position with the specified value.

# Fill forward: fill with the previous value
df7 = df.reindex(['1', '2', 'A', 'B', 'C', 'D', 'E'], method='ffill')
print(df7, '\\
', end='*'*30 + '\\
')
# Fill backwards: Fill with the next value that has a value
df8 = df.reindex(['1', '2', 'A', 'B', 'C', 'D', 'E'], method='bfill')
print(df8)

Output:

 Col-1 Col-2
1 NaN NaN
2 NaN NaN
A 1.0 5.0
B 3.0 7.0
C 5.0 9.0
D 5.0 9.0
E 5.0 9.0
*******************************
   Col-1 Col-2
1 1.0 5.0
2 1.0 5.0
A 1.0 5.0
B 3.0 7.0
C 5.0 9.0
D NaN NaN
E NaN NaN

Use the method parameter to set whether to fill forward or backward.

Detailed usage of reindex_like()

reindex_like() is to update the index of the current DataFrame with the index of another DataFrame. If it is an index value that does not exist in the original data, the empty value NaN will be filled by default. You can also use the method parameter to set forward filling or backward filling. Note: there is no fill_value parameter in reindex_like(), and filling with specified values is not supported.

DataFrame.reindex_like(other, method=None, copy=True, limit=None, tolerance=None):

other: Specify another DataFrame, and use the index of the DataFrame specified by other to update the index of the current DataFrame.

method, copy, limit, tolerance are the same as reindex().

dfa = pd.DataFrame({'Col-1': [1, 3, 5], 'Col-2': [5, 7, 9]}, index=['A\ ', 'B', 'C'])
print(dfa)
dfb = pd.DataFrame({'Col-1': [1, 3, 5, 7, 9], 'Col-2': [2, 4, 6, 8, 10]},
                   index=['A', 'B', 'C', 'D', 'E'])
print(dfb, '\\
', end='*'*30 + '\\
')
dfc = dfa.reindex_like(dfb)
print(dfc)

Output:

 Col-1 Col-2
A 1 5
B 3 7
C 5 9
   Col-1 Col-2
A 1 2
B 3 4
C 5 6
D 7 8
E 9 10
*******************************
   Col-1 Col-2
A 1.0 5.0
B 3.0 7.0
C 5.0 9.0
D NaN NaN
E NaN NaN

The above is the introduction to the usage of reset_index(), reindex() and reindex_like() in pandas. This article can help you distinguish their usage differences.