python data analysis: feature engineering skills and methods in data analysis

Feature engineering is a very important step in data analysis that involves processing and transforming raw data to create more informative and predictive features.

Picture

1. E-commerce business scenario:

 - Feature: Order date</code><code> - Tip: Extract year, quarter, month, day of the week and other information from the order date</code><code> - Code example: </code><code> import pandas as pd</code>
<code> # Read order data</code><code> orders = pd.read_csv('orders.csv')</code><code> # Extract year</code><code> orders[' Year'] = pd.to_datetime(orders['OrderDate']).dt.year</code>
<code> # Extract quarter</code><code> orders['Quarter'] = pd.to_datetime(orders['OrderDate']).dt.quarter</code><code> # Extract month</code><code> orders['Month'] = pd.to_datetime(orders['OrderDate']).dt.month</code><code> # Extract the day of the week</code><code> orders['Weekday'] = pd.to_datetime(orders['OrderDate']).dt.weekday

2. Market research business scenario: ?

 - Feature: Text data</code><code> - Tip: Use text feature extraction methods (such as bag-of-words model, TF-IDF, etc.) to convert text data into numerical features</code><code> - Code Example: </code><code> from sklearn.feature_extraction.text import CountVectorizer</code>

<code> # Read survey data</code><code> survey_data = pd.read_csv('survey_data.csv')</code>

<code> # Extract text features</code><code> text_data = survey_data['Response']</code><code> vectorizer = CountVectorizer()</code><code> text_features = vectorizer.fit_transform(text_data )

3. Healthcare business scenario: ?

 - Feature: Patient age</code><code> - Tip: Calculate patient's age based on date of birth</code><code> - Code example: </code><code> import datetime</code>
<code> # Read patient data</code><code> patient_data = pd.read_csv('patient_data.csv')</code><code> # Calculate age</code><code> current_year = datetime. datetime.now().year</code><code> patient_data['Age'] = current_year - pd.to_datetime(patient_data['BirthDate']).dt.year

4. Financial business scenario: ?

 - Features: time series data</code><code> - Tips: Extracting lagged features of time series data (such as the previous day's closing price, average trading volume in the past week, etc.)</code><code> - Code example: </code><code> # Read stock data</code><code> stock_data = pd.read_csv('stock_data.csv')</code><code> # Extract lag features</code><code> stock_data['PreviousClose'] = stock_data['Close'].shift(1)</code><code> stock_data['AverageVolume'] = stock_data['Volume'] .rolling(window=7).mean()

5. Social media business scenario: ?

 - Feature: User registration time</code><code> - Tip: Calculate the difference between user registration time and current time, indicating the user's usage time</code><code> - Code example:</code> <code> # Read user data</code><code> user_data = pd.read_csv('user_data.csv')</code><code> # Calculate usage time</code><code> user_data[\ 'RegistrationDate'] = pd.to_datetime(user_data['RegistrationDate'])</code><code> user_data['UsageDuration'] = (pd.Timestamp.now() - user_data['RegistrationDate\ ']).dt.days

6. Human resources business scenario: ?

 - Feature: Employee joining date</code><code> - Tip: Extract month and quarter information from joining date</code><code> - Code example: </code><code> # Read employees Data</code><code> employee_data = pd.read_csv('employee_data.csv')</code><code> # Extract month and quarter</code><code> employee_data['Month'] = pd.to_datetime(employee_data['HireDate']).dt.month</code><code> employee_data['Quarter'] = pd.to_datetime(employee_data['HireDate']).dt.quarter 

7. Education business scenario: ?

 - Feature: student test scores</code><code> - Technique: Calculate the mean and standard deviation of students' scores in each subject as new features</code><code> - Code example:</code> <code> # Read student performance data</code><code> exam_scores = pd.read_csv('exam_scores.csv')</code><code> # Calculate the average score and standard deviation</code><code> exam_scores['AverageScore'] = exam_scores.mean(axis=1)</code><code> exam_scores['ScoreStd'] = exam_scores.std(axis=1)

8. Hotel business scenario: ?

 - Features: reservation date and check-in date</code><code> - Tip: Calculate the number of days in advance to indicate how far in advance the user makes the reservation</code><code> - Code example:</code><code> # Read booking data</code><code> booking_data = pd.read_csv('booking_data.csv')</code><code> # Calculate the number of days in advance for booking</code><code> booking_data['BookingDate '] = pd.to_datetime(booking_data['BookingDate'])</code><code> booking_data['CheckInDate'] = pd.to_datetime(booking_data['CheckInDate'])</code> <code> booking_data['DaysInAdvance'] = (booking_data['CheckInDate'] - booking_data['BookingDate']).dt.days

9. Marketing business scenario: ?

 - Features: Ad clicks and impressions</code><code> - Technique: Calculate ad click-through rate, indicating the proportion of ads being clicked</code><code> - Code example:</code><code> # Read advertising data</code><code> ad_data = pd.read_csv('ad_data.csv')</code><code> # Calculate click-through rate</code><code> ad_data['ClickThroughRate '] = ad_data['Clicks'] / ad_data['Impressions']

10. Logistics business scenario: ?

 - Features: Cargo weight and volume</code><code> - Skill: Calculate cargo density, indicating the ratio of cargo weight and volume</code><code> - Code example:</code><code> # Read cargo data</code><code> shipment_data = pd.read_csv('shipment_data.csv')</code><code> # Calculate cargo density</code><code> shipment_data['Density\ '] = shipment_data['Weight'] / shipment_data['Volume']

The above are some feature engineering techniques and code examples in actual business scenarios. The goal of feature engineering is to extract useful features from raw data to help build more accurate and effective data models. Based on specific business needs and data characteristics, you can choose suitable feature engineering methods and techniques to process the data.

About Python technical reserves

Learning Python well is good whether you are getting a job or doing a side job to make money, but you still need to have a learning plan to learn Python. Finally, we share a complete set of Python learning materials to give some help to those who want to learn Python!

1. Learning routes in all directions of Python

The technical points in all directions of Python have been compiled to form a summary of knowledge points in various fields. Its usefulness is that you can find corresponding learning resources according to the above knowledge points to ensure that you learn more comprehensively.

2. Python essential development tools

3. Python video collection

Watch zero-based learning videos. Watching videos is the fastest and most effective way to learn. It is easy to get started by following the teacher’s ideas in the video, from basic to in-depth.

4. Practical cases

Optical theory is useless. You must learn to follow along and practice it in order to apply what you have learned to practice. At this time, you can learn from some practical cases.

5. Python exercises

Check learning results.

6. Interview materials

We must learn Python to find a high-paying job. The following interview questions are the latest interview materials from first-tier Internet companies such as Alibaba, Tencent, Byte, etc., and Alibaba bosses have given authoritative answers. After finishing this set I believe everyone can find a satisfactory job based on the interview information.

Finally, I wish you all to make progress every day! !

The above complete version of the complete set of Python learning materials has been uploaded to the CSDN official. If friends need it, they can directly scan the CSDN official certification QR code below on WeChat to get it for free [100% free guaranteed].

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. OpenCV skill treeVideo analysisOptical flow 23690 people are learning the system