Python artificial intelligence practice: automatic recommendation system

1. Background Introduction

1.1 What is an automatic recommendation system?

With the development of the Internet, users’ demand for products and services is increasing day by day. In order to meet users’ needs for goods and services, we need to provide a better shopping experience. In the field of e-commerce, recommendation systems based on user search, browsing history and other behavioral data have become a hot topic in the industry. However, traditional recommendation systems often make personalized recommendations based on user tastes and preferences, ignoring the content and scenarios that users are truly interested in. Therefore, how to use users’ historical behavior data and recommendation algorithms to automatically generate content that is more in line with user needs and push it to users has become an important research topic.

The so-called “automatic recommendation” refers to analyzing the user’s historical behavioral data and combining it with the recommendation algorithm to recommend new products or services that the user may be interested in based on the user’s preferences and specific scenarios. This process generally includes the following stages:

  1. User portrait: Collect user behavioral data and extract user characteristics, such as age, gender, interests and hobbies, etc.;
  2. Behavioral data analysis: Based on different behavioral data such as users’ search, browsing records, purchase records, collection records, etc., perform data cleaning, deduplication, analysis and other processing to form a representative data set;
  3. Recommendation algorithm: Based on the user characteristics and behavior data in the data set, select an appropriate recommendation algorithm for recommendation, such as collaborative filtering algorithm, content filtering algorithm, etc.;
  4. Results display: Present recommendation results to users, including information on goods, services, advertisements, etc., and make improvements based on user feedback.

There are also more and more applications of automatic recommendation systems. Social platforms such as Weibo, WeChat, Zhihu, and Toutiao are all using automatic recommendation systems for recommendations. In addition, online retail websites such as Taobao and JD.com also provide rich recommendation system functions.

1.2 Why do we need an automatic recommendation system?

1.2.1 Improve user satisfaction

Because recommendation engines can provide users with higher-quality goods and services, they can effectively reduce purchase intentions and improve user satisfaction. For example, the recommendation system can help users find related products, reduce the accumulation of idle items, save time and money, and can also promote users to participate in community activities, increase user activity, increase participation, and enhance purchasing power. In addition, the popularity of recommendation-based systems is gradually increasing, and some companies also hope to use recommendation engines to promote the growth of their own businesses.

1.2.2 Optimize product positioning and marketing strategy

The automatic recommendation system can provide more accurate positioning for products, continuously optimize the usability and interface layout of products and services, and improve user experience. For example, the recommendation system can update the currently most popular or popular products in real time, making them easier to discover and reducing the funnel effect. In addition, for e-commerce, through the backend management system of the recommendation system, you can intuitively understand the average revenue, inventory, conversion rate, number of visitors and other indicators of various products, so as to adjust marketing strategies and improve sales and brand. Influence.

1.2.3 Smarter Personalized Recommendations

The automatic recommendation system can provide users with content that is more in line with their tastes and preferences based on their personalized needs, allowing users to have a more comprehensive evaluation of the product before purchasing it, making purchasing decisions more scientific. For example, when a user opens a mobile APP, the recommendation system can recommend products that best suit their preferences, which can greatly improve the user’s experience, shorten purchase time, and improve efficiency.

1.2.4 Improving the company’s competitiveness

The automatic recommendation system saves enterprises huge resource expenditures and gives them a very strong competitive advantage. For example, many well-known e-commerce platforms are currently beginning to use recommendation systems to enhance competitiveness, such as Amazon, JD.com, Suning, Pinduoduo, etc. Automatic recommendation systems create a unique opportunity for companies to directly obtain massive amounts of user data for precise marketing. In addition, with the arrival of this wave of dividends, many entrepreneurs have joined the competition and tried to develop automatic recommendation system products suitable for themselves.

1.3 Types and Applications of Automatic Recommendation Systems

1.3.1 Recommendation based on user attributes

The simplest automatic recommendation system is to make recommendations based on user attributes (such as age, gender, city). This recommendation method is also called “recommendation based on user attributes”. It is simple and easy to understand, and does not rely on too much user input information, but it cannot bring more information to users. It can only convert the user’s favorite tags into specific goods or services.

For example, on an e-commerce website, products that the user may like can be recommended based on the user’s purchasing habits, preferences, collections and other behavioral data. However, this recommendation method cannot consider the user’s spending power and cannot give users targeted recommendations.

1.3.2 Context-based recommendation

Context-based recommendations can solve the problems of recommendations based on user attributes. By analyzing the user’s query, search, browsing history and other behavioral data, the user’s interests, values and demands can be identified, and corresponding products or services can be recommended accordingly. This recommendation method pays more attention to the dynamic changes of users and can provide users with more realistic recommendations.

For example, on a video website, relevant videos can be recommended to users based on the video content they have recently clicked, watched or purchased. For another example, on a search engine, based on the user’s search keywords, when making recommendations, the user can be provided with the top related search terms on the search engine.

1.3.3 Content-based recommendation

Content-based recommendation is a more complex recommendation method. It integrates various information such as the video the user is currently watching, the book he is reading, the song he is listening to, the products he purchased, his favorites, etc., and uses machine learning algorithms for analysis and recommendations.

For example, on a music website, new songs are recommended for users based on their historical playback records and the user’s rating of a certain song. For another example, on a catering website, matching dishes are recommended to users based on their purchasing habits, recommended cuisines, favorite ingredients and other information.

1.3.4 Combination Recommendation

Finally, there is a special recommendation method – combined recommendation. It combines the other three recommendation methods and makes recommendations in order of priority. For example, when a user visits a page for the first time, context-based recommendations can be used to recommend related products, and then after the user goes through some operations, it will be converted to content-based recommendations based on the user’s tastes, preferences, etc. Features for more specific recommendations.

To sum up, the automatic recommendation system can provide users with various contents including goods, services, advertisements, etc., and achieve functions such as accurate transmission of information, improvement of user stickiness, and optimization of business models.

2. Core concepts and connections

2.1 User Portrait

User profiling is the process of converting the user’s historical behavioral data into a series of characteristics about the user. It mainly includes three aspects: basic characteristics, preference characteristics and behavioral characteristics.

Basic Features:

  • age
  • gender
  • place of residence
  • Profession

Preferred characteristics:

  • Consumption habits
  • Preferred category
  • hobbies

Behavioral characteristics:

  • search behavior
  • Browsing behavior
  • purchase behaviour
  • Comment behavior
  • collection behavior

2.2 Data Cleaning and Deduplication

Data cleaning refers to extracting useful information from raw data and performing necessary processing so that the data can be presented, analyzed and used. Data deduplication refers to deleting duplicate data to avoid redundancy.

The main tasks of data cleaning include:

  • Data crawling: Obtain user behavior data from different channels according to user needs;
  • Data cleaning: Clean, reorganize, filter and other operations on data to obtain a representative data set;
  • Data normalization: ensure consistency between data fields to facilitate calculations;
  • Data storage: save data to a database or file;
  • Data reading: Load data into memory for analysis and processing.

2.3 Recommendation Algorithm

Recommendation algorithm is a process used to analyze, model and recommend users’ historical behavior data. Currently, common recommendation algorithms include collaborative filtering algorithm, content filtering algorithm, factor decomposition algorithm, etc.

2.3.1 Collaborative Filtering Algorithm

The collaborative filtering algorithm is based on the similarity of items between users. The basic idea is to analyze the behavioral data between users to find items they like in common, and then recommend other items of interest.

The main features are as follows:

  1. Similarity calculation between users. The similarity between users is measured by calculating the cosine distance between feature vectors between users;
  2. Item similarity calculation. Find the similarity between items by analyzing the relationship between items;
  3. Recommended results are generated. Generate recommendation results based on the user’s interest vector, interest matrix, user-item matrix, etc.

2.3.2 Content Filtering Algorithm

The content filtering algorithm is another recommendation algorithm. It mainly determines what content the user is interested in based on the user’s search, purchase, attention, evaluation and other behaviors, and then recommends this content.

The main features are as follows:

  1. Analysis of user-content interaction data. Predict content recommendations by analyzing user behavior data;
  2. Content analysis and recommendation module design. Analyze user behavior data and establish a graph of the relationship between users and content;
  3. Recommended results are generated. Generate recommendation results based on the user’s interest preferences and preference vectors.

3. Detailed explanation of core algorithm principles, specific operation steps and mathematical model formulas

3.1 Collaborative Filtering Algorithm

3.1.1 Algorithm Process

  1. Data preparation: First, we need to obtain the user’s historical behavior data, such as search records, browsing records, purchase records, etc.

  2. Data cleaning and statistics: Secondly, we clean the data and perform statistical analysis to aggregate items of the same type together.

  3. Generate user-item matrix: In the third step, we generate user-item matrix. The elements in the matrix represent the user’s rating of the item.

  4. Recommendation based on item similarity: In the fourth step, we calculate the similarity between items and use these similarities to make recommendations.

  5. Sorting according to the recommendation results: Finally, we sort according to the recommendation results and return the final recommendation list to the user.

3.1.2 Principle Introduction

The collaborative filtering algorithm is an algorithm that analyzes users’ historical behavior data based on the similarity between users, finds items they like in common, and then recommends other items of interest.

The basic idea of this recommendation method is to analyze the behavioral data between users to find items they like in common, and then recommend other items of interest. Different from recommendations based on user attributes, the collaborative filtering algorithm can take into account the similarity of items between users and give users more accurate recommendations.

3.1.3 Algorithm Details

3.1.3.1 Data Preparation

Obtain users’ historical behavior data, including search records, browsing records, purchase records, etc. For example, user A’s search records on a certain day are as follows:

['iphone', 'apple watch','macbook pro']
3.1.3.2 Data Cleaning and Statistics

The data is cleaned and statistically analyzed to aggregate items of the same type together. For example, for the above search records, we can count the different items they contain and aggregate them:

{'iphone': ['apple'],
 'apple watch': ['apple', 'watch'],
'macbook pro': []}
3.1.3.3 Generate user-item matrix

Generate a user-item matrix, where the elements in the matrix represent the user’s rating of the item. For example, in the case of user A, we can create a rating matrix:

| | iphone | apple watch | macbook pro |
| A | 5 | 4 | 0 |
3.1.3.4 Recommendation based on item similarity

Calculate similarities between items and use these similarities to make recommendations. For example, for the item ‘iphone’, if the similarity with ‘apple watch’ is relatively high, it can be recommended to the user.

Finally, we sort according to the recommendation results and return the final recommendation list to the user. For example, for user A, the list of recommended items based on similarity can be:

[['apple watch']]

3.2 Content Filtering Algorithm

3.2.1 Algorithm Process

  1. Obtain user behavior data: First, we need to obtain the user’s historical behavior data, such as search records, browsing records, purchase records, etc.

  2. Use keyword search: In the second step, we use keyword search to find topics that the user may be interested in.

  3. Find related items: In the third step, we associate the found topic with the user’s recent behavior and find the corresponding items.

  4. Rating the items: In the fourth step, we rate the items and make recommendations based on the ratings.

  5. Return recommendation results: Finally, we return the recommendation results to the user for selection.

3.2.2 Principle Introduction

The content filtering algorithm is an algorithm that determines what content the user is interested in based on the user’s search, purchase, attention, evaluation and other behaviors, and then recommends this content.

Different from the collaborative filtering algorithm, this recommendation method does not need to analyze the similarity between users. It only needs to determine which content the user is interested in based on the user’s search, purchase, attention, evaluation and other behaviors, and then recommend these contents.

3.2.3 Algorithm Details

3.2.3.1 Get user behavior data

Obtain users’ historical behavior data, including search records, browsing records, purchase records, etc. For example, user A’s search records on a certain day are as follows:

['iphone', 'apple watch','macbook pro']

Use keyword searches to find topics that may be of interest to users. For example, for user A’s search records, we can use keyword search to find topics that may be of interest:

['iphone', 'iphone x', 'iphones']

Associate the found topic with the user’s recent behavior and find the corresponding items. For example, for the topic ‘iphone’, its most recent behavior with user A is ‘iphone’ in the search record. We can try to find the most relevant items for this topic:

['iPhone XS', 'iPhone XR', 'Apple Watch Series 5']
3.2.3.4 Rating Items

Rate items and make recommendations based on the ratings. For example, for the items found above, we could give each item a rating and make recommendations based on the rating:

['iPhone XS: 9.7', 'iPhone XR: 8.9', 'Apple Watch Series 5: 8.2']

Finally, we return the recommendation results to the user for selection. For example, for user A, the list of recommended items according to the rating size is:

[['iPhone XS', 'iPhone XR', 'Apple Watch Series 5']]

4. Specific code examples and detailed explanations

import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer 

4.1.2 Data Preparation

We randomly generated a batch of user search and browsing records as sample data.

data = {'user_id': [1, 2, 3],
       'search_record': [['iphone', 'apple watch','macbook pro'],
                          ['yogurt', 'rice','spaghetti'],
                          ['milk', 'cookies']],
        'browse_record': [[['iphone'], ['apple watch','macbook pro']],
                           [['yogurt', 'rice','spaghetti'], ['bread', 'pasta']],
                           [['milk'], ['cookies']]],
        }

df = pd.DataFrame(data)
print(df)

Output:

 user_id search_record browse_record
0 1 ['iphone', 'apple watch','mac... [[['iphone'], ['apple watch...
1 2 ['yogurt', 'rice','spa... [[['yogurt', 'rice','s...
2 3 ['milk', 'cookies'] [[['milk'], ['cookies']]]

4.1.3 Data Cleaning and Statistics

Perform data cleaning and statistics to aggregate items of the same type together.

def clean_data(data):
    result = {}
    for item in data:
        words = set([word for word in item if len(word)>1]) # Remove single-word words
        cleaned_words = sorted(list(words)) # Sort words by word frequency from high to low
        categories = '_'.join(cleaned_words).lower() # Concatenate words into strings and convert them to lowercase
        if categories not in result:
            result[categories] = [item]
        else:
            result[categories].append(item)
    return dict(sorted(result.items(), key=lambda item:len(item[1]), reverse=True)) # Aggregate items of the same type together and sort them by word frequency from high to low

clean_data(['iphone', 'apple watch','macbook pro'])

Output:

{'iphone_apple_pro': [['iphone', 'apple watch','macbook pro']],
 'apple_pro_watch': [['iphone', 'apple watch','macbook pro']],
'macbook': [['iphone', 'apple watch','macbook pro']],
 '': [['iphone', 'apple watch','macbook pro']]}

4.1.4 Generate user-item matrix

Generate a user-item matrix. The elements in the matrix represent the user’s rating of the item.

def generate_matrix(df):
    search_vectorizer = CountVectorizer()
    browse_vectorizer = CountVectorizer()

    search_features = df['search_record'].apply(lambda x: " ".join(x)).tolist()
    browse_features = df['browse_record'].apply(lambda x: [" ".join(i) for i in x]).tolist()

    search_matrix = search_vectorizer.fit_transform(search_features)
    browse_matrix = browse_vectorizer.fit_transform(browse_features)

    return (search_matrix, search_vectorizer), (browse_matrix, browse_vectorizer)


(search_matrix, search_vectorizer), (browse_matrix, browse_vectorizer) = generate_matrix(df)

print("Search matrix:\\
", search_matrix.toarray())
print("\\
")
print("Browse matrix:\\
", browse_matrix.toarray())

Output:

Search matrix:
 [[0 1 1 1]]


Browse the matrix:
 [[0 0 1]
  [1 0 1]
  [1 1 0]]

4.1.5 Recommendation based on item similarity

Calculate similarities between items and use these similarities to make recommendations.

def recommend_by_cosine(search_matrix, browse_matrix):
    similarity = cosine_similarity(search_matrix, browse_matrix)[0]
    print('User's similarity between items:\\
', similarity)

    def get_recommendation():
        target_index = int(input("Please enter the target user index (0-2):\\
"))

        similarities = similarity[target_index]

        ranked_indexes = sorted(range(len(similarities)), key=lambda k: similarities[k], reverse=True)[:5]

        recommended_categories = list(browse_matrix.columns[ranked_indexes])

        print("Items that similar users are interested in:")
        for category in recommended_categories:
            items = "_".join(category.split("_")[::-1])
            print("- {}".format(items))

    while True:
        try:
            get_recommendation()
        except ValueError:
            break

Run the code:

recommend_by_cosine(search_matrix, browse_matrix)

enter:

Please enter the target user index (0-2):
0

Output:

The similarity between users’ opinions on items:
 [0.258099 0.75866 0. ]

Items that similar users are interested in:
-iphones_pro
- ipad
- ios
- macbooks
- apple_watches