“How to apply data analysis to improve the efficiency of software development process?”

Article directory

  • A word of positive energy every day
  • Preface
  • What can python do?
  • Using python tools for data analysis
  • What are the introductory learning methods and recommended classic textbooks for Python?
  • Data analysis thinking and methods
    • Data analysis ideas
      • Clarify the purpose or problem
      • Start with a “what if” approach
      • Locking causes problems
  • Basic steps of data analysis
  • Common methods of data analysis
  • Use of software tools
  • postscript
  • Recommended good book – “Quick Data Analysis Using ChatGPT”
    • Reasons for recommendation
    • brief introduction
    • About the Author
    • Purchase link
  • Book donation event

One sentence of positive energy every day

Since ancient times, people have been driven by integrity, and a word is worth a hundred gold.

Foreword

How important is data analysis in the current workplace and how can programmers learn data analysis skills?

Data analytics plays a vital role in the current workplace. With the continuous development of technology, companies and enterprises have an increasing demand for data, and data analysts have become a popular position. However, not only data analysts need to master data analysis skills, but other functional positions also need relevant knowledge, such as marketing, business analysis, product management, operations management, etc. Data analysis can help them better understand the market, products, User and other aspects of the situation to provide support for corporate decision-making.

So as a programmer, how do you learn data analysis skills? First, you need to understand the basic concepts and processes of data analysis, including data collection, data cleaning, data analysis, data visualization, etc. Secondly, you need to learn relevant tools and technologies, such as Python, R language, SQL, Excel, etc. Finally, you need practical and project experience, and through your own actual operation and practice, you can better familiarize yourself with and master data analysis skills.

What can python do?

After learning Python, you can perform the following tasks and tasks:

  1. Data analysis and processing: Python is very powerful in data processing and analysis and can be used for data cleaning, statistical analysis, modeling and prediction, etc.

  2. Web crawlers and data collection: Python can be used for web crawler development to obtain data from web pages, APIs and databases.

  3. Automation and batch processing: Python can write scripts to automate repetitive tasks, such as file operations, website logins, email sending, etc.

  4. Machine learning and artificial intelligence: Python has strong application value in machine learning and artificial intelligence, including deep learning, natural language processing, image processing, etc.

  5. Web development: Python can be used to develop web applications, and frameworks such as Django and Flask can make development efficient and easy to maintain.

  6. Game development: Python can be used for game development, and frameworks such as Pygame can help developers quickly create games.

In short, Python is very flexible and can be applied to many different fields and tasks. As long as you have enough imagination and skills, you can create countless useful tools and applications.

Use python tools for data analysis

Here are some of the jobs I’ve experienced:

  1. Data cleaning: Use the Pandas library for data cleaning, including data type conversion, missing value processing, duplicate value processing, etc.

  2. Data visualization: Use Matplotlib and Seaborn libraries for data visualization, which can present data more intuitively through histograms, scatter plots, line charts, etc.

  3. Machine learning: Use the Scikit-learn library for machine learning modeling, including data partitioning, feature engineering, selecting appropriate algorithms for modeling, etc.

  4. Deep learning: Use TensorFlow and PyTorch libraries for deep learning modeling, including data preprocessing, model building, training and evaluation, etc.

  5. Database operations: Use the Pandas library and SQLAlchemy library to connect to the database to perform data query, sorting, filtering, merging and other operations.

  6. There is also some graphical data analysis with Excel.

In short, Python has a wealth of data analysis tools that can perform data analysis and modeling quickly and efficiently. Continuous learning and practice are required to better utilize the advantages of Python.

What are the introductory learning methods and recommended classic textbooks for Python?

There are many introductory learning methods and classic textbooks for Python. Here are some of my personal experiences and recommendations:

  1. Official documentation: The official Python website provides detailed documentation, including Python language reference, standard library reference, tutorials, etc. Beginners can have an overall understanding of Python and an in-depth understanding of Python’s syntax and features by reading the official documentation.

  2. Online courses: Online course platforms such as Coursera, Udacity, and edX provide a large number of Python courses, including many courses from famous teachers. You can choose courses that suit you according to your interests and level.

  3. Books: Python has many classic textbooks suitable for beginners, such as “Python Programming from Introduction to Practice” and “Python Basics Tutorial”. These teaching materials are rich in content, systematic and easy to understand, and can help beginners get started quickly.

  4. Practical projects: The best way to learn Python is to gain an in-depth understanding of the Python language and applications through practical projects. You can find some Python practical projects to participate in through some websites, forums, social media and other channels.

My personal introductory learning method is to start by learning basic syntax, and then improve my Python programming level by reading some classic Python textbooks and practicing projects. I will also look for some online courses to improve my skills, and I will also use the official Python documentation. To gain a deeper understanding of Python’s syntax and features.

Data analysis thinking and methods

Data analysis ideas


Clear purpose or problem

Surely many people have had this experience? After receiving a “data analysis” instruction, many people will first make a bar chart of the changes in data over the past period, and then use a line chart to show the growth of each year relative to the previous year. Next, perform ABC classification on the data. ABC classification refers to sorting in order from large to small, and dividing different categories into “Class A”, “Class B” and “Category C” according to their proportions in the whole. This is a sort of priority order. analysis method. Using the data at hand, let’s do these analyses. But after that, it will fall into a stagnant state, “What should we do next…” This situation occurs because the analyst only focuses on the data representing the results without specifically considering the purpose or purpose of the analysis. question. Resultant data, similar to sales volume, order volume, etc., are resultant data that reflect the quality of business. They are easy to count and attract attention. However, the purpose of data analysis is to find problems, analyze causes, and propose solutions. It is unlikely that the cause of the problem can be found by relying solely on the result data. The several methods highlighted above are the most commonly used ones that everyone knows, such as classification, sorting, and trend analysis. Therefore, it is not enough to simply master the methods in data analysis. The purpose of the analysis must first be clear.

Start with a “hypothetical” approach

When there is a lack of ideas at the beginning of data analysis, “hypotheses” can help us determine the scope and cause of the analysis problem. After clarifying the purpose and problem, we can adopt a hypothetical approach to WHAT dismantle the problem. Let’s take a question from [2020 NetEase School Recruitment Data Analysis] as an example:

“You found that the number of paying users this month dropped sharply (more than 30%) from the previous month. How would you explore the reasons behind it? Please describe your ideas and the key indicators involved.”

After confirming the purpose of analysis, we can first split the problem. After splitting, we will find that there are multiple reasons for the impact. We need to consider the dimensions from which to compare the indicators in order to find the key to the problem. Speculating before relying on data can also be said to be part of a WHAT-type hypothesis. In this way, as long as people are familiar with the business content, they will think of many dimensions such as “products”, “region”, and “customer age”. By first selecting the dimensions that are most closely related to the problem and that best reflect the differences for comparison, you can find the key to the problem.


In response to this question, we assumed that the number of new users has decreased. Next, we determined one of the analysis directions, and then dismantled the new users. For example, the number of new users in different channels, which one has decreased, and which channel has decreased. Whether the delivery is effective, and then conduct hypothesis verification based on user age, region, gender, etc.

Why locking causes problems

After we determine the key issues through assumptions, we can use various data analysis methods we will talk about below to analyze the key issues, such as the RFM model commonly used in user analysis, funnel analysis commonly used when studying conversion rates, etc. For example, when analyzing the conversion rate, by comparing the conversion rates of different channels, we found that one of the channels had a significant decrease in traffic. We segmented the users of this channel, specifically the decrease in users in the 20-30 age group, which happens to be the profile of our main consumer users. .

Finally, specific solutions and measures need to be formulated based on the analyzed problems and causes.

Basic steps of data analysis

  1. Clarify the purpose of data analysis
  2. Sort out and determine the analysis logic and determine the method
  3. Collection, processing, etc. of data used for analysis
  4. Use statistics, data mining, etc. to extract valuable data
  5. Use text, tables, visual graphics, etc. to demonstrate success
  6. Draw conclusions from data analysis and write analysis reports

Common methods of data analysis

Different data analysis methods are suitable for different scenarios. Moreover, data analysis methods are not only used in work, but are also useful in life. If you think about the problem from a different angle, you may be able to discover a new world. Listed below are 9 common data analysis methods:

  1. logic tree analysis

    Logic tree is also called problem tree, deduction tree or decomposition tree, etc. The most commonly used tool at McKinsey to analyze problems is the “logic tree”. A logic tree is a hierarchical list of all sub-problems of a problem, starting from the top level and gradually expanding downwards.

The famous Fermi problem uses logic tree analysis.

  1. Multidimensional dismantling analysis method
    Multi-dimensional analysis is essentially segmentation analysis, and multi-dimensional analysis plays a very important role in refined operations. Multi-dimensional analysis is mainly based on two aspects, one is the refinement of indicators, and the other is the diversity of dimensions, such as time dimension, competing product dimension, etc.

For example, in the process of job hunting, evaluating whether a company is suitable can be analyzed from multiple dimensions:

This idea is similar to the logic tree. The main purpose is to break down a vague problem into multiple sub-problems.

  1. PEST analysis
    PEST analysis refers to the analysis of the macro environment, P is politics, E is economy, S is society, and T is technology. The PEST analysis method is generally used during market research.
  2. Comparative analysis method:

Comparative analysis method, also called “comparative analysis method”, is an analysis method that compares objective things to achieve the essence and laws of affairs and make correct evaluations.

Comparative analysis usually compares two interrelated indicator data and then displays them through charts. Analysis methods include: absolute comparison and relative comparison.

There was once a sentence that impressed me deeply, “Good data indicators must be proportions, and good data analysis must have comparisons. Indeed, comparison is indispensable in data analysis work now.”

  1. hypothesis testing analysis
    Hypothesis testing analysis is a process of using data to make decisions. The basic steps are: formulate a hypothesis – collect evidence – draw a conclusion (the conclusion must be supported by data)

If you want to find the cause of a problem, use hypothesis testing analysis. For example, detective films often use this method, assuming first and then demonstrating.

Commonly used hypothetical argumentation method in work can quickly improve your business thinking ability.

  1. Related analysis methods:
    Correlation analysis, specifically: finding the correlation between two data indicators. Correlation analysis is also divided into “direct correlation” and “indirect correlation”. Similarly, there will be negative correlation, no correlation, and non-linear correlation. In actual work, we will make a scatter plot to analyze the correlation between two different things:

For example, in an APP, if a user repeatedly browses a type of product, related products will be pushed to him. Another example is Douyin. Station B recommends videos I like. Douban will use related analysis to recommend favorite movies.

  1. Group analysis method:

Divide the data into different data groups according to a certain characteristic, and then compare the data of each data group. Flexibly define data groupings according to different products and businesses, and then analyze the data indicators.

If you want to analyze user retention and churn, you need to use group analysis. As shown below: whether the update of the product release version leads to user growth or loss. Data can be grouped according to the periodic characteristics of users using products.

  1. RFM taxonomy:

RFM is used to group user values to achieve refined operations. Similar to the matrix method, but converts a two-dimensional matrix into a three-dimensional one. Compared with other grouping methods, its advantage is simplicity.

The RFM analysis method uses the three indicators of R, F, and M to group users into value groups.

R (Recency): the latest consumption time interval.

F (Frequency): refers to the user’s consumption frequency within a period of time.

M (Monetary,): refers to the user’s consumption amount within a period of time.
  1. final path method
    The final path method can also be called the funnel method, and AARRR is also a type of funnel method. If you want to analyze user behavior or product operations, you need to use the final path method. Refine operations through management of steps between start and goal and feedback on data.

For example, user behavior analysis of the intermediate path from click to payment in online shopping malls. The conversion from online education clicks to payment, etc.

This is a brief introduction to the 9 conventional data analysis methods. The application process must be flexibly grasped in actual business scenarios, otherwise everything will be empty talk.

Use of software tools

  1. excel
    The first thing you need to master is Excel, which is the most widely used. Basic data processing, function formulas, pivot tables, etc. Excel can basically meet the needs of a small amount of data.
  2. Data analysis tools
    After the amount of data exceeds 100,000 or even millions, we need to use some visualization tools to complete the analysis requirements.

Common data analysis tools, such as tableau, power bi, Haizhi BDP, etc., in addition to well meeting the needs of larger data volumes, also have good advantages in visualization effects, zero-code analysis, etc.

Several features to consider when choosing a tool:

  1. Easy to use, simple to operate, no programming knowledge required

  2. Support export, sharing, etc.

  3. Provide some case templates

  4. Beautiful visualization

The operation ideas of this type of analysis tools are relatively similar, usually

Data upload/access – data processing – create Kanban charts – drag and drop analysis

Take the operation steps of BDP Personal Edition as an example:

1. Data upload
Select – Add data source in the data source interface, which supports multiple data source types such as local tables and databases.
Here we take a local excel table as an example. Click Upload File’, select a local data table, and upload it.


2. Data processing
The worksheet is an interface for storing detailed data. In this interface, you can view data and perform various data processing operations. For example, you no longer need to write vlookup formulas, you can directly drag and drop to complete data matching.

3. Create a visual signboard
Create a Kanban board on the dashboard interface and click “Add Chart” in the upper right corner. Multiple charts can be added to one Kanban board and automatically laid out.

Drag the fields to be displayed to the dimension and value columns respectively, select the required chart type, and an intuitive visual chart will be automatically generated. Return and save it yourself.

Postscript

Any business will generate a large amount of data information. Only when we use certain scientific and technological means to analyze and mine effective information can its important value be reflected.

In the Internet, finance, consulting, telecommunications, retail, medical, tourism and other industries, there is an urgent need for new data analysis talents who specialize in data collection, cleaning, processing, analysis and can produce business reports and provide decision-making.

In short, data analysis has become one of the indispensable skills in the workplace. As programmers, learning data analysis skills can help us better understand the situation of the company and the market, provide better support for corporate decision-making, and also improve ourselves. professional competitiveness.

Good book recommendation – “Quick Data Analysis Using ChatGPT”

Reason for recommendation

  • Use ChatGPT to subvert data analysis and generate data analysis results in 1 minute!
  • More than 30 carefully selected ChatGPT data analysis cases + more than 50 ChatGPT data analysis strategies
  • Covering the entire process from data preprocessing to advanced analysis
  • Help you stand out in a highly competitive environment
  • Let AI take you to the top of data science.

Super practical. Through more than 30 practical cases and operating techniques, readers can quickly learn and flexibly apply data analysis and ChatGPT technology.

Very comprehensive with more than 50 ChatGPT data analysis strategies, covering the entire process from data preprocessing to advanced analysis.

It’s really easy to understand. It explains the principles and applications of data analysis and ChatGPT in easy-to-understand language, with zero threshold, so that even novices in the workplace can easily master it.

High Return Study this book and use ChatGPT to practice in every aspect of the data analysis process, which can greatly improve work efficiency, reduce labor costs, and thus bring higher return on investment to enterprises and individuals.

Introduction

This book is a practical guide on data analysis and ChatGPT application. It is designed to help readers understand the basic knowledge of data analysis and use ChatGPT for efficient data processing and analysis. With the advent of the big data era, data analysis has become a key driving force for the development of modern enterprises and industries. This book was born to meet this market demand.

This book is divided into 8 chapters, covering everything from basic knowledge of data analysis and common statistical methods to using ChatGPT for data preparation, data cleaning, data feature extraction, data visualization, regression analysis and predictive modeling, classification and clustering analysis. , as well as comprehensive content such as deep learning and big data analysis. Each chapter introduces in detail the use of ChatGPT to solve practical problems in the data analysis process, and provides a wealth of examples to help readers quickly master relevant skills.

This book is suitable for data analysts, data scientists, researchers, business managers, students, and readers who are interested in data analysis and artificial intelligence technology. By reading this book, readers will master the core concepts and methods of data analysis, and learn to use ChatGPT to bring higher efficiency and value to data analysis work.

About the author

Zhu Ning, a master’s degree in imaging, has rich AI research background and practical experience. He has served as Huawei AI algorithm engineer and Microsoft senior scientist. Proficient in the theory and practice of machine learning, deep learning and data analysis, focusing on cutting-edge research in computer vision and natural language processing. Keep up with the development of ChatGPT technology, successfully apply it in actual data analysis work, and bring significant results to AI projects.

Purchase link

Dangdang.com link: http://product.dangdang.com/29606385.html
JD.com link: https://item.jd.com/13810483.html

Book donation event

  • We will give away 1 to 4 books this time [depending on the amount of reading, the more you read, the more you will get]
  • Activity time: Until November 2, 2023
  • How to participate: follow the blogger + three links (like, collect, comment)

Reprinted from: https://blog.csdn.net/u014727709/article/details/134064892
Welcome to start, welcome comments, welcome corrections