RFM Analysis Using Python

As a data analyst, marketer, or project manager, you understand the power of Data-driven insights. RFM analysis using Python effectively can be a game-changer in this regard. This guide is designed to give you the knowledge and tools you need to harness the full power of your customer data. After all, you can provide actionable insights that take your organization to new heights. Before we dive deeper, let's start with a brief overview of RFM Analysis.

RFM Analysis Using Python

Understanding the RFM Analysis

RFM methods are widely used to classify performance analysis. RFM Analysis representing Recency, Frequency, and Monetary analysis is widely used to segment customers based on their transaction behavior. Here's a step-by-step guide with examples:

  1. Recency:
    It means how recently a customer made a purchase. For example, If you buy something on March 1st and today is April 1st, your recency is 30 days.
  2. Frequency:
    It means how often a customer orders or buys products. Basically, this is the number of orders that any customer made from the first purchase date. For example, if you buy a shirt on January 31st and trousers on February 15th, your frequency is 2.
  3. Monetary:
    How much does a customer spend on purchases. For instance, let's take the previous example, where we made 2 purchases. The shirt cost 10 $, and the jeans are 25 $. Your monetary value is 35 $.

By analyzing these three metrics for each customer, RFM analysis helps businesses identify various customer segments, such as high-value customers, loyal customers, at-risk customers, and dormant customers. These segments can be targeted with specific marketing strategies tailored to their unique characteristics and needs. RFM analysis provides actionable insights that enable businesses to optimize their marketing efforts, improve customer retention, and maximize revenue.

RFM analysis aims to divide customers into segments, such as high-value, medium-value, or low-value customers. Let's assume our company is named "javaTpoint", and we'll perform an RFM analysis of our customer data.

We can continue moving forward now that we understand the value of recurrence, frequency, and monetary.

Calculating Recency

We calculate recency for customers who made a purchase. The recency is the number of days since their last purchase:

Importing Necessary Libraries

To Read the downloaded dataset with the following lines of code

To look at the head of the dataframe

Complete Code

Output

RFM Analysis Using Python

Explanation

In the above code, we first imported the necessary pandas, numpy and datetime modules. After that, the code reads data from a CSV file named "RFM_Analysis.csv" into a Pandas DataFrame called 'df'.

Calculating Recency

We calculate recency for customers who made a purchase. The recency is the number of days since their last purchase:

Example

Output

	Customer_Name	Purchase_Date	Recency
0	Aaron Ross		2024-11-05		56
1	Abigail Martin	2023-06-14		566
2	Addison Gomez	2024-08-25		128
3	Alexander Allen	2024-09-13		109
4	Alexander Powell	2024-12-11		20
...	...	...	...
92	Victoria Foster	2023-08-01		518
93	William Clark	2024-07-07		177
94	William Ward	2024-01-10		356
95	Wyatt Gonzales	2023-07-17		533
96	Zoe Evans		2023-11-20		407
97 rows × 3 columns

Explanation

The above code converts the 'Purchase_Date' column in a DataFrame to datetime format, groups the data by 'Customer_Name', finds the maximum purchase date for each customer, calculates recency based on the most recent purchase date, and displays the resulting DataFrame with customer names and their corresponding recency values.

Calculating Frequency

Next, calculate the frequency of transactions for each customer:

Example

Output

Customer_Name	Frequency
0	Aaron Ross		1
1	Abigail Martin	1
2	Addison Gomez	1
3	Alexander Allen	1
4	Alexander Powell	1

Explanation

The above code counts the frequency of orders for each unique customer name in a DataFrame, removes duplicate rows, groups the data by customer name, counts the number of orders for each customer, renames the columns, and displays the resulting DataFrame showing customer names and their corresponding order frequencies.

Calculating Monetary Value

Compute the monetary value by summing up the total sales for each customer:

Example

Output

	Customer_Name	Monetary
0	Aaron Ross		70.25
1	Abigail Martin	5983.25
2	Addison Gomez	2887.62
3	Alexander Allen	4983.11
4	Alexander Powell	2355.69

Explanation

The above code multiplies the 'Sales' column in the DataFrame ('df') by the 'Quantity' column and assigns the result to another column called 'Total'. Then, it aggregates the DataFrame by 'Customer Name', calculates the 'Total' amount for each customer, and stores the result in a new DataFrame ('monetary_df'). Finally, it renames the 'CustomerName' and 'Monetary' columns and displays the first few lines of the resulting DataFrame.

Merging All Columns

Merge the recency, frequency, and monetary columns into a single DataFrame:

Example

Output

	Customer_Name	Recency	Frequency	Monetary
0	Aaron Ross		56		1		70.25
1	Abigail Martin	566		1		5983.25
2	Addison Gomez	128		1		2887.62
3	Alexander Allen	109		1		4983.11
4	Alexander Powell	20		1		2355.69

Explanation

This code merges three DataFrames ('df_recency', 'frequency_df', 'monetary_df') based on the 'Customer_Name' column, resulting in a new DataFrame ('rfm_df'). Then, it drops the 'Purchase_Date' column from the merged DataFrame and displays the first few rows.

Ranking Customers

Normalize customers' rank based on their recency, frequency, and monetary scores:

Code

Output

RFM Analysis Using Python

Explanation

The above code calculates ranks for the Recency, Frequency, and Monetary values in the DataFrame 'rfm_df'. Then, it normalizes these ranks to a scale of 0 to 100. Finally, it drops the original rank columns and displays the first few rows of the modified DataFrame.

Calculating RFM Score:

Assign a score based on customer behavior for each variable.

  • R Score: Higher score for more recent purchases.
  • F Score: Higher score for frequent orders.
  • M Score: Higher score for greater monetary value.

Choose a Scale:

Your scale depends on your customer base size.

  • 1 - 3 scale is used for less than 30,000 customers.
  • 1 - 4 scale is used for 30,000 - 200,000 customers.
  • 1 - 5 scale is used for more than 200,000 customers.

Here, we will use a scale of 5. The formula used for calculating rfm score is: 0.15*Recency score + 0.28*Frequency score + 0.57 *Monetary score

Code

Output

	Customer_Name	RFM_Score
0	Aaron Ross		3.54
1	Abigail Martin	3.01
2	Addison Gomez	3.48
3	Alexander Allen	3.49
4	Alexander Powell	3.57
...	...	...
92	Victoria Foster	3.06
93	William Clark	3.43
94	William Ward	3.22
95	Wyatt Gonzales	3.05
96	Zoe Evans		3.16
97 rows × 2 columns

Explanation

The above code computes an RFM score for each customer in the DataFrame 'rfm_df' by combining the normalized ranks for Recency, Frequency, and Monetary values with specific weights. Then, it scales the RFM scores by multiplying them by 0.05, rounds them to two decimal places, and selects the 'Customer_Name' and 'RFM_Score' columns for the first seven customers in the DataFrame.

Rating Customers Based on RFM

Based on the segments, assign ratings (e.g., from 1 to 5):

  • rfm score >4.5 : Champions
  • 4.5 > rfm score > 4 : Loyal Customers
  • 4>rfm score >3 : Potential Loyalists
  • 3>rfm score>1.6 : Recent Customers
  • rfm score<1.6 : Lost Customers

Program

Output

	Customer_Name	RFM_Score	Customer_segment
0	Aaron Ross		3.54		Medium Value Customer
1	Abigail Martin	3.01		Medium Value Customer
2	Addison Gomez	3.48		Medium Value Customer
3	Alexander Allen	3.49		Medium Value Customer
4	Alexander Powell	3.57		Medium Value Customer
...	...	...	...
92	Victoria Foster	3.06		Medium Value Customer
93	William Clark	3.43		Medium Value Customer
94	William Ward	3.22		Medium Value Customer
95	Wyatt Gonzales	3.05		Medium Value Customer
96	Zoe Evans		3.16		Medium Value Customer
97 rows × 3 columns

Explanation

The above code is analyzing customer data based on RFM scores. It assigns customer segments such as "Top Customers" (higher score) or "Lost Customers" (lower score) to a new "Customer_segment" column and then displays the output with the customer name, score and category assigned.

Visualize Customer Segments

Now, we will use a pie plot to display all segments of customers in a pie chart. You can do this either using matplotlib or seaborn visualization libraries in Python. Here, we are using maplotlib.

Code

Output

RFM Analysis Using Python

Explanation

The above code creates a pie chart using the Matplotlib visualization library, showing the distribution of customers across different segments. It uses the value_counts() method to count the number of customers in each segment. Then, it plots a pie chart with segment labels and the percentage of customers displayed as text within the pie slices. Then, the pie chart is displayed using plt.show().

Conclusion

In conclusion, RFM analysis conducted using Python offers businesses a powerful tool for understanding customer behavior and optimizing marketing strategies. By leveraging Python's data analysis and visualization capabilities, organizations can extract valuable insights from customer data, segment their customer base effectively, and implement targeted approaches to enhance customer engagement and drive business growth.