以前一直在猜测RFM的实现原理,今天总算了解了一点
直接附上R code,google所得:
##Creating Random Sales Data of the format CustomerId (unique to each customer), Sales.Date,Purchase.Value
sales=data.frame(sample(1000:1999,replace=T,size=10000),abs(round(rnorm(10000,28,13))))
names(sales)=c("CustomerId","Sales Value")
sales.dates <- as.Date("2012/1/1") + 700*sort(stats::runif(10000))
#generating random dates
sales=cbind(sales,sales.dates)
str(sales)
sales$recency=round(as.numeric(difftime(Sys.Date(),sales[,3],units="days")) )
##library(gregmisc)
##if you have existing sales data you need to just shape it in this format
rename.vars(sales, from="Sales Value", to="Purchase.Value")#Renaming Variable Names
## Creating Total Sales(Monetization),Frequency, Last Purchase date for each customer
salesM=aggregate(sales[,2],list(sales$CustomerId),sum)
names(salesM)=c("CustomerId","Monetization")
salesF=aggregate(sales[,2],list(sales$CustomerId),length)
names(salesF)=c("CustomerId","Frequency")
salesR=aggregate(sales[,4],list(sales$CustomerId),min)
names(salesR)=c("CustomerId","Recency")
##Merging R,F,M
test1=merge(salesF,salesR,"CustomerId")
salesRFM=merge(salesM,test1,"CustomerId")
##Creating R,F,M levels
salesRFM$rankR=cut(salesRFM$Recency, 100,labels=F) #rankR 1 is very recent while rankR 5 is least recent
salesRFM$rankF=cut(salesRFM$Frequency, 100,labels=F)#rankF 1 is least frequent while rankF 5 is most frequent
salesRFM$rankM=cut(salesRFM$Monetization, 100,labels=F)#rankM 1 is lowest sales while rankM 5 is highest sales
##Looking at RFM tables
table(salesRFM[,5:6])
table(salesRFM[,6:7])
table(salesRFM[,5:7])
Note-you can also use quantile function instead of cut function. This changes cut to equal length instead of equal interval. or see other methods for finding breaks for categories.
最后
以上就是害羞小蜜蜂最近收集整理的关于R 语言 RFM 模型实现的全部内容,更多相关R内容请搜索靠谱客的其他文章。
发表评论 取消回复