概述
前段时间,实验室需要开发一个用户日志模块,来对实验室的Web项目监控,获取用户的行为日志。个人首先觉得应该主要使用js来实现相关功能,无奈js水平着实太低,最终采用了servlet的方式来实现。
项目介绍
自己先从github上查询到了一个相关项目,clickstream,我先来介绍一下该项目是怎么实现的。
Clickstream的实现
它首先使用了一个Listener来监听ServletContext和HttpSession,代码如下
public class ClickstreamListener implements ServletContextListener, HttpSessionListener {
private static final Log log = LogFactory.getLog(ClickstreamListener.class);
/** The servlet context attribute key. */
public static final String CLICKSTREAMS_ATTRIBUTE_KEY = "clickstreams";
/**
* The click stream (individual) attribute key: this is
* the one inserted into the HttpSession.
*/
public static final String SESSION_ATTRIBUTE_KEY = "clickstream";
/** The current clickstreams, keyed by session ID. */
private Map<String, Clickstream> clickstreams = new ConcurrentHashMap<String, Clickstream>();
public ClickstreamListener() {
log.debug("ClickstreamLogger constructed");
}
/**
* Notification that the ServletContext has been initialized.
*
* @param sce The context event
*/
public void contextInitialized(ServletContextEvent sce) {
log.debug("ServletContext initialised");
//把clickstreams存放于ServletContext中,web容器集群时,不能采用这种方式
sce.getServletContext().setAttribute(CLICKSTREAMS_ATTRIBUTE_KEY, clickstreams);
}
/**
* Notification that the ServletContext has been destroyed.
*
* @param sce The context event
*/
public void contextDestroyed(ServletContextEvent sce) {
log.debug("ServletContext destroyed");
// help gc, but should be already clear except when exception was thrown during sessionDestroyed
clickstreams.clear();//应该在此处完成持久化
}
/**
* Notification that a Session has been created.
*
* @param hse The session event
*/
public void sessionCreated(HttpSessionEvent hse) {
final HttpSession session = hse.getSession();
if (log.isDebugEnabled()) {
log.debug("Session " + session.getId() + " was created, adding a new clickstream.");
}
Object attrValue = session.getAttribute(SESSION_ATTRIBUTE_KEY);
if (attrValue != null) {
log.warn("Session " + session.getId() + " already has an attribute named " +
SESSION_ATTRIBUTE_KEY + ": " + attrValue);
}
final Clickstream clickstream = new Clickstream();
//为新建的session绑定一个clickstream
session.setAttribute(SESSION_ATTRIBUTE_KEY, clickstream);
clickstreams.put(session.getId(), clickstream);
}
/**
* Notification that a session has been destroyed.销毁session,应该在此方法内完成对应clickstream的持久化
*
* @param hse The session event
*/
public void sessionDestroyed(HttpSessionEvent hse) {
final HttpSession session = hse.getSession();
// check if the session is not null (expired)
if (session == null) {
return;
}
if (log.isDebugEnabled()) {
log.debug("Session " + session.getId() + " was destroyed, logging the clickstream and removing it.");
}
final Clickstream stream = clickstreams.get(session.getId());
if (stream == null) {
log.warn("Session " + session.getId() + " doesn't have a clickstream.");
return;
}
try {
if (stream.getSession() != null) {
ClickstreamLoggerFactory.getLogger().log(stream);
}
}
catch (Exception e) {
log.error(e.getMessage(), e);
}
finally {
clickstreams.remove(session.getId());
}
}
}
在这里,读者应该明白session和request之间的区别,一次session可以对应多个request,而多个request可以封装成一个Clickstream。所以使用了
private Map<String, Clickstream> clickstreams = new ConcurrentHashMap<String, Clickstream>();
来存储session和Clickstream之间的映射。每次创建一个session的时候,就在session里面绑定一个Clickstream。
Clickstream的定义如下:
public class Clickstream implements Serializable {
private static final long serialVersionUID = 1;
/** The stream itself: a list of click events. */
private List<ClickstreamRequest> clickstream = new CopyOnWriteArrayList<ClickstreamRequest>();//使用List按顺序保持每个session中的所有请求
/** The attributes. */
private Map<String, Object> attributes = new HashMap<String, Object>();
/** The host name. */
private String hostname;
/** The original referer URL, if any. */
private String initialReferrer;
/**
The stream start time. */
private Date start = new Date();
/**应该直接用System.currentTimeMillis()获得一个long时间戳呢,性能更高,也容易存储在一个bigint的Mysql字段里面*/
/** The time of the last request made on this stream. */
private Date lastRequest = new Date();
/** Flag indicating this is a bot surfing the site. */
private boolean bot = false;
/**
* The session itself.
*
* Marked as transient so that it does not get serialized when the stream is serialized.
* See JIRA issue CLK-14 for details.
*/
private transient HttpSession session;
/**
* Adds a new request to the stream of clicks. The HttpServletRequest is converted
* to a ClickstreamRequest object and added to the clickstream.
*
* @param request The serlvet request to be added to the clickstream
*/
public void addRequest(HttpServletRequest request) {
lastRequest = new Date();
if (hostname == null) {
hostname = request.getRemoteHost();
session = request.getSession();
}
// if this is the first request in the click stream
if (clickstream.isEmpty()) {
// setup initial referrer
if (request.getHeader("REFERER") != null) {
initialReferrer = request.getHeader("REFERER");
}
else {
initialReferrer = "";
}
// decide whether this is a bot
bot = BotChecker.isBot(request);
}
clickstream.add(new ClickstreamRequest(request, lastRequest));
}
/**
* Gets an attribute for this clickstream.
*
* @param name
*/
public Object getAttribute(String name) {
return attributes.get(name);
}
/**
* Gets the attribute names for this clickstream.
*/
public Set<String> getAttributeNames() {
return attributes.keySet();
}
/**
* Sets an attribute for this clickstream.
*
* @param name
* @param value
*/
public void setAttribute(String name, Object value) {
attributes.put(name, value);
}
/**
* Returns the host name that this clickstream relates to.
*
* @return the host name that the user clicked through
*/
public String getHostname() {
return hostname;
}
/**
* Returns the bot status.
*
* @return true if the client is bot or spider
*/
public boolean isBot() {
return bot;
}
/**
* Returns the HttpSession associated with this clickstream.
*
* @return the HttpSession associated with this clickstream
*/
public HttpSession getSession() {
return session;
}
/**
* The URL of the initial referer. This is useful for determining
* how the user entered the site.
*
* @return the URL of the initial referer
*/
public String getInitialReferrer() {
return initialReferrer;
}
/**
* Returns the Date when the clickstream began.
*
* @return the Date when the clickstream began
*/
public Date getStart() {
return start;
}
/**
* Returns the last Date that the clickstream was modified.
*
* @return the last Date that the clickstream was modified
*/
public Date getLastRequest() {
return lastRequest;
}
/**
* Returns the actual List of ClickstreamRequest objects.
*
* @return the actual List of ClickstreamRequest objects
*/
public List<ClickstreamRequest> getStream() {
return clickstream;
}
ClickstreamRequest是对HttpServletRequest的简化封装,定义如下:
public class ClickstreamRequest implements Serializable {
private static final long serialVersionUID = 1;
private final String protocol;
private final String serverName;
private final int serverPort;
private final String requestURI;
private final String queryString;
private final String remoteUser;
private final long timestamp;
public ClickstreamRequest(HttpServletRequest request, Date timestamp) {
protocol = request.getProtocol();
serverName = request.getServerName();
serverPort = request.getServerPort();
requestURI = request.getRequestURI();
queryString = request.getQueryString();
remoteUser = request.getRemoteUser();
this.timestamp = timestamp.getTime();
}
public String getProtocol() {
return protocol;
}
public String getServerName() {
return serverName;
}
public int getServerPort() {
return serverPort;
}
public String getRequestURI() {
return requestURI;
}
public String getQueryString() {
return queryString;
}
public String getRemoteUser() {
return remoteUser;
}
public Date getTimestamp() {
return new Date(timestamp);
}
/**
* Returns a string representation of the HTTP request being tracked.
* Example: <b>www.opensymphony.com/some/path.jsp?arg1=foo&arg2=bar</b>
*
* @return a string representation of the HTTP request being tracked.
*/
@Override
public String toString() {
return serverName + (serverPort != 80 ? ":" + serverPort : "") + requestURI
+ (queryString != null ? "?" + queryString : "");
}
}
所以,当每次有请求时,使用Filter对request进行过滤,把request添加到Clickstream中
public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException {
// Ensure that filter is only applied once per request.
if (req.getAttribute(FILTER_APPLIED) == null) {
log.debug("Applying clickstream filter to request.");
req.setAttribute(FILTER_APPLIED, true);
HttpServletRequest request = (HttpServletRequest)req;
HttpSession session = request.getSession();
Clickstream stream = (Clickstream) session.getAttribute(ClickstreamListener.SESSION_ATTRIBUTE_KEY);
stream.addRequest(request);
}
else {
log.debug("Clickstream filter already applied, ignoring it.");
}
// pass the request on
chain.doFilter(req, res);
}
当session销毁的时候,把Clickstream持久化即可。
改进
Clickstram项目,使用ServletContext来存储clickstreams,意味着只能使用一个web容器,
不然无法保证ClickstreamRequest的顺序性,不利于拓展。
所以在集群情况下,比如tomcat集群,可以使用Redis来存储相关的对象。
把Clickstream拆成三部分:
Redis中的List, 每个元素对应着一个序列化之后的ClickstreamRequest 字符串;
Redis中的Hash,存储ClickstreamRequest parameters, private Map
如何统计与分析
持久化使用了下面的两个表(简化一下):
session 会话表:
id ip referer is_bot request_count start_time end_time
其中,
referer :入口页面;
is_bot :是否是搜索引擎;
request_count :该次会话中,请求的次数,如果为1,则表明用户访问了一个页面,就离开了,可以用于计算跳出率。
request 请求表:
id ip referer uri query is_ajax start_time last_time(持续时间) refresh_count sid
其中,
refresh_count :请求的刷新次数;
来源分析:
通过session表中的referer来判断:如果referer是null,用户是通过浏览器输入url来访问的;如果是搜索引擎的URL,说明是通过搜索访问的,可以获得关键词;其它的话,
是通过网站链接访问的;
根据时间,获得周、月、季度等相关统计;
访客分析:
访客总表
根据session表,来统计日、周、月的统计;
访客日志
统计各个IP的浏览细节,结合session和request即可获得;
忠诚度分析
根据ip在session表里面是否大于2次,判断用户是否是忠诚用户;
IP出现次数越多,忠诚度越高;
地域分布
根据IP即可。
页面分析
从访问页面、出口页面、入口页面等分析,使用request表即可。注意,
List类型的clickstream 中,两个相邻的ClickstreamRequest元素A,B,
B的发生时间减去A的发生时间,可以当做A页面的停留时间;
如果A和B是同一个页面,说明用户在刷新页面A,用户计算request_count ;
想要获知在线人数,查看session_ids中的元素个数即可;
其它功能省略……
如果想要获得用户浏览器、操作系统等,可以从User-Agent中获得
由于没使用js,所以无法获知浏览器分辨率等情况。
相关项目
Google Analytic
使用js,据说为了保护用户隐私,不提供用户的浏览顺序。
Piwik
使用js,后端使用php。
javascript方法v.s. Servlet方法
js方法更容易获得用户在浏览器端的行为;js代码片段只需要加在前端的公共页面上,耦合性低;
js往往会将用户的行为缓存在客户端,然后再把日志提交到服务器端,如果用户关闭了浏览器,部分日志会丢失;
servlet只能在服务器端分析request,无法获得鼠标路径、分辨率等信息;servlet需要和要
监控的项目配置在一起;耗费内存;能够及时获得在线人数;
问题
1.会有人问,如果浏览器关闭了怎么办?
tomcat容器使用了一个map来存储sessionid和session对象之间的映射,
同时会有一个守护线程来检查session对象是否过期,当浏览器关闭的后,
session超时的时候,依旧会触发Listener。
最后
以上就是羞涩白羊为你收集整理的使用Servlet获取用户日志的全部内容,希望文章能够帮你解决使用Servlet获取用户日志所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复