使用Servlet获取用户日志

100 阅读 0 评论 66 点赞

我是靠谱客的博主羞涩白羊，这篇文章主要介绍使用Servlet获取用户日志，现在分享给大家，希望可以做个参考。

前段时间，实验室需要开发一个用户日志模块，来对实验室的Web项目监控，获取用户的行为日志。个人首先觉得应该主要使用js来实现相关功能，无奈js水平着实太低，最终采用了servlet的方式来实现。

项目介绍

自己先从github上查询到了一个相关项目，clickstream，我先来介绍一下该项目是怎么实现的。
Clickstream的实现

它首先使用了一个Listener来监听ServletContext和HttpSession，代码如下

复制代码

public class ClickstreamListener implements ServletContextListener, HttpSessionListener {
private static final Log log = LogFactory.getLog(ClickstreamListener.class);
/** The servlet context attribute key. */
public static final String CLICKSTREAMS_ATTRIBUTE_KEY = "clickstreams";
/**
* The click stream (individual) attribute key: this is
* the one inserted into the HttpSession.
*/
public static final String SESSION_ATTRIBUTE_KEY = "clickstream";
/** The current clickstreams, keyed by session ID. */
private Map<String, Clickstream> clickstreams = new ConcurrentHashMap<String, Clickstream>();
public ClickstreamListener() {
log.debug("ClickstreamLogger constructed");
}
/**
* Notification that the ServletContext has been initialized.
*
* @param sce The context event
*/
public void contextInitialized(ServletContextEvent sce) {
log.debug("ServletContext initialised");
//把clickstreams存放于ServletContext中，web容器集群时，不能采用这种方式
sce.getServletContext().setAttribute(CLICKSTREAMS_ATTRIBUTE_KEY, clickstreams);
}
/**
* Notification that the ServletContext has been destroyed.
*
* @param sce The context event
*/
public void contextDestroyed(ServletContextEvent sce) {
log.debug("ServletContext destroyed");
// help gc, but should be already clear except when exception was thrown during sessionDestroyed
clickstreams.clear();//应该在此处完成持久化
}
/**
* Notification that a Session has been created.
*
* @param hse The session event
*/
public void sessionCreated(HttpSessionEvent hse) {
final HttpSession session = hse.getSession();
if (log.isDebugEnabled()) {
log.debug("Session " + session.getId() + " was created, adding a new clickstream.");
}
Object attrValue = session.getAttribute(SESSION_ATTRIBUTE_KEY);
if (attrValue != null) {
log.warn("Session " + session.getId() + " already has an attribute named " +
SESSION_ATTRIBUTE_KEY + ": " + attrValue);
}
final Clickstream clickstream = new Clickstream();
//为新建的session绑定一个clickstream
session.setAttribute(SESSION_ATTRIBUTE_KEY, clickstream);
clickstreams.put(session.getId(), clickstream);
}
/**
* Notification that a session has been destroyed.销毁session，应该在此方法内完成对应clickstream的持久化
*
* @param hse The session event
*/
public void sessionDestroyed(HttpSessionEvent hse) {
final HttpSession session = hse.getSession();
// check if the session is not null (expired)
if (session == null) {
return;
}
if (log.isDebugEnabled()) {
log.debug("Session " + session.getId() + " was destroyed, logging the clickstream and removing it.");
}
final Clickstream stream = clickstreams.get(session.getId());
if (stream == null) {
log.warn("Session " + session.getId() + " doesn't have a clickstream.");
return;
}
try {
if (stream.getSession() != null) {
ClickstreamLoggerFactory.getLogger().log(stream);
}
}
catch (Exception e) {
log.error(e.getMessage(), e);
}
finally {
clickstreams.remove(session.getId());
}
}
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
public class ClickstreamListener implements ServletContextListener, HttpSessionListener {
private static final Log log = LogFactory.getLog(ClickstreamListener.class);
/** The servlet context attribute key. */
public static final String CLICKSTREAMS_ATTRIBUTE_KEY = "clickstreams";
/**
* The click stream (individual) attribute key: this is
* the one inserted into the HttpSession.
*/
public static final String SESSION_ATTRIBUTE_KEY = "clickstream";
/** The current clickstreams, keyed by session ID. */
private Map<String, Clickstream> clickstreams = new ConcurrentHashMap<String, Clickstream>();
public ClickstreamListener() {
log.debug("ClickstreamLogger constructed");
}
/**
* Notification that the ServletContext has been initialized.
*
* @param sce The context event
*/
public void contextInitialized(ServletContextEvent sce) {
log.debug("ServletContext initialised");
//把clickstreams存放于ServletContext中，web容器集群时，不能采用这种方式
sce.getServletContext().setAttribute(CLICKSTREAMS_ATTRIBUTE_KEY, clickstreams);
}
/**
* Notification that the ServletContext has been destroyed.
*
* @param sce The context event
*/
public void contextDestroyed(ServletContextEvent sce) {
log.debug("ServletContext destroyed");
// help gc, but should be already clear except when exception was thrown during sessionDestroyed
clickstreams.clear();//应该在此处完成持久化
}
/**
* Notification that a Session has been created.
*
* @param hse The session event
*/
public void sessionCreated(HttpSessionEvent hse) {
final HttpSession session = hse.getSession();
if (log.isDebugEnabled()) {
log.debug("Session " + session.getId() + " was created, adding a new clickstream.");
}
Object attrValue = session.getAttribute(SESSION_ATTRIBUTE_KEY);
if (attrValue != null) {
log.warn("Session " + session.getId() + " already has an attribute named " +
SESSION_ATTRIBUTE_KEY + ": " + attrValue);
}
final Clickstream clickstream = new Clickstream();
//为新建的session绑定一个clickstream
session.setAttribute(SESSION_ATTRIBUTE_KEY, clickstream);
clickstreams.put(session.getId(), clickstream);
}
/**
* Notification that a session has been destroyed.销毁session，应该在此方法内完成对应clickstream的持久化
*
* @param hse The session event
*/
public void sessionDestroyed(HttpSessionEvent hse) {
final HttpSession session = hse.getSession();
// check if the session is not null (expired)
if (session == null) {
return;
}
if (log.isDebugEnabled()) {
log.debug("Session " + session.getId() + " was destroyed, logging the clickstream and removing it.");
}
final Clickstream stream = clickstreams.get(session.getId());
if (stream == null) {
log.warn("Session " + session.getId() + " doesn't have a clickstream.");
return;
}
try {
if (stream.getSession() != null) {
ClickstreamLoggerFactory.getLogger().log(stream);
}
}
catch (Exception e) {
log.error(e.getMessage(), e);
}
finally {
clickstreams.remove(session.getId());
}
}
}

在这里，读者应该明白session和request之间的区别，一次session可以对应多个request，而多个request可以封装成一个Clickstream。所以使用了

复制代码

1
private Map<String, Clickstream> clickstreams = new ConcurrentHashMap<String, Clickstream>();

来存储session和Clickstream之间的映射。每次创建一个session的时候，就在session里面绑定一个Clickstream。

Clickstream的定义如下：

复制代码

public class Clickstream implements Serializable {
private static final long serialVersionUID = 1;
/** The stream itself: a list of click events. */
private List<ClickstreamRequest> clickstream = new CopyOnWriteArrayList<ClickstreamRequest>();//使用List按顺序保持每个session中的所有请求
/** The attributes. */
private Map<String, Object> attributes = new HashMap<String, Object>();
/** The host name. */
private String hostname;
/** The original referer URL, if any. */
private String initialReferrer;
/**
The stream start time. */
private Date start = new Date();
/**应该直接用System.currentTimeMillis()获得一个long时间戳呢，性能更高，也容易存储在一个bigint的Mysql字段里面*/
/** The time of the last request made on this stream. */
private Date lastRequest = new Date();
/** Flag indicating this is a bot surfing the site. */
private boolean bot = false;
/**
* The session itself.
*
* Marked as transient so that it does not get serialized when the stream is serialized.
* See JIRA issue CLK-14 for details.
*/
private transient HttpSession session;
/**
* Adds a new request to the stream of clicks. The HttpServletRequest is converted
* to a ClickstreamRequest object and added to the clickstream.
*
* @param request The serlvet request to be added to the clickstream
*/
public void addRequest(HttpServletRequest request) {
lastRequest = new Date();
if (hostname == null) {
hostname = request.getRemoteHost();
session = request.getSession();
}
// if this is the first request in the click stream
if (clickstream.isEmpty()) {
// setup initial referrer
if (request.getHeader("REFERER") != null) {
initialReferrer = request.getHeader("REFERER");
}
else {
initialReferrer = "";
}
// decide whether this is a bot
bot = BotChecker.isBot(request);
}
clickstream.add(new ClickstreamRequest(request, lastRequest));
}
/**
* Gets an attribute for this clickstream.
*
* @param name
*/
public Object getAttribute(String name) {
return attributes.get(name);
}
/**
* Gets the attribute names for this clickstream.
*/
public Set<String> getAttributeNames() {
return attributes.keySet();
}
/**
* Sets an attribute for this clickstream.
*
* @param name
* @param value
*/
public void setAttribute(String name, Object value) {
attributes.put(name, value);
}
/**
* Returns the host name that this clickstream relates to.
*
* @return the host name that the user clicked through
*/
public String getHostname() {
return hostname;
}
/**
* Returns the bot status.
*
* @return true if the client is bot or spider
*/
public boolean isBot() {
return bot;
}
/**
* Returns the HttpSession associated with this clickstream.
*
* @return the HttpSession associated with this clickstream
*/
public HttpSession getSession() {
return session;
}
/**
* The URL of the initial referer. This is useful for determining
* how the user entered the site.
*
* @return the URL of the initial referer
*/
public String getInitialReferrer() {
return initialReferrer;
}
/**
* Returns the Date when the clickstream began.
*
* @return the Date when the clickstream began
*/
public Date getStart() {
return start;
}
/**
* Returns the last Date that the clickstream was modified.
*
* @return the last Date that the clickstream was modified
*/
public Date getLastRequest() {
return lastRequest;
}
/**
* Returns the actual List of ClickstreamRequest objects.
*
* @return the actual List of ClickstreamRequest objects
*/
public List<ClickstreamRequest> getStream() {
return clickstream;
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
public class Clickstream implements Serializable {
private static final long serialVersionUID = 1;
/** The stream itself: a list of click events. */
private List<ClickstreamRequest> clickstream = new CopyOnWriteArrayList<ClickstreamRequest>();//使用List按顺序保持每个session中的所有请求
/** The attributes. */
private Map<String, Object> attributes = new HashMap<String, Object>();
/** The host name. */
private String hostname;
/** The original referer URL, if any. */
private String initialReferrer;
/**
The stream start time. */
private Date start = new Date();
/**应该直接用System.currentTimeMillis()获得一个long时间戳呢，性能更高，也容易存储在一个bigint的Mysql字段里面*/
/** The time of the last request made on this stream. */
private Date lastRequest = new Date();
/** Flag indicating this is a bot surfing the site. */
private boolean bot = false;
/**
* The session itself.
*
* Marked as transient so that it does not get serialized when the stream is serialized.
* See JIRA issue CLK-14 for details.
*/
private transient HttpSession session;
/**
* Adds a new request to the stream of clicks. The HttpServletRequest is converted
* to a ClickstreamRequest object and added to the clickstream.
*
* @param request The serlvet request to be added to the clickstream
*/
public void addRequest(HttpServletRequest request) {
lastRequest = new Date();
if (hostname == null) {
hostname = request.getRemoteHost();
session = request.getSession();
}
// if this is the first request in the click stream
if (clickstream.isEmpty()) {
// setup initial referrer
if (request.getHeader("REFERER") != null) {
initialReferrer = request.getHeader("REFERER");
}
else {
initialReferrer = "";
}
// decide whether this is a bot
bot = BotChecker.isBot(request);
}
clickstream.add(new ClickstreamRequest(request, lastRequest));
}
/**
* Gets an attribute for this clickstream.
*
* @param name
*/
public Object getAttribute(String name) {
return attributes.get(name);
}
/**
* Gets the attribute names for this clickstream.
*/
public Set<String> getAttributeNames() {
return attributes.keySet();
}
/**
* Sets an attribute for this clickstream.
*
* @param name
* @param value
*/
public void setAttribute(String name, Object value) {
attributes.put(name, value);
}
/**
* Returns the host name that this clickstream relates to.
*
* @return the host name that the user clicked through
*/
public String getHostname() {
return hostname;
}
/**
* Returns the bot status.
*
* @return true if the client is bot or spider
*/
public boolean isBot() {
return bot;
}
/**
* Returns the HttpSession associated with this clickstream.
*
* @return the HttpSession associated with this clickstream
*/
public HttpSession getSession() {
return session;
}
/**
* The URL of the initial referer. This is useful for determining
* how the user entered the site.
*
* @return the URL of the initial referer
*/
public String getInitialReferrer() {
return initialReferrer;
}
/**
* Returns the Date when the clickstream began.
*
* @return the Date when the clickstream began
*/
public Date getStart() {
return start;
}
/**
* Returns the last Date that the clickstream was modified.
*
* @return the last Date that the clickstream was modified
*/
public Date getLastRequest() {
return lastRequest;
}
/**
* Returns the actual List of ClickstreamRequest objects.
*
* @return the actual List of ClickstreamRequest objects
*/
public List<ClickstreamRequest> getStream() {
return clickstream;
}

ClickstreamRequest是对HttpServletRequest的简化封装，定义如下：

复制代码

public class ClickstreamRequest implements Serializable {
private static final long serialVersionUID = 1;
private final String protocol;
private final String serverName;
private final int serverPort;
private final String requestURI;
private final String queryString;
private final String remoteUser;
private final long timestamp;
public ClickstreamRequest(HttpServletRequest request, Date timestamp) {
protocol = request.getProtocol();
serverName = request.getServerName();
serverPort = request.getServerPort();
requestURI = request.getRequestURI();
queryString = request.getQueryString();
remoteUser = request.getRemoteUser();
this.timestamp = timestamp.getTime();
}
public String getProtocol() {
return protocol;
}
public String getServerName() {
return serverName;
}
public int getServerPort() {
return serverPort;
}
public String getRequestURI() {
return requestURI;
}
public String getQueryString() {
return queryString;
}
public String getRemoteUser() {
return remoteUser;
}
public Date getTimestamp() {
return new Date(timestamp);
}
/**
* Returns a string representation of the HTTP request being tracked.
* Example: <b>www.opensymphony.com/some/path.jsp?arg1=foo&arg2=bar</b>
*
* @return a string representation of the HTTP request being tracked.
*/
@Override
public String toString() {
return serverName + (serverPort != 80 ? ":" + serverPort : "") + requestURI
+ (queryString != null ? "?" + queryString : "");
}
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
public class ClickstreamRequest implements Serializable {
private static final long serialVersionUID = 1;
private final String protocol;
private final String serverName;
private final int serverPort;
private final String requestURI;
private final String queryString;
private final String remoteUser;
private final long timestamp;
public ClickstreamRequest(HttpServletRequest request, Date timestamp) {
protocol = request.getProtocol();
serverName = request.getServerName();
serverPort = request.getServerPort();
requestURI = request.getRequestURI();
queryString = request.getQueryString();
remoteUser = request.getRemoteUser();
this.timestamp = timestamp.getTime();
}
public String getProtocol() {
return protocol;
}
public String getServerName() {
return serverName;
}
public int getServerPort() {
return serverPort;
}
public String getRequestURI() {
return requestURI;
}
public String getQueryString() {
return queryString;
}
public String getRemoteUser() {
return remoteUser;
}
public Date getTimestamp() {
return new Date(timestamp);
}
/**
* Returns a string representation of the HTTP request being tracked.
* Example: <b>www.opensymphony.com/some/path.jsp?arg1=foo&arg2=bar</b>
*
* @return a string representation of the HTTP request being tracked.
*/
@Override
public String toString() {
return serverName + (serverPort != 80 ? ":" + serverPort : "") + requestURI
+ (queryString != null ? "?" + queryString : "");
}
}

所以，当每次有请求时，使用Filter对request进行过滤，把request添加到Clickstream中

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException {
// Ensure that filter is only applied once per request.
if (req.getAttribute(FILTER_APPLIED) == null) {
log.debug("Applying clickstream filter to request.");
req.setAttribute(FILTER_APPLIED, true);
HttpServletRequest request = (HttpServletRequest)req;
HttpSession session = request.getSession();
Clickstream stream = (Clickstream) session.getAttribute(ClickstreamListener.SESSION_ATTRIBUTE_KEY);
stream.addRequest(request);
}
else {
log.debug("Clickstream filter already applied, ignoring it.");
}
// pass the request on
chain.doFilter(req, res);
}

当session销毁的时候，把Clickstream持久化即可。

改进

Clickstram项目，使用ServletContext来存储clickstreams，意味着只能使用一个web容器，
不然无法保证ClickstreamRequest的顺序性，不利于拓展。

所以在集群情况下，比如tomcat集群，可以使用Redis来存储相关的对象。
把Clickstream拆成三部分：
Redis中的List，每个元素对应着一个序列化之后的ClickstreamRequest 字符串；
Redis中的Hash，存储ClickstreamRequest parameters, private Map

如何统计与分析

持久化使用了下面的两个表(简化一下)：

session 会话表：
id ip referer is_bot request_count start_time end_time
其中，
referer ：入口页面；
is_bot ：是否是搜索引擎；
request_count :该次会话中，请求的次数，如果为1，则表明用户访问了一个页面，就离开了，可以用于计算跳出率。

request 请求表：
id ip referer uri query is_ajax start_time last_time（持续时间） refresh_count sid
其中，
refresh_count ：请求的刷新次数；

来源分析：

通过session表中的referer来判断：如果referer是null，用户是通过浏览器输入url来访问的；如果是搜索引擎的URL，说明是通过搜索访问的，可以获得关键词；其它的话，
是通过网站链接访问的；
根据时间，获得周、月、季度等相关统计；

访客分析：

访客总表

根据session表，来统计日、周、月的统计；

访客日志

统计各个IP的浏览细节，结合session和request即可获得；

忠诚度分析

根据ip在session表里面是否大于2次，判断用户是否是忠诚用户；
IP出现次数越多，忠诚度越高；

地域分布

根据IP即可。

页面分析

从访问页面、出口页面、入口页面等分析，使用request表即可。注意，

List类型的clickstream 中，两个相邻的ClickstreamRequest元素A，B，
B的发生时间减去A的发生时间，可以当做A页面的停留时间；
如果A和B是同一个页面，说明用户在刷新页面A，用户计算request_count ；

想要获知在线人数，查看session_ids中的元素个数即可；

其它功能省略……

如果想要获得用户浏览器、操作系统等，可以从User-Agent中获得
由于没使用js，所以无法获知浏览器分辨率等情况。

javascript方法v.s. Servlet方法

js方法更容易获得用户在浏览器端的行为；js代码片段只需要加在前端的公共页面上，耦合性低;
js往往会将用户的行为缓存在客户端，然后再把日志提交到服务器端，如果用户关闭了浏览器，部分日志会丢失；
servlet只能在服务器端分析request，无法获得鼠标路径、分辨率等信息；servlet需要和要
监控的项目配置在一起；耗费内存；能够及时获得在线人数；