我是靠谱客的博主 震动香烟,最近开发中收集的这篇文章主要介绍作业完成情况——自然语言处理Week1:Week2:Week3:Week4:Week5:Week6:《Text Retrieval and Search Engines》(12.13)Week1:Week2:Week3:Week4:Week5:Week6:,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

参与人员:
1. 余艾锶、2. 程会林、3. 黄莉婷、4. 梁清源、5. 曾伟、6. 陈南浩

完成检查:博客(读书笔记)、课后习题答案、代码、回答问题

《Text Mining and Analytics》(12.13)
https://www.coursera.org/learn/text-mining

Week1:

Guiding Questions

Develop your answers to the following guiding questions while watching the video lectures throughout the week.

  1. What does a computer have to do in order to understand a natural language sentence?
  2. What is ambiguity?
  3. Why is natural language processing (NLP) difficult for computers?
  4. What is bag-of-words representation?
  5. Why is this word-based representation more robust than representations derived from syntactic and semantic analysis of text?
  6. What is a paradigmatic relation?
  7. What is a syntagmatic relation?
  8. What is the general idea for discovering paradigmatic relations from text?
  9. What is the general idea for discovering syntagmatic relations from text?
  10. Why do we want to do Term Frequency Transformation when computing similarity of context?
  11. How does BM25 Term Frequency transformation work?
  12. Why do we want to do Inverse Document Frequency (IDF) weighting when computing similarity of context?

未完成:

已完成:

黄莉婷
http://blog.csdn.net/weixin_40962955/article/details/78828721
梁清源
http://blog.csdn.net/qq_33414271/article/details/78802272
http://www.jianshu.com/u/337e85e2a284
曾伟
http://www.jianshu.com/p/9e520d5ccdaa
程会林
http://blog.csdn.net/qq_35159009/article/details/78836340
余艾锶
http://blog.csdn.net/xy773545778/article/details/78829053
陈南浩
http://blog.csdn.net/DranGoo/article/details/78850788

Week2:

Guiding Questions
Develop your answers to the following guiding questions while watching the video lectures throughout the week.

  1. What is entropy? For what kind of random variables does the entropy function reach its minimum and maximum, respectively? 1
  2. What is conditional entropy? 2
  3. What is the relation between conditional entropy H(X|Y) and entropy H(X)? Which is larger? 3
  4. How can conditional entropy be used for discovering syntagmatic relations? 4
  5. What is mutual information I(X;Y)? How is it related to entropy H(X) and conditional entropy H(X|Y)? 5
  6. What’s the minimum value of I(X;Y)? Is it symmetric? 6
  7. For what kind of X and Y, does mutual information I(X;Y) reach its minimum? For a given X, for what Y does I(X;Y) reach its maximum? 1
  8. Why is mutual information sometimes more useful for discovering syntagmatic relations than conditional entropy?
    What is a topic? 2
  9. How can we define the task of topic mining and analysis computationally? What’s the input? What’s the output? 3
  10. How can we heuristically solve the problem of topic mining and analysis by treating a term as a topic? What are the main problems of such an approach? 4
  11. What are the benefits of representing a topic by a word distribution? 5
  12. What is a statistical language model? What is a unigram language model? How can we compute the probability of a sequence of words given a unigram language model? 6
  13. What is Maximum Likelihood estimate of a unigram language model given a text article? 1
  14. What is the basic idea of Bayesian estimation? What is a prior distribution? What is a posterior distribution? How are they related with each other? What is Bayes rule? 2

未完成:陈南浩

已完成:
梁清源
http://blog.csdn.net/qq_33414271/article/details/78871154
程会林
https://www.jianshu.com/p/61614d406b0f
黄莉婷
http://blog.csdn.net/weixin_40962955/article/details/78877103
余艾锶
http://blog.csdn.net/xy773545778/article/details/78848613
曾伟
http://blog.csdn.net/qq_39759159/article/details/78882651

Week3:

Guiding Questions
Develop your answers to the following guiding questions while watching the video lectures throughout the week.

  1. What is a mixture model? In general, how do you compute the probability of observing a particular word from a mixture model? What is the general form of the expression for this probability? 3
  2. What does the maximum likelihood estimate of the component word distributions of a mixture model behave like? In what sense do they “collaborate” and/or “compete”? 4
  3. Why can we use a fixed background word distribution to force a discovered topic word distribution to reduce its probability on the common (often non-content) words? 5
  4. What is the basic idea of the EM algorithm? What does the E-step typically do? What does the M-step typically do? In which of the two steps do we typically apply the Bayes rule? Does EM converge to a global maximum? 6
  5. What is PLSA? How many parameters does a PLSA model have? How is this number affected by the size of our data set to be mined? How can we adjust the standard PLSA to incorporate a prior on a topic word distribution? 1
  6. How is LDA different from PLSA? What is shared by the two models? 2

未完成:
已完成:
程会林:公式归一化为什么不同?
https://www.jianshu.com/p/bcef1ad7a530?utm_campaign=haruki&utm_content=note&utm_medium=reader_share&utm_source=qq
曾伟
http://www.cnblogs.com/Negan-ZW/p/8179076.html
梁清源
http://blog.csdn.net/qq_33414271/article/details/78938301
黄莉婷 LDA 的原理
http://blog.csdn.net/weixin_40962955/article/details/78941383#t10
陈南浩
http://blog.csdn.net/DranGoo/article/details/78968749
余艾锶
http://blog.csdn.net/xy773545778/article/details/78898000

Week4:

Guiding Questions
Develop your answers to the following guiding questions while watching the video lectures throughout the week.

  1. What is clustering? What are some applications of clustering in text mining and analysis? 3
  2. How can we use a mixture model to do document clustering? How many parameters are there in such a model? 4
  3. How is the mixture model for document clustering related to a topic model such as PLSA? In what way are they similar? Where are they different? 5
  4. How do we determine the cluster for each document after estimating all the parameters of a mixture model? 6
  5. How does hierarchical agglomerative clustering work? How do single-link, complete-link, and average-link work for computing group similarity? Which of these three ways of computing group similarity is least sensitive to outliers in the data? 1
  6. How do we evaluate clustering results? 2
  7. What is text categorization? What are some applications of text categorization? 3
  8. What does the training data for categorization look like?
  9. How does the Naïve Bayes classifier work? 4
  10. Why do we often use logarithm in the scoring function for Naïve Bayes? 5

未完成:陈南浩
已完成:
黄莉婷
https://www.jianshu.com/p/219677177390
程会林
https://www.jianshu.com/p/02ff3ccf98a2
余艾锶
http://blog.csdn.net/xy773545778/article/details/78988705
梁清源
http://blog.csdn.net/qq_33414271/article/details/79032916
生成式分类器 VS 判别式分类器
http://blog.csdn.net/qq_33414271/article/details/79092438
曾伟
http://www.cnblogs.com/Negan-ZW/p/8243941.html

Week5:

未完成:程会林、黄莉婷、梁清源、曾伟、陈南浩

已完成:
余艾锶
http://blog.csdn.net/xy773545778/article/details/79093113

Week6:

未完成:余艾锶、程会林、黄莉婷、梁清源、曾伟、陈南浩
已完成:

《Text Retrieval and Search Engines》(12.13)

https://www.coursera.org/learn/text-retrieval

Week1:

未完成:余艾锶、程会林、黄莉婷、梁清源、曾伟、陈南浩
已完成:

Week2:

未完成:余艾锶、程会林、黄莉婷、梁清源、曾伟、陈南浩
已完成:

Week3:

未完成:余艾锶、程会林、黄莉婷、梁清源、曾伟、陈南浩
已完成:

Week4:

未完成:余艾锶、程会林、黄莉婷、梁清源、曾伟、陈南浩
已完成:

Week5:

未完成:余艾锶、程会林、黄莉婷、梁清源、曾伟、陈南浩
已完成:

Week6:

未完成:余艾锶、程会林、黄莉婷、梁清源、曾伟、陈南浩
已完成:
1

1
1
1
1
1
1

1

最后

以上就是震动香烟为你收集整理的作业完成情况——自然语言处理Week1:Week2:Week3:Week4:Week5:Week6:《Text Retrieval and Search Engines》(12.13)Week1:Week2:Week3:Week4:Week5:Week6:的全部内容,希望文章能够帮你解决作业完成情况——自然语言处理Week1:Week2:Week3:Week4:Week5:Week6:《Text Retrieval and Search Engines》(12.13)Week1:Week2:Week3:Week4:Week5:Week6:所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(61)

评论列表共有 0 条评论

立即
投稿
返回
顶部