我是靠谱客的博主 知性宝贝,最近开发中收集的这篇文章主要介绍Dice's coefficient,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

 

Dice's coefficient (also known as the Dice coefficient) is a similarity measure related to the Jaccard index.

For sets X and Y of keywords used in information retrieval, the coefficient may be defined as:[1]

s = /frac{2 | X /cap Y |}{| X | + | Y |}

When taken as a string similarity measure, the coefficient may be calculated for two strings, x and y using bigrams as follows:[2]

s = /frac{2 n_{t}}{n_{x} + n_{y}}

where nt is the number of character bigrams found in both strings, nx is the number of bigrams in string x and ny is the number of bigrams in string y. For example, to calculate the similarity between:

night
nacht

We would find the set of bigrams in each word:

{ ni, ig, gh, ht}
{ na, ac, ch, ht}

Each set has 4 elements, and the intersection of these two sets has only one element: ht.

Plugging this into the formula, we calculate, s = (2 * 1) / (4 + 4) = 0.25

 See also

  • Jaccard index
  • Levenshtein distance
  • Sørensen similarity index

 Notes

  1. ^ C. J. van Rijsbergen (1979)
  2. ^ Kondrak, G. et al. (2003)

References

  • C. J. van Rijsbergen (1979) Information Retrieval (London: Butterworths)
  • Kondrak, G., Marcu, D. and Knight, K. (2003) "Cognates Can Improve Statistical Translation Models" in Proceedings of HLT-NAACL 2003: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 46--48
Retrieved from " http://en.wikipedia.org/wiki/Dice%27s_coefficient"

最后

以上就是知性宝贝为你收集整理的Dice's coefficient的全部内容,希望文章能够帮你解决Dice's coefficient所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(34)

评论列表共有 0 条评论

立即
投稿
返回
顶部