我是靠谱客的博主 斯文咖啡,最近开发中收集的这篇文章主要介绍mysql 相似性检索_计算从4个mysql表中检索到的所有可能的文本对的余弦相似性,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

下面是计算一组文档之间成对余弦相似度的最小示例(假设您已成功地从数据库中检索到标题和文本)。在from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.metrics.pairwise import cosine_similarity

# Assume thats the data we have (4 short documents)

data = [

'I like beer and pizza',

'I love pizza and pasta',

'I prefer wine over beer',

'Thou shalt not pass'

]

# Vectorise the data

vec = TfidfVectorizer()

X = vec.fit_transform(data) # `X` will now be a TF-IDF representation of the data, the first row of `X` corresponds to the first sentence in `data`

# Calculate the pairwise cosine similarities (depending on the amount of data that you are going to have this could take a while)

S = cosine_similarity(X)

'''

S looks as follows:

array([[ 1. , 0.4078538 , 0.19297924, 0. ],

[ 0.4078538 , 1. , 0. , 0. ],

[ 0.19297924, 0. , 1. , 0. ],

[ 0. , 0. , 0. , 1. ]])

The first row of `S` contains the cosine similarities to every other element in `X`.

For example the cosine similarity of the first sentence to the third sentence is ~0.193.

Obviously the similarity of every sentence/document to itself is 1 (hence the diagonal of the sim matrix will be all ones).

Given that all indices are consistent it is straightforward to extract the corresponding sentences to the similarities.

'''

最后

以上就是斯文咖啡为你收集整理的mysql 相似性检索_计算从4个mysql表中检索到的所有可能的文本对的余弦相似性的全部内容,希望文章能够帮你解决mysql 相似性检索_计算从4个mysql表中检索到的所有可能的文本对的余弦相似性所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(63)

评论列表共有 0 条评论

立即
投稿
返回
顶部