python可以处理特别大的数据文件吗_将大文件中的数据分块以进行多处理？

92 阅读 0 评论 61 点赞

我是靠谱客的博主魁梧金鱼，这篇文章主要介绍python可以处理特别大的数据文件吗_将大文件中的数据分块以进行多处理？，现在分享给大家，希望可以做个参考。

当fileobj很大时，list(file_obj)可能需要大量内存。我们可以通过使用itertools在需要时提取行块来减少内存需求。

特别是，我们可以使用reader = csv.reader(f)

chunks = itertools.groupby(reader, keyfunc)

将文件分割成可处理的块，以及groups = [list(chunk) for key, chunk in itertools.islice(chunks, num_chunks)]

result = pool.map(worker, groups)

使多处理池一次处理num_chunks块。

通过这样做，我们只需要足够的内存来保存一些（num_chunks）块，而不是整个文件。import multiprocessing as mp

import itertools

import time

import csv

def worker(chunk):

# `chunk` will be a list of CSV rows all with the same name column

# replace this with your real computation

# print(chunk)

return len(chunk)

def keyfunc(row):

# `row` is one row of the CSV file.

# replace this with the name column.

return row[0]

def main():

pool = mp.Pool()

largefile = 'test.dat'

num_chunks = 10

results = []

with open(largefile) as f:

reader = csv.reader(f)

chunks = itertools.groupby(reader, keyfunc)

while True:

# make a list of num_chunks chunks

groups = [list(chunk) for key, chunk in

itertools.islice(chunks, num_chunks)]

if groups:

result = pool.map(worker, groups)

results.extend(result)

else:

break

pool.close()

pool.join()

print(results)

if __name__ == '__main__':

main()

最后

以上就是魁梧金鱼最近收集整理的关于python可以处理特别大的数据文件吗_将大文件中的数据分块以进行多处理？的全部内容，更多相关python可以处理特别大内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：python可以处理特别大的数据文件吗
浏览次数：92 次浏览
发布日期：2024-05-01 07:35:01
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_7_o_26_fw_13__7_g5.html

python可以处理特别大的数据文件吗_将大文件中的数据分块以进行多处理？

最后

评论列表共有 0 条评论

发表评论取消回复

python可以处理特别大的数据文件吗_将大文件中的数据分块以进行多处理？

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

微信扫一扫：分享

发表评论取消回复