有没有办法使用多线程来写入同一个CSV文件的不同列?

huangapple go评论74阅读模式
英文:

Is there a way to use multithreading to write to different columns of the same csv file?

问题

我的代码使用文本文件作为输入,并使用多线程来同时执行这些行,我需要输出CSV文件,其中每个线程在文件中填充另一列。我找不到一种方法来实现这个,有办法吗?

我的代码如下:

def generate_values(min, max, amount):
    arr = [None] * amount
    for i in range(amount):
        arr[i] = random.uniform(min, max)
        if thread.is_alive():
            output_csv.write(str(arr[i]))
    return arr

input_file = open("max_min.txt", "r")
output_csv = open("uniform_values.csv", "w")
for lines in input_file:
    line = lines.split("\n")
    for fields in line:
        if "\n" in line:
            line.remove("\n")
        for i in fields:
            i = fields.split(",")
        Emin = float(i[0])
        Emax = float(i[1])
        Eamount = int(i[2])
        thread = threading.Thread(target=generate_values, args=(Emin, Emax, Eamount))
        threads.append(thread)

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

输入文件如下:

2,5,1000
1,7,1000
4,25,1000

输出应该是每个线程的数字位于不同的列中,类似于:

3.4 ; 5.6 ; 21.4
4.2 ; 5.8 ; 31.2
.
.
.
等等(分号分隔每一列)

如果不使用`多线程`,如何将插入到CSV文件的数据插入到另一列中?

<details>
<summary>英文:</summary>

My code use text file as an input and use `multithreading` to execute the lines together,
I need to output CSV file that every thread fill another column in the file.
I can&#39;t find a way to do that, is there a way?

My code:

def generate_values(min, max, amount):
arr = [None] * amount
for i in range(amount):
arr[i] = random.uniform(min, max)
if thread.is_alive():
output_csv.write(str(arr[i]))
return arr

input_file = open("max_min.txt", "r")
output_csv= open("uniform_values.csv","w")
for lines in input_file:
line = lines.split("\n")
for fields in line:
if "\n" in line:
line.remove("\n")
for i in fields:
i = fields.split(",")
Emin = float(i[0])
Emax = float(i[1])
Eamount = int(i[2])
thread=threading.Thread(target=generate_values, args=(Emin,Emax,Eamount))
threads.append(thread)

for thread in threads:
thread.start()

for thread in threads:
thread.join()


The input file is:

2,5,1000
1,7,1000
4,25,1000

The output should be the numbers of each thread in different columns.
Something like:

3.4 ; 5.6 ; 21.4
4.2 ; 5.8 ; 31.2
.
.
.
etc. (The semicolon defined each column)


If not through `multithreading`, how to make the data that inserted to the CSV file insert to another column?


</details>


# 答案1
**得分**: 1

你不能从不同的线程写入文本文件的不同行的不同部分,因为这样做本质上是随机的,因此并发写入不同列的数据是不可行的。

此外,正如我在评论中所说,我不认为多线程会加速你正在做的工作。只有当线程进行I/O操作或调用使用其他语言编写的Python扩展时,它们才会真正并行运行,否则它们会协作多任务并共享由解释器执行的时间。

尽管如此,以下是如何使用线程进行操作。为了解决同时写入文件的限制,每个线程将列数据以及列索引放入共享的 `Queue` 中。当它们都完成时,将放入 `Queue` 的数据一次写入一行。

该代码还使用了 `csv` 模块来处理文件的读取和写入,因为输入和输出文件都是以这种格式。

```python
import csv
from queue import Queue
from random import uniform
from threading import Thread

def generate_values(index, min, max, amount):
    col = tuple(uniform(min, max) for _ in range(amount))
    cols.put((index, col))

with open("max_min.txt", "r", newline='') as input_file:
    # 生成值的列。
    cols = Queue()  # 先进先出
    kinds = float, float, int  # 输入字段的数据类型。
    threads = []

    for i, fields in enumerate(csv.reader(input_file)):
        # 将字段转换为适当的类型并将它们分配给命名变量。
        e_min, e_max, e_amount = (kind(field) for kind, field in zip(kinds, fields))
#        e_amount = 5  # 用于测试的行数限制。
        thread = Thread(target=generate_values, args=(i, e_min, e_max, e_amount))
        threads.append(thread)
        thread.start()

    # 等待所有线程完成。
    for thread in threads:
        thread.join()

    # 将列数据存储在以列索引编号为键的字典中。
    results = {index: col for index, col in sorted(cols.queue)}

# 将结果写入csv文件。
with open("uniform_values.csv", "w", newline='') as output_csv:
    writer = csv.writer(output_csv, delimiter=';')
    writer.writerows(row for row in zip(*results.values()))

print('done')

希望这能帮助你!

英文:

You can't write to different parts of different lines of a text file from threads because doing so is inherently random, so writing different columns of data concurrently isn't feasible.

Plus, as I said in a comment, I don't think multithreading is going to speed up what you're doing anyway. The only time threads will actually run in parallel is when they do I/O or make calls to Python extensions written on other languages, but otherwise they cooperatively multitask and share time being executed by the interpreter.

Nevertheless, here how to do it with threads. To workaround the simultaneous writing to a file limitation, each thread instead puts the column data along with the column index into a shared Queue. When they're all finished the data that was put into the Queue is written out a row at a time.

The code also uses the csv module to handle the reading and writing of the files, because that's the format both the input and output files have.

import csv
from queue import Queue
from random import uniform
from threading import Thread


def generate_values(index, min, max, amount):
    col = tuple(uniform(min, max) for _ in range(amount))
    cols.put((index, col))


with open(&quot;max_min.txt&quot;, &quot;r&quot;, newline=&#39;&#39;) as input_file:

    # Generate the columns of values.
    cols = Queue()  # FIFO
    kinds = float, float, int  # Data types of input fields.
    threads = []

    for i, fields in enumerate(csv.reader(input_file)):
        # Convert fields to proper type and assign them to named variables.
        e_min, e_max, e_amount = (kind(field) for kind, field in zip(kinds, fields))
#        e_amount = 5  # Limit number of rows for testing.
        thread = Thread(target=generate_values, args=(i, e_min, e_max, e_amount))
        threads.append(thread)
        thread.start()

    # Wait for all the threads to finish.
    for thread in threads:
        thread.join()

    # Store column data into a dictionary keyed by column index number.
    results = {index: col for index, col in sorted(cols.queue)}

# Write the results to a csv file.
with open(&quot;uniform_values.csv&quot;, &quot;w&quot;, newline=&#39;&#39;) as output_csv:
    writer = csv.writer(output_csv, delimiter=&#39;;&#39;)
    writer.writerows(row for row in zip(*results.values()))

print(&#39;done&#39;)

huangapple
  • 本文由 发表于 2020年1月7日 02:30:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/59617170.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定