2020年1月7日 02:30:54go评论165阅读模式

英文:

Is there a way to use multithreading to write to different columns of the same csv file?

问题

我的代码使用文本文件作为输入，并使用多线程来同时执行这些行，我需要输出CSV文件，其中每个线程在文件中填充另一列。我找不到一种方法来实现这个，有办法吗？

我的代码如下：

def generate_values(min, max, amount):
    arr = [None] * amount
    for i in range(amount):
        arr[i] = random.uniform(min, max)
        if thread.is_alive():
            output_csv.write(str(arr[i]))
    return arr

input_file = open("max_min.txt", "r")
output_csv = open("uniform_values.csv", "w")
for lines in input_file:
    line = lines.split("\n")
    for fields in line:
        if "\n" in line:
            line.remove("\n")
        for i in fields:
            i = fields.split(",")
        Emin = float(i[0])
        Emax = float(i[1])
        Eamount = int(i[2])
        thread = threading.Thread(target=generate_values, args=(Emin, Emax, Eamount))
        threads.append(thread)

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

输入文件如下：

2,5,1000
1,7,1000
4,25,1000

输出应该是每个线程的数字位于不同的列中，类似于：

3.4 ; 5.6 ; 21.4
4.2 ; 5.8 ; 31.2
.
.
.
等等（分号分隔每一列）

如果不使用`多线程`，如何将插入到CSV文件的数据插入到另一列中？

<details>
<summary>英文:</summary>

My code use text file as an input and use `multithreading` to execute the lines together,
I need to output CSV file that every thread fill another column in the file.
I can&#39;t find a way to do that, is there a way?

My code:

def generate_values(min, max, amount):
arr = [None] * amount
for i in range(amount):
arr[i] = random.uniform(min, max)
if thread.is_alive():
output_csv.write(str(arr[i]))
return arr

input_file = open("max_min.txt", "r")
output_csv= open("uniform_values.csv","w")
for lines in input_file:
line = lines.split("\n")
for fields in line:
if "\n" in line:
line.remove("\n")
for i in fields:
i = fields.split(",")
Emin = float(i[0])
Emax = float(i[1])
Eamount = int(i[2])
thread=threading.Thread(target=generate_values, args=(Emin,Emax,Eamount))
threads.append(thread)

for thread in threads:
thread.start()

for thread in threads:
thread.join()


The input file is:

2,5,1000
1,7,1000
4,25,1000

The output should be the numbers of each thread in different columns.
Something like:

3.4 ; 5.6 ; 21.4
4.2 ; 5.8 ; 31.2
.
.
.
etc. (The semicolon defined each column)


If not through `multithreading`, how to make the data that inserted to the CSV file insert to another column?


</details>


# 答案1
**得分**: 1

你不能从不同的线程写入文本文件的不同行的不同部分，因为这样做本质上是随机的，因此并发写入不同列的数据是不可行的。

此外，正如我在评论中所说，我不认为多线程会加速你正在做的工作。只有当线程进行I/O操作或调用使用其他语言编写的Python扩展时，它们才会真正并行运行，否则它们会协作多任务并共享由解释器执行的时间。

尽管如此，以下是如何使用线程进行操作。为了解决同时写入文件的限制，每个线程将列数据以及列索引放入共享的 `Queue` 中。当它们都完成时，将放入 `Queue` 的数据一次写入一行。

该代码还使用了 `csv` 模块来处理文件的读取和写入，因为输入和输出文件都是以这种格式。

```python
import csv
from queue import Queue
from random import uniform
from threading import Thread

def generate_values(index, min, max, amount):
    col = tuple(uniform(min, max) for _ in range(amount))
    cols.put((index, col))

with open("max_min.txt", "r", newline='') as input_file:
    # 生成值的列。
    cols = Queue()  # 先进先出
    kinds = float, float, int  # 输入字段的数据类型。
    threads = []

    for i, fields in enumerate(csv.reader(input_file)):
        # 将字段转换为适当的类型并将它们分配给命名变量。
        e_min, e_max, e_amount = (kind(field) for kind, field in zip(kinds, fields))
#        e_amount = 5  # 用于测试的行数限制。
        thread = Thread(target=generate_values, args=(i, e_min, e_max, e_amount))
        threads.append(thread)
        thread.start()

    # 等待所有线程完成。
    for thread in threads:
        thread.join()

    # 将列数据存储在以列索引编号为键的字典中。
    results = {index: col for index, col in sorted(cols.queue)}

# 将结果写入csv文件。
with open("uniform_values.csv", "w", newline='') as output_csv:
    writer = csv.writer(output_csv, delimiter=';')
    writer.writerows(row for row in zip(*results.values()))

print('done')

希望这能帮助你！

英文:

You can't write to different parts of different lines of a text file from threads because doing so is inherently random, so writing different columns of data concurrently isn't feasible.

Plus, as I said in a comment, I don't think multithreading is going to speed up what you're doing anyway. The only time threads will actually run in parallel is when they do I/O or make calls to Python extensions written on other languages, but otherwise they cooperatively multitask and share time being executed by the interpreter.

Nevertheless, here how to do it with threads. To workaround the simultaneous writing to a file limitation, each thread instead puts the column data along with the column index into a shared Queue. When they're all finished the data that was put into the Queue is written out a row at a time.

The code also uses the csv module to handle the reading and writing of the files, because that's the format both the input and output files have.

import csv
from queue import Queue
from random import uniform
from threading import Thread


def generate_values(index, min, max, amount):
    col = tuple(uniform(min, max) for _ in range(amount))
    cols.put((index, col))


with open(&quot;max_min.txt&quot;, &quot;r&quot;, newline=&#39;&#39;) as input_file:

    # Generate the columns of values.
    cols = Queue()  # FIFO
    kinds = float, float, int  # Data types of input fields.
    threads = []

    for i, fields in enumerate(csv.reader(input_file)):
        # Convert fields to proper type and assign them to named variables.
        e_min, e_max, e_amount = (kind(field) for kind, field in zip(kinds, fields))
#        e_amount = 5  # Limit number of rows for testing.
        thread = Thread(target=generate_values, args=(i, e_min, e_max, e_amount))
        threads.append(thread)
        thread.start()

    # Wait for all the threads to finish.
    for thread in threads:
        thread.join()

    # Store column data into a dictionary keyed by column index number.
    results = {index: col for index, col in sorted(cols.queue)}

# Write the results to a csv file.
with open(&quot;uniform_values.csv&quot;, &quot;w&quot;, newline=&#39;&#39;) as output_csv:
    writer = csv.writer(output_csv, delimiter=&#39;;&#39;)
    writer.writerows(row for row in zip(*results.values()))

print(&#39;done&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

有没有办法使用多线程来写入同一个CSV文件的不同列？

问题

从Python中的列名中删除前缀

如何将定时任务分配给特定的线程？

Accessing output of RDKIT Chem.FindAllSubgraphsOfLengthN(mol,n)

为什么我的Kafka消费者在我寻求到另一个位置时会得到一个ValueError？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论