2023年7月24日 18:33:07go评论98阅读模式

英文:

How Do I extract Strings from a csv.file and write them as a list of strings

问题

以下是您要翻译的代码部分：

import pandas as pd
def process_csv(file_name):
    # 读取CSV文件
    df = pd.read_csv(file_name)
    # 假设列名为 'Column5', 'Column4' 和 'Column3'
    # 将 'Column5' 转换为数字
    df['Column5'] = pd.to_numeric(df['Column5'], errors='coerce')
    # 提取 'Column5' 大于等于 18 的行
    extracted_rows = df[df['Column5'] >= 18]
    # 创建新的字符串，通过连接 'Column4' 和 'Column3'（为了我的目的，这两列需要倒序连接）
    combined_strings = extracted_rows['Column4'] + " " + extracted_rows['Column3']
    
    print(combined_strings)
    # 将合并的字符串写入文本文件
    with open('file.txt', 'w') as f:
        for item in combined_strings:
            f.write('%s\n' % item)
process_csv('file.csv')

更新后的代码如下：

import pandas as pd
def process_csv(file_name):
    # 读取CSV文件
    df = pd.read_csv(file_name)
    # 检查列5中的字符串是否包含'-'
    # 如果包含，就在'-'处分割并取第一部分
    # 否则保留原始字符串
    df.iloc[:, 4] = df.iloc[:, 4].apply(lambda x: x.split('-')[0] if len(str(x)) > 3 and '-' in str(x) else x)
    # 将列5转换为数字，将无效解析设为NaN
    df.iloc[:, 4] = pd.to_numeric(df.iloc[:, 4], errors='coerce')
    # 用负数替换NaN（由于无效解析而产生）
    df.iloc[:, 4].fillna(-1, inplace=True)
    # 提取列5大于等于18的行
    extracted_rows = df[df.iloc[:, 4] >= 18]
    # 通过连接列4和列3创建新的字符串
    combined_strings = extracted_rows.iloc[:, 3] + " " + extracted_rows.iloc[:, 2]
    print(combined_strings)
    # 将合并的字符串写入文本文件
    with open('file.txt', 'w') as f:
        for item in combined_strings:
            f.write("%s\n" % item)
process_csv('file.csv')

英文:

I would like to extract some strings from certain columns in a csv-file if one condition in another column is met. Then I want to write the extracted strings in a list in a txt.file.

I am new to pandas, so there is probably an obvious solution for this, but my file generated with the code below turns up empty. If I print my variable "extracted rows" in line 12 I only get this: "Series([], dtype: object)" Any ideas?

import pandas as pd
def process_csv(file_name):
    # Read the CSV file
    df = pd.read_csv(file_name)
    # Assuming the columns are named as &#39;Column5&#39;, &#39;Column4&#39; and &#39;Column3&#39;
    # Convert &#39;Column5&#39; to numeric
    df[&#39;Column5&#39;] = pd.to_numeric(df[&#39;Column5&#39;], errors=&#39;coerce&#39;)
    # Extract rows where &#39;Column5&#39; is &gt;= 18
    extracted_rows = df[df[&#39;Column5&#39;] &gt;= 18]
    # Create new strings by concatenating &#39;Column4&#39; and &#39;Column3&#39; (which need to be reverse order in generated string for my purpose 
    combined_strings = extracted_rows[&#39;Column4&#39;] + &quot; &quot; + extracted_rows[&#39;Column3&#39;]
    
    print(combined_strings)
    # Write the combined strings to a txt file
    with open(&#39;file.txt&#39;, &#39;w&#39;) as f:
        for item in combined_strings:
            f.write(&#39;%s\n&#39; % item)
process_csv(&#39;file.csv&#39;)

UPDATE: Taking up a suggestion I worked with apply and tried to find a solution for cases in which rows in column five contained two numbers and '-'. But now I only get those rows out that actually contained '-'. Drives me a little crazy:

import pandas as pd
def process_csv(file_name):
    # Read the CSV file
    df = pd.read_csv(file_name)
    # Check if strings in column 5 contain &#39;-&#39;
    # If so split at &#39;-&#39; and take the first part
    # Otherwise, keep the original string
    df.iloc[:, 4] = df.iloc[:, 4].apply(lambda x: x.split(&#39;-&#39;)[0] if len(str(x)) &gt; 3 and &#39;-&#39; in str(x) else x)
    # Convert column 5 to numeric, set invalid parsing as NaN
    df.iloc[:, 4] = pd.to_numeric(df.iloc[:, 4], errors=&#39;coerce&#39;)
    # Replace NaNs (resulted from invalid parsing) with a negative number
    df.iloc[:, 4].fillna(-1, inplace=True)
    # Extract rows where column 5 is &gt;= 18
    extracted_rows = df[df.iloc[:, 4] &gt;= 18]
    # Create new strings by concatenating column 4 and column 3
    combined_strings = extracted_rows.iloc[:, 3] + &quot; &quot; + extracted_rows.iloc[:, 2]
   print(combined_strings)
   Write the combined strings to a txt file
   with open(&#39;file.txt&#39;, &#39;w&#39;) as f:
        for item in combined_strings:
            f.write(&quot;%s\n&quot; % item)
process_csv(&#39;file.csv&#39;)

答案1

得分: 0

你可以使用 apply。有关更多信息和文档，请参考：(https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html)

import pandas as pd
df = pd.DataFrame({'Col1': ['a', 'b', 'c'], 'Col2': ['a', 'b', 'e'], 'Col3': ['e', 'f', 'g']})
def do_something(row):
    # 在这个函数中，第一个输入参数是 DataFrame 的 "row"
    # 你可以有更多的输入参数，但这可能会相当复杂。
    if row['Col1'] == row['Col2']:
        return row['Col1'] + ' ' + row['Col3']
df.apply(do_something, axis=1)

以下是输出：

0     a e
1     b f
2    None
dtype: object

当然，你可以通过以下方式将输出重定向到 DataFrame 的一部分：

df.loc[:, 'output'] = df.apply(do_something, axis=1)

希望这有所帮助！

英文:

You could use apply. For more info and documentation: (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html)

import pandas as pd
df = pd.DataFrame({&#39;Col1&#39;: [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;], &#39;Col2&#39;: [&#39;a&#39;, &#39;b&#39;, &#39;e&#39;], &#39;Col3&#39;: [&#39;e&#39;, &#39;f&#39;, &#39;g&#39;]})
def do_something(row):
# In this function, the first input parameter is the &quot;row&quot;
# of the DataFrame, you could have more input parameters,
# but this could be quite complicated.
    if row[&#39;Col1&#39;] == row[&#39;Col2&#39;]:
        return row[&#39;Col1&#39;] + &#39; &#39; + row[&#39;Col3&#39;]
        
df.apply(do_something, axis=1)

The following is the output:

&gt;&gt;&gt; 
0     a e
1     b f
2    None
dtype: object

You could of course redirect the output into part of your DataFrame by doing this:

df.loc[:, &#39;output&#39;] = df.apply(do_something, axis=1)

Hope this helps!

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我提取字符串从一个csv文件并将它们写成一个字符串列表

问题

答案1

Pandas- 分组字符串数值

从Pandas DataFrame提取数据

Convert generic csv to xml in Go

Suddenly this Python program is not able to fetch data from bseindia API. Any way to debug changes in API and pass proper parametrs?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。