2023年2月16日 09:41:27go评论93阅读模式

英文:

Create a condition on python dataframe to get index of rows where column matches certain value

问题

我已阅读以下讨论，正在考虑在我的代码中实施它，以获取列匹配条件的行索引。我有以下文件，我想提取'Jabatan'列为'-'且'Jumlah Lembar Saham'不为'-'的行。以下是我的代码：

input_csv_file = "./CSV/Officers_and_Shareholders.csv"
COLUMNS = ['Nama', 'Jabatan', 'Alamat', 'Klasifikasi Saham', 'Jumlah Lembar Saham', 'Total']
df = pd.read_csv(input_csv_file, skiprows=11, on_bad_lines='skip', names=COLUMNS)
df.fillna('', inplace=True)
NAME = 'Nama'
NUMBER_OF_SHARES = "Jumlah Lembar Saham"
TOTAL = "Total"
POSITION = "Jabatan"
pattern_shareholders = re.compile(r'[A-Z]+\s[]+\s{}[A-Z]+[,]')
shareholders_df = df[(~df['Nama'].str.startswith("NIK:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("NPWP:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("TTL:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("Nomor SK") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("Tanggal SK") & df[POSITION] != "-")]
shareholders_df = df[df[POSITION] == True].index.tolist()
shareholders_list = df[NAME].tolist()
shareholders_string = ' '.join(officers_list)
matches = pattern_shareholders.findall(officers_string)
print(matches)

但上面的代码返回了'Nama'列下的每个名称，例如以下内容：

['ALIF SASETYO,', 'ARIEF HERMAWAN,', 'ARLAN SEPTIA ANANDA RASAM,', 'CHAIRAL TANJUNG,', 'FUAD RIZAL,', 'R AGUS HARYOTO PURNOMO,', 'PT CTCORP INFRASTRUKTUR D INDONESIA,', 'I E S M PT INTRERPORT PATIMBAN AGUNG,', 'PT PATIMBAN MAJU BERSAMA,', 'PT TERMINAL PETIKEMAS SURABAYA,', 'YUKKI NUGRAHAWAN HANAFI,']
所以理想情况下，如果满足条件，返回的值应该只像以下这样：
```python
['PT CTCORP INFRASTRUKTUR D INDONESIA,', 'I E S M PT INTRERPORT PATIMBAN AGUNG,', 'PT PATIMBAN MAJU BERSAMA,', 'PT TERMINAL PETIKEMAS SURABAYA,']

因为在上面返回的'Nama'列下的值是唯一的，其中'Jabatan'为'-'，而'Jumlah Lembar Saham'不为'-'。有没有办法做到这一点呢？

英文:

I have read the following discussion and is thinking of implementing it on my code to get the index of rows where the column matches the condition. I have the following file, and I want to extract the rows for which the column of 'Jabatan' is '-' and
'Jumlah Lembar Saham' is not '-'. Here is my code:

input_csv_file = &quot;./CSV/Officers_and_Shareholders.csv&quot;
COLUMNS = [&#39;Nama&#39;, &#39;Jabatan&#39;, &#39;Alamat&#39;, &#39;Klasifikasi Saham&#39;, &#39;Jumlah Lembar Saham&#39;, &#39;Total&#39;]
df = pd.read_csv(input_csv_file, skiprows=11, on_bad_lines=&#39;skip&#39;, names=COLUMNS)
df.fillna(&#39;&#39;, inplace=True)
NAME = &#39;Nama&#39;
NUMBER_OF_SHARES = &quot;Jumlah Lembar Saham&quot;
TOTAL = &quot;Total&quot;
POSITION = &quot;Jabatan&quot;
pattern_shareholders = re.compile(r&#39;[A-Z]+\s[]+\s{}[A-Z]+[,]&#39;)
shareholders_df = df[(~df[&#39;Nama&#39;].str.startswith(&quot;NIK:&quot;) &amp; df[POSITION] != &quot;-&quot;)]
shareholders_df = df[(~df[&#39;Nama&#39;].str.startswith(&quot;NPWP:&quot;) &amp; df[POSITION] != &quot;-&quot;)]
shareholders_df = df[(~df[&#39;Nama&#39;].str.startswith(&quot;TTL:&quot;) &amp; df[POSITION] != &quot;-&quot;)]
shareholders_df = df[(~df[&#39;Nama&#39;].str.startswith(&quot;Nomor SK&quot;) &amp; df[POSITION] != &quot;-&quot;)]
shareholders_df = df[(~df[&#39;Nama&#39;].str.startswith(&quot;Tanggal SK&quot;) &amp; df[POSITION] != &quot;-&quot;)]
shareholders_df = df[df[POSITION] == True].index.tolist()
shareholders_list = df[NAME].tolist()
shareholders_string = &#39; &#39;.join(officers_list)
matches = pattern_shareholders.findall(officers_string)
print(matches)

But the code on the above returns every names under the 'Nama' column, such as the following:

[&#39;ALIF SASETYO,&#39;, &#39;ARIEF HERMAWAN,&#39;, &#39;ARLAN SEPTIA ANANDA RASAM,&#39;, &#39;CHAIRAL TANJUNG,&#39;, &#39;FUAD RIZAL,&#39;, &#39;R AGUS HARYOTO PURNOMO,&#39;, &#39;PT CTCORP INFRASTRUKTUR D INDONESIA,&#39;, &#39;I E S M PT INTRERPORT PATIMBAN AGUNG,&#39;, &#39;PT PATIMBAN MAJU BERSAMA,&#39;, &#39;PT TERMINAL PETIKEMAS SURABAYA,&#39;, &#39;YUKKI NUGRAHAWAN HANAFI,&#39;]

So ideally, if the conditions are met, the returned value should only be like the following:

[&#39;PT CTCORP INFRASTRUKTUR D INDONESIA,&#39;, &#39;I E S M PT INTRERPORT PATIMBAN AGUNG,&#39;, &#39;PT PATIMBAN MAJU BERSAMA,&#39;, &#39;PT TERMINAL PETIKEMAS SURABAYA,&#39;]

Since the value under the column 'Nama' returned on the above are the only ones where 'Jabatan' is '-' and 'Jumlah Lembar Saham' is not '-'. Is there any method to do this?

答案1

得分: 1

看起来你需要使用 df.loc：

print(df.loc[(df['Jabatan'] == '-') & (df['Jumlah Lembar Saham'] != '-')]['Nama'])

英文:

Sounds like you need df.loc

df.loc[(df[&#39;col1&#39;] == value) &amp; (df[&#39;col2&#39;] &lt; value)]

So in your case

print(df.loc[(df[&#39;Jabatan&#39;] == &#39;-&#39;) &amp; (df[&#39;Jumlah Lembar Saham&#39;] != &#39;-&#39;)][&#39;Nama&#39;])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Python数据框中创建条件，以获取列匹配特定值的行的索引。

问题

答案1

如何打印列表中每个重复字符串的值？

使用字典创建的Pandas DataFrame与使用列表创建的DataFrame相比。

逃逸分析

在Flask中具有附加字段的多对多关联表

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。