在Python数据框中创建条件,以获取列匹配特定值的行的索引。

huangapple go评论68阅读模式
英文:

Create a condition on python dataframe to get index of rows where column matches certain value

问题

我已阅读以下讨论,正在考虑在我的代码中实施它,以获取列匹配条件的行索引。我有以下文件,我想提取'Jabatan'列为'-'且'Jumlah Lembar Saham'不为'-'的行。以下是我的代码:

input_csv_file = "./CSV/Officers_and_Shareholders.csv"
COLUMNS = ['Nama', 'Jabatan', 'Alamat', 'Klasifikasi Saham', 'Jumlah Lembar Saham', 'Total']
df = pd.read_csv(input_csv_file, skiprows=11, on_bad_lines='skip', names=COLUMNS)
df.fillna('', inplace=True)

NAME = 'Nama'
NUMBER_OF_SHARES = "Jumlah Lembar Saham"
TOTAL = "Total"
POSITION = "Jabatan"

pattern_shareholders = re.compile(r'[A-Z]+\s[]+\s{}[A-Z]+[,]')
shareholders_df = df[(~df['Nama'].str.startswith("NIK:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("NPWP:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("TTL:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("Nomor SK") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("Tanggal SK") & df[POSITION] != "-")]
shareholders_df = df[df[POSITION] == True].index.tolist()
shareholders_list = df[NAME].tolist()
shareholders_string = ' '.join(officers_list)
matches = pattern_shareholders.findall(officers_string)

print(matches)

但上面的代码返回了'Nama'列下的每个名称,例如以下内容:

['ALIF SASETYO,', 'ARIEF HERMAWAN,', 'ARLAN SEPTIA ANANDA RASAM,', 'CHAIRAL TANJUNG,', 'FUAD RIZAL,', 'R AGUS HARYOTO PURNOMO,', 'PT CTCORP INFRASTRUKTUR D INDONESIA,', 'I E S M PT INTRERPORT PATIMBAN AGUNG,', 'PT PATIMBAN MAJU BERSAMA,', 'PT TERMINAL PETIKEMAS SURABAYA,', 'YUKKI NUGRAHAWAN HANAFI,']

所以理想情况下如果满足条件返回的值应该只像以下这样

```python
['PT CTCORP INFRASTRUKTUR D INDONESIA,', 'I E S M PT INTRERPORT PATIMBAN AGUNG,', 'PT PATIMBAN MAJU BERSAMA,', 'PT TERMINAL PETIKEMAS SURABAYA,']

因为在上面返回的'Nama'列下的值是唯一的,其中'Jabatan'为'-',而'Jumlah Lembar Saham'不为'-'。有没有办法做到这一点呢?

英文:

I have read the following discussion and is thinking of implementing it on my code to get the index of rows where the column matches the condition. I have the following file, and I want to extract the rows for which the column of 'Jabatan' is '-' and
'Jumlah Lembar Saham' is not '-'. Here is my code:

input_csv_file = "./CSV/Officers_and_Shareholders.csv"
COLUMNS = ['Nama', 'Jabatan', 'Alamat', 'Klasifikasi Saham', 'Jumlah Lembar Saham', 'Total']
df = pd.read_csv(input_csv_file, skiprows=11, on_bad_lines='skip', names=COLUMNS)
df.fillna('', inplace=True)

NAME = 'Nama'
NUMBER_OF_SHARES = "Jumlah Lembar Saham"
TOTAL = "Total"
POSITION = "Jabatan"

pattern_shareholders = re.compile(r'[A-Z]+\s[]+\s{}[A-Z]+[,]')
shareholders_df = df[(~df['Nama'].str.startswith("NIK:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("NPWP:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("TTL:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("Nomor SK") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("Tanggal SK") & df[POSITION] != "-")]
shareholders_df = df[df[POSITION] == True].index.tolist()
shareholders_list = df[NAME].tolist()
shareholders_string = ' '.join(officers_list)
matches = pattern_shareholders.findall(officers_string)

print(matches)

But the code on the above returns every names under the 'Nama' column, such as the following:

['ALIF SASETYO,', 'ARIEF HERMAWAN,', 'ARLAN SEPTIA ANANDA RASAM,', 'CHAIRAL TANJUNG,', 'FUAD RIZAL,', 'R AGUS HARYOTO PURNOMO,', 'PT CTCORP INFRASTRUKTUR D INDONESIA,', 'I E S M PT INTRERPORT PATIMBAN AGUNG,', 'PT PATIMBAN MAJU BERSAMA,', 'PT TERMINAL PETIKEMAS SURABAYA,', 'YUKKI NUGRAHAWAN HANAFI,']

So ideally, if the conditions are met, the returned value should only be like the following:

['PT CTCORP INFRASTRUKTUR D INDONESIA,', 'I E S M PT INTRERPORT PATIMBAN AGUNG,', 'PT PATIMBAN MAJU BERSAMA,', 'PT TERMINAL PETIKEMAS SURABAYA,']

Since the value under the column 'Nama' returned on the above are the only ones where 'Jabatan' is '-' and 'Jumlah Lembar Saham' is not '-'. Is there any method to do this?

答案1

得分: 1

看起来你需要使用 df.loc

print(df.loc[(df['Jabatan'] == '-') & (df['Jumlah Lembar Saham'] != '-')]['Nama'])
英文:

Sounds like you need df.loc

df.loc[(df['col1'] == value) & (df['col2'] < value)]

So in your case

print(df.loc[(df['Jabatan'] == '-') & (df['Jumlah Lembar Saham'] != '-')]['Nama'])

huangapple
  • 本文由 发表于 2023年2月16日 09:41:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/75467045.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定