英文:
Create a condition on python dataframe to get index of rows where column matches certain value
问题
我已阅读以下讨论,正在考虑在我的代码中实施它,以获取列匹配条件的行索引。我有以下文件,我想提取'Jabatan'列为'-'且'Jumlah Lembar Saham'不为'-'的行。以下是我的代码:
input_csv_file = "./CSV/Officers_and_Shareholders.csv"
COLUMNS = ['Nama', 'Jabatan', 'Alamat', 'Klasifikasi Saham', 'Jumlah Lembar Saham', 'Total']
df = pd.read_csv(input_csv_file, skiprows=11, on_bad_lines='skip', names=COLUMNS)
df.fillna('', inplace=True)
NAME = 'Nama'
NUMBER_OF_SHARES = "Jumlah Lembar Saham"
TOTAL = "Total"
POSITION = "Jabatan"
pattern_shareholders = re.compile(r'[A-Z]+\s[]+\s{}[A-Z]+[,]')
shareholders_df = df[(~df['Nama'].str.startswith("NIK:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("NPWP:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("TTL:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("Nomor SK") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("Tanggal SK") & df[POSITION] != "-")]
shareholders_df = df[df[POSITION] == True].index.tolist()
shareholders_list = df[NAME].tolist()
shareholders_string = ' '.join(officers_list)
matches = pattern_shareholders.findall(officers_string)
print(matches)
但上面的代码返回了'Nama'列下的每个名称,例如以下内容:
['ALIF SASETYO,', 'ARIEF HERMAWAN,', 'ARLAN SEPTIA ANANDA RASAM,', 'CHAIRAL TANJUNG,', 'FUAD RIZAL,', 'R AGUS HARYOTO PURNOMO,', 'PT CTCORP INFRASTRUKTUR D INDONESIA,', 'I E S M PT INTRERPORT PATIMBAN AGUNG,', 'PT PATIMBAN MAJU BERSAMA,', 'PT TERMINAL PETIKEMAS SURABAYA,', 'YUKKI NUGRAHAWAN HANAFI,']
所以理想情况下,如果满足条件,返回的值应该只像以下这样:
```python
['PT CTCORP INFRASTRUKTUR D INDONESIA,', 'I E S M PT INTRERPORT PATIMBAN AGUNG,', 'PT PATIMBAN MAJU BERSAMA,', 'PT TERMINAL PETIKEMAS SURABAYA,']
因为在上面返回的'Nama'列下的值是唯一的,其中'Jabatan'为'-',而'Jumlah Lembar Saham'不为'-'。有没有办法做到这一点呢?
英文:
I have read the following discussion and is thinking of implementing it on my code to get the index of rows where the column matches the condition. I have the following file, and I want to extract the rows for which the column of 'Jabatan' is '-' and
'Jumlah Lembar Saham' is not '-'. Here is my code:
input_csv_file = "./CSV/Officers_and_Shareholders.csv"
COLUMNS = ['Nama', 'Jabatan', 'Alamat', 'Klasifikasi Saham', 'Jumlah Lembar Saham', 'Total']
df = pd.read_csv(input_csv_file, skiprows=11, on_bad_lines='skip', names=COLUMNS)
df.fillna('', inplace=True)
NAME = 'Nama'
NUMBER_OF_SHARES = "Jumlah Lembar Saham"
TOTAL = "Total"
POSITION = "Jabatan"
pattern_shareholders = re.compile(r'[A-Z]+\s[]+\s{}[A-Z]+[,]')
shareholders_df = df[(~df['Nama'].str.startswith("NIK:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("NPWP:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("TTL:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("Nomor SK") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("Tanggal SK") & df[POSITION] != "-")]
shareholders_df = df[df[POSITION] == True].index.tolist()
shareholders_list = df[NAME].tolist()
shareholders_string = ' '.join(officers_list)
matches = pattern_shareholders.findall(officers_string)
print(matches)
But the code on the above returns every names under the 'Nama' column, such as the following:
['ALIF SASETYO,', 'ARIEF HERMAWAN,', 'ARLAN SEPTIA ANANDA RASAM,', 'CHAIRAL TANJUNG,', 'FUAD RIZAL,', 'R AGUS HARYOTO PURNOMO,', 'PT CTCORP INFRASTRUKTUR D INDONESIA,', 'I E S M PT INTRERPORT PATIMBAN AGUNG,', 'PT PATIMBAN MAJU BERSAMA,', 'PT TERMINAL PETIKEMAS SURABAYA,', 'YUKKI NUGRAHAWAN HANAFI,']
So ideally, if the conditions are met, the returned value should only be like the following:
['PT CTCORP INFRASTRUKTUR D INDONESIA,', 'I E S M PT INTRERPORT PATIMBAN AGUNG,', 'PT PATIMBAN MAJU BERSAMA,', 'PT TERMINAL PETIKEMAS SURABAYA,']
Since the value under the column 'Nama' returned on the above are the only ones where 'Jabatan' is '-' and 'Jumlah Lembar Saham' is not '-'. Is there any method to do this?
答案1
得分: 1
看起来你需要使用 df.loc
:
print(df.loc[(df['Jabatan'] == '-') & (df['Jumlah Lembar Saham'] != '-')]['Nama'])
英文:
Sounds like you need df.loc
df.loc[(df['col1'] == value) & (df['col2'] < value)]
So in your case
print(df.loc[(df['Jabatan'] == '-') & (df['Jumlah Lembar Saham'] != '-')]['Nama'])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论