英文:
How to pass variable to filter condition dataframe pandas
问题
以下是翻译好的部分:
我有一些文件(10.230.30.146_480.txt、10.20.24.16_480.txt、10.55.30.2_383.txt),我需要将文件名的第一部分作为变量1,第二部分作为变量2。
我使用以下代码来实现这个目标:
for txt in na_sheets:
x = txt.replace('.txt', '')
y = x.split("_", 1)
variable1 = y[0]
variable2 = y[1]
df1 = df[(df['MSAN_IP'] == variable1) & (df['OUTER_VLAN'] == variable2)]
然后,我创建了一个循环来迭代变量variable1和variable2,并将它们传递给过滤条件,但输出是一个只包含标题的空DataFrame。
英文:
I have files (10.230.30.146_480.txt, 10.20.24.16_480.txt, 10.55.30.2_383.txt), I need to use the first part of the file name as variable1 and the second part as variable2
I used the code to do that
for txt in na_sheets:
x=txt.replace('.txt','')
y= x.split("_", 1)
variable1 = y[0]
variable2=y[1]
df1=df[(df['MSAN_IP'] == 'variable1') & (df['OUTER_VLAN'] == variable2)]
Then I made for loop to iterate the variables variable1 and variable2
and filter dataframe df and pass this variables to the filer condition
The output is Empty DataFrame has the headers only
答案1
得分: 1
Here is the translated code snippet:
如果我理解正确,您可以使用:
na_sheets = [
"10.230.30.146_480.txt",
"10.20.24.16_480.txt",
"10.55.30.2_383.txt"
]
dfs = {
fn: df.loc[(df["MSAN_IP"] == v1) & (df["OUTER_VLAN"] == int(v2))]
for fn in na_sheets for v1, v2 in [fn.rstrip(".txt").split("_")]
}
*注:这将创建一个字典,其中键是文件名。*
输出:
for k, v in dfs.items():
print(k, v, sep="\n", end="\n\n")
10.230.30.146_480.txt
MSAN_IP OUTER_VLAN
2 10.230.30.146 480
10.20.24.16_480.txt
MSAN_IP OUTER_VLAN
0 10.20.24.16 480
10.55.30.2_383.txt
MSAN_IP OUTER_VLAN
1 10.55.30.2 383
*使用的输入:*
df = pd.DataFrame({
"MSAN_IP": ["10.20.24.16", "10.55.30.2", "10.230.30.146"],
"OUTER_VLAN": [480, 383, 480],
})
I've translated the code portion as requested.
英文:
IIUC, you can use :
na_sheets = [
"10.230.30.146_480.txt",
"10.20.24.16_480.txt",
"10.55.30.2_383.txt"
]
dfs = {
fn: df.loc[(df["MSAN_IP"] == v1) & (df["OUTER_VLAN"] == int(v2))]
for fn in na_sheets for v1, v2 in [fn.rstrip(".txt").split("_")] # maxsplit=1 ?
}
NB: This will make a dictionnary of DataFrames where the keys are the filenames.
Output :
for k, v in dfs.items():
print(k, v, sep="\n", end="\n\n")
10.230.30.146_480.txt
MSAN_IP OUTER_VLAN
2 10.230.30.146 480
10.20.24.16_480.txt
MSAN_IP OUTER_VLAN
0 10.20.24.16 480
10.55.30.2_383.txt
MSAN_IP OUTER_VLAN
1 10.55.30.2 383
Input used :
df = pd.DataFrame({
"MSAN_IP": ["10.20.24.16", "10.55.30.2", "10.230.30.146"],
"OUTER_VLAN": [480, 383, 480],
})
答案2
得分: 1
你之所以没有得到结果是因为你的for循环正在覆盖值,所以你只得到最后一个值。你可以这样做:
na_sheets = (
"10.230.30.146_480.txt",
"10.20.24.16_480.txt",
"10.55.30.2_383.txt"
)
k = [txt.replace('.txt', '').split("_", 1) for txt in na_sheets]
#[['10.230.30.146', '480'], ['10.20.24.16', '480'], ['10.55.30.2', '383']]
variable1 = [x[0] for x in k]
#['10.230.30.146', '10.20.24.16', '10.55.30.2']
variable2 = [x[1] for x in k]
#['480', '480', '383']
现在你可以在一个数据框中使用它们。
df = pd.DataFrame({
"MSAN_IP": variable1,
"OUTER_VLAN": variable2,
})
英文:
You are not getting the result because your for loop is overriding values and you are getting only the last value. You can do:
na_sheets = (
"10.230.30.146_480.txt",
"10.20.24.16_480.txt",
"10.55.30.2_383.txt"
)
k=[txt.replace('.txt','').split("_", 1) for txt in na_sheets]
#[['10.230.30.146', '480'], ['10.20.24.16', '480'], ['10.55.30.2', '383']]
variable1 = [x[0] for x in k]
#['10.230.30.146', '10.20.24.16', '10.55.30.2']
variable2 = [x[1] for x in k]
#['480', '480', '383']
Now you can use them in a dataframe.
df = pd.DataFrame({
"MSAN_IP": variable1,
"OUTER_VLAN": variable2,
})
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论