如何在pandas数据框中限制行数?

huangapple go评论54阅读模式
英文:

How to limit rows in pandas dataframe?

问题

我需要限制 pandas 数据框中的行数,只保留最后的 1000 行。例如,要在 pandas 数据框中保留最后的 1000 行并保存为 CSV 文件,您可以尝试以下代码:

df = df.iloc[-1000:]
df.to_csv('output.csv', index=False)

这将保留数据框中的最后 1000 行,并将其保存为名为 "output.csv" 的 CSV 文件,不包括索引列。

英文:

How to limit number of rows in pandas dataframe in python code. I needed last 1000 rows the rest need to delete.
For example 1000 rows, in pandas dataframe -> 1000 rows in csv.

I tried df.iloc[:1000]

I needed autoclean pandas dataframe and saving last 1000 rows.

答案1

得分: 2

如果您想获取前1000条记录,您可以使用:

df = df.head(1000)
英文:

If you want first 1000 records you can use:

df = df.head(1000)

答案2

得分: 0

导入时限制为前1000行的CSV:

df_limited = pd.read_csv(file, nrows=1000)

获取DataFrame的前1000行(用于导出):

df_limited = df.head(1000)

获取DataFrame的后1000行(用于导出):

df_limited = df.tail(1000)

编辑 1:
由于你在导出CSV文件时:
你可以使用 [n:m] 进行范围选择,其中 n 是你选择的起始点,m 是结束点。
它的工作原理如下:
如果数字是正数,它从列表顶部、字符串开头、DataFrame 顶部等开始计数。
如果数字是负数,它从末尾开始计数。

  • [5:] 选择从第 5 个元素到末尾的所有元素(因为没有给出结束点)
  • [3:8] 选择从第 3 个元素到第 8 个元素的所有元素
  • [5:-2] 选择从第 5 个元素到倒数第 2 个元素的所有元素(从末尾数第 2 个)
  • [-1000:] 起始点是从末尾数第 1000 个元素,结束点是最后一个元素(这是你想要的,我想)
  • [:1000] 选择前 1000 行(起始点是开头,因为没有给出数字,结束点是从前面数第 1000 个元素)

编辑 2:
经过快速检查(以及一个非常简单的基准测试),似乎 df.tail(1000) 比 df.iloc[-1000:] 快得多。

英文:

Are you trying to limit the number of rows when importing a csv, or when exporting a dataframe to a new csv file?

Importing first 1000 rows of csv:

df_limited = pd.read_csv(file, nrows=1000)

Get first 1000 rows of a dataframe (for export):

df_limited = df.head(1000)

Get last 1000 rows of a dataframe (for export):

df_limited = df.tail(1000)

Edit 1
As you are exporting a csv:
You can make a range selection with [n:m] where n is the starting point of your selection and m is the end point.
It works like this:
If the number is positive, it's counting from the top of the list, beginning of the string, top of the dataframe etc.
If the number is negative, it counts from the back.

  • [5:] selects everything from the 5th element to the end (as there is
    no end point given)
  • [3:8] selects everything from the 3rd element up to the 8th
  • [5:-2] selects everything from the 5th element up to the 2nd to last
    (the 2nd from the back)
  • [-1000:] the start point is 1000 elements from the back, the end
    point is the last element (this is what you wanted, i think)
  • [:1000] selects the first 1000 lines (start point is the beginning, as there is no number given, end point is 1000 elements from the front)

Edit 2
After a quick check (and a very simple benchmark) it looks like df.tail(1000) is significantly faster than df.iloc[-1000:]

答案3

得分: 0

使用 df.iloc[:1000] 可以获取前 1000 行。

如果你想获取最后 1000 行,你需要稍微修改这一行代码为 df_last_1000 = df.iloc[-1000:]

要将它保存为 CSV 文件,你可以使用 pandasto_csv() 方法:df_last_1000.to_csv("last_1000.csv")

英文:

With df.iloc[:1000] you get the first 1000 rows.

Since you want to get the last 1000 rows, you have to change this line a bit to df_last_1000 = df.iloc[-1000:]

To safe it as a csv file you can use pandas' to_csv() method: df_last_1000.to_csv("last_1000.csv")

huangapple
  • 本文由 发表于 2023年2月8日 17:27:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75383650.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定