将多个文本文件合并成一个电子表格,每个文件占据一列。

huangapple go评论85阅读模式
英文:

MacOS - Combine multiple text files into one spreadsheet with one file per a column?

问题

使用MacOS终端,有没有办法将一个包含多个文本文件的目录合并成一个电子表格(可以是CSV或Numbers格式):

  • 每个文件位于单独的列中
  • 每个txt文件的每一行位于单独的行中
  • 文件按照它们的文件名的首字母顺序排列在电子表格中

示例1:这是在合并之前我的文本文件的样子:

将多个文本文件合并成一个电子表格,每个文件占据一列。

示例2:这是在合并后我的文本文件应该在电子表格中的样子:

将多个文本文件合并成一个电子表格,每个文件占据一列。

(这些示例只是部分提取。实际上我有数百个文件要合并)。

<hr>

我尝试过的步骤:

  1. 我在Stack Overflow上搜索了答案,但所有关于这个任务的其他问题都使用了Python或Panda。我希望找到一个可以直接从MacOS终端完成而不需要安装Python或Panda等包的解决方案。

  2. 通过研究,我相信可以使用paste命令:

    paste -d '\t' *.txt > ^0-merged.csv

    但是,当我尝试这样做时,会出现以下错误消息:paste: Too many open files。它还会生成一个完全空白的CSV文件。

英文:

Using MacOS terminal, is there away to take a directory of text files and have them all combined into one spreadsheet (either CSV or Numbers format) so:

  • Every file is in a separate column
  • Every line of the txt file is in a separate row.
  • The files are placed in the spreadsheet in alphabetical order (using the first letter of file name of the text file).

Example 1: here is how my text files look before combining:
将多个文本文件合并成一个电子表格,每个文件占据一列。

Example 2: here is how my text files should look in a spreadsheet after combining:
将多个文本文件合并成一个电子表格,每个文件占据一列。

(These examples are a partial extract. I actually have 100s of files to combine).

<hr>

Steps I have tried:

  1. I searched Stack Overflow for an answer, but all the other questions about this task use Python or Panda. I would prefer a solution that could be done directly from MacOS terminal without needing to install packages like Python or Panda.

  2. From researching, I believe the paste command could be used:

    paste -d &#39;\t&#39; *.txt &gt; ^0-merged.csv

However, when I try this, it produces the following error message: paste: Too many open files. It also produces a CSV file that is completely blank.

答案1

得分: 1

  1. 你可以循环遍历并追加每个文件。
  2. ```touch merged.csv
  3. for f in *.txt; do paste -d '\t' $f merged.csv > temp; cp temp merged.csv; done; rm temp

你必须先创建文件,否则粘贴操作会失败找不到文件。

https://unix.stackexchange.com/questions/205642/combining-large-amount-of-files

为文件名包含空格的情况添加一个新的想法。

  1. touch merged.csv
  2. # 保存并更改 IFS
  3. OLDIFS=$IFS
  4. IFS=$'\n'
  5. # 将所有文件名读入数组
  6. fileArray=($(find ./ -name "*.txt" | sort))
  7. # 恢复 IFS
  8. IFS=$OLDIFS
  9. # 获取数组长度
  10. tLen=${#fileArray[@]}
  11. # 使用 for 循环读取所有文件名
  12. for (( i=0; i<${tLen}; i++ ));
  13. do
  14. paste -d '\t' "${fileArray[$i]}" merged.csv > temp;
  15. cp temp merged.csv;
  16. done
  17. rm temp
  1. <details>
  2. <summary>英文:</summary>
  3. You could loop through appending each file.

touch merged.csv
for f in *.txt; do paste -d '\t' $f merged.csv > temp; cp temp merged.csv; done; rm temp

  1. You have to create the file first as the paste will fail if it can&#39;t find the file.
  2. https://unix.stackexchange.com/questions/205642/combining-large-amount-of-files
  3. Adding a new idea for files with spaces in.

#!/bin/bash
touch merged.csv

save and change IFS

OLDIFS=$IFS
IFS=$'\n'

read all file name into an array

fileArray=($(find ./ -name "*.txt" | sort))

restore it

IFS=$OLDIFS

get length of an array

tLen=${#fileArray[@]}

use for loop read all filenames

for (( i=0; i<${tLen}; i++ ));
do
paste -d '\t' "${fileArray[$i]}" merged.csv > temp;
cp temp merged.csv;
done
rm temp

  1. </details>
  2. # 答案2
  3. **得分**: 1
  4. Ruby是MacOS的一部分。
  5. 给定:
  6. ```shell
  7. head -n 3 *.txt
  8. ==&gt; GOOD THINGS IN LIFE.txt &lt;==
  9. Art
  10. Fun
  11. Hugs
  12. ==&gt; IN THE BACKYARD.txt &lt;==
  13. Hose
  14. Tree
  15. Soil
  16. ==&gt; KITCHEN CUPBOARD ESSENTIALS.txt &lt;==
  17. Tea
  18. Rice
  19. Milk
  20. ==&gt; KNITTING STITCHES.txt &lt;==
  21. Rib
  22. Dip
  23. Seed

你可以运行:

  1. ruby -e &#39;
  2. a=[]
  3. ARGV.sort.each{|fn|
  4. a&lt;&lt;[fn]+File.open(fn).read.split(/\R/)
  5. }
  6. a.transpose.each{|sa|
  7. puts sa.join(&quot;,&quot;)
  8. }
  9. &#39; *.txt

输出:

  1. GOOD THINGS IN LIFE.txt,IN THE BACKYARD.txt,KITCHEN CUPBOARD ESSENTIALS.txt,KNITTING STITCHES.txt
  2. Art,Hose,Tea,Rib
  3. Fun,Tree,Rice,Dip
  4. Hugs,Soil,Milk,Seed

如果你想得到一个在Excel中工作更好的“正规”CSV文件,可以使用Ruby附带的CSV模块:

  1. ruby -r csv -e &#39;
  2. a=[]
  3. ARGV.sort.each{|fn|
  4. a&lt;&lt;[fn]+File.open(fn).read.split(/\R/)
  5. }
  6. a=a.transpose
  7. puts CSV.generate(**{headers:true, quote_empty:true, force_quotes:true}){|csv|
  8. csv&lt;&lt;a[0]
  9. a[1..].each{|row|
  10. csv&lt;&lt;row
  11. }
  12. }
  13. &#39; *.txt

输出:

  1. "GOOD THINGS IN LIFE.txt","IN THE BACKYARD.txt","KITCHEN CUPBOARD ESSENTIALS.txt","KNITTING STITCHES.txt"
  2. "Art","Hose","Tea","Rib"
  3. "Fun","Tree","Rice","Dip"
  4. "Hugs","Soil","Milk","Seed"
  5. "Earth","Fence","Salt","Tile"
  6. "Honor","Porch","Pesto","Linen"
  7. "Space","Patio","Flour","Cable"
  8. "Sport","Grass","Honey","Wicker"
  9. "Intelligence","Wading Pool","Baking Powder","Knotted Boxes"
  10. "Innovation","Welcome Mat","Vegetable Oil","Chinese Wave"
  11. "Confidence","Back Stoop","Tomato Paste","Checkerboard"
  12. "Good Deeds","Fruit Tree","Black Pepper","Herringbone"
  13. "Creativity","Downspout","Baking Soda","Stockinette"
  14. "Education","Birdbath","Ketchup","Garter"
  15. "Kindness","Terrace","Surer","Waffle"
  16. "Integrity","Planter","Sugar","Puri Ridge"
  17. "Faith","Carport","Coffee","Netted"
  18. "Friends","Flowerbed","Cinnamon","Elongated"
  19. "Respect","Shovel","Cheese","Farrow Rib"
  20. "People","Hedges","Bread","Plaited"
  21. "Yourself","Rocks","Olive Oil","Clamshell"
  22. "Happiness","Lawnmower","Crackers","Bamboo"
  23. "Heart","Hot Tub","Pasta","English Rib"
  24. "Religion","Garden","Scissors","Basket"
  25. "Wisdom","Stoop","Garlic","Raspberry"
英文:

Ruby is part of MacOS.

Given:

  1. head -n 3 *.txt
  2. ==&gt; GOOD THINGS IN LIFE.txt &lt;==
  3. Art
  4. Fun
  5. Hugs
  6. ==&gt; IN THE BACKYARD.txt &lt;==
  7. Hose
  8. Tree
  9. Soil
  10. ==&gt; KITCHEN CUPBOARD ESSENTIALS.txt &lt;==
  11. Tea
  12. Rice
  13. Milk
  14. ==&gt; KNITTING STITCHES.txt &lt;==
  15. Rib
  16. Dip
  17. Seed
  18. # and the rest of your lines in each case...

You can do:

  1. ruby -e &#39;
  2. a=[]
  3. ARGV.sort.each{|fn|
  4. a&lt;&lt;[fn]+File.open(fn).read.split(/\R/)
  5. }
  6. a.transpose.each{|sa|
  7. puts sa.join(&quot;,&quot;)
  8. }
  9. &#39; *.txt

Prints:

  1. GOOD THINGS IN LIFE.txt,IN THE BACKYARD.txt,KITCHEN CUPBOARD ESSENTIALS.txt,KNITTING STITCHES.txt
  2. Art,Hose,Tea,Rib
  3. Fun,Tree,Rice,Dip
  4. Hugs,Soil,Milk,Seed
  5. Earth,Fence,Salt,Tile
  6. Honor,Porch,Pesto,Linen
  7. Space,Patio,Flour,Cable
  8. Sport,Grass,Honey,Wicker
  9. Intelligence,Wading Pool,Baking Powder,Knotted Boxes
  10. Innovation,Welcome Mat,Vegetable Oil,Chinese Wave
  11. Confidence,Back Stoop,Tomato Paste,Checkerboard
  12. Good Deeds,Fruit Tree,Black Pepper,Herringbone
  13. Creativity,Downspout,Baking Soda,Stockinette
  14. Education,Birdbath,Ketchup,Garter
  15. Kindness,Terrace,Surer,Waffle
  16. Integrity,Planter,Sugar,Puri Ridge
  17. Faith,Carport,Coffee,Netted
  18. Friends,Flowerbed,Cinnamon,Elongated
  19. Respect,Shovel,Cheese,Farrow Rib
  20. People,Hedges,Bread,Plaited
  21. Yourself,Rocks,Olive Oil,Clamshell
  22. Happiness,Lawnmower,Crackers,Bamboo
  23. Heart,Hot Tub,Pasta,English Rib
  24. Religion,Garden,Scissors,Basket
  25. Wisdom,Stoop,Garlic,Raspberry

If you want a 'proper' csv with quoted fields that works better with Excel, you can use the CSV module included with Ruby:

  1. ruby -r csv -e &#39;
  2. a=[]
  3. ARGV.sort.each{|fn|
  4. a&lt;&lt;[fn]+File.open(fn).read.split(/\R/)
  5. }
  6. a=a.transpose
  7. puts CSV.generate(**{headers:true, quote_empty:true, force_quotes:true}){|csv|
  8. csv&lt;&lt;a[0]
  9. a[1..].each{|row|
  10. csv&lt;&lt;row
  11. }
  12. }
  13. &#39; *.txt

Prints:

  1. &quot;GOOD THINGS IN LIFE.txt&quot;,&quot;IN THE BACKYARD.txt&quot;,&quot;KITCHEN CUPBOARD ESSENTIALS.txt&quot;,&quot;KNITTING STITCHES.txt&quot;
  2. &quot;Art&quot;,&quot;Hose&quot;,&quot;Tea&quot;,&quot;Rib&quot;
  3. &quot;Fun&quot;,&quot;Tree&quot;,&quot;Rice&quot;,&quot;Dip&quot;
  4. &quot;Hugs&quot;,&quot;Soil&quot;,&quot;Milk&quot;,&quot;Seed&quot;
  5. &quot;Earth&quot;,&quot;Fence&quot;,&quot;Salt&quot;,&quot;Tile&quot;
  6. &quot;Honor&quot;,&quot;Porch&quot;,&quot;Pesto&quot;,&quot;Linen&quot;
  7. &quot;Space&quot;,&quot;Patio&quot;,&quot;Flour&quot;,&quot;Cable&quot;
  8. &quot;Sport&quot;,&quot;Grass&quot;,&quot;Honey&quot;,&quot;Wicker&quot;
  9. &quot;Intelligence&quot;,&quot;Wading Pool&quot;,&quot;Baking Powder&quot;,&quot;Knotted Boxes&quot;
  10. &quot;Innovation&quot;,&quot;Welcome Mat&quot;,&quot;Vegetable Oil&quot;,&quot;Chinese Wave&quot;
  11. &quot;Confidence&quot;,&quot;Back Stoop&quot;,&quot;Tomato Paste&quot;,&quot;Checkerboard&quot;
  12. &quot;Good Deeds&quot;,&quot;Fruit Tree&quot;,&quot;Black Pepper&quot;,&quot;Herringbone&quot;
  13. &quot;Creativity&quot;,&quot;Downspout&quot;,&quot;Baking Soda&quot;,&quot;Stockinette&quot;
  14. &quot;Education&quot;,&quot;Birdbath&quot;,&quot;Ketchup&quot;,&quot;Garter&quot;
  15. &quot;Kindness&quot;,&quot;Terrace&quot;,&quot;Surer&quot;,&quot;Waffle&quot;
  16. &quot;Integrity&quot;,&quot;Planter&quot;,&quot;Sugar&quot;,&quot;Puri Ridge&quot;
  17. &quot;Faith&quot;,&quot;Carport&quot;,&quot;Coffee&quot;,&quot;Netted&quot;
  18. &quot;Friends&quot;,&quot;Flowerbed&quot;,&quot;Cinnamon&quot;,&quot;Elongated&quot;
  19. &quot;Respect&quot;,&quot;Shovel&quot;,&quot;Cheese&quot;,&quot;Farrow Rib&quot;
  20. &quot;People&quot;,&quot;Hedges&quot;,&quot;Bread&quot;,&quot;Plaited&quot;
  21. &quot;Yourself&quot;,&quot;Rocks&quot;,&quot;Olive Oil&quot;,&quot;Clamshell&quot;
  22. &quot;Happiness&quot;,&quot;Lawnmower&quot;,&quot;Crackers&quot;,&quot;Bamboo&quot;
  23. &quot;Heart&quot;,&quot;Hot Tub&quot;,&quot;Pasta&quot;,&quot;English Rib&quot;
  24. &quot;Religion&quot;,&quot;Garden&quot;,&quot;Scissors&quot;,&quot;Basket&quot;
  25. &quot;Wisdom&quot;,&quot;Stoop&quot;,&quot;Garlic&quot;,&quot;Raspberry&quot;

Comments:

> It also puts the file name at the top of every column. Is there any
> way to omit the file name? Also, it seems to treat Uppercase A-Z and
> lowercase a-z as separate (e.g. so file names with A-Z will come first
> and then file names with a-z) Thanks!

If you have files of different length, you can pad the end of the shorter files so that you still have a proper matrix to transpose:

  1. ruby -r csv -e &#39;
  2. a=[]
  3. ARGV.sort_by{|s| s.downcase}.each{|fn|
  4. a&lt;&lt;File.open(fn).read.split(/\R/)
  5. }
  6. max_length=a.max_by{|sa| sa.length}.length
  7. a.each.with_index{|sa,i|
  8. if sa.length&lt;max_length then a[i].concat [&quot;&quot;]*(max_length-sa.length) end }
  9. a=a.transpose
  10. puts CSV.generate(**{headers:true, quote_empty:true, force_quotes:true}){|csv|
  11. csv&lt;&lt;a[0]
  12. a[1..].each{|row|
  13. csv&lt;&lt;row
  14. }
  15. }
  16. &#39; *.txt

huangapple
  • 本文由 发表于 2023年4月17日 18:51:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76034355.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定