将多个文本文件合并成一个电子表格,每个文件占据一列。

huangapple go评论60阅读模式
英文:

MacOS - Combine multiple text files into one spreadsheet with one file per a column?

问题

使用MacOS终端,有没有办法将一个包含多个文本文件的目录合并成一个电子表格(可以是CSV或Numbers格式):

  • 每个文件位于单独的列中
  • 每个txt文件的每一行位于单独的行中
  • 文件按照它们的文件名的首字母顺序排列在电子表格中

示例1:这是在合并之前我的文本文件的样子:

将多个文本文件合并成一个电子表格,每个文件占据一列。

示例2:这是在合并后我的文本文件应该在电子表格中的样子:

将多个文本文件合并成一个电子表格,每个文件占据一列。

(这些示例只是部分提取。实际上我有数百个文件要合并)。

<hr>

我尝试过的步骤:

  1. 我在Stack Overflow上搜索了答案,但所有关于这个任务的其他问题都使用了Python或Panda。我希望找到一个可以直接从MacOS终端完成而不需要安装Python或Panda等包的解决方案。

  2. 通过研究,我相信可以使用paste命令:

    paste -d '\t' *.txt > ^0-merged.csv

    但是,当我尝试这样做时,会出现以下错误消息:paste: Too many open files。它还会生成一个完全空白的CSV文件。

英文:

Using MacOS terminal, is there away to take a directory of text files and have them all combined into one spreadsheet (either CSV or Numbers format) so:

  • Every file is in a separate column
  • Every line of the txt file is in a separate row.
  • The files are placed in the spreadsheet in alphabetical order (using the first letter of file name of the text file).

Example 1: here is how my text files look before combining:
将多个文本文件合并成一个电子表格,每个文件占据一列。

Example 2: here is how my text files should look in a spreadsheet after combining:
将多个文本文件合并成一个电子表格,每个文件占据一列。

(These examples are a partial extract. I actually have 100s of files to combine).

<hr>

Steps I have tried:

  1. I searched Stack Overflow for an answer, but all the other questions about this task use Python or Panda. I would prefer a solution that could be done directly from MacOS terminal without needing to install packages like Python or Panda.

  2. From researching, I believe the paste command could be used:

    paste -d &#39;\t&#39; *.txt &gt; ^0-merged.csv

However, when I try this, it produces the following error message: paste: Too many open files. It also produces a CSV file that is completely blank.

答案1

得分: 1

你可以循环遍历并追加每个文件。

```touch merged.csv
for f in *.txt; do paste -d '\t' $f merged.csv > temp; cp temp merged.csv; done; rm temp

你必须先创建文件,否则粘贴操作会失败找不到文件。

https://unix.stackexchange.com/questions/205642/combining-large-amount-of-files

为文件名包含空格的情况添加一个新的想法。

touch merged.csv

# 保存并更改 IFS
OLDIFS=$IFS
IFS=$'\n'

# 将所有文件名读入数组
fileArray=($(find ./ -name "*.txt" | sort))

# 恢复 IFS
IFS=$OLDIFS

# 获取数组长度
tLen=${#fileArray[@]}

# 使用 for 循环读取所有文件名
for (( i=0; i<${tLen}; i++ ));
do
  paste -d '\t' "${fileArray[$i]}" merged.csv > temp;
  cp temp merged.csv;
done
rm temp

<details>
<summary>英文:</summary>

You could loop through appending each file.

touch merged.csv
for f in *.txt; do paste -d '\t' $f merged.csv > temp; cp temp merged.csv; done; rm temp


You have to create the file first as the paste will fail if it can&#39;t find the file.

https://unix.stackexchange.com/questions/205642/combining-large-amount-of-files



Adding a new idea for files with spaces in.

#!/bin/bash
touch merged.csv

save and change IFS

OLDIFS=$IFS
IFS=$'\n'

read all file name into an array

fileArray=($(find ./ -name "*.txt" | sort))

restore it

IFS=$OLDIFS

get length of an array

tLen=${#fileArray[@]}

use for loop read all filenames

for (( i=0; i<${tLen}; i++ ));
do
paste -d '\t' "${fileArray[$i]}" merged.csv > temp;
cp temp merged.csv;
done
rm temp


</details>



# 答案2
**得分**: 1

Ruby是MacOS的一部分。

给定:

```shell
head -n 3 *.txt
==&gt; GOOD THINGS IN LIFE.txt &lt;==
Art
Fun
Hugs

==&gt; IN THE BACKYARD.txt &lt;==
Hose
Tree
Soil

==&gt; KITCHEN CUPBOARD ESSENTIALS.txt &lt;==
Tea
Rice
Milk

==&gt; KNITTING STITCHES.txt &lt;==
Rib
Dip
Seed

你可以运行:

ruby -e &#39;
a=[]
ARGV.sort.each{|fn|
    a&lt;&lt;[fn]+File.open(fn).read.split(/\R/)
}
a.transpose.each{|sa|
    puts sa.join(&quot;,&quot;)
}
&#39; *.txt

输出:

GOOD THINGS IN LIFE.txt,IN THE BACKYARD.txt,KITCHEN CUPBOARD ESSENTIALS.txt,KNITTING STITCHES.txt
Art,Hose,Tea,Rib
Fun,Tree,Rice,Dip
Hugs,Soil,Milk,Seed

如果你想得到一个在Excel中工作更好的“正规”CSV文件,可以使用Ruby附带的CSV模块:

ruby -r csv -e &#39;
a=[]
ARGV.sort.each{|fn|
    a&lt;&lt;[fn]+File.open(fn).read.split(/\R/)
}
a=a.transpose
puts CSV.generate(**{headers:true, quote_empty:true, force_quotes:true}){|csv|
    csv&lt;&lt;a[0]
    a[1..].each{|row|
        csv&lt;&lt;row
    }
}
&#39; *.txt

输出:

"GOOD THINGS IN LIFE.txt","IN THE BACKYARD.txt","KITCHEN CUPBOARD ESSENTIALS.txt","KNITTING STITCHES.txt"
"Art","Hose","Tea","Rib"
"Fun","Tree","Rice","Dip"
"Hugs","Soil","Milk","Seed"
"Earth","Fence","Salt","Tile"
"Honor","Porch","Pesto","Linen"
"Space","Patio","Flour","Cable"
"Sport","Grass","Honey","Wicker"
"Intelligence","Wading Pool","Baking Powder","Knotted Boxes"
"Innovation","Welcome Mat","Vegetable Oil","Chinese Wave"
"Confidence","Back Stoop","Tomato Paste","Checkerboard"
"Good Deeds","Fruit Tree","Black Pepper","Herringbone"
"Creativity","Downspout","Baking Soda","Stockinette"
"Education","Birdbath","Ketchup","Garter"
"Kindness","Terrace","Surer","Waffle"
"Integrity","Planter","Sugar","Puri Ridge"
"Faith","Carport","Coffee","Netted"
"Friends","Flowerbed","Cinnamon","Elongated"
"Respect","Shovel","Cheese","Farrow Rib"
"People","Hedges","Bread","Plaited"
"Yourself","Rocks","Olive Oil","Clamshell"
"Happiness","Lawnmower","Crackers","Bamboo"
"Heart","Hot Tub","Pasta","English Rib"
"Religion","Garden","Scissors","Basket"
"Wisdom","Stoop","Garlic","Raspberry"
英文:

Ruby is part of MacOS.

Given:

head -n 3 *.txt
==&gt; GOOD THINGS IN LIFE.txt &lt;==
Art
Fun
Hugs

==&gt; IN THE BACKYARD.txt &lt;==
Hose
Tree
Soil

==&gt; KITCHEN CUPBOARD ESSENTIALS.txt &lt;==
Tea
Rice
Milk

==&gt; KNITTING STITCHES.txt &lt;==
Rib
Dip
Seed

# and the rest of your lines in each case...

You can do:

ruby -e &#39;
a=[]
ARGV.sort.each{|fn|
    a&lt;&lt;[fn]+File.open(fn).read.split(/\R/)
}
a.transpose.each{|sa|
    puts sa.join(&quot;,&quot;)
}
&#39; *.txt

Prints:

GOOD THINGS IN LIFE.txt,IN THE BACKYARD.txt,KITCHEN CUPBOARD ESSENTIALS.txt,KNITTING STITCHES.txt
Art,Hose,Tea,Rib
Fun,Tree,Rice,Dip
Hugs,Soil,Milk,Seed
Earth,Fence,Salt,Tile
Honor,Porch,Pesto,Linen
Space,Patio,Flour,Cable
Sport,Grass,Honey,Wicker
Intelligence,Wading Pool,Baking Powder,Knotted Boxes
Innovation,Welcome Mat,Vegetable Oil,Chinese Wave
Confidence,Back Stoop,Tomato Paste,Checkerboard
Good Deeds,Fruit Tree,Black Pepper,Herringbone
Creativity,Downspout,Baking Soda,Stockinette
Education,Birdbath,Ketchup,Garter
Kindness,Terrace,Surer,Waffle
Integrity,Planter,Sugar,Puri Ridge
Faith,Carport,Coffee,Netted
Friends,Flowerbed,Cinnamon,Elongated
Respect,Shovel,Cheese,Farrow Rib
People,Hedges,Bread,Plaited
Yourself,Rocks,Olive Oil,Clamshell
Happiness,Lawnmower,Crackers,Bamboo
Heart,Hot Tub,Pasta,English Rib
Religion,Garden,Scissors,Basket
Wisdom,Stoop,Garlic,Raspberry

If you want a 'proper' csv with quoted fields that works better with Excel, you can use the CSV module included with Ruby:

ruby -r csv -e &#39;
a=[]
ARGV.sort.each{|fn|
    a&lt;&lt;[fn]+File.open(fn).read.split(/\R/)
}
a=a.transpose
puts CSV.generate(**{headers:true, quote_empty:true, force_quotes:true}){|csv|
    csv&lt;&lt;a[0]
    a[1..].each{|row|
        csv&lt;&lt;row
    }
}
&#39; *.txt

Prints:

&quot;GOOD THINGS IN LIFE.txt&quot;,&quot;IN THE BACKYARD.txt&quot;,&quot;KITCHEN CUPBOARD ESSENTIALS.txt&quot;,&quot;KNITTING STITCHES.txt&quot;
&quot;Art&quot;,&quot;Hose&quot;,&quot;Tea&quot;,&quot;Rib&quot;
&quot;Fun&quot;,&quot;Tree&quot;,&quot;Rice&quot;,&quot;Dip&quot;
&quot;Hugs&quot;,&quot;Soil&quot;,&quot;Milk&quot;,&quot;Seed&quot;
&quot;Earth&quot;,&quot;Fence&quot;,&quot;Salt&quot;,&quot;Tile&quot;
&quot;Honor&quot;,&quot;Porch&quot;,&quot;Pesto&quot;,&quot;Linen&quot;
&quot;Space&quot;,&quot;Patio&quot;,&quot;Flour&quot;,&quot;Cable&quot;
&quot;Sport&quot;,&quot;Grass&quot;,&quot;Honey&quot;,&quot;Wicker&quot;
&quot;Intelligence&quot;,&quot;Wading Pool&quot;,&quot;Baking Powder&quot;,&quot;Knotted Boxes&quot;
&quot;Innovation&quot;,&quot;Welcome Mat&quot;,&quot;Vegetable Oil&quot;,&quot;Chinese Wave&quot;
&quot;Confidence&quot;,&quot;Back Stoop&quot;,&quot;Tomato Paste&quot;,&quot;Checkerboard&quot;
&quot;Good Deeds&quot;,&quot;Fruit Tree&quot;,&quot;Black Pepper&quot;,&quot;Herringbone&quot;
&quot;Creativity&quot;,&quot;Downspout&quot;,&quot;Baking Soda&quot;,&quot;Stockinette&quot;
&quot;Education&quot;,&quot;Birdbath&quot;,&quot;Ketchup&quot;,&quot;Garter&quot;
&quot;Kindness&quot;,&quot;Terrace&quot;,&quot;Surer&quot;,&quot;Waffle&quot;
&quot;Integrity&quot;,&quot;Planter&quot;,&quot;Sugar&quot;,&quot;Puri Ridge&quot;
&quot;Faith&quot;,&quot;Carport&quot;,&quot;Coffee&quot;,&quot;Netted&quot;
&quot;Friends&quot;,&quot;Flowerbed&quot;,&quot;Cinnamon&quot;,&quot;Elongated&quot;
&quot;Respect&quot;,&quot;Shovel&quot;,&quot;Cheese&quot;,&quot;Farrow Rib&quot;
&quot;People&quot;,&quot;Hedges&quot;,&quot;Bread&quot;,&quot;Plaited&quot;
&quot;Yourself&quot;,&quot;Rocks&quot;,&quot;Olive Oil&quot;,&quot;Clamshell&quot;
&quot;Happiness&quot;,&quot;Lawnmower&quot;,&quot;Crackers&quot;,&quot;Bamboo&quot;
&quot;Heart&quot;,&quot;Hot Tub&quot;,&quot;Pasta&quot;,&quot;English Rib&quot;
&quot;Religion&quot;,&quot;Garden&quot;,&quot;Scissors&quot;,&quot;Basket&quot;
&quot;Wisdom&quot;,&quot;Stoop&quot;,&quot;Garlic&quot;,&quot;Raspberry&quot;

Comments:

> It also puts the file name at the top of every column. Is there any
> way to omit the file name? Also, it seems to treat Uppercase A-Z and
> lowercase a-z as separate (e.g. so file names with A-Z will come first
> and then file names with a-z) Thanks!

If you have files of different length, you can pad the end of the shorter files so that you still have a proper matrix to transpose:

ruby -r csv -e &#39;
a=[]
ARGV.sort_by{|s| s.downcase}.each{|fn|
    a&lt;&lt;File.open(fn).read.split(/\R/)
}
max_length=a.max_by{|sa| sa.length}.length
a.each.with_index{|sa,i| 
    if sa.length&lt;max_length then a[i].concat [&quot;&quot;]*(max_length-sa.length) end }
a=a.transpose
puts CSV.generate(**{headers:true, quote_empty:true, force_quotes:true}){|csv|
    csv&lt;&lt;a[0]
    a[1..].each{|row|
        csv&lt;&lt;row
    }
}
&#39; *.txt

huangapple
  • 本文由 发表于 2023年4月17日 18:51:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76034355.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定