Using nested for loop Write a bash script to validate charectors and data type in any two field records are matching or not

huangapple go评论91阅读模式
英文:

Using nested for loop Write a bash script to validate charectors and data type in any two field records are matching or not

问题

以下是您要求的代码部分的翻译:

  1. # 获取当前日期
  2. now=`date +"%d_%m_%y"`
  3. # 构建文件名
  4. file=transaction${now}.dat.gz
  5. # 将头部字段逐行存入header_fieldc变量
  6. header_fieldc=`zcat $file | head -1 | tr "|" "\n"`
  7. # 将header_fieldc拆分为数组a
  8. a=( $header_fieldc )
  9. # 使用循环遍历每个字段
  10. for (( i=0; i<=${#a[@]}; i++ ));do
  11. echo "${a[i]}"
  12. # 对于第一个字段特殊处理
  13. if [ $i == 0 ];then
  14. i=`expr $i + 1 `
  15. # 从文件中提取第i个字段的记录
  16. rec_fieldc=`zcat $file | sed '1d;$d' | cut -d\| -f $i `
  17. fi
  18. # 将rec_fieldc字段拆分为数组b
  19. b=( $rec_fieldc )
  20. # 使用循环遍历记录中的每个元素
  21. for (( j=0; j<=${#b[@]}; j++ ));do
  22. echo "${b[j]}"
  23. var=`echo "${b[j]}" `
  24. # 检查元素是否为整数
  25. if [[ "${var}" =~ ^[0-9]+$ ]];then
  26. echo " ${b[j]} 是有效的记录"
  27. else
  28. echo "${a[j]} 字段中存在无效字符" > exception.txt
  29. exit 0
  30. fi
  31. done
  32. done

请注意,我已经纠正了一些代码中的拼写和格式错误。如果需要进一步的帮助或解释,请随时提出。

英文:

Sample data:

  1. Header | &lt;Transaction_ID&gt; | &lt;Item_ Name&gt; |&lt;Item_Type&gt; | &lt;Customer_ID&gt; | &lt;Type_of_Transaction&gt; | &lt;Payment_Method&gt;|Amount
  2. Data |1001 |Samsung |Handset |R2R003 |Online |Credit Card |100|
  3. Data | 1004|LG |TV | R2R042| Online | Debit card|150.24|
  4. Trailer | 2

Here number of fields in header is 7.We need to check whether the charectors in any two field records are matching or not and also we need to check whether the data type of fields matching with it's record.

Requirement:

Need to use nested for loop to perform validation of any two or three field records.

I tried this code below but it works fine for one field records.

  1. !# /bin/bash
  2. now=`date +&quot;%d_%m_%y&quot; `
  3. file=transaction${now}.dat.gz
  4. #header_fieldc is a parameter which has each header fields in new line
  5. header_fieldc=`zcat $file | head -1|tr &quot;|&quot; &quot;\n&quot; `
  6. a=( $header_fieldc)
  7. for (( i=0; i&lt;=${#a[@]}; i++ ));do
  8. echo &quot;${a[i]}&quot;
  9. if [ $i == 0 ];then
  10. i=`expr $i + 1 `
  11. rec_fieldc=`zcat $file |sed&#39;1d;$d&#39;
  12. |cut -d\| -f $i `
  13. fi
  14. #rec_fieldc parameter contains records of ith header field .
  15. b=( $rec_fieldc )
  16. for (( j=0; j&lt;=${#b[@]}; j++ ));do
  17. echo &quot;${b[j]}&quot;
  18. var=`echo &quot;${b[j]}&quot; `
  19. if [[ &quot;${var}&quot; =~ ^[0-9]+$ ]];then
  20. echo &quot; ${b[j]} valid&quot;
  21. else
  22. echo &quot;invalid character precent in ${a[j]} field&quot; &gt;exception.txt
  23. exit 0
  24. fi
  25. done
  26. done

Output:

  1. &lt;TransactionID&gt;
  2. 1001 is a valid record
  3. 1004 is a valid record

答案1

得分: 0

  1. **注意:不要忘记,if语句使用 `==` 来比较,而不是 `=`,否则我认为您的代码可能会在此方面工作,如果您更正了这个错误。**
  2. 我复制了设置,如下所示。我添加了一些具有更多/更少字段的行以进行演示。sample_data.txt的内容:
  3. ```plaintext
  4. Header | &lt;Transaction ID&gt; | &lt;Item Name&gt; |&lt;Item Type&gt; | &lt;Customer ID&gt; | &lt;Type of Transaction&gt; | &lt;Payment Method&gt;| Amount
  5. Data |1001 |Samsung |Handset |R2R003 |Online |Credit Card |100|
  6. Data |1001 |Samsung |Handset |R2R003 |Online |extra |Credit Card |100|
  7. Data |1001 |Samsung |Online |Credit Card |100|
  8. Data | 1004|LG |TV | R2R042| Online | Debit card|150.24|
  9. Data |1001 |Samsung |Handset |R2R003 |Online |extra |Credit Card |100|
  10. Trailer | 2

以下是test.sh脚本:

  1. #!/bin/bash
  2. header_field_count=$(cat sample_data.txt | awk -F &#39;|&#39; &#39;{print NF}&#39; |head -1)
  3. echo &#39;header_field_count:&#39; $header_field_count
  4. number_of_lines=$(wc -l &lt; sample_data.txt)
  5. echo &#39;number of lines to process in file:&#39; $number_of_lines
  6. let current_line=2 #跳过第一行,因为那是标题
  7. data_field_count_array=($(cat sample_data.txt | awk -F &#39;|&#39; &#39;{print NF -1}&#39;)) # 注意NF -1,因为这些行末尾有一个额外的分隔符
  8. while [ $current_line -lt $number_of_lines ]; do
  9. echo &#39;line:&#39; $current_line &#39;has&#39; ${data_field_count_array[$current_line-1]} &#39;fields&#39; #bash数组从零开始索引,因此要减1(第1行的索引为0)
  10. if [[ ${data_field_count_array[$current_line-1]} == $header_field_count ]]; then
  11. echo &#39;EQUAL to the Header&#39;
  12. else
  13. echo &#39;NOT EQUAL to the Header&#39;
  14. fi
  15. let current_line+=1
  16. done
  17. # 这将打印样本数据中的任何重复行
  18. echo -n &#39;duplicate lines: &#39;
  19. echo $(sort sample_data.txt | uniq -d)

运行时的输出如下:

  1. header_field_count: 8
  2. number of lines to process in file: 6
  3. line: 2 has 8 fields
  4. EQUAL to the Header
  5. line: 3 has 9 fields
  6. NOT EQUAL to the Header
  7. line: 4 has 6 fields
  8. NOT EQUAL to the Header
  9. line: 5 has 8 fields
  10. EQUAL to the Header
  11. duplicate lines: Data |1001 |Samsung |Handset |R2R003 |Online |extra |Credit Card |100|

它将适用于任何数量的数据行。

您可以进行更多的研究并添加更多的验证示例来检查字段是否为有效数字。请参阅:https://stackoverflow.com/questions/806906/how-do-i-test-if-a-variable-is-a-number-in-bash

字符类型的示例验证:
您可以通过调用以下脚本来测试此脚本:

./script.sh Test this string for me 将通过字符串检查

./script.sh -187.8 将通过数字检查

  1. #!/bin/bash
  2. #正则表达式示例
  3. echo &#39;testing string:&quot;&#39;$@&#39;&quot;&#39;
  4. re=&#39;^[0-9 -.]&#39; #数字、数字和负数或小数点允许
  5. if ! [[ $@ =~ $re ]]; then #将输入与允许的字符正则表达式进行比较
  6. echo &#39;not a decimal or negative number&#39;
  7. fi
  8. re=&#39;^[a-z ,]&#39; #允许小写字母和空格逗号
  9. if ! [[ $@ =~ $re ]]; then
  10. echo &#39;not a lowercase string with SPACE or COMA&#39;
  11. fi
  12. re=&#39;^[a-zA-Z ,.;]&#39; #允许大小写字母和空格、逗号、点、分号
  13. if ! [[ $@ =~ $re ]]; then
  14. echo &#39;not a text string&#39;
  15. fi

如果要检查电子邮件等内容,可以将带有@等字符的内容添加到括号中。

这是最后的编辑(问题太详细了)。

  1. #!/bin/bash
  2. header_field_count=$(cat sample_data.txt | awk -F &#39;|&#39; &#39;{print NF}&#39; |head -1)
  3. echo &#39;header_field_count:&#39; $header_field_count
  4. number_of_lines=$(wc -l &lt; sample_data.txt)
  5. echo &#39;number of lines to process in file:&#39; $number_of_lines
  6. let current_line=2 #跳过第一行,因为那是标题
  7. data_field_count_array=($(cat sample_data.txt | awk -F &#39;|&#39; &#39;{print NF -1}&#39;)) # 注意NF -1,因为这些行末尾有一个额外的分隔符
  8. while [ $current_line -lt $number_of_lines ]; do
  9. echo &#39;line:&#39; $current_line &#39;has&#39; ${data_field_count_array[$current_line-1]} &#39;fields&#39; #bash数组从零开始索引,因此要减1(第1行的索引为0)
  10. if [[ ${data_field_count_array[$current_line-1]} == $header_field_count ]]; then
  11. echo &#39;EQUAL to the Header&#39;
  12. else
  13. echo &#39;NOT EQUAL to the Header&#39;
  14. fi
  15. let current_line+=1
  16. done
  17. fieldstocheck=(&quot;Amount&quot; &quot;&lt;Item Name&gt;&quot;) #要检查的字段名称;数组可以扩展
  18. fieldtypecheck=(&quot;num&quot; &quot;string&quot;) #我们指定检查要求(例如,Customer ID需要
  19. <details>
  20. <summary>英文:</summary>
  21. **NOTE: do not forget that if statements use `==` to compare and not `=` otherwise I think your code might work if you correct this.**
  22. I replicated the setup like so. I added a few lines that have more/fewer fields for the demo. Contents of sample_data.txt:

Header | <Transaction ID> | <Item Name> |<Item Type> | <Customer ID> | <Type of Transaction> | <Payment Method>| Amount
Data |1001 |Samsung |Handset |R2R003 |Online |Credit Card |100|
Data |1001 |Samsung |Handset |R2R003 |Online |extra |Credit Card |100|
Data |1001 |Samsung |Online |Credit Card |100|
Data | 1004|LG |TV | R2R042| Online | Debit card|150.24|
Data |1001 |Samsung |Handset |R2R003 |Online |extra |Credit Card |100|
Trailer | 2

  1. Here is the script test.sh:

#!/bin/bash
header_field_count=$(cat sample_data.txt | awk -F '|' '{print NF}' |head -1)
echo 'header_field_count:' $header_field_count
number_of_lines=$(wc -l < sample_data.txt)
echo 'number of lines to process in file:' $number_of_lines
let current_line=2 #skip 1st line because that is header
data_field_count_array=($(cat sample_data.txt | awk -F '|' '{print NF -1}')) # note NF -1 because these lines have an extra separator at the end
while [ $current_line -lt $number_of_lines ]; do
echo 'line:' $current_line 'has' ${data_field_count_array[$current_line-1]} 'fields' #bash arrays are zero indexed therefore the -1 (line 1 is index 0)
if [[ ${data_field_count_array[$current_line-1]} == $header_field_count ]]; then
echo 'EQUAL to the Header'
else
echo 'NOT EQUAL to the Header'
fi
let current_line+=1
done

this will print any duplicate lines in the sample data

echo -n 'duplicate lines: '
echo $(sort sample_data.txt | uniq -d)

  1. when running this is the output:

header_field_count: 8
number of lines to process in file: 6
line: 2 has 8 fields
EQUAL to the Header
line: 3 has 9 fields
NOT EQUAL to the Header
line: 4 has 6 fields
NOT EQUAL to the Header
line: 5 has 8 fields
EQUAL to the Header
duplicate lines: Data |1001 |Samsung |Handset |R2R003 |Online |extra |Credit Card |100|

  1. it will work with any number of data lines
  2. You can do more research and add more validation example to check if fields are valid numbers. See: https://stackoverflow.com/questions/806906/how-do-i-test-if-a-variable-is-a-number-in-bash
  3. Example validation of character types:
  4. you can play with this script by calling it:
  5. `./script.sh Test this string for me` will pass the string check
  6. `./script.sh -187.8` will pass the number check

#!/bin/bash
#regex example
echo 'testing string:"'$@'"'
re='^[0-9 -.]' #number digits and negative or decimal allowed
if ! [[ $@ =~ $re ]]; then #compare input with the allowed character regex
echo 'not a decimal or negative number'
fi
re='^[a-z ,]' #lower case letters and SPACE COMA allowed
if ! [[ $@ =~ $re ]]; then
echo 'not a lowercase string with SPACE or COMA'
fi
re='^[a-zA-Z ,.;]' #lower and uppercase letters and SPACE COMA DOT SEMICOLON allowed
if ! [[ $@ =~ $re ]]; then
echo 'not a text string'
fi

  1. You can add whatever characters like `@` into the brackets if you want to check for e-mails or something.
  2. This is the last edit (the question is too far into details)

#!/bin/bash
header_field_count=$(cat sample_data.txt | awk -F '|' '{print NF}' |head -1)
echo 'header_field_count:' $header_field_count
number_of_lines=$(wc -l < sample_data.txt)
echo 'number of lines to process in file:' $number_of_lines
let current_line=2 #skip 1st line because that is header
data_field_count_array=($(cat sample_data.txt | awk -F '|' '{print NF -1}')) # note NF -1 because these lines have an extra separator at the end
while [ $current_line -lt $number_of_lines ]; do
echo 'line:' $current_line 'has' ${data_field_count_array[$current_line-1]} 'fields' #bash arrays are zero indexed therefore the -1 (line 1 is index 0)
if [[ ${data_field_count_array[$current_line-1]} == $header_field_count ]]; then
echo 'EQUAL to the Header'
else
echo 'NOT EQUAL to the Header'
fi
let current_line+=1
done
fieldstocheck=("Amount" "<Item Name>") #name of fields to check; array may be expanded
fieldtypecheck=("num" "string") #we specify the check requirements (customer ID needs to be a number and so on)
#find which field index in each row corresponds to fieldstocheck
for i in $(seq 1 1 $header_field_count) ; do
field=$(cat sample_data.txt | head -1 | awk -F '|' '{print $'$i'}')
let finalindex=${#fieldstocheck[@]}-1
for j in $(seq 0 1 $finalindex); do
if [[ "$field" =~ "${fieldstocheck[j]}" ]]; then
echo $field '==' ${fieldstocheck[j]} 'at index:' $i
fieldstocheck[j]=$i
fi
done
done

#check the column entries if they are valid
let finalindex=${#fieldstocheck[@]}-1
for i in $(seq 0 1 $finalindex); do
echo $i
echo 'column' "${fieldstocheck[$i]}" 'needs to be a' "${fieldtypecheck[$i]}"
fieldlist=("$(cat sample_data.txt | awk -F '|' '{print $'${fieldstocheck[$i]}'}')")
for j in ${fieldlist[@]}; do
case "${fieldtypecheck[$i]}" in
num)
re='^[0-9 -.]' #number digits and negative or decimal allowed
if [[ "$j" =~ $re ]]; then
echo 'OK number' $j
else
echo 'ERROR not a decimal or negative number' $j
fi
;;
string)
re='^[a-zA-Z ,.;]' #lower and uppercase letters and SPACE COMA DOT SEMICOLON allowed
if [[ "$j" =~ $re ]]; then
echo 'OK string' $j
else
echo 'ERROR not a string' $j
fi
;;
*)
echo 'not valid variable type'
;;
esac
done
done

  1. </details>

huangapple
  • 本文由 发表于 2023年1月9日 19:08:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75056397.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定