获取HTML标签之间的值并将其存储到文本文件中。

huangapple go评论59阅读模式
英文:

get value between html tags and store it to text file

问题

output_01.txt

尝试保存我
尝试存储我

output_02.txt

尝试保存我

output_03.txt

如果你可以 :-)
表001
表002

output.txt

尝试保存我
尝试存储我 如果你可以 :-)
表001
表002
英文:

Peace everyone,

am trying to store the vaules between tags to an text file...!
Is this possible using CMD ?

Examples:

mybat.bat

@echo off
findstr /i "<p>" "file.html" >output_01.txt
findstr /i "<p>.*</p>" "file.html" >output_02.txt
findstr /i "<td>.*</td>" "file.html" >output_03.txt

file.html

<body>
<p>Try to save me</p>
<p>Try to store me
    If you can :-)</p><table><tr><th>Am Table</th></tr><tr><td>Table 001</td></tr></table>
    <table><tr><th>Am Table</th></tr><tr><td>Table 002</td></tr></table>
</body>

output_01.txt

<p>Try to save me</p>
<p>Try to store me

output_02.txt

<p>Try to save me</p>

output_03.txt

    If you can :-)</p><table><tr><th>Am Table<th/></tr><tr><td>Table 001</td></tr></table>
    <table><tr><th>Am Table<th/></tr><tr><td>Table 002</td></tr></table>

Need output to contains only the value between tags!

Example:

output.txt

Try to save me
Try to store me If you can :-)
Table 001
Table 002

答案1

得分: 1

请参考这篇文章

你可以尝试使用vbscript中的正则表达式批处理来实现类似以下的操作:

@echo off
Title 使用vbscript的正则表达式批处理删除所有HTML标签
Set "InputFile=input.html"
Set "OutPutFile=output.txt"
Call :RemoveHTML "%InputFile%" "%OutPutFile%"
Start /MAX Notepad "%OutputFile%" & Exit
::--------------------------------------------------------
:RemoveHTML <InputFile> <OutPutFile>
(
    echo WScript.StdOut.WriteLine RemoveHTML(Data)
    echo Function RemoveHTML(Data)
    echo Dim strPattern, strReplace, strResult,oRegExp
    echo Data = "%~1" 
    echo Data = WScript.StdIn.ReadAll
    echo strPattern = "<[^>]*>"
    echo strReplace = ""
    echo Set oRegExp = New RegExp
    echo oRegExp.Global = True 
    echo oRegExp.IgnoreCase = True 
    echo oRegExp.Pattern = strPattern
    echo strResult = oRegExp.Replace(Data,strReplace)
    echo RemoveHTML = strResult
    echo End Function
)>"%tmp%\%~n0.vbs"
cscript //nologo "%tmp%\%~n0.vbs" < "%~1" > "%~2"
If Exist "%tmp%\%~n0.vbs" Del "%tmp%\%~n0.vbs"
Exit /B
::-------------------------------------------------------
英文:

Refer to this article

You can try something like that in batch using regex with vbscript :

<!-- language: lang-bat -->

@echo off
Title Remove All HTML Tags using Regex with vbscript
Set &quot;InputFile=input.html&quot;
Set &quot;OutPutFile=output.txt&quot;
Call :RemoveHTML &quot;%InputFile%&quot; &quot;%OutPutFile%&quot;
Start /MAX Notepad &quot;%OutputFile%&quot; &amp; Exit
::--------------------------------------------------------
:RemoveHTML &lt;InputFile&gt; &lt;OutPutFile&gt;
(
    echo WScript.StdOut.WriteLine RemoveHTML(Data^)
    echo Function RemoveHTML(Data^)
    echo Dim strPattern, strReplace, strResult,oRegExp
    echo Data = &quot;%~1&quot; 
    echo Data = WScript.StdIn.ReadAll
    echo strPattern = &quot;&lt;[^&gt;]*&gt;&quot;
    echo strReplace = &quot;&quot;
    echo Set oRegExp = New RegExp
    echo oRegExp.Global = True 
    echo oRegExp.IgnoreCase = True 
    echo oRegExp.Pattern = strPattern
    echo strResult = oRegExp.Replace(Data,strReplace^)
    echo RemoveHTML = strResult
    echo End Function
)&gt;&quot;%tmp%\%~n0.vbs&quot;
cscript //nologo &quot;%tmp%\%~n0.vbs&quot; &lt; &quot;%~1&quot; &gt; &quot;%~2&quot;
If Exist &quot;%tmp%\%~n0.vbs&quot; Del &quot;%tmp%\%~n0.vbs&quot;
Exit /B
::-------------------------------------------------------

huangapple
  • 本文由 发表于 2020年1月6日 15:05:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/59607971.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定