英文:
get value between html tags and store it to text file
问题
output_01.txt
尝试保存我
尝试存储我
output_02.txt
尝试保存我
output_03.txt
如果你可以 :-)
表001
表002
output.txt
尝试保存我
尝试存储我 如果你可以 :-)
表001
表002
英文:
Peace everyone,
am trying to store the vaules between tags to an text file...!
Is this possible using CMD ?
Examples:
mybat.bat
@echo off
findstr /i "<p>" "file.html" >output_01.txt
findstr /i "<p>.*</p>" "file.html" >output_02.txt
findstr /i "<td>.*</td>" "file.html" >output_03.txt
file.html
<body>
<p>Try to save me</p>
<p>Try to store me
If you can :-)</p><table><tr><th>Am Table</th></tr><tr><td>Table 001</td></tr></table>
<table><tr><th>Am Table</th></tr><tr><td>Table 002</td></tr></table>
</body>
output_01.txt
<p>Try to save me</p>
<p>Try to store me
output_02.txt
<p>Try to save me</p>
output_03.txt
If you can :-)</p><table><tr><th>Am Table<th/></tr><tr><td>Table 001</td></tr></table>
<table><tr><th>Am Table<th/></tr><tr><td>Table 002</td></tr></table>
Need output to contains only the value between tags!
Example:
output.txt
Try to save me
Try to store me If you can :-)
Table 001
Table 002
答案1
得分: 1
请参考这篇文章。
你可以尝试使用vbscript中的正则表达式批处理来实现类似以下的操作:
@echo off
Title 使用vbscript的正则表达式批处理删除所有HTML标签
Set "InputFile=input.html"
Set "OutPutFile=output.txt"
Call :RemoveHTML "%InputFile%" "%OutPutFile%"
Start /MAX Notepad "%OutputFile%" & Exit
::--------------------------------------------------------
:RemoveHTML <InputFile> <OutPutFile>
(
echo WScript.StdOut.WriteLine RemoveHTML(Data)
echo Function RemoveHTML(Data)
echo Dim strPattern, strReplace, strResult,oRegExp
echo Data = "%~1"
echo Data = WScript.StdIn.ReadAll
echo strPattern = "<[^>]*>"
echo strReplace = ""
echo Set oRegExp = New RegExp
echo oRegExp.Global = True
echo oRegExp.IgnoreCase = True
echo oRegExp.Pattern = strPattern
echo strResult = oRegExp.Replace(Data,strReplace)
echo RemoveHTML = strResult
echo End Function
)>"%tmp%\%~n0.vbs"
cscript //nologo "%tmp%\%~n0.vbs" < "%~1" > "%~2"
If Exist "%tmp%\%~n0.vbs" Del "%tmp%\%~n0.vbs"
Exit /B
::-------------------------------------------------------
英文:
Refer to this article
You can try something like that in batch using regex with vbscript :
<!-- language: lang-bat -->
@echo off
Title Remove All HTML Tags using Regex with vbscript
Set "InputFile=input.html"
Set "OutPutFile=output.txt"
Call :RemoveHTML "%InputFile%" "%OutPutFile%"
Start /MAX Notepad "%OutputFile%" & Exit
::--------------------------------------------------------
:RemoveHTML <InputFile> <OutPutFile>
(
echo WScript.StdOut.WriteLine RemoveHTML(Data^)
echo Function RemoveHTML(Data^)
echo Dim strPattern, strReplace, strResult,oRegExp
echo Data = "%~1"
echo Data = WScript.StdIn.ReadAll
echo strPattern = "<[^>]*>"
echo strReplace = ""
echo Set oRegExp = New RegExp
echo oRegExp.Global = True
echo oRegExp.IgnoreCase = True
echo oRegExp.Pattern = strPattern
echo strResult = oRegExp.Replace(Data,strReplace^)
echo RemoveHTML = strResult
echo End Function
)>"%tmp%\%~n0.vbs"
cscript //nologo "%tmp%\%~n0.vbs" < "%~1" > "%~2"
If Exist "%tmp%\%~n0.vbs" Del "%tmp%\%~n0.vbs"
Exit /B
::-------------------------------------------------------
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论