data filter 去掉HTML文件中的所有标记

224 阅读 0 评论 148 点赞

我是靠谱客的博主清秀时光，这篇文章主要介绍data filter 去掉HTML文件中的所有标记，现在分享给大家，希望可以做个参考。

编写一个C++程序来读取文件，过滤掉所有的标记，将过滤掉标记后的内容输出到一个新文件中。

1. 从文件中读取一个字符

2. 确定字符是否是HTML标记的一部分

3. 打印出所有不是HTML标记的字符

/* --------------------------------------------
* This program reads a html file, and writes
* the text without the tags to a new file.
* --------------------------------------------*/
#include <iostream> // Required for cin, cout, cerr
#include <fstream>
// Required for ifstream, ofstream
#include <string>
// Required for string
#include <cstdlib>
// Required for exit
using namespace std;
int main()
{
// Declare objects
char ch;
bool text_state(true);
string infile, outfile;
ifstream html;
ofstream htmltext;
// Prompt user for name of input file
cout << "Enter the name of the input file : n( *.*, such as : demo.html ) n" ;
cout << "Make sure the file is under current project file ! n" ;
// My English is poor ~~
cin >> infile;
cout<< "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~n" ;
// Prompt user for name of output file
cout << "Enter the name of the output file :
" ;
cin >> outfile;
// Open files
html.open(infile.c_str());
if(html.fail())
{
cout<< "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~n" ;
cerr << "Error opening input file" << endl ;
exit(1);
}
htmltext.open(outfile.c_str());
// Read first character from html file
html.get(ch);
while(!html.eof())
{
// Check state
if(text_state)
{
if(ch == '<')
// Beginning of a tag
text_state = false;
// Change states
else
htmltext << ch;
// Still text, write to the file
}
else
{
// Command state, no output required
if(ch == '>')
// End of tag
text_state = true;
// Change states
}
// Read next character from html file
html.get(ch);
}
html.close();
htmltext.close();
cout<< "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~n" ;
cout << "Success transformed ! n" ;
cout << "Look for " << outfile << " in current file.n" ;
cout<< "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~n" ;
return 0;
}

之后就可以拿个HTML文件试试了，不过这个程序只是把所有标记过滤掉，还有待完善。如果非标记字符有很多无关内容，效果就差强人意。建议用典型的HTML文件测试，如：

<html>
<head>
<title>我的第一个 HTML 页面</title>
</head>
<body>
<p>body 元素的内容会显示在浏览器中。</p>
<p>title 元素的内容会显示在浏览器的标题栏中。</p>
</body>
</html>

转载于:https://www.cnblogs.com/Genesis2018/p/8304749.html

最后

以上就是清秀时光最近收集整理的关于data filter 去掉HTML文件中的所有标记的全部内容，更多相关data内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：c/c++
浏览次数：224 次浏览
发布日期：2023-10-18 02:46:19
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_23_o_26_f2_13__23__18_5.html

data filter 去掉HTML文件中的所有标记

最后

评论列表共有 0 条评论

发表评论取消回复

data filter 去掉HTML文件中的所有标记

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复