概述
section = new ArrayList
(); for(Element para : paragraphs) { section.add(new Paragraph(para)); int plc = 0; for(Paragraph p : section) plc += p.letter_count; if(plc >= (SECTION_WORD_COUNT / 0.9)){ StringBuilder sb = new StringBuilder(); for(Paragraph p : section){ p.paragraph.select("img").removeAttr("width").removeAttr("height").removeAttr("style").removeAttr("class"); sb.append(p.paragraph.outerHtml()); } sections.add(sb.toString()); section.clear(); } } if(section.size() > 0){ StringBuilder sb = new StringBuilder(); for(Paragraph p : section){ p.paragraph.select("img").removeAttr("width").removeAttr("height").removeAttr("style").removeAttr("class"); sb.append(p.paragraph.outerHtml()); } sections.add(sb.toString()); } //如果最后一段太短,则合并到倒数第二段 int last_sec_idx = sections.size()-1; int last_sec_idx2 = sections.size()-2; if(last_sec_idx2 >=0){ String lastSection = sections.get(last_sec_idx); if(lastSection.length() < SECTION_WORD_COUNT/3){ sections.set(last_sec_idx2, sections.get(last_sec_idx2) + lastSection); sections.remove(last_sec_idx); } } return sections; } private static class Paragraph { private Element paragraph; private int letter_count; public Paragraph(Element p) { this.paragraph = p; Element tmp = p.clone(); try{ tmp.select("pre").remove(); }catch(Exception e){} this.letter_count = tmp.text().length(); } }
最后
以上就是完美冬瓜为你收集整理的html代码自动分段,对 HTML 内容进行自动分段的全部内容,希望文章能够帮你解决html代码自动分段,对 HTML 内容进行自动分段所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复