统计文件中单词出现的频次

296 阅读 0 评论 196 点赞

我是靠谱客的博主虚心微笑，这篇文章主要介绍统计文件中单词出现的频次，现在分享给大家，希望可以做个参考。

一．需求：统计The Old Man and the Sea .txt文件中的单词出现的频次

分析:
首先要读取文件，考虑读取文件的方式怎样最合适，该用什么效率最快
将读取的内容如何存储，又如何统计——>相同单词个数累加
做法:
      使用BufferedReader缓冲流，它有readLine()，每次可以读一行，提高io性能。
      使用Map集合存储，Map以键值对的方式存储，并且Map集合的特点是无顺序且不重复的，利用不重复的特点，可以将相同单词出现但键不存储单词，值+1，达到统计的效果。

二实现:
工具类:将进行相同操作的步骤我们分装成方法，写在一个类中，可以避免代码的冗余度，也使代码美观，逻辑清晰，可达到多次利用的效果。
   Util类中的方法:
  1. 读文件内容

//读取文件方法
    public  String[] readFile(File filename) throws IOException {
        Reader reader = new FileReader(filename);
        //BufferedReader可以读取行,读取速度快
        BufferedReader br = new BufferedReader(reader);
        //用于将读取的每一句拼接，去除边缘截断单词
        StringBuffer sbu = new StringBuffer();
        String line = "";
        while (null != (line = br.readLine())) {
            //拼接行
            sbu.append(line);
        }
        // System.out.println(sbu);
        String s = sbu.toString();
        //正则中的非英文字符分割
        String[] str = s.split("\W");
        return str;
    }

2. 需求的核心代码_统计频次

    public static void countWord(File filename) throws IOException {
        Util util = new Util();
        String[] str = util.readFile(filename);
        Map<String,Integer>  map = new  HashMap<String,Integer>();
        //遍历数组(即整篇文章)
        for(String word : str){
            if(map.containsKey(word)){
                //包含该单词则加1
                map.put(word, map.get(word)+1);
            }else{
                //第一次出现则为1
                map.put(word, 1);
            }
        }
        util.printCount(map);
    }

3. 工具类中打印方法

//打印方法
public  void printCount(Map map) {
    Set<String> set = map.keySet();
    for (String key : set) {
        String str = key + "——>" + map.get(key);
        System.out.println(str);
    }
}

以上代码即可实现需求。

如果我们还想统计某个单词出现的频次，可以用以下方法:

Filename:我们读取那个文件 w表示：要统计那个单词

public  static void countWord(File filename,String w) throws IOException {
    Util util = new Util();
    //调用读取文件方法
    String[] str =  util.readFile(filename);
    Map<String,Integer>  map = new  HashMap<String,Integer>();
    //循环遍历读取的文件
    for(int i= 0 ; i<str.length; i++) {
        //文件中的单词和参数一一对比，在有该单词的情况下做以下操作
        if(w.equals(str[i])){
            if(map.containsKey(w)) {
                //包含该单词则加1
                map.put(str[i], map.get(str[i])+1);
            }else{
                //第一次出现则为1
                map.put(str[i], 1);
            }
        }else{
            //不包含该参数则继续下次循环
            continue;
        }
    }
    //调用打印方法
    util.printCount(map);
}

把读到的内容写到.txt文件中，可以用以下方法

Readname:源文件

Writefile:目标文件

//写文件方法
public  void writeFile(File readname, File writefile) throws IOException {
 InputStream in = new FileInputStream(readname);
 OutputStream out = new FileOutputStream(writefile);
    byte[] bytes = new byte[5];
    int len = -1;
    while ((len=in.read(bytes)) != -1) {
        //拼接行
       out.write(bytes);
    }
    out.close();
    in.close();
}

//读取文件方法
public String[] readFile(File filename) throws IOException {
    Reader reader = new FileReader(filename);
    //BufferedReader可以读取行,读取速度快
    BufferedReader br = new BufferedReader(reader);
    //用于将读取的每一句拼接，去除边缘截断单词
    StringBuffer sbu = new StringBuffer();
    String line = "";
    while (null != (line = br.readLine())) {
        //拼接行
        sbu.append(line);
    }
    // System.out.println(sbu);
    String s = sbu.toString();
    //正则中的非英文字符分割
    String[] str = s.split("\W");
    return str;
}