Android FFmpeg音频重采样

215 阅读 0 评论 142 点赞

我是靠谱客的博主个性猎豹，这篇文章主要介绍Android FFmpeg音频重采样，现在分享给大家，希望可以做个参考。

自己有个需求，要把采样率为44100的PCM重采样为16000的PCM，经过1天的探索，终于成功了！！

网上的资料比较少，找到的都是比较零碎的知识点，只能慢慢摸索，想到这个，还是觉得有团队好，个人学习总是慢一点的，比较曲折，不过也能锻炼个人的学习能力！

基本原理

进入正题，所谓音频重采样，简单来说，就是在原来的PCM进行插值（低采样率到高采样率）或抽取（高采样率到低采样），但是如果我们只是线性插值或者抽取，抽取可能产生混叠，内插可能产生镜像，因此需要在抽取前进行抗混叠滤波，在内插后进行抗镜像滤波。抗混叠滤波和抗镜像滤波都是使用低通滤波器实现。

音频基础

采样率：

自然界的声音都是模拟信号，采样就是从这些模拟信号按一定的间隔采集数据，而1秒采集多少个，称之为采样率。

采样精度：

采集的数据得给他一个值，以多少计算好呢？此时就需要一个精度！如果最大的值为255，最小的值为0，那么采样的数值就得在0-255之间，精度为1/256，精度位数为8BIT；如果最大的值为65535，最小的值为0，那么采样的数值就得在0-65535之间，精度为1/65535，精度位数为16BIT；采样精度是采样所能达到的最小的精度。采样的精度越小，精度位数越大，采集的数据值就越接近模拟时的值。

声道：

顾名思义，就是声道数，一般有单声道，双声道，3声道等等。

音频帧：

一般音频帧有1024样品，那么一帧音频的数据量大小 = 声道数 * 样品数 * 每个样本占用的字节数。

这个概念对下面理解FFmpeg进行重采样很重要！

PCM格式

由于我们是对PCM进行重采样，因此我们对PCM格式有个基本了解！

PCM数据格式有2种：分片（plane）和打包（packed）；

目前我遇到的都是packed格式，如下图：
PCM格式

一般我们会使用精度较高的16位，一个PCM由2Byte组成，低位在前，高位在后：

 short pcm1 = array[0];
 short pcm2 = origin[1];
 short pcm = (short) ((pcm1 & 0xff) | (pcm2 << 8));

重采样

直接上码：

/**
 * 重采样
 * @param env 
 * @param clazz 
 * @param sourcePath 源PCM文件
 * @param targetPath 目标PCM文件
 * @param sourceSampleRate 源采样率 
 * @param targetSampleRate 目标采样率
 * @param sourceChannels 源声道数
 * @param targetChannels 目标声道数
 * @return 
 */
static jint jniResample(JNIEnv *env, jclass clazz,
                        jstring sourcePath, jstring targetPath,
                        jint sourceSampleRate, jint targetSampleRate,
                        jint sourceChannels, jint targetChannels) {
    int result = -1;
    FILE *source;
    FILE *target;
    SwrContext *context;
    int sourceChannelLayout;
    int targetChannelLayout;
    AVSampleFormat sampleFormat;
    int sourceLineSize;
    int sourceBufferSize;
    int sourceSamples;
    uint8_t **sourceData;
    int targetLineSize;
    int targetBufferSize;
    int targetSamples;
    int targetMaxSamples;
    uint8_t **targetData;
    int read;
    const char *_sourcePath = env->GetStringUTFChars(sourcePath, 0);
    const char *_targetPath = env->GetStringUTFChars(targetPath, 0);
    // 打开文件
    source = fopen(_sourcePath, "rb");
    if (!source) {
        result = -1;
        goto R2;
    }
    target = fopen(_targetPath, "wb");
    if (!target) {
        fclose(source);
        goto R2;
    }
    // 重采样上下文
    context = swr_alloc();
    if (!context) {
        goto R1;
    }
    // 声道类型
    sourceChannelLayout = getChannelLayout(sourceChannels);
    targetChannelLayout = getChannelLayout(targetChannels);
    // 16BIT交叉存放PCM数据格式
    sampleFormat = AV_SAMPLE_FMT_S16;
    // 配置
    av_opt_set_int(context, "in_channel_layout", sourceChannelLayout, 0);
    av_opt_set_int(context, "in_sample_rate", sourceSampleRate, 0);
    av_opt_set_sample_fmt(context, "in_sample_fmt", sampleFormat, 0);
    av_opt_set_int(context, "out_channel_layout", targetChannelLayout, 0);
    av_opt_set_int(context, "out_sample_rate", targetSampleRate, 0);
    av_opt_set_sample_fmt(context, "out_sample_fmt", sampleFormat, 0);
    // 初始化
    if (swr_init(context) < 0) {
        result = -1;
        goto R1;
    }
    // 输入
    // 输入样品数 一帧1024样品数
    sourceSamples = 1024;
    // 输入大小 计算一帧样品数据量大小 = 声道数 * 样品数 * 每个样品所占字节 
    sourceBufferSize = av_samples_get_buffer_size(&sourceLineSize, sourceChannels, sourceSamples, sampleFormat, 1);
    // 分配输入空间
    result = av_samples_alloc_array_and_samples(&sourceData, &sourceLineSize, sourceChannels,
                                       sourceSamples, sampleFormat, 0);
    if (result < 0) {
        result = -1;
        goto R1;
    }
    // 输出
    // 计算（最大）输出样品数
    targetMaxSamples = targetSamples = (int) av_rescale_rnd(sourceSamples, targetSampleRate, sourceSampleRate, AV_ROUND_UP);
    // 分配输出空间
    result = av_samples_alloc_array_and_samples(&targetData, &targetLineSize, targetChannels,
                                                targetSamples, sampleFormat, 0);
    if (result < 0) {
        result = -1;
        goto R1;
    }
    // 循环读取文件
    // 每次读取一帧数据量大小
    read = fread(sourceData[0], 1, sourceBufferSize, source);
    while (read > 0) {
        // 计算输出样品数
        targetSamples = (int) av_rescale_rnd(swr_get_delay(context, sourceSampleRate) + sourceSamples, targetSampleRate, sourceSampleRate, AV_ROUND_UP);
        if (targetSamples > targetMaxSamples) {
            av_freep(&targetData[0]);
            result = av_samples_alloc(targetData, &targetLineSize, targetChannels, targetSamples, sampleFormat, 1);
            if (result < 0) {
                break;
            }
            targetMaxSamples = targetSamples;
        }
        // 重采样
        result = swr_convert(context, targetData, targetSamples,
                                 (const uint8_t **) sourceData, sourceSamples);
        if (result < 0) {
            break;
        }
        // 计算输出大小 result为一帧重采样数
        targetBufferSize = av_samples_get_buffer_size(&targetLineSize, targetChannels, result, sampleFormat, 1);
        if (targetBufferSize < 0) {
            break;
        }
        // 写入文件
        fwrite(targetData[0], 1, targetBufferSize, target);
        // 每次读取一帧数据量大小
        read = fread(sourceData[0], 1, sourceBufferSize, source);
    }
R1:
    // 关闭文件
    fclose(source);
    fclose(target);
R2:
    // 释放
    swr_free(&context);
    env->ReleaseStringUTFChars(sourcePath, _sourcePath);
    env->ReleaseStringUTFChars(targetPath, _targetPath);
    return result;
}