Keras baseline using MFCC for Sound Classification
Mel-Frequency Ceptral Coeffienents(MFCC) feature extraction for Sound Classification
https://www.kaggle.com/seriousran/mfcc-feature-extraction-for-sound-classification
MFCC Feature extraction for Sound Classification
Explore and run machine learning code with Kaggle Notebooks | Using data from Cornell Birdcall Identification
www.kaggle.com
Using the MFCC feature for sound classification like the Cornell Birdcall Identification is common. It takes few hours for Cornell Birdcall Identification datasets. I will share extracted feature as dataset after the execution in colab.
In this notebook, I just use 3 mp3 files for each bird class. (check the LIMIT variable)
Please enjoy it and don't forget to vote it.
Feel free to give an advice.
Mel-Frequency Cepstral Coefficients (MFCCs)

The log-spectrum already takes into account perceptual sensitivity on the magnitude axis, by expressing magnitudes on the logarithmic-axis. The other dimension is then the frequency axis.
There exists a multitude of different criteria with which to quantify accuracy on the frequency scale and there are, correspondingly, a multitude of perceptually motivated frequency scales including the equivalent rectangular bandwidth (ERB) scale, the Bark scale, and the mel-scale. Probably through an abritrary choice mainly due to tradition, in this context we will focus on the mel-scale. This scale describes the perceptual distance between pitches of different frequencies.
Though the argumentation for the MFCCs is not without problems, it has become the most used feature in speech and audio recognition applications. It is used because it works and because it has relatively low complexity and it is straightforward to implement. Simply stated,
if you're unsure which inputs to give to a speech and audio recognition engine, try first the MFCCs.

The beneficial properties of the MFCCs include:
Quantifies the gross-shape of the spectrum (the spectral envelope), which is important in, for example, identification of vowels. At the same time, it removes fine spectral structure (micro-level structure), which is often less important. It thus focuses on that part of the signal which is typically most informative.
Straightforward and computationally reasonably efficient calculation.
Their performance is well-tested and -understood.
Some of the issues with the MFCC include:
The choice of perceptual scale is not well-motivated. Scales such as the ERB or gamma-tone filterbanks might be better suited. However, these alternative filterbanks have not demonstrated consistent benefit, whereby the mel-scale has persisted.
MFCCs are not robust to noise. That is, the performance of MFCCs in presence of additive noise, in comparison to other features, has not always been good.
The choice of triangular weighting filters wk,h is arbitrary and not based on well-grounded motivations. Alternatives have been presented, but they have not gained popularity, probably due to minor effect on outcome.
The MFCCs work well in analysis but for synthesis, they are problematic. Namely, it is difficult to find an inverse transform (from MFCCs to power spectra) which is simultaneously unbiased (=accurate) and congruent with its physical representation (=power spectrum must be positive).
ref: https://wiki.aalto.fi/display/ITSP/Cepstrum+and+MFCC
ref: https://melon1024.github.io/ssc/
'📚 딥딥러닝' 카테고리의 다른 글
YOLOv5 학습 예제 코드 (튜토리얼) - 마스크 쓰고 있는/안 쓴 얼굴(사람) 찾기 (3) | 2020.06.29 |
---|---|
YOLO v5 공개! 개념, 이론, 논문 대신에 iOS...? (0) | 2020.06.22 |
nvidia driver, CUDA 업그레이드 문제 해결 (0) | 2020.06.09 |
Detectron2 trained model load (architecture and weights) from config and checkpoints (0) | 2020.05.12 |
nvidia-smi 대신에 nvtop을 쓰자! GPU의 htop 느낌! (1) | 2020.04.04 |
댓글
이 글 공유하기
다른 글
-
YOLOv5 학습 예제 코드 (튜토리얼) - 마스크 쓰고 있는/안 쓴 얼굴(사람) 찾기
YOLOv5 학습 예제 코드 (튜토리얼) - 마스크 쓰고 있는/안 쓴 얼굴(사람) 찾기
2020.06.29YOLOv5는 x(x-large), l(large), m(medium), s(small)의 4종류로 구성됩니다. 자세한 내용은 이전 글을 참조해주시고, 이 포스트에서는 학습 코드(스크립트)에 대해서 다루고자 합니다. [분류 전체보기] - YOLO v5 공개! 개념, 이론, 논문 대신에 iOS….? YOLO v5 공개! 개념, 이론, 논문 대신에 iOS….? YOLO V5가 공개되었습니다! YOLO는 You Only Look Once라는 이름으로, one-stage object detection 딥러닝 기법으로 매우 빠른 속도의 추론 과정으로 2016년에 큰 충격을 안겨주었었습니다. 저는 증강/가상현실을. lv99.tistory.com 먼저 YOLOv5를 official하게 공개한 ultralytics… -
YOLO v5 공개! 개념, 이론, 논문 대신에 iOS...?
YOLO v5 공개! 개념, 이론, 논문 대신에 iOS...?
2020.06.22YOLO V5가 공개되었습니다! YOLO는 You Only Look Once라는 이름으로, one-stage object detection 딥러닝 기법으로 매우 빠른 속도의 추론 과정으로 2016년에 큰 충격을 안겨주었었습니다. 저는 증강/가상현실을 전공으로 대학원에서 연구하고 있던 시기였기에, 실시간 물체 인식은 매우 중요한 기반 기술중에 하나였죠. YOLO가 세상에 가져다준 충격도 어마어마 했지만, 제 개인에게 가져다준 영향력은 더욱 컸습니다. Darknet이라는 친숙하지 않은 프레임워크 기반이었기에 많은 이들이 사용을 어려워했으나, 다른 프레임워크들로도 구현이되면서 더 유명해지기 시작했습니다. YOLO v4와 v5는 오리지널 YOLO의 저자와 다른 이가 연구개발하였으며, v5는 pytorch를 사용… -
nvidia driver, CUDA 업그레이드 문제 해결
nvidia driver, CUDA 업그레이드 문제 해결
2020.06.091. 재설치를 했는데 버전이 제대로 반영이 안되는 경우 재시작이 필요한듯! lsmod | grep nvidia 로 확인해보고 관련된 것들을 rmmod 해줌! sudo rmmod nvidia_uvm sudo rmmod nvidia_drm sudo rmmod nvidia_modeset sudo rmmod nvidia (순서가 매우 중요하며, 여기에 명시되어있지 않더라도 nvidia 달려있으면 다 rm~) sudo rmmod nvidia를 해주게 되면, 다시 실행이 되므로 nvidia-smi 를 실행해서 버전이 바뀌었는지 확인! 2. nvidia-smi를 확인했을 때, driver 버전을 올랐지만, CUDA 부분이 N/A로 나오는 경우 dpkg -l | grep -i nvidia 로 잘못된 녀석이 있는지 디버… -
Detectron2 trained model load (architecture and weights) from config and checkpoints
Detectron2 trained model load (architecture and weights) from config and checkpoints
2020.05.121. build model with config.yaml file 2. load weights with model.pth file import torch from detectron2.config import get_cfg from detectron2.modeling import build_model from detectron2.checkpoint import DetectionCheckpointer config_path = 'config.yaml' model_path = 'model_final.pth' cfg = get_cfg() cfg.merge_from_file(config_path) model = build_model(cfg) DetectionCheckpointer(model).load(model_p…
댓글을 사용할 수 없습니다.