Vienna University of Technology
Institute of Software Technology and Interactive Systems
Information & Software Engineering Group
Music Information Retrieval
Automatically Recognizing Environments and Situations Based on Sound Analysis
Each of us hears noise every day in different places, and there are features that tell apart the source and makes it unique.
We can tell the difference between the noise we hear while we are having a shower, when we have a cup of coffee at a cafeteria, but we might have some trouble to recognize the coming underground or the tram.
In the present document we present a study focused on classifying daily noises like those automatically by using a learning machine, that could be applied for other purpose like film genre classification or music genre classification. We carry out several experiments with different features and algorithms to contrast the performance.
Environmental Classification
In order to create the classifier, we first record sounds with a voice recorder in different environments (21 categories) and cut short samples from the streams. Then we extract some audio features which identify the samples (RH, RP, SSD and combinations of them) and create WEKA files with the features to train the classifier using some known algorithms (kNN nearest neighbours, Random Forests, Naive Bayes and Vector-supported machines) and these vector files. We carry out some experiments, comparing and analysing the results, trying to explain them. We also suggest some propositions for future work in order to improve the classifier accuracy.
How to obtain the training set from the samples
Evaluation
In order to train the classifier, we are going to use WEKA, a suite of machine learning algorithms developed at the University of Waikato, in New Zealand. It contains some tools like data visualization or some algorithms we need for the training process.
We will execute four proofs: the first one with 100 samples belonging to 10 categories within the whole 21, the second one with all the samples (1357) in the 21 categories (both trials in a cross validation setting on the training set), the third one using the vectors from samples in the second trial as training set and the vectors from a new stream as test set, and the last one using the stream in the third experiment filtered to a lower frequency so that it has a mobile phone quality.
The best results in all the experiments can be seen in the following table.
Precision | Recall | |
---|---|---|
1st experiment | 84.5% | 83% |
2nd experiment | 74.5% | 74.29% |
3rd experiment | 21.1% | 11.50% |
4th experiment | 9.5% | 11.91% |
The detailed results for every algorithm and vector file are shown in the table below:
First experiment
Vector file | Algorithm | Parameter | Results |
---|---|---|---|
RH | kNN | k=1 | Result |
k=3 | Result | ||
k=5 | Result | ||
Random Forests | I=10 | Result | |
I=20 | Result | ||
I=50 | Result | ||
Naive-Bayes | Default | Result | |
Vector-supported machines | E=1 | Result | |
E=2 | Result | ||
RP | kNN | k=1 | Result |
k=3 | Result | ||
k=5 | Result | ||
Random Forests | I=10 | Result | |
I=20 | Result | ||
I=50 | Result | ||
Naive-Bayes | Default | Result | |
Vector-supported machines | E=1 | Result | |
E=2 | Result | ||
SSD | kNN | k=1 | Result |
k=3 | Result | ||
k=5 | Result | ||
Random Forests | I=10 | Result | |
I=20 | Result | ||
I=50 | Result | ||
Naive-Bayes | Default | Result | |
Vector-supported machines | E=1 | Result | |
E=2 | Result | ||
RP+RH | kNN | k=1 | Result |
k=3 | Result | ||
k=5 | Result | ||
Random Forests | I=10 | Result | |
I=20 | Result | ||
I=50 | Result | ||
Naive-Bayes | Default | Result | |
Vector-supported machines | E=1 | Result | |
E=2 | Result | ||
RP+SSD | kNN | k=1 | Result |
k=3 | Result | ||
k=5 | Result | ||
Random Forests | I=10 | Result | |
I=20 | Result | ||
I=50 | Result | ||
Naive-Bayes | Default | Result | |
Vector-supported machines | E=1 | Result | |
E=2 | Result | ||
RP+SSD+RH | kNN | k=1 | Result |
k=3 | Result | ||
k=5 | Result | ||
Random Forests | I=10 | Result | |
I=20 | Result | ||
I=50 | Result | ||
Naive-Bayes | Default | Result | |
Vector-supported machines | E=1 | Result | |
E=2 | Result | ||
SSD+RH | kNN | k=1 | Result |
k=3 | Result | ||
k=5 | Result | ||
Random Forests | I=10 | Result | |
I=20 | Result | ||
I=50 | Result | ||
Naive-Bayes | Default | Result | |
Vector-supported machines | E=1 | Result | |
E=2 | Result |
Second experiment
Vector file | Algorithm | Parameter | Results |
---|---|---|---|
RH | kNN | k=1 | Result |
k=3 | Result | ||
k=5 | Result | ||
Random Forests | I=10 | Result | |
I=20 | Result | ||
I=50 | Result | ||
Naive-Bayes | Default | Result | |
Vector-supported machines | E=1 | Result | |
E=2 | Result | ||
RP | kNN | k=1 | Result |
k=3 | Result | ||
k=5 | Result | ||
Random Forests | I=10 | Result | |
I=20 | Result | ||
I=50 | Result | ||
Naive-Bayes | Default | Result | |
Vector-supported machines | E=1 | Result | |
E=2 | Result | ||
SSD | kNN | k=1 | Result |
k=3 | Result | ||
k=5 | Result | ||
Random Forests | I=10 | Result | |
I=20 | Result | ||
I=50 | Result | ||
Naive-Bayes | Default | Result | |
Vector-supported machines | E=1 | Result | |
E=2 | Result | ||
RP+RH | kNN | k=1 | Result |
k=3 | Result | ||
k=5 | Result | ||
Random Forests | I=10 | Result | |
I=20 | Result | ||
I=50 | Result | ||
Naive-Bayes | Default | Result | |
Vector-supported machines | E=1 | Result | |
E=2 | Result | ||
RP+SSD | kNN | k=1 | Result |
k=3 | Result | ||
k=5 | Result | ||
Random Forests | I=10 | Result | |
I=20 | Result | ||
I=50 | Result | ||
Naive-Bayes | Default | Result | |
Vector-supported machines | E=1 | Result | |
E=2 | Result | ||
RP+SSD+RH | kNN | k=1 | Result |
k=3 | Result | ||
k=5 | Result | ||
Random Forests | I=10 | Result | |
I=20 | Result | ||
I=50 | Result | ||
Naive-Bayes | Default | Result | |
Vector-supported machines | E=1 | Result | |
E=2 | Result | ||
SSD+RH | kNN | k=1 | Result |
k=3 | Result | ||
k=5 | Result | ||
Random Forests | I=10 | Result | |
I=20 | Result | ||
I=50 | Result | ||
Naive-Bayes | Default | Result | |
Vector-supported machines | E=1 | Result | |
E=2 | Result |
Third experiment
First sub-experiment
Vector file | Algorithm | Parameter | Results |
---|---|---|---|
SSD | Random Forests | I=50 | Result |
SMO | E=1 | Result | |
E=2 | Result | ||
SSD+RH | E=2 | Result |
Second sub-experiment
Vector file | Algorithm | Parameter | Results |
---|---|---|---|
SSD | Random Forests | I=50 | Result |
SMO | E=1 | Result | |
E=2 | Result | ||
SSD+RH | E=2 | Result |
Fourth experiment
Vector file | Algorithm | Parameter | Results |
---|---|---|---|
SSD | Random Forests | I=50 | Result |
SMO | E=1 | Result | |
E=2 | Result | ||
SSD+RH | E=2 | Result |
Programmes
We required eight different tools to complete this study. AMR to MP3 Converter converted the AMR files recorded with a mobile phone to MP3 files,
MP3 Cutter cut the streams in samples, Audacity processed the sound by adding silences, converting from WAV to MP3 and changing the sampling frequency to the samples,
the programming language Perl to execute the scripts, WEKA to train the classifier and run the experiments, Eclipse to develop the Java programmes, the programming
language Java to write some programmes and LaTeX for the typesetting of the paper.
Corpus
- Streams for first and second experiment (MP3)
- Samples for first experiment (MP3)
- Stream for third and fourth experiment (MP3)
Downloads
- AMR to MP3 Converter
- MP3 Cutter
- Audacity
- WEKA
- Eclipse
- Java
- LaTeX Maker LaTeX Editor, LaTeX distribution (for Windows)
- Perl scripts
- Java Feature Extraction
- Java programme for the third experiment: executable | source code
- Java programme that executes Perl scripts (source code included)
- Ground truth files: Third experiment, first subtrial | Third experiment, second subtrial |
Fourth experiment - Paper - Juan Valentín Cortés Fábregas: "Automatically Recognizing Environments and Situations Based on Sound Analysis"
created in July 2010 by Juan Valentín Cortés Fábregas