Automatically Recognizing Environments and Situations Based on Sound Analysis

Each of us hears noise every day in different places, and there are features that tell apart the source and makes it unique.

We can tell the difference between the noise we hear while we are having a shower, when we have a cup of coffee at a cafeteria, but we might have some trouble to recognize the coming underground or the tram.

In the present document we present a study focused on classifying daily noises like those automatically by using a learning machine, that could be applied for other purpose like film genre classification or music genre classification. We carry out several experiments with different features and algorithms to contrast the performance.

Environmental Classification

In order to create the classifier, we first record sounds with a voice recorder in different environments (21 categories) and cut short samples from the streams. Then we extract some audio features which identify the samples (RH, RP, SSD and combinations of them) and create WEKA files with the features to train the classifier using some known algorithms (kNN nearest neighbours, Random Forests, Naive Bayes and Vector-supported machines) and these vector files. We carry out some experiments, comparing and analysing the results, trying to explain them. We also suggest some propositions for future work in order to improve the classifier accuracy.

How to obtain the training set from the samples

Evaluation

In order to train the classifier, we are going to use WEKA, a suite of machine learning algorithms developed at the University of Waikato, in New Zealand. It contains some tools like data visualization or some algorithms we need for the training process.

We will execute four proofs: the first one with 100 samples belonging to 10 categories within the whole 21, the second one with all the samples (1357) in the 21 categories (both trials in a cross validation setting on the training set), the third one using the vectors from samples in the second trial as training set and the vectors from a new stream as test set, and the last one using the stream in the third experiment filtered to a lower frequency so that it has a mobile phone quality.

The best results in all the experiments can be seen in the following table.

	Precision	Recall
1st experiment	84.5%	83%
2nd experiment	74.5%	74.29%
3rd experiment	21.1%	11.50%
4th experiment	9.5%	11.91%

The detailed results for every algorithm and vector file are shown in the table below:

First experiment

Vector file	Algorithm	Parameter	Results
RH	kNN	k=1	Result
		k=3	Result
		k=5	Result
	Random Forests	I=10	Result
		I=20	Result
		I=50	Result
	Naive-Bayes	Default	Result
	Vector-supported machines	E=1	Result
	Vector-supported machines	E=2	Result
RP	kNN	k=1	Result
		k=3	Result
		k=5	Result
	Random Forests	I=10	Result
		I=20	Result
		I=50	Result
	Naive-Bayes	Default	Result
	Vector-supported machines	E=1	Result
	Vector-supported machines	E=2	Result
SSD	kNN	k=1	Result
		k=3	Result
		k=5	Result
	Random Forests	I=10	Result
		I=20	Result
		I=50	Result
	Naive-Bayes	Default	Result
	Vector-supported machines	E=1	Result
	Vector-supported machines	E=2	Result
RP+RH	kNN	k=1	Result
		k=3	Result
		k=5	Result
	Random Forests	I=10	Result
		I=20	Result
		I=50	Result
	Naive-Bayes	Default	Result
	Vector-supported machines	E=1	Result
	Vector-supported machines	E=2	Result
RP+SSD	kNN	k=1	Result
		k=3	Result
		k=5	Result
	Random Forests	I=10	Result
		I=20	Result
		I=50	Result
	Naive-Bayes	Default	Result
	Vector-supported machines	E=1	Result
	Vector-supported machines	E=2	Result
RP+SSD+RH	kNN	k=1	Result
		k=3	Result
		k=5	Result
	Random Forests	I=10	Result
		I=20	Result
		I=50	Result
	Naive-Bayes	Default	Result
	Vector-supported machines	E=1	Result
	Vector-supported machines	E=2	Result
SSD+RH	kNN	k=1	Result
		k=3	Result
		k=5	Result
	Random Forests	I=10	Result
		I=20	Result
		I=50	Result
	Naive-Bayes	Default	Result
	Vector-supported machines	E=1	Result
	Vector-supported machines	E=2	Result

Second experiment

Vector file	Algorithm	Parameter	Results
RH	kNN	k=1	Result
		k=3	Result
		k=5	Result
	Random Forests	I=10	Result
		I=20	Result
		I=50	Result
	Naive-Bayes	Default	Result
	Vector-supported machines	E=1	Result
	Vector-supported machines	E=2	Result
RP	kNN	k=1	Result
		k=3	Result
		k=5	Result
	Random Forests	I=10	Result
		I=20	Result
		I=50	Result
	Naive-Bayes	Default	Result
	Vector-supported machines	E=1	Result
	Vector-supported machines	E=2	Result
SSD	kNN	k=1	Result
		k=3	Result
		k=5	Result
	Random Forests	I=10	Result
		I=20	Result
		I=50	Result
	Naive-Bayes	Default	Result
	Vector-supported machines	E=1	Result
	Vector-supported machines	E=2	Result
RP+RH	kNN	k=1	Result
		k=3	Result
		k=5	Result
	Random Forests	I=10	Result
		I=20	Result
		I=50	Result
	Naive-Bayes	Default	Result
	Vector-supported machines	E=1	Result
	Vector-supported machines	E=2	Result
RP+SSD	kNN	k=1	Result
		k=3	Result
		k=5	Result
	Random Forests	I=10	Result
		I=20	Result
		I=50	Result
	Naive-Bayes	Default	Result
	Vector-supported machines	E=1	Result
	Vector-supported machines	E=2	Result
RP+SSD+RH	kNN	k=1	Result
		k=3	Result
		k=5	Result
	Random Forests	I=10	Result
		I=20	Result
		I=50	Result
	Naive-Bayes	Default	Result
	Vector-supported machines	E=1	Result
	Vector-supported machines	E=2	Result
SSD+RH	kNN	k=1	Result
		k=3	Result
		k=5	Result
	Random Forests	I=10	Result
		I=20	Result
		I=50	Result
	Naive-Bayes	Default	Result
	Vector-supported machines	E=1	Result
	Vector-supported machines	E=2	Result

Third experiment

First sub-experiment

Vector file	Algorithm	Parameter	Results
SSD	Random Forests	I=50	Result
	SMO	E=1	Result
		E=2	Result
SSD+RH		E=2	Result

Second sub-experiment

Vector file	Algorithm	Parameter	Results
SSD	Random Forests	I=50	Result
	SMO	E=1	Result
		E=2	Result
SSD+RH		E=2	Result

Fourth experiment

Vector file	Algorithm	Parameter	Results
SSD	Random Forests	I=50	Result
	SMO	E=1	Result
		E=2	Result
SSD+RH		E=2	Result

Programmes

We required eight different tools to complete this study. AMR to MP3 Converter converted the AMR files recorded with a mobile phone to MP3 files,
MP3 Cutter cut the streams in samples, Audacity processed the sound by adding silences, converting from WAV to MP3 and changing the sampling frequency to the samples,
the programming language Perl to execute the scripts, WEKA to train the classifier and run the experiments, Eclipse to develop the Java programmes, the programming
language Java to write some programmes and LaTeX for the typesetting of the paper.

Corpus

Downloads

AMR to MP3 Converter
MP3 Cutter
Audacity
WEKA
Eclipse
Java
LaTeX Maker LaTeX Editor, LaTeX distribution (for Windows)
Perl scripts
Java Feature Extraction
Java programme for the third experiment: executable | source code
Java programme that executes Perl scripts (source code included)
Ground truth files: Third experiment, first subtrial | Third experiment, second subtrial |
Fourth experiment
Paper - Juan Valentín Cortés Fábregas: "Automatically Recognizing Environments and Situations Based on Sound Analysis"

created in July 2010 by Juan Valentín Cortés Fábregas

Final project TU Wien

Sunday, July 4, 2010