FAST-RIR / README.md

Upload 40 files

b53f252 about 3 years ago

4.34 kB

	# FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR (ICASSP 2022)
	This is the official implementation of our neural-network-based fast diffuse room impulse response generator ([FAST-RIR](https://arxiv.org/pdf/2110.04057.pdf)) for generating room impulse responses (RIRs) for a given rectangular acoustic environment. Our model is inspired by [StackGAN](https://github.com/hanzhanggit/StackGAN-Pytorch) architecture. The audio examples and spectrograms of the generated RIRs are available [here](https://anton-jeran.github.io/FRIR/).

	NEWS : We have genaralized our FAST-RIR to generate RIRs for any 3D indoor scenes represented using meshes. Official code of our network [MESH2IR](https://anton-jeran.github.io/M2IR/) is available.

	## Requirements

	```
	Python3.6
	Pytorch
	python-dateutil
	easydict
	pandas
	torchfile
	gdown
	librosa
	soundfile
	acoustics
	wavefile
	wavfile
	pyyaml==5.4.1
	pickle
	```


	## Embedding

	Each normalized embedding is created as follows: If you are using our trained model, you may need to use extra parameter Correction(CRR).

	```
	Listener Position = LP
	Source Position = SP
	Room Dimension = RD
	Reverberation Time = T60
	Correction = CRR

	CRR = 0.1 if 0.5<T60<0.6
	CRR = 0.2 if T60>0.6
	CRR = 0 otherwise

	Embedding = ([LP_X,LP_Y,LP_Z,SP_X,SP_Y,SP_Z,RD_X,RD_Y,RD_Z,(T60+CRR)] /5) - 1
	```


	## Generete RIRs using trained model

	Download the trained model using this command

	```
	source download_generate.sh
	```

	Create normalized embeddings list in pickle format. You can run following command to generate an example embedding list
	```
	python3 example1.py
	```

	Run the following command inside code_new to generate RIRs corresponding to the normalized embeddings list. You can find generated RIRs inside code_new/Generated_RIRs

	```
	python3 main.py --cfg cfg/RIR_eval.yml --gpu 0
	```

	## Range

	Our trained NN-DAS is capable of generating RIRs with the following range accurately.
	```
	Room Dimension X --> 8m to 11m
	Room Dimesnion Y --> 6m to 8m
	Room Dimension Z --> 2.5m to 3.5m
	Listener Position --> Any position within the room
	Speaker Position --> Any position within the room
	Reverberation time --> 0.2s to 0.7s
	```

	## Training the Model

	Run the following command to download the training dataset we created using a [Diffuse Acoustic Simulator](https://github.com/GAMMA-UMD/pygsound). You also can train the model using your dataset.

	```
	source download_data.sh
	```

	Run the following command to train the model. You can pass what GPUs to be used for training as an input argument. In this example, I am using 2 GPUs.

	```
	python3 main.py --cfg cfg/RIR_s1.yml --gpu 0,1
	```


	## Related Works
	1) [IR-GAN: Room Impulse Response Generator for Far-field Speech Recognition (INTERSPEECH2021)](https://github.com/anton-jeran/IR-GAN)
	2) [TS-RIR: Translated synthetic room impulse responses for speech augmentation (IEEE ASRU 2021)](https://github.com/GAMMA-UMD/TS-RIR)


	## Citations
	If you use our FAST-RIR for your research, please consider citing

	```
	@INPROCEEDINGS{9747846,
	author={Ratnarajah, Anton and Zhang, Shi-Xiong and Yu, Meng and Tang, Zhenyu and Manocha, Dinesh and Yu, Dong},
	booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
	title={Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator},
	year={2022},
	volume={},
	number={},
	pages={571-575},
	doi={10.1109/ICASSP43922.2022.9747846}}
	```

	Our work is inspired by
	```
	@inproceedings{han2017stackgan,
	Author = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
	Title = {StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks},
	Year = {2017},
	booktitle = {{ICCV}},
	}
	```

	If you use our training dataset generated using [Diffuse Acoustic Simulator](https://github.com/GAMMA-UMD/pygsound) in your research, please consider citing
	```
	@inproceedings{9052932,
	author={Z. {Tang} and L. {Chen} and B. {Wu} and D. {Yu} and D. {Manocha}},
	booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
	title={Improving Reverberant Speech Training Using Diffuse Acoustic Simulation},
	year={2020},
	volume={},
	number={},
	pages={6969-6973},
	}
	```