314 lines
6.2 KiB
Markdown
314 lines
6.2 KiB
Markdown
# AAC Encoder/Decoder Assignment (Multimedia – AUTh)
|
||
|
||
## Overview
|
||
|
||
This repository contains a staged implementation of a simplified AAC-like audio encoder/decoder pipeline, developed in the context of the **Multimedia** course at Aristotle University of Thessaloniki (AUTh).
|
||
|
||
The project follows a progressive, level-based structure:
|
||
|
||
- **Level 1:** Core analysis/synthesis pipeline
|
||
- **Level 2:** Full transform-domain encoding with quantization
|
||
- **Level 3:** Psychoacoustic modeling and perceptual coding enhancements
|
||
|
||
The goal of this work is to:
|
||
|
||
- Faithfully implement the processing chain specified in the assignment
|
||
- Validate correctness using structured and reproducible tests
|
||
- Maintain a clean and reproducible project architecture
|
||
- Ensure separation between development logic and submission packaging
|
||
|
||
---
|
||
|
||
## System Architecture
|
||
|
||
The implemented pipeline follows a simplified AAC-style structure:
|
||
|
||
```
|
||
Input WAV
|
||
↓
|
||
SSC (Segmentation Control)
|
||
↓
|
||
Filterbank (MDCT)
|
||
↓
|
||
[TNS / Psychoacoustic Model] (Level 3)
|
||
↓
|
||
Quantization & Coding (Level 2+)
|
||
↓
|
||
Bitstream Structuring
|
||
↓
|
||
-----------------------------------------
|
||
↓
|
||
Inverse Quantization
|
||
↓
|
||
Inverse Filterbank (IMDCT)
|
||
↓
|
||
OLA Reconstruction
|
||
↓
|
||
Output WAV
|
||
```
|
||
|
||
Each level progressively enables more blocks of this pipeline.
|
||
|
||
---
|
||
|
||
## Repository Structure
|
||
|
||
The repository is organized into source code, material, and report files.
|
||
|
||
```
|
||
root/
|
||
│
|
||
├── source/
|
||
│ ├── level_1/
|
||
│ ├── level_2/
|
||
│ ├── level_3/
|
||
│ ├── core/
|
||
│ └── material/
|
||
│
|
||
├── report/
|
||
├── README.md
|
||
└── LICENSE
|
||
```
|
||
|
||
### `source/`
|
||
|
||
Contains all implementation code.
|
||
|
||
#### `level_x/`
|
||
|
||
Each level directory contains:
|
||
|
||
- `level_x.py` (main module entry point)
|
||
- `core/` (hard-links to shared implementation)
|
||
- `material/` (hard-links to required helper material)
|
||
- `tests/` (level-specific tests)
|
||
|
||
Each level is **self-contained** to satisfy submission requirements.
|
||
|
||
#### `core/`
|
||
|
||
This directory contains the centralized implementation of:
|
||
|
||
- SSC
|
||
- MDCT / IMDCT filterbank
|
||
- Quantizer / dequantizer
|
||
- Psychoacoustic model
|
||
- TNS
|
||
- Bitstream handling
|
||
- Encoder/decoder pipelines
|
||
|
||
All development happens here.
|
||
|
||
Each `level_x/core/` directory references these files using **hard links**, ensuring:
|
||
|
||
- No code duplication
|
||
- No synchronization errors
|
||
- Clean development workflow
|
||
|
||
#### `material/`
|
||
|
||
Contains helper files provided by the assignment:
|
||
|
||
- Sample audio
|
||
- Reference data
|
||
- Required constants or auxiliary files
|
||
|
||
---
|
||
|
||
## Development Workflow Design
|
||
|
||
One of the project requirements was to deliver `level_x` directories containing all required files, without referencing external directories.
|
||
|
||
Naively copying files across levels would introduce:
|
||
|
||
- Code redundancy
|
||
- High maintenance cost
|
||
- Risk of inconsistencies
|
||
- Debugging complexity
|
||
|
||
To avoid this:
|
||
|
||
- All implementation lives in `source/core/`
|
||
- Each `level_x` directory contains hard-links to `core/` and `material/`
|
||
|
||
This ensures:
|
||
|
||
- Single source of truth
|
||
- Clean modular structure
|
||
- Instructor-compliant submission format
|
||
- Safe iterative development
|
||
|
||
---
|
||
|
||
# Level Descriptions
|
||
|
||
---
|
||
|
||
## Level 1 – Core Transform Pipeline
|
||
|
||
### Goal
|
||
|
||
Implement the baseline transform-domain analysis/synthesis chain.
|
||
|
||
### Implemented Components
|
||
|
||
- Sequence Segmentation Control (SSC)
|
||
- MDCT analysis filterbank
|
||
- IMDCT synthesis filterbank
|
||
- Overlap-Add (OLA) reconstruction
|
||
- End-to-end encoder/decoder:
|
||
- `aac_coder_1()`
|
||
- `i_aac_coder_1()`
|
||
- Demo:
|
||
- `demo_aac_1()`
|
||
|
||
### Testing Coverage
|
||
|
||
- SSC unit tests
|
||
- MDCT / IMDCT correctness tests
|
||
- Perfect reconstruction validation
|
||
- OLA consistency tests
|
||
- Encoder/decoder integration tests
|
||
|
||
This level ensures transform-domain correctness and signal integrity.
|
||
|
||
---
|
||
|
||
## Level 2 – Quantization and Coding
|
||
|
||
### Goal
|
||
|
||
Extend Level 1 by implementing transform-domain quantization and coding.
|
||
|
||
### Implemented Components
|
||
|
||
- Scalar quantization
|
||
- Dequantization
|
||
- Basic bitstream formatting
|
||
- Integration into encoder/decoder pipeline:
|
||
- `aac_coder_2()`
|
||
- `i_aac_coder_2()`
|
||
- Demo:
|
||
- `demo_aac_2()`
|
||
|
||
### Validation
|
||
|
||
- SNR-based quality evaluation
|
||
- Consistency tests between quantizer and inverse quantizer
|
||
- End-to-end reconstruction tests
|
||
|
||
This level introduces compression and controlled signal degradation.
|
||
|
||
---
|
||
|
||
## Level 3 – Psychoacoustic Model & Perceptual Coding
|
||
|
||
### Goal
|
||
|
||
Incorporate perceptual modeling to improve compression efficiency.
|
||
|
||
### Implemented Components
|
||
|
||
- Psychoacoustic model
|
||
- Masking threshold estimation
|
||
- TNS (Temporal Noise Shaping)
|
||
- Adaptive quantization
|
||
- Full encoding/decoding pipeline:
|
||
- `aac_coder_3()`
|
||
- `i_aac_coder_3()`
|
||
- Demo:
|
||
- `demo_aac_3()`
|
||
|
||
### Validation
|
||
|
||
- Perceptual improvements compared to Level 2
|
||
- Stability tests
|
||
- End-to-end evaluation
|
||
|
||
This level approximates a simplified perceptual AAC-like encoder.
|
||
|
||
---
|
||
|
||
# How to Run
|
||
|
||
All commands assume you are inside:
|
||
|
||
```
|
||
source/
|
||
```
|
||
|
||
---
|
||
|
||
## Run Level Demo
|
||
|
||
Navigate to the desired level:
|
||
|
||
```
|
||
cd source/level_x
|
||
```
|
||
|
||
Run:
|
||
|
||
```bash
|
||
python -m level_x <input.wav> <output.wav>
|
||
```
|
||
|
||
Example:
|
||
|
||
```bash
|
||
python -m level_1 material/LicorDeCalandraca.wav material/LicorDeCalandraca_out.wav
|
||
```
|
||
|
||
The demo prints:
|
||
|
||
- Overall SNR (dB)
|
||
- Processing information
|
||
|
||
---
|
||
|
||
# Running Tests
|
||
|
||
Tests are written using `pytest`.
|
||
|
||
A `pytest.ini` file is included in `source/` to ensure proper module resolution.
|
||
|
||
From inside `source/`:
|
||
|
||
```bash
|
||
pytest -v
|
||
```
|
||
|
||
Run specific level tests:
|
||
|
||
```bash
|
||
pytest -v level_1/tests
|
||
pytest -v level_2/tests
|
||
pytest -v level_3/tests
|
||
```
|
||
|
||
Run a specific test file:
|
||
|
||
```bash
|
||
pytest -v level_1/tests/test_SSC.py
|
||
```
|
||
|
||
---
|
||
|
||
# Reproducibility
|
||
|
||
- Python version: 3.x
|
||
- Tests validated with `pytest`
|
||
- No external dependencies beyond assignment requirements
|
||
- Deterministic pipeline execution
|
||
|
||
---
|
||
|
||
# Disclaimer
|
||
|
||
This project was developed solely for educational purposes as part of the Multimedia course at AUTh.
|
||
It is provided **"as is"**, without any express or implied warranties.
|
||
The author assumes no responsibility for any misuse, data loss, security incidents, or damages resulting from the use of this software.
|
||
This implementation should not be used in production environments.
|
||
|
||
All work, modifications, and results are the sole responsibility of the author.
|