feat: Enhance metadata extraction and formatting, improve extractor architecture, and update documentation

This commit is contained in:
sHa
2025-12-26 13:38:17 +00:00
parent 8f68624529
commit 91df347727
13 changed files with 170 additions and 76 deletions

View File

@@ -7,27 +7,45 @@ This is a Python Terminal User Interface (TUI) application for managing media fi
Key features:
- Recursive directory scanning
- Tree-based file navigation
- Detailed metadata extraction and display
- Detailed metadata extraction and display from multiple sources
- Color-coded information
- Keyboard and mouse navigation
- Extensible for future renaming and editing features
- Extensible extractor and formatter architecture
## Technology Stack
- Python 3.11+
- Textual (TUI framework)
- Mutagen (audio/video metadata)
- PyMediaInfo (detailed track information)
- Mutagen (embedded metadata)
- Python-Magic (MIME type detection)
- UV (package manager)
## Code Structure
- `main.py`: Main application code
- `main.py`: Main application entry point
- `pyproject.toml`: Project configuration and dependencies
- `README.md`: User documentation
- `todo.txt`: Development task list
- `AI_AGENT.md`: This file
- `renamer/`: Main package
- `app.py`: Main Textual application class
- `extractor.py`: MediaExtractor class coordinating multiple extractors
- `extractors/`: Individual extractor classes
- `mediainfo_extractor.py`: PyMediaInfo-based extraction
- `filename_extractor.py`: Filename parsing
- `metadata_extractor.py`: Mutagen-based metadata
- `fileinfo_extractor.py`: Basic file information
- `formatters/`: Data formatting classes
- `media_formatter.py`: Main formatter coordinating display
- `track_formatter.py`: Track information formatting
- `size_formatter.py`: File size formatting
- `date_formatter.py`: Timestamp formatting
- `duration_formatter.py`: Duration formatting
- `resolution_formatter.py`: Resolution formatting
- `text_formatter.py`: Text styling utilities
- `constants.py`: Application constants
- `screens.py`: Additional UI screens
- `test/`: Unit tests
## Instructions for AI Agents
@@ -42,19 +60,35 @@ Key features:
### Development Workflow
1. Read the current code and understand the structure
1. Read the current code and understand the architecture
2. Check the TODO list for pending tasks
3. Implement features incrementally
4. Test changes by running the app with `uv run python main.py [directory]`
5. Update TODO list as tasks are completed
5. Update tests as needed
6. Ensure backward compatibility
### Key Components
- `RenamerApp`: Main application class inheriting from Textual's App
- `MediaTree`: Custom Tree widget with file-specific styling
- `get_media_tracks`: Function to extract media track information
- Various helper functions for formatting and detection
- `MediaExtractor`: Coordinates multiple specialized extractors
- `MediaFormatter`: Formats extracted data for TUI display
- Various extractor classes for different data sources
- Various formatter classes for different data types
### Extractor Architecture
Extractors are responsible for gathering raw data from different sources:
- Each extractor inherits from no base class but follows the pattern of `__init__(file_path)` and `extract_*()` methods
- The `MediaExtractor` class coordinates multiple extractors and provides a unified `get()` interface
- Extractors return raw data (strings, numbers, dicts) without formatting
### Formatter Architecture
Formatters are responsible for converting raw data into display strings:
- Each formatter provides static methods like `format_*()`
- The `MediaFormatter` coordinates formatters and applies them based on data types
- Formatters handle text styling, color coding, and human-readable representations
### Future Enhancements
@@ -69,6 +103,7 @@ Key features:
- Test navigation, selection, and display
- Verify metadata extraction accuracy
- Check for any errors or edge cases
- Run unit tests with `uv run pytest`
### Contribution Guidelines
@@ -76,5 +111,6 @@ Key features:
- Update documentation as needed
- Ensure the app runs without errors
- Follow the existing code patterns
- Update tests for new functionality
This document should be updated as the project evolves.

View File

@@ -6,9 +6,11 @@ A terminal-based (TUI) application for scanning directories, viewing media file
- Recursive directory scanning for video files
- Tree view navigation with keyboard and mouse support
- File details display (size, extensions, metadata)
- Detailed metadata extraction from multiple sources (MediaInfo, filename parsing, embedded metadata)
- Color-coded information display
- Command-based interface with hotkeys
- Container type detection using Mutagen
- Extensible extractor and formatter system
- Support for video, audio, and subtitle track information
## Installation
@@ -54,7 +56,23 @@ renamer /path/to/media/directory
- Mouse clicks supported
- Select a video file to view its details in the right panel
## Development
## Architecture
The application uses a modular architecture with separate extractors and formatters:
### Extractors
- **MediaInfoExtractor**: Extracts detailed track information using PyMediaInfo
- **FilenameExtractor**: Parses metadata from filenames
- **MetadataExtractor**: Extracts embedded metadata using Mutagen
- **FileInfoExtractor**: Provides basic file information
### Formatters
- **MediaFormatter**: Formats extracted data for display
- **TrackFormatter**: Formats video/audio/subtitle track information
- **SizeFormatter**: Formats file sizes
- **DateFormatter**: Formats timestamps
- **DurationFormatter**: Formats time durations
- **ResolutionFormatter**: Formats video resolutions
### Setup Development Environment
```bash
@@ -79,7 +97,7 @@ uv run python main.py /path/to/directory
### Uninstall
```bash
uv tool uninstall renamerq
uv tool uninstall renamer
```
## Supported Video Formats
@@ -96,4 +114,6 @@ uv tool uninstall renamerq
## Dependencies
- textual: TUI framework
- mutagen: Media metadata detection
- pymediainfo: Detailed media track information
- mutagen: Embedded metadata extraction
- python-magic: MIME type detection

View File

View File

@@ -36,7 +36,7 @@ MEDIA_TYPES = {
}
SOURCE_DICT = {
"WEB-DL": ["WEB-DL", "WEBRip", "WEB-Rip", "WEB"],
"WEB-DL": ["WEB-DL", "WEBRip", "WEB-Rip", "WEB", "WEB-DLRip"],
"BDRip": ["BDRip", "BD-Rip", "BDRIP"],
"BDRemux": ["BDRemux", "BD-Remux", "BDREMUX"],
"DVDRip": ["DVDRip", "DVD-Rip", "DVDRIP"],

View File

@@ -64,6 +64,15 @@ class MediaExtractor:
'extension': [
('FileInfo', lambda: self.fileinfo_extractor.extract_extension())
],
'video_tracks': [
('MediaInfo', lambda: self.mediainfo_extractor.extract_video_tracks())
],
'audio_tracks': [
('MediaInfo', lambda: self.mediainfo_extractor.extract_audio_tracks())
],
'subtitle_tracks': [
('MediaInfo', lambda: self.mediainfo_extractor.extract_subtitle_tracks())
],
}
# Conditions for when a value is considered valid
@@ -76,8 +85,10 @@ class MediaExtractor:
'aspect_ratio': lambda x: x is not None,
'hdr': lambda x: x is not None,
'audio_langs': lambda x: x is not None,
'metadata': lambda x: x is not None,
'tracks': lambda x: x is not None and any(x.get(k, []) for k in ['video_tracks', 'audio_tracks', 'subtitle_tracks'])
'tracks': lambda x: x is not None and any(x.get(k, []) for k in ['video_tracks', 'audio_tracks', 'subtitle_tracks']),
'video_tracks': lambda x: x is not None and len(x) > 0,
'audio_tracks': lambda x: x is not None and len(x) > 0,
'subtitle_tracks': lambda x: x is not None and len(x) > 0,
}
def get(self, key: str, source: str | None = None):

View File

@@ -31,23 +31,49 @@ class FilenameExtractor:
def extract_year(self) -> str | None:
"""Extract year from filename"""
year_match = re.search(r'\((\d{4})\)|(\d{4})', self.file_name)
return (year_match.group(1) or year_match.group(2)) if year_match else None
# First try to find year in parentheses (most common and reliable)
paren_match = re.search(r'\((\d{4})\)', self.file_name)
if paren_match:
return paren_match.group(1)
# Fallback: look for year in dots (like .1971.)
dot_match = re.search(r'\.(\d{4})\.', self.file_name)
if dot_match:
return dot_match.group(1)
# Last resort: any 4-digit number (but this is less reliable)
any_match = re.search(r'\b(\d{4})\b', self.file_name)
if any_match:
year = any_match.group(1)
# Basic sanity check: years should be between 1900 and current year + a few years
current_year = 2025 # Update this as needed
if 1900 <= int(year) <= current_year + 10:
return year
return None
def extract_source(self) -> str | None:
"""Extract video source from filename"""
temp_name = re.sub(r'\s*\(\d{4}\)\s*|\s*\d{4}\s*|\.\d{4}\.', '', self.file_name)
temp_name = re.sub(r'\s*\(\d{4}\)\s*|\s*\d{4}\s*|\.\d{4}\.', ' ', self.file_name)
for src, aliases in SOURCE_DICT.items():
for alias in aliases:
if re.search(r'\b' + re.escape(alias) + r'\b', temp_name, re.IGNORECASE):
if alias.upper() in temp_name.upper():
return src
return None
def extract_frame_class(self) -> str | None:
"""Extract frame class from filename (480p, 720p, 1080p, 2160p, etc.)"""
# First check for specific numeric resolutions
match = re.search(r'(\d{3,4})[pi]', self.file_name, re.IGNORECASE)
if match:
height = int(match.group(1))
return self._get_frame_class_from_height(height)
# If no specific resolution found, check for quality indicators
unclassified_indicators = ['SD', 'LQ', 'HD', 'QHD']
for indicator in unclassified_indicators:
if re.search(r'\b' + re.escape(indicator) + r'\b', self.file_name, re.IGNORECASE):
return 'Unclassified'
return 'Unclassified'

View File

@@ -37,10 +37,6 @@ class MetadataExtractor:
return type(self.info).__name__
return self._detect_by_mime()
def extract_meta_description(self) -> str:
"""Extract meta description"""
meta_type = self.extract_meta_type()
return {info['meta_type']: info['description'] for info in MEDIA_TYPES.values()}.get(meta_type, f'Unknown type {meta_type}')
def _detect_by_mime(self) -> str:
"""Detect meta type by MIME"""

View File

@@ -132,7 +132,10 @@ class MediaFormatter:
"label_formatters": [TextFormatter.bold, TextFormatter.uppercase],
}
]
for item in self.extractor.get("tracks").get("video_tracks"):
# Get video tracks
video_tracks = self.extractor.get("video_tracks", "MediaInfo") or []
for item in video_tracks:
data.append(
{
"group": "Tracks Info",
@@ -142,9 +145,10 @@ class MediaFormatter:
"display_formatters": [TextFormatter.green],
}
)
for i, item in enumerate(
self.extractor.get("tracks").get("audio_tracks"), start=1
):
# Get audio tracks
audio_tracks = self.extractor.get("audio_tracks", "MediaInfo") or []
for i, item in enumerate(audio_tracks, start=1):
data.append(
{
"group": "Tracks Info",
@@ -154,9 +158,10 @@ class MediaFormatter:
"display_formatters": [TextFormatter.yellow],
}
)
for i, item in enumerate(
self.extractor.get("tracks").get("subtitle_tracks"), start=1
):
# Get subtitle tracks
subtitle_tracks = self.extractor.get("subtitle_tracks", "MediaInfo") or []
for i, item in enumerate(subtitle_tracks, start=1):
data.append(
{
"group": "Tracks Info",
@@ -195,13 +200,6 @@ class MediaFormatter:
"value": self.extractor.get("artist", "Metadata") or "Not extracted",
"display_formatters": [TextFormatter.grey],
},
{
"label": "Description",
"label_formatters": [TextFormatter.bold],
"value": self.extractor.get("meta_description", "Metadata")
or "Not extracted",
"display_formatters": [TextFormatter.grey],
},
]
return [self._format_data_item(item) for item in data]

View File

@@ -90,7 +90,6 @@ The Invention of Lying (2009) [720p,ukr,eng].mkv
The Island of Dr. Moreau.(1977).[720p,ukr].mp4
The Killing.(1956).[SD,ukr,eng].mkv
The Love Guru.(2008).[SD,ukr].avi
The Love Guru.(2008).[SD,ukr].avi
The Manchurian Candidate.(2004).[720p,ukr,eng].mkv
The Mortal Instruments. City of Bones.(2013).[720p,ukr,eng].mkv
The Mutant Chronicles.(2008).[SD,ukr,eng].mkv
@@ -203,3 +202,10 @@ Upgrade.(2018).[SD,eng].mkv
Человек с бульвара Капуцинов (1987) [1080p,rus] [tmdbid-45227].mkv
Человек-амфибия (1961) [SD,rus] [tmdbid-43685].avi
Чук и Гек (1953) [SD,rus] [tmdbid-148412].avi
The long title.(2008).[SD 720p,ukr].avi
The_long_title.(2008).2K.1440p.ukr.avi
The long title (2008) SD 720p UKR.avi
The long title (2008) UHD 1440p ENG.mp4
The long title (2008) UHD 1440 ENG.mp4
The long title (2008) 8K 4320p ENG.mp4

View File

@@ -4,32 +4,35 @@ from renamer.extractors.fileinfo_extractor import FileInfoExtractor
class TestFileInfoExtractor:
@pytest.fixture
def extractor(self, test_file):
return FileInfoExtractor(test_file)
@pytest.fixture
def test_file(self):
"""Use the filenames.txt file for testing"""
return Path(__file__).parent / "filenames.txt"
def test_extract_size(self, test_file):
def test_extract_size(self, extractor):
"""Test extracting file size"""
size = FileInfoExtractor.extract_size(test_file)
size = extractor.extract_size()
assert isinstance(size, int)
assert size > 0
def test_extract_modification_time(self, test_file):
def test_extract_modification_time(self, extractor):
"""Test extracting modification time"""
mtime = FileInfoExtractor.extract_modification_time(test_file)
mtime = extractor.extract_modification_time()
assert isinstance(mtime, float)
assert mtime > 0
def test_extract_file_name(self, test_file):
def test_extract_file_name(self, extractor):
"""Test extracting file name"""
name = FileInfoExtractor.extract_file_name(test_file)
name = extractor.extract_file_name()
assert isinstance(name, str)
assert name == "filenames.txt"
def test_extract_file_path(self, test_file):
def test_extract_file_path(self, extractor):
"""Test extracting file path"""
path = FileInfoExtractor.extract_file_path(test_file)
path = extractor.extract_file_path()
assert isinstance(path, str)
assert "filenames.txt" in path
assert str(test_file) == path
assert "filenames.txt" in path

View File

@@ -17,7 +17,8 @@ def load_test_filenames():
def test_extract_title(filename):
"""Test title extraction from filename"""
file_path = Path(filename)
title = FilenameExtractor.extract_title(file_path)
extractor = FilenameExtractor(file_path)
title = extractor.extract_title()
# Print filename and extracted title clearly
print(f"\nFilename: \033[1;36m{filename}\033[0m")
print(f"Extracted title: \033[1;32m{title}\033[0m")
@@ -29,7 +30,8 @@ def test_extract_title(filename):
def test_extract_year(filename):
"""Test year extraction from filename"""
file_path = Path(filename)
year = FilenameExtractor.extract_year(file_path)
extractor = FilenameExtractor(file_path)
year = extractor.extract_year()
# Print filename and extracted year clearly
print(f"\nFilename: \033[1;36m{filename}\033[0m")
print(f"Extracted year: \033[1;32m{year}\033[0m")
@@ -42,7 +44,8 @@ def test_extract_year(filename):
def test_extract_source(filename):
"""Test source extraction from filename"""
file_path = Path(filename)
source = FilenameExtractor.extract_source(file_path)
extractor = FilenameExtractor(file_path)
source = extractor.extract_source()
# Print filename and extracted source clearly
print(f"\nFilename: \033[1;36m{filename}\033[0m")
print(f"Extracted source: \033[1;32m{source}\033[0m")
@@ -54,7 +57,8 @@ def test_extract_source(filename):
def test_extract_frame_class(filename):
"""Test frame class extraction from filename"""
file_path = Path(filename)
frame_class = FilenameExtractor.extract_frame_class(file_path)
extractor = FilenameExtractor(file_path)
frame_class = extractor.extract_frame_class()
# Print filename and extracted frame class clearly
print(f"\nFilename: \033[1;36m{filename}\033[0m")
print(f"Extracted frame_class: \033[1;32m{frame_class}\033[0m")

View File

@@ -5,8 +5,8 @@ from renamer.extractors.mediainfo_extractor import MediaInfoExtractor
class TestMediaInfoExtractor:
@pytest.fixture
def extractor(self):
return MediaInfoExtractor()
def extractor(self, test_file):
return MediaInfoExtractor(test_file)
@pytest.fixture
def test_file(self):
@@ -15,18 +15,18 @@ class TestMediaInfoExtractor:
def test_extract_resolution(self, extractor, test_file):
"""Test extracting resolution from media info"""
resolution = extractor.extract_resolution(test_file)
resolution = extractor.extract_resolution()
# Text files don't have video resolution
assert resolution is None
def test_extract_hdr(self, extractor, test_file):
"""Test extracting HDR info"""
hdr = extractor.extract_hdr(test_file)
hdr = extractor.extract_hdr()
# Text files don't have HDR
assert hdr is None
def test_extract_audio_langs(self, extractor, test_file):
"""Test extracting audio languages"""
langs = extractor.extract_audio_langs(test_file)
langs = extractor.extract_audio_langs()
# Text files don't have audio tracks
assert langs == ''

View File

@@ -4,35 +4,29 @@ from renamer.extractors.metadata_extractor import MetadataExtractor
class TestMetadataExtractor:
@pytest.fixture
def extractor(self, test_file):
return MetadataExtractor(test_file)
@pytest.fixture
def test_file(self):
"""Use the filenames.txt file for testing"""
return Path(__file__).parent / "filenames.txt"
def test_extract_title(self, test_file):
def test_extract_title(self, extractor):
"""Test extracting title from metadata"""
title = MetadataExtractor.extract_title(test_file)
title = extractor.extract_title()
# Text files don't have metadata, so should be None
assert title is None
def test_extract_duration(self, test_file):
def test_extract_duration(self, extractor):
"""Test extracting duration from metadata"""
duration = MetadataExtractor.extract_duration(test_file)
duration = extractor.extract_duration()
# Text files don't have duration
assert duration is None
def test_extract_artist(self, test_file):
def test_extract_artist(self, extractor):
"""Test extracting artist from metadata"""
artist = MetadataExtractor.extract_artist(test_file)
artist = extractor.extract_artist()
# Text files don't have artist
assert artist is None
def test_extract_all_metadata(self, test_file):
"""Test extracting all metadata"""
metadata = MetadataExtractor.extract_all_metadata(test_file)
expected = {
'title': None,
'duration': None,
'artist': None
}
assert metadata == expected
assert artist is None