feat: Enhance metadata extraction and formatting, improve extractor architecture, and update documentation

This commit is contained in:
sHa
2025-12-26 13:38:17 +00:00
parent 8f68624529
commit 91df347727
13 changed files with 170 additions and 76 deletions

View File

@@ -7,27 +7,45 @@ This is a Python Terminal User Interface (TUI) application for managing media fi
Key features: Key features:
- Recursive directory scanning - Recursive directory scanning
- Tree-based file navigation - Tree-based file navigation
- Detailed metadata extraction and display - Detailed metadata extraction and display from multiple sources
- Color-coded information - Color-coded information
- Keyboard and mouse navigation - Keyboard and mouse navigation
- Extensible for future renaming and editing features - Extensible extractor and formatter architecture
## Technology Stack ## Technology Stack
- Python 3.11+ - Python 3.11+
- Textual (TUI framework) - Textual (TUI framework)
- Mutagen (audio/video metadata)
- PyMediaInfo (detailed track information) - PyMediaInfo (detailed track information)
- Mutagen (embedded metadata)
- Python-Magic (MIME type detection) - Python-Magic (MIME type detection)
- UV (package manager) - UV (package manager)
## Code Structure ## Code Structure
- `main.py`: Main application code - `main.py`: Main application entry point
- `pyproject.toml`: Project configuration and dependencies - `pyproject.toml`: Project configuration and dependencies
- `README.md`: User documentation - `README.md`: User documentation
- `todo.txt`: Development task list
- `AI_AGENT.md`: This file - `AI_AGENT.md`: This file
- `renamer/`: Main package
- `app.py`: Main Textual application class
- `extractor.py`: MediaExtractor class coordinating multiple extractors
- `extractors/`: Individual extractor classes
- `mediainfo_extractor.py`: PyMediaInfo-based extraction
- `filename_extractor.py`: Filename parsing
- `metadata_extractor.py`: Mutagen-based metadata
- `fileinfo_extractor.py`: Basic file information
- `formatters/`: Data formatting classes
- `media_formatter.py`: Main formatter coordinating display
- `track_formatter.py`: Track information formatting
- `size_formatter.py`: File size formatting
- `date_formatter.py`: Timestamp formatting
- `duration_formatter.py`: Duration formatting
- `resolution_formatter.py`: Resolution formatting
- `text_formatter.py`: Text styling utilities
- `constants.py`: Application constants
- `screens.py`: Additional UI screens
- `test/`: Unit tests
## Instructions for AI Agents ## Instructions for AI Agents
@@ -42,19 +60,35 @@ Key features:
### Development Workflow ### Development Workflow
1. Read the current code and understand the structure 1. Read the current code and understand the architecture
2. Check the TODO list for pending tasks 2. Check the TODO list for pending tasks
3. Implement features incrementally 3. Implement features incrementally
4. Test changes by running the app with `uv run python main.py [directory]` 4. Test changes by running the app with `uv run python main.py [directory]`
5. Update TODO list as tasks are completed 5. Update tests as needed
6. Ensure backward compatibility 6. Ensure backward compatibility
### Key Components ### Key Components
- `RenamerApp`: Main application class inheriting from Textual's App - `RenamerApp`: Main application class inheriting from Textual's App
- `MediaTree`: Custom Tree widget with file-specific styling - `MediaTree`: Custom Tree widget with file-specific styling
- `get_media_tracks`: Function to extract media track information - `MediaExtractor`: Coordinates multiple specialized extractors
- Various helper functions for formatting and detection - `MediaFormatter`: Formats extracted data for TUI display
- Various extractor classes for different data sources
- Various formatter classes for different data types
### Extractor Architecture
Extractors are responsible for gathering raw data from different sources:
- Each extractor inherits from no base class but follows the pattern of `__init__(file_path)` and `extract_*()` methods
- The `MediaExtractor` class coordinates multiple extractors and provides a unified `get()` interface
- Extractors return raw data (strings, numbers, dicts) without formatting
### Formatter Architecture
Formatters are responsible for converting raw data into display strings:
- Each formatter provides static methods like `format_*()`
- The `MediaFormatter` coordinates formatters and applies them based on data types
- Formatters handle text styling, color coding, and human-readable representations
### Future Enhancements ### Future Enhancements
@@ -69,6 +103,7 @@ Key features:
- Test navigation, selection, and display - Test navigation, selection, and display
- Verify metadata extraction accuracy - Verify metadata extraction accuracy
- Check for any errors or edge cases - Check for any errors or edge cases
- Run unit tests with `uv run pytest`
### Contribution Guidelines ### Contribution Guidelines
@@ -76,5 +111,6 @@ Key features:
- Update documentation as needed - Update documentation as needed
- Ensure the app runs without errors - Ensure the app runs without errors
- Follow the existing code patterns - Follow the existing code patterns
- Update tests for new functionality
This document should be updated as the project evolves. This document should be updated as the project evolves.

View File

@@ -6,9 +6,11 @@ A terminal-based (TUI) application for scanning directories, viewing media file
- Recursive directory scanning for video files - Recursive directory scanning for video files
- Tree view navigation with keyboard and mouse support - Tree view navigation with keyboard and mouse support
- File details display (size, extensions, metadata) - Detailed metadata extraction from multiple sources (MediaInfo, filename parsing, embedded metadata)
- Color-coded information display
- Command-based interface with hotkeys - Command-based interface with hotkeys
- Container type detection using Mutagen - Extensible extractor and formatter system
- Support for video, audio, and subtitle track information
## Installation ## Installation
@@ -54,7 +56,23 @@ renamer /path/to/media/directory
- Mouse clicks supported - Mouse clicks supported
- Select a video file to view its details in the right panel - Select a video file to view its details in the right panel
## Development ## Architecture
The application uses a modular architecture with separate extractors and formatters:
### Extractors
- **MediaInfoExtractor**: Extracts detailed track information using PyMediaInfo
- **FilenameExtractor**: Parses metadata from filenames
- **MetadataExtractor**: Extracts embedded metadata using Mutagen
- **FileInfoExtractor**: Provides basic file information
### Formatters
- **MediaFormatter**: Formats extracted data for display
- **TrackFormatter**: Formats video/audio/subtitle track information
- **SizeFormatter**: Formats file sizes
- **DateFormatter**: Formats timestamps
- **DurationFormatter**: Formats time durations
- **ResolutionFormatter**: Formats video resolutions
### Setup Development Environment ### Setup Development Environment
```bash ```bash
@@ -79,7 +97,7 @@ uv run python main.py /path/to/directory
### Uninstall ### Uninstall
```bash ```bash
uv tool uninstall renamerq uv tool uninstall renamer
``` ```
## Supported Video Formats ## Supported Video Formats
@@ -96,4 +114,6 @@ uv tool uninstall renamerq
## Dependencies ## Dependencies
- textual: TUI framework - textual: TUI framework
- mutagen: Media metadata detection - pymediainfo: Detailed media track information
- mutagen: Embedded metadata extraction
- python-magic: MIME type detection

View File

View File

@@ -36,7 +36,7 @@ MEDIA_TYPES = {
} }
SOURCE_DICT = { SOURCE_DICT = {
"WEB-DL": ["WEB-DL", "WEBRip", "WEB-Rip", "WEB"], "WEB-DL": ["WEB-DL", "WEBRip", "WEB-Rip", "WEB", "WEB-DLRip"],
"BDRip": ["BDRip", "BD-Rip", "BDRIP"], "BDRip": ["BDRip", "BD-Rip", "BDRIP"],
"BDRemux": ["BDRemux", "BD-Remux", "BDREMUX"], "BDRemux": ["BDRemux", "BD-Remux", "BDREMUX"],
"DVDRip": ["DVDRip", "DVD-Rip", "DVDRIP"], "DVDRip": ["DVDRip", "DVD-Rip", "DVDRIP"],

View File

@@ -64,6 +64,15 @@ class MediaExtractor:
'extension': [ 'extension': [
('FileInfo', lambda: self.fileinfo_extractor.extract_extension()) ('FileInfo', lambda: self.fileinfo_extractor.extract_extension())
], ],
'video_tracks': [
('MediaInfo', lambda: self.mediainfo_extractor.extract_video_tracks())
],
'audio_tracks': [
('MediaInfo', lambda: self.mediainfo_extractor.extract_audio_tracks())
],
'subtitle_tracks': [
('MediaInfo', lambda: self.mediainfo_extractor.extract_subtitle_tracks())
],
} }
# Conditions for when a value is considered valid # Conditions for when a value is considered valid
@@ -76,8 +85,10 @@ class MediaExtractor:
'aspect_ratio': lambda x: x is not None, 'aspect_ratio': lambda x: x is not None,
'hdr': lambda x: x is not None, 'hdr': lambda x: x is not None,
'audio_langs': lambda x: x is not None, 'audio_langs': lambda x: x is not None,
'metadata': lambda x: x is not None, 'tracks': lambda x: x is not None and any(x.get(k, []) for k in ['video_tracks', 'audio_tracks', 'subtitle_tracks']),
'tracks': lambda x: x is not None and any(x.get(k, []) for k in ['video_tracks', 'audio_tracks', 'subtitle_tracks']) 'video_tracks': lambda x: x is not None and len(x) > 0,
'audio_tracks': lambda x: x is not None and len(x) > 0,
'subtitle_tracks': lambda x: x is not None and len(x) > 0,
} }
def get(self, key: str, source: str | None = None): def get(self, key: str, source: str | None = None):

View File

@@ -31,8 +31,26 @@ class FilenameExtractor:
def extract_year(self) -> str | None: def extract_year(self) -> str | None:
"""Extract year from filename""" """Extract year from filename"""
year_match = re.search(r'\((\d{4})\)|(\d{4})', self.file_name) # First try to find year in parentheses (most common and reliable)
return (year_match.group(1) or year_match.group(2)) if year_match else None paren_match = re.search(r'\((\d{4})\)', self.file_name)
if paren_match:
return paren_match.group(1)
# Fallback: look for year in dots (like .1971.)
dot_match = re.search(r'\.(\d{4})\.', self.file_name)
if dot_match:
return dot_match.group(1)
# Last resort: any 4-digit number (but this is less reliable)
any_match = re.search(r'\b(\d{4})\b', self.file_name)
if any_match:
year = any_match.group(1)
# Basic sanity check: years should be between 1900 and current year + a few years
current_year = 2025 # Update this as needed
if 1900 <= int(year) <= current_year + 10:
return year
return None
def extract_source(self) -> str | None: def extract_source(self) -> str | None:
"""Extract video source from filename""" """Extract video source from filename"""
@@ -40,14 +58,22 @@ class FilenameExtractor:
for src, aliases in SOURCE_DICT.items(): for src, aliases in SOURCE_DICT.items():
for alias in aliases: for alias in aliases:
if re.search(r'\b' + re.escape(alias) + r'\b', temp_name, re.IGNORECASE): if alias.upper() in temp_name.upper():
return src return src
return None return None
def extract_frame_class(self) -> str | None: def extract_frame_class(self) -> str | None:
"""Extract frame class from filename (480p, 720p, 1080p, 2160p, etc.)""" """Extract frame class from filename (480p, 720p, 1080p, 2160p, etc.)"""
# First check for specific numeric resolutions
match = re.search(r'(\d{3,4})[pi]', self.file_name, re.IGNORECASE) match = re.search(r'(\d{3,4})[pi]', self.file_name, re.IGNORECASE)
if match: if match:
height = int(match.group(1)) height = int(match.group(1))
return self._get_frame_class_from_height(height) return self._get_frame_class_from_height(height)
# If no specific resolution found, check for quality indicators
unclassified_indicators = ['SD', 'LQ', 'HD', 'QHD']
for indicator in unclassified_indicators:
if re.search(r'\b' + re.escape(indicator) + r'\b', self.file_name, re.IGNORECASE):
return 'Unclassified'
return 'Unclassified' return 'Unclassified'

View File

@@ -37,10 +37,6 @@ class MetadataExtractor:
return type(self.info).__name__ return type(self.info).__name__
return self._detect_by_mime() return self._detect_by_mime()
def extract_meta_description(self) -> str:
"""Extract meta description"""
meta_type = self.extract_meta_type()
return {info['meta_type']: info['description'] for info in MEDIA_TYPES.values()}.get(meta_type, f'Unknown type {meta_type}')
def _detect_by_mime(self) -> str: def _detect_by_mime(self) -> str:
"""Detect meta type by MIME""" """Detect meta type by MIME"""

View File

@@ -132,7 +132,10 @@ class MediaFormatter:
"label_formatters": [TextFormatter.bold, TextFormatter.uppercase], "label_formatters": [TextFormatter.bold, TextFormatter.uppercase],
} }
] ]
for item in self.extractor.get("tracks").get("video_tracks"):
# Get video tracks
video_tracks = self.extractor.get("video_tracks", "MediaInfo") or []
for item in video_tracks:
data.append( data.append(
{ {
"group": "Tracks Info", "group": "Tracks Info",
@@ -142,9 +145,10 @@ class MediaFormatter:
"display_formatters": [TextFormatter.green], "display_formatters": [TextFormatter.green],
} }
) )
for i, item in enumerate(
self.extractor.get("tracks").get("audio_tracks"), start=1 # Get audio tracks
): audio_tracks = self.extractor.get("audio_tracks", "MediaInfo") or []
for i, item in enumerate(audio_tracks, start=1):
data.append( data.append(
{ {
"group": "Tracks Info", "group": "Tracks Info",
@@ -154,9 +158,10 @@ class MediaFormatter:
"display_formatters": [TextFormatter.yellow], "display_formatters": [TextFormatter.yellow],
} }
) )
for i, item in enumerate(
self.extractor.get("tracks").get("subtitle_tracks"), start=1 # Get subtitle tracks
): subtitle_tracks = self.extractor.get("subtitle_tracks", "MediaInfo") or []
for i, item in enumerate(subtitle_tracks, start=1):
data.append( data.append(
{ {
"group": "Tracks Info", "group": "Tracks Info",
@@ -195,13 +200,6 @@ class MediaFormatter:
"value": self.extractor.get("artist", "Metadata") or "Not extracted", "value": self.extractor.get("artist", "Metadata") or "Not extracted",
"display_formatters": [TextFormatter.grey], "display_formatters": [TextFormatter.grey],
}, },
{
"label": "Description",
"label_formatters": [TextFormatter.bold],
"value": self.extractor.get("meta_description", "Metadata")
or "Not extracted",
"display_formatters": [TextFormatter.grey],
},
] ]
return [self._format_data_item(item) for item in data] return [self._format_data_item(item) for item in data]

View File

@@ -90,7 +90,6 @@ The Invention of Lying (2009) [720p,ukr,eng].mkv
The Island of Dr. Moreau.(1977).[720p,ukr].mp4 The Island of Dr. Moreau.(1977).[720p,ukr].mp4
The Killing.(1956).[SD,ukr,eng].mkv The Killing.(1956).[SD,ukr,eng].mkv
The Love Guru.(2008).[SD,ukr].avi The Love Guru.(2008).[SD,ukr].avi
The Love Guru.(2008).[SD,ukr].avi
The Manchurian Candidate.(2004).[720p,ukr,eng].mkv The Manchurian Candidate.(2004).[720p,ukr,eng].mkv
The Mortal Instruments. City of Bones.(2013).[720p,ukr,eng].mkv The Mortal Instruments. City of Bones.(2013).[720p,ukr,eng].mkv
The Mutant Chronicles.(2008).[SD,ukr,eng].mkv The Mutant Chronicles.(2008).[SD,ukr,eng].mkv
@@ -203,3 +202,10 @@ Upgrade.(2018).[SD,eng].mkv
Человек с бульвара Капуцинов (1987) [1080p,rus] [tmdbid-45227].mkv Человек с бульвара Капуцинов (1987) [1080p,rus] [tmdbid-45227].mkv
Человек-амфибия (1961) [SD,rus] [tmdbid-43685].avi Человек-амфибия (1961) [SD,rus] [tmdbid-43685].avi
Чук и Гек (1953) [SD,rus] [tmdbid-148412].avi Чук и Гек (1953) [SD,rus] [tmdbid-148412].avi
The long title.(2008).[SD 720p,ukr].avi
The_long_title.(2008).2K.1440p.ukr.avi
The long title (2008) SD 720p UKR.avi
The long title (2008) UHD 1440p ENG.mp4
The long title (2008) UHD 1440 ENG.mp4
The long title (2008) 8K 4320p ENG.mp4

View File

@@ -4,32 +4,35 @@ from renamer.extractors.fileinfo_extractor import FileInfoExtractor
class TestFileInfoExtractor: class TestFileInfoExtractor:
@pytest.fixture
def extractor(self, test_file):
return FileInfoExtractor(test_file)
@pytest.fixture @pytest.fixture
def test_file(self): def test_file(self):
"""Use the filenames.txt file for testing""" """Use the filenames.txt file for testing"""
return Path(__file__).parent / "filenames.txt" return Path(__file__).parent / "filenames.txt"
def test_extract_size(self, test_file): def test_extract_size(self, extractor):
"""Test extracting file size""" """Test extracting file size"""
size = FileInfoExtractor.extract_size(test_file) size = extractor.extract_size()
assert isinstance(size, int) assert isinstance(size, int)
assert size > 0 assert size > 0
def test_extract_modification_time(self, test_file): def test_extract_modification_time(self, extractor):
"""Test extracting modification time""" """Test extracting modification time"""
mtime = FileInfoExtractor.extract_modification_time(test_file) mtime = extractor.extract_modification_time()
assert isinstance(mtime, float) assert isinstance(mtime, float)
assert mtime > 0 assert mtime > 0
def test_extract_file_name(self, test_file): def test_extract_file_name(self, extractor):
"""Test extracting file name""" """Test extracting file name"""
name = FileInfoExtractor.extract_file_name(test_file) name = extractor.extract_file_name()
assert isinstance(name, str) assert isinstance(name, str)
assert name == "filenames.txt" assert name == "filenames.txt"
def test_extract_file_path(self, test_file): def test_extract_file_path(self, extractor):
"""Test extracting file path""" """Test extracting file path"""
path = FileInfoExtractor.extract_file_path(test_file) path = extractor.extract_file_path()
assert isinstance(path, str) assert isinstance(path, str)
assert "filenames.txt" in path assert "filenames.txt" in path
assert str(test_file) == path

View File

@@ -17,7 +17,8 @@ def load_test_filenames():
def test_extract_title(filename): def test_extract_title(filename):
"""Test title extraction from filename""" """Test title extraction from filename"""
file_path = Path(filename) file_path = Path(filename)
title = FilenameExtractor.extract_title(file_path) extractor = FilenameExtractor(file_path)
title = extractor.extract_title()
# Print filename and extracted title clearly # Print filename and extracted title clearly
print(f"\nFilename: \033[1;36m{filename}\033[0m") print(f"\nFilename: \033[1;36m{filename}\033[0m")
print(f"Extracted title: \033[1;32m{title}\033[0m") print(f"Extracted title: \033[1;32m{title}\033[0m")
@@ -29,7 +30,8 @@ def test_extract_title(filename):
def test_extract_year(filename): def test_extract_year(filename):
"""Test year extraction from filename""" """Test year extraction from filename"""
file_path = Path(filename) file_path = Path(filename)
year = FilenameExtractor.extract_year(file_path) extractor = FilenameExtractor(file_path)
year = extractor.extract_year()
# Print filename and extracted year clearly # Print filename and extracted year clearly
print(f"\nFilename: \033[1;36m{filename}\033[0m") print(f"\nFilename: \033[1;36m{filename}\033[0m")
print(f"Extracted year: \033[1;32m{year}\033[0m") print(f"Extracted year: \033[1;32m{year}\033[0m")
@@ -42,7 +44,8 @@ def test_extract_year(filename):
def test_extract_source(filename): def test_extract_source(filename):
"""Test source extraction from filename""" """Test source extraction from filename"""
file_path = Path(filename) file_path = Path(filename)
source = FilenameExtractor.extract_source(file_path) extractor = FilenameExtractor(file_path)
source = extractor.extract_source()
# Print filename and extracted source clearly # Print filename and extracted source clearly
print(f"\nFilename: \033[1;36m{filename}\033[0m") print(f"\nFilename: \033[1;36m{filename}\033[0m")
print(f"Extracted source: \033[1;32m{source}\033[0m") print(f"Extracted source: \033[1;32m{source}\033[0m")
@@ -54,7 +57,8 @@ def test_extract_source(filename):
def test_extract_frame_class(filename): def test_extract_frame_class(filename):
"""Test frame class extraction from filename""" """Test frame class extraction from filename"""
file_path = Path(filename) file_path = Path(filename)
frame_class = FilenameExtractor.extract_frame_class(file_path) extractor = FilenameExtractor(file_path)
frame_class = extractor.extract_frame_class()
# Print filename and extracted frame class clearly # Print filename and extracted frame class clearly
print(f"\nFilename: \033[1;36m{filename}\033[0m") print(f"\nFilename: \033[1;36m{filename}\033[0m")
print(f"Extracted frame_class: \033[1;32m{frame_class}\033[0m") print(f"Extracted frame_class: \033[1;32m{frame_class}\033[0m")

View File

@@ -5,8 +5,8 @@ from renamer.extractors.mediainfo_extractor import MediaInfoExtractor
class TestMediaInfoExtractor: class TestMediaInfoExtractor:
@pytest.fixture @pytest.fixture
def extractor(self): def extractor(self, test_file):
return MediaInfoExtractor() return MediaInfoExtractor(test_file)
@pytest.fixture @pytest.fixture
def test_file(self): def test_file(self):
@@ -15,18 +15,18 @@ class TestMediaInfoExtractor:
def test_extract_resolution(self, extractor, test_file): def test_extract_resolution(self, extractor, test_file):
"""Test extracting resolution from media info""" """Test extracting resolution from media info"""
resolution = extractor.extract_resolution(test_file) resolution = extractor.extract_resolution()
# Text files don't have video resolution # Text files don't have video resolution
assert resolution is None assert resolution is None
def test_extract_hdr(self, extractor, test_file): def test_extract_hdr(self, extractor, test_file):
"""Test extracting HDR info""" """Test extracting HDR info"""
hdr = extractor.extract_hdr(test_file) hdr = extractor.extract_hdr()
# Text files don't have HDR # Text files don't have HDR
assert hdr is None assert hdr is None
def test_extract_audio_langs(self, extractor, test_file): def test_extract_audio_langs(self, extractor, test_file):
"""Test extracting audio languages""" """Test extracting audio languages"""
langs = extractor.extract_audio_langs(test_file) langs = extractor.extract_audio_langs()
# Text files don't have audio tracks # Text files don't have audio tracks
assert langs == '' assert langs == ''

View File

@@ -4,35 +4,29 @@ from renamer.extractors.metadata_extractor import MetadataExtractor
class TestMetadataExtractor: class TestMetadataExtractor:
@pytest.fixture
def extractor(self, test_file):
return MetadataExtractor(test_file)
@pytest.fixture @pytest.fixture
def test_file(self): def test_file(self):
"""Use the filenames.txt file for testing""" """Use the filenames.txt file for testing"""
return Path(__file__).parent / "filenames.txt" return Path(__file__).parent / "filenames.txt"
def test_extract_title(self, test_file): def test_extract_title(self, extractor):
"""Test extracting title from metadata""" """Test extracting title from metadata"""
title = MetadataExtractor.extract_title(test_file) title = extractor.extract_title()
# Text files don't have metadata, so should be None # Text files don't have metadata, so should be None
assert title is None assert title is None
def test_extract_duration(self, test_file): def test_extract_duration(self, extractor):
"""Test extracting duration from metadata""" """Test extracting duration from metadata"""
duration = MetadataExtractor.extract_duration(test_file) duration = extractor.extract_duration()
# Text files don't have duration # Text files don't have duration
assert duration is None assert duration is None
def test_extract_artist(self, test_file): def test_extract_artist(self, extractor):
"""Test extracting artist from metadata""" """Test extracting artist from metadata"""
artist = MetadataExtractor.extract_artist(test_file) artist = extractor.extract_artist()
# Text files don't have artist # Text files don't have artist
assert artist is None assert artist is None
def test_extract_all_metadata(self, test_file):
"""Test extracting all metadata"""
metadata = MetadataExtractor.extract_all_metadata(test_file)
expected = {
'title': None,
'duration': None,
'artist': None
}
assert metadata == expected