feat: Enhance metadata extraction and formatting, improve extractor architecture, and update documentation
This commit is contained in:
54
AI_AGENT.md
54
AI_AGENT.md
@@ -7,27 +7,45 @@ This is a Python Terminal User Interface (TUI) application for managing media fi
|
||||
Key features:
|
||||
- Recursive directory scanning
|
||||
- Tree-based file navigation
|
||||
- Detailed metadata extraction and display
|
||||
- Detailed metadata extraction and display from multiple sources
|
||||
- Color-coded information
|
||||
- Keyboard and mouse navigation
|
||||
- Extensible for future renaming and editing features
|
||||
- Extensible extractor and formatter architecture
|
||||
|
||||
## Technology Stack
|
||||
|
||||
- Python 3.11+
|
||||
- Textual (TUI framework)
|
||||
- Mutagen (audio/video metadata)
|
||||
- PyMediaInfo (detailed track information)
|
||||
- Mutagen (embedded metadata)
|
||||
- Python-Magic (MIME type detection)
|
||||
- UV (package manager)
|
||||
|
||||
## Code Structure
|
||||
|
||||
- `main.py`: Main application code
|
||||
- `main.py`: Main application entry point
|
||||
- `pyproject.toml`: Project configuration and dependencies
|
||||
- `README.md`: User documentation
|
||||
- `todo.txt`: Development task list
|
||||
- `AI_AGENT.md`: This file
|
||||
- `renamer/`: Main package
|
||||
- `app.py`: Main Textual application class
|
||||
- `extractor.py`: MediaExtractor class coordinating multiple extractors
|
||||
- `extractors/`: Individual extractor classes
|
||||
- `mediainfo_extractor.py`: PyMediaInfo-based extraction
|
||||
- `filename_extractor.py`: Filename parsing
|
||||
- `metadata_extractor.py`: Mutagen-based metadata
|
||||
- `fileinfo_extractor.py`: Basic file information
|
||||
- `formatters/`: Data formatting classes
|
||||
- `media_formatter.py`: Main formatter coordinating display
|
||||
- `track_formatter.py`: Track information formatting
|
||||
- `size_formatter.py`: File size formatting
|
||||
- `date_formatter.py`: Timestamp formatting
|
||||
- `duration_formatter.py`: Duration formatting
|
||||
- `resolution_formatter.py`: Resolution formatting
|
||||
- `text_formatter.py`: Text styling utilities
|
||||
- `constants.py`: Application constants
|
||||
- `screens.py`: Additional UI screens
|
||||
- `test/`: Unit tests
|
||||
|
||||
## Instructions for AI Agents
|
||||
|
||||
@@ -42,19 +60,35 @@ Key features:
|
||||
|
||||
### Development Workflow
|
||||
|
||||
1. Read the current code and understand the structure
|
||||
1. Read the current code and understand the architecture
|
||||
2. Check the TODO list for pending tasks
|
||||
3. Implement features incrementally
|
||||
4. Test changes by running the app with `uv run python main.py [directory]`
|
||||
5. Update TODO list as tasks are completed
|
||||
5. Update tests as needed
|
||||
6. Ensure backward compatibility
|
||||
|
||||
### Key Components
|
||||
|
||||
- `RenamerApp`: Main application class inheriting from Textual's App
|
||||
- `MediaTree`: Custom Tree widget with file-specific styling
|
||||
- `get_media_tracks`: Function to extract media track information
|
||||
- Various helper functions for formatting and detection
|
||||
- `MediaExtractor`: Coordinates multiple specialized extractors
|
||||
- `MediaFormatter`: Formats extracted data for TUI display
|
||||
- Various extractor classes for different data sources
|
||||
- Various formatter classes for different data types
|
||||
|
||||
### Extractor Architecture
|
||||
|
||||
Extractors are responsible for gathering raw data from different sources:
|
||||
- Each extractor inherits from no base class but follows the pattern of `__init__(file_path)` and `extract_*()` methods
|
||||
- The `MediaExtractor` class coordinates multiple extractors and provides a unified `get()` interface
|
||||
- Extractors return raw data (strings, numbers, dicts) without formatting
|
||||
|
||||
### Formatter Architecture
|
||||
|
||||
Formatters are responsible for converting raw data into display strings:
|
||||
- Each formatter provides static methods like `format_*()`
|
||||
- The `MediaFormatter` coordinates formatters and applies them based on data types
|
||||
- Formatters handle text styling, color coding, and human-readable representations
|
||||
|
||||
### Future Enhancements
|
||||
|
||||
@@ -69,6 +103,7 @@ Key features:
|
||||
- Test navigation, selection, and display
|
||||
- Verify metadata extraction accuracy
|
||||
- Check for any errors or edge cases
|
||||
- Run unit tests with `uv run pytest`
|
||||
|
||||
### Contribution Guidelines
|
||||
|
||||
@@ -76,5 +111,6 @@ Key features:
|
||||
- Update documentation as needed
|
||||
- Ensure the app runs without errors
|
||||
- Follow the existing code patterns
|
||||
- Update tests for new functionality
|
||||
|
||||
This document should be updated as the project evolves.
|
||||
30
README.md
30
README.md
@@ -6,9 +6,11 @@ A terminal-based (TUI) application for scanning directories, viewing media file
|
||||
|
||||
- Recursive directory scanning for video files
|
||||
- Tree view navigation with keyboard and mouse support
|
||||
- File details display (size, extensions, metadata)
|
||||
- Detailed metadata extraction from multiple sources (MediaInfo, filename parsing, embedded metadata)
|
||||
- Color-coded information display
|
||||
- Command-based interface with hotkeys
|
||||
- Container type detection using Mutagen
|
||||
- Extensible extractor and formatter system
|
||||
- Support for video, audio, and subtitle track information
|
||||
|
||||
## Installation
|
||||
|
||||
@@ -54,7 +56,23 @@ renamer /path/to/media/directory
|
||||
- Mouse clicks supported
|
||||
- Select a video file to view its details in the right panel
|
||||
|
||||
## Development
|
||||
## Architecture
|
||||
|
||||
The application uses a modular architecture with separate extractors and formatters:
|
||||
|
||||
### Extractors
|
||||
- **MediaInfoExtractor**: Extracts detailed track information using PyMediaInfo
|
||||
- **FilenameExtractor**: Parses metadata from filenames
|
||||
- **MetadataExtractor**: Extracts embedded metadata using Mutagen
|
||||
- **FileInfoExtractor**: Provides basic file information
|
||||
|
||||
### Formatters
|
||||
- **MediaFormatter**: Formats extracted data for display
|
||||
- **TrackFormatter**: Formats video/audio/subtitle track information
|
||||
- **SizeFormatter**: Formats file sizes
|
||||
- **DateFormatter**: Formats timestamps
|
||||
- **DurationFormatter**: Formats time durations
|
||||
- **ResolutionFormatter**: Formats video resolutions
|
||||
|
||||
### Setup Development Environment
|
||||
```bash
|
||||
@@ -79,7 +97,7 @@ uv run python main.py /path/to/directory
|
||||
|
||||
### Uninstall
|
||||
```bash
|
||||
uv tool uninstall renamerq
|
||||
uv tool uninstall renamer
|
||||
```
|
||||
|
||||
## Supported Video Formats
|
||||
@@ -96,4 +114,6 @@ uv tool uninstall renamerq
|
||||
|
||||
## Dependencies
|
||||
- textual: TUI framework
|
||||
- mutagen: Media metadata detection
|
||||
- pymediainfo: Detailed media track information
|
||||
- mutagen: Embedded metadata extraction
|
||||
- python-magic: MIME type detection
|
||||
|
||||
@@ -36,7 +36,7 @@ MEDIA_TYPES = {
|
||||
}
|
||||
|
||||
SOURCE_DICT = {
|
||||
"WEB-DL": ["WEB-DL", "WEBRip", "WEB-Rip", "WEB"],
|
||||
"WEB-DL": ["WEB-DL", "WEBRip", "WEB-Rip", "WEB", "WEB-DLRip"],
|
||||
"BDRip": ["BDRip", "BD-Rip", "BDRIP"],
|
||||
"BDRemux": ["BDRemux", "BD-Remux", "BDREMUX"],
|
||||
"DVDRip": ["DVDRip", "DVD-Rip", "DVDRIP"],
|
||||
|
||||
@@ -64,6 +64,15 @@ class MediaExtractor:
|
||||
'extension': [
|
||||
('FileInfo', lambda: self.fileinfo_extractor.extract_extension())
|
||||
],
|
||||
'video_tracks': [
|
||||
('MediaInfo', lambda: self.mediainfo_extractor.extract_video_tracks())
|
||||
],
|
||||
'audio_tracks': [
|
||||
('MediaInfo', lambda: self.mediainfo_extractor.extract_audio_tracks())
|
||||
],
|
||||
'subtitle_tracks': [
|
||||
('MediaInfo', lambda: self.mediainfo_extractor.extract_subtitle_tracks())
|
||||
],
|
||||
}
|
||||
|
||||
# Conditions for when a value is considered valid
|
||||
@@ -76,8 +85,10 @@ class MediaExtractor:
|
||||
'aspect_ratio': lambda x: x is not None,
|
||||
'hdr': lambda x: x is not None,
|
||||
'audio_langs': lambda x: x is not None,
|
||||
'metadata': lambda x: x is not None,
|
||||
'tracks': lambda x: x is not None and any(x.get(k, []) for k in ['video_tracks', 'audio_tracks', 'subtitle_tracks'])
|
||||
'tracks': lambda x: x is not None and any(x.get(k, []) for k in ['video_tracks', 'audio_tracks', 'subtitle_tracks']),
|
||||
'video_tracks': lambda x: x is not None and len(x) > 0,
|
||||
'audio_tracks': lambda x: x is not None and len(x) > 0,
|
||||
'subtitle_tracks': lambda x: x is not None and len(x) > 0,
|
||||
}
|
||||
|
||||
def get(self, key: str, source: str | None = None):
|
||||
|
||||
@@ -31,8 +31,26 @@ class FilenameExtractor:
|
||||
|
||||
def extract_year(self) -> str | None:
|
||||
"""Extract year from filename"""
|
||||
year_match = re.search(r'\((\d{4})\)|(\d{4})', self.file_name)
|
||||
return (year_match.group(1) or year_match.group(2)) if year_match else None
|
||||
# First try to find year in parentheses (most common and reliable)
|
||||
paren_match = re.search(r'\((\d{4})\)', self.file_name)
|
||||
if paren_match:
|
||||
return paren_match.group(1)
|
||||
|
||||
# Fallback: look for year in dots (like .1971.)
|
||||
dot_match = re.search(r'\.(\d{4})\.', self.file_name)
|
||||
if dot_match:
|
||||
return dot_match.group(1)
|
||||
|
||||
# Last resort: any 4-digit number (but this is less reliable)
|
||||
any_match = re.search(r'\b(\d{4})\b', self.file_name)
|
||||
if any_match:
|
||||
year = any_match.group(1)
|
||||
# Basic sanity check: years should be between 1900 and current year + a few years
|
||||
current_year = 2025 # Update this as needed
|
||||
if 1900 <= int(year) <= current_year + 10:
|
||||
return year
|
||||
|
||||
return None
|
||||
|
||||
def extract_source(self) -> str | None:
|
||||
"""Extract video source from filename"""
|
||||
@@ -40,14 +58,22 @@ class FilenameExtractor:
|
||||
|
||||
for src, aliases in SOURCE_DICT.items():
|
||||
for alias in aliases:
|
||||
if re.search(r'\b' + re.escape(alias) + r'\b', temp_name, re.IGNORECASE):
|
||||
if alias.upper() in temp_name.upper():
|
||||
return src
|
||||
return None
|
||||
|
||||
def extract_frame_class(self) -> str | None:
|
||||
"""Extract frame class from filename (480p, 720p, 1080p, 2160p, etc.)"""
|
||||
# First check for specific numeric resolutions
|
||||
match = re.search(r'(\d{3,4})[pi]', self.file_name, re.IGNORECASE)
|
||||
if match:
|
||||
height = int(match.group(1))
|
||||
return self._get_frame_class_from_height(height)
|
||||
|
||||
# If no specific resolution found, check for quality indicators
|
||||
unclassified_indicators = ['SD', 'LQ', 'HD', 'QHD']
|
||||
for indicator in unclassified_indicators:
|
||||
if re.search(r'\b' + re.escape(indicator) + r'\b', self.file_name, re.IGNORECASE):
|
||||
return 'Unclassified'
|
||||
|
||||
return 'Unclassified'
|
||||
@@ -37,10 +37,6 @@ class MetadataExtractor:
|
||||
return type(self.info).__name__
|
||||
return self._detect_by_mime()
|
||||
|
||||
def extract_meta_description(self) -> str:
|
||||
"""Extract meta description"""
|
||||
meta_type = self.extract_meta_type()
|
||||
return {info['meta_type']: info['description'] for info in MEDIA_TYPES.values()}.get(meta_type, f'Unknown type {meta_type}')
|
||||
|
||||
def _detect_by_mime(self) -> str:
|
||||
"""Detect meta type by MIME"""
|
||||
|
||||
@@ -132,7 +132,10 @@ class MediaFormatter:
|
||||
"label_formatters": [TextFormatter.bold, TextFormatter.uppercase],
|
||||
}
|
||||
]
|
||||
for item in self.extractor.get("tracks").get("video_tracks"):
|
||||
|
||||
# Get video tracks
|
||||
video_tracks = self.extractor.get("video_tracks", "MediaInfo") or []
|
||||
for item in video_tracks:
|
||||
data.append(
|
||||
{
|
||||
"group": "Tracks Info",
|
||||
@@ -142,9 +145,10 @@ class MediaFormatter:
|
||||
"display_formatters": [TextFormatter.green],
|
||||
}
|
||||
)
|
||||
for i, item in enumerate(
|
||||
self.extractor.get("tracks").get("audio_tracks"), start=1
|
||||
):
|
||||
|
||||
# Get audio tracks
|
||||
audio_tracks = self.extractor.get("audio_tracks", "MediaInfo") or []
|
||||
for i, item in enumerate(audio_tracks, start=1):
|
||||
data.append(
|
||||
{
|
||||
"group": "Tracks Info",
|
||||
@@ -154,9 +158,10 @@ class MediaFormatter:
|
||||
"display_formatters": [TextFormatter.yellow],
|
||||
}
|
||||
)
|
||||
for i, item in enumerate(
|
||||
self.extractor.get("tracks").get("subtitle_tracks"), start=1
|
||||
):
|
||||
|
||||
# Get subtitle tracks
|
||||
subtitle_tracks = self.extractor.get("subtitle_tracks", "MediaInfo") or []
|
||||
for i, item in enumerate(subtitle_tracks, start=1):
|
||||
data.append(
|
||||
{
|
||||
"group": "Tracks Info",
|
||||
@@ -195,13 +200,6 @@ class MediaFormatter:
|
||||
"value": self.extractor.get("artist", "Metadata") or "Not extracted",
|
||||
"display_formatters": [TextFormatter.grey],
|
||||
},
|
||||
{
|
||||
"label": "Description",
|
||||
"label_formatters": [TextFormatter.bold],
|
||||
"value": self.extractor.get("meta_description", "Metadata")
|
||||
or "Not extracted",
|
||||
"display_formatters": [TextFormatter.grey],
|
||||
},
|
||||
]
|
||||
|
||||
return [self._format_data_item(item) for item in data]
|
||||
|
||||
@@ -90,7 +90,6 @@ The Invention of Lying (2009) [720p,ukr,eng].mkv
|
||||
The Island of Dr. Moreau.(1977).[720p,ukr].mp4
|
||||
The Killing.(1956).[SD,ukr,eng].mkv
|
||||
The Love Guru.(2008).[SD,ukr].avi
|
||||
The Love Guru.(2008).[SD,ukr].avi
|
||||
The Manchurian Candidate.(2004).[720p,ukr,eng].mkv
|
||||
The Mortal Instruments. City of Bones.(2013).[720p,ukr,eng].mkv
|
||||
The Mutant Chronicles.(2008).[SD,ukr,eng].mkv
|
||||
@@ -203,3 +202,10 @@ Upgrade.(2018).[SD,eng].mkv
|
||||
Человек с бульвара Капуцинов (1987) [1080p,rus] [tmdbid-45227].mkv
|
||||
Человек-амфибия (1961) [SD,rus] [tmdbid-43685].avi
|
||||
Чук и Гек (1953) [SD,rus] [tmdbid-148412].avi
|
||||
The long title.(2008).[SD 720p,ukr].avi
|
||||
The_long_title.(2008).2K.1440p.ukr.avi
|
||||
The long title (2008) SD 720p UKR.avi
|
||||
The long title (2008) UHD 1440p ENG.mp4
|
||||
The long title (2008) UHD 1440 ENG.mp4
|
||||
The long title (2008) 8K 4320p ENG.mp4
|
||||
|
||||
|
||||
@@ -4,32 +4,35 @@ from renamer.extractors.fileinfo_extractor import FileInfoExtractor
|
||||
|
||||
|
||||
class TestFileInfoExtractor:
|
||||
@pytest.fixture
|
||||
def extractor(self, test_file):
|
||||
return FileInfoExtractor(test_file)
|
||||
|
||||
@pytest.fixture
|
||||
def test_file(self):
|
||||
"""Use the filenames.txt file for testing"""
|
||||
return Path(__file__).parent / "filenames.txt"
|
||||
|
||||
def test_extract_size(self, test_file):
|
||||
def test_extract_size(self, extractor):
|
||||
"""Test extracting file size"""
|
||||
size = FileInfoExtractor.extract_size(test_file)
|
||||
size = extractor.extract_size()
|
||||
assert isinstance(size, int)
|
||||
assert size > 0
|
||||
|
||||
def test_extract_modification_time(self, test_file):
|
||||
def test_extract_modification_time(self, extractor):
|
||||
"""Test extracting modification time"""
|
||||
mtime = FileInfoExtractor.extract_modification_time(test_file)
|
||||
mtime = extractor.extract_modification_time()
|
||||
assert isinstance(mtime, float)
|
||||
assert mtime > 0
|
||||
|
||||
def test_extract_file_name(self, test_file):
|
||||
def test_extract_file_name(self, extractor):
|
||||
"""Test extracting file name"""
|
||||
name = FileInfoExtractor.extract_file_name(test_file)
|
||||
name = extractor.extract_file_name()
|
||||
assert isinstance(name, str)
|
||||
assert name == "filenames.txt"
|
||||
|
||||
def test_extract_file_path(self, test_file):
|
||||
def test_extract_file_path(self, extractor):
|
||||
"""Test extracting file path"""
|
||||
path = FileInfoExtractor.extract_file_path(test_file)
|
||||
path = extractor.extract_file_path()
|
||||
assert isinstance(path, str)
|
||||
assert "filenames.txt" in path
|
||||
assert str(test_file) == path
|
||||
@@ -17,7 +17,8 @@ def load_test_filenames():
|
||||
def test_extract_title(filename):
|
||||
"""Test title extraction from filename"""
|
||||
file_path = Path(filename)
|
||||
title = FilenameExtractor.extract_title(file_path)
|
||||
extractor = FilenameExtractor(file_path)
|
||||
title = extractor.extract_title()
|
||||
# Print filename and extracted title clearly
|
||||
print(f"\nFilename: \033[1;36m{filename}\033[0m")
|
||||
print(f"Extracted title: \033[1;32m{title}\033[0m")
|
||||
@@ -29,7 +30,8 @@ def test_extract_title(filename):
|
||||
def test_extract_year(filename):
|
||||
"""Test year extraction from filename"""
|
||||
file_path = Path(filename)
|
||||
year = FilenameExtractor.extract_year(file_path)
|
||||
extractor = FilenameExtractor(file_path)
|
||||
year = extractor.extract_year()
|
||||
# Print filename and extracted year clearly
|
||||
print(f"\nFilename: \033[1;36m{filename}\033[0m")
|
||||
print(f"Extracted year: \033[1;32m{year}\033[0m")
|
||||
@@ -42,7 +44,8 @@ def test_extract_year(filename):
|
||||
def test_extract_source(filename):
|
||||
"""Test source extraction from filename"""
|
||||
file_path = Path(filename)
|
||||
source = FilenameExtractor.extract_source(file_path)
|
||||
extractor = FilenameExtractor(file_path)
|
||||
source = extractor.extract_source()
|
||||
# Print filename and extracted source clearly
|
||||
print(f"\nFilename: \033[1;36m{filename}\033[0m")
|
||||
print(f"Extracted source: \033[1;32m{source}\033[0m")
|
||||
@@ -54,7 +57,8 @@ def test_extract_source(filename):
|
||||
def test_extract_frame_class(filename):
|
||||
"""Test frame class extraction from filename"""
|
||||
file_path = Path(filename)
|
||||
frame_class = FilenameExtractor.extract_frame_class(file_path)
|
||||
extractor = FilenameExtractor(file_path)
|
||||
frame_class = extractor.extract_frame_class()
|
||||
# Print filename and extracted frame class clearly
|
||||
print(f"\nFilename: \033[1;36m{filename}\033[0m")
|
||||
print(f"Extracted frame_class: \033[1;32m{frame_class}\033[0m")
|
||||
|
||||
@@ -5,8 +5,8 @@ from renamer.extractors.mediainfo_extractor import MediaInfoExtractor
|
||||
|
||||
class TestMediaInfoExtractor:
|
||||
@pytest.fixture
|
||||
def extractor(self):
|
||||
return MediaInfoExtractor()
|
||||
def extractor(self, test_file):
|
||||
return MediaInfoExtractor(test_file)
|
||||
|
||||
@pytest.fixture
|
||||
def test_file(self):
|
||||
@@ -15,18 +15,18 @@ class TestMediaInfoExtractor:
|
||||
|
||||
def test_extract_resolution(self, extractor, test_file):
|
||||
"""Test extracting resolution from media info"""
|
||||
resolution = extractor.extract_resolution(test_file)
|
||||
resolution = extractor.extract_resolution()
|
||||
# Text files don't have video resolution
|
||||
assert resolution is None
|
||||
|
||||
def test_extract_hdr(self, extractor, test_file):
|
||||
"""Test extracting HDR info"""
|
||||
hdr = extractor.extract_hdr(test_file)
|
||||
hdr = extractor.extract_hdr()
|
||||
# Text files don't have HDR
|
||||
assert hdr is None
|
||||
|
||||
def test_extract_audio_langs(self, extractor, test_file):
|
||||
"""Test extracting audio languages"""
|
||||
langs = extractor.extract_audio_langs(test_file)
|
||||
langs = extractor.extract_audio_langs()
|
||||
# Text files don't have audio tracks
|
||||
assert langs == ''
|
||||
@@ -4,35 +4,29 @@ from renamer.extractors.metadata_extractor import MetadataExtractor
|
||||
|
||||
|
||||
class TestMetadataExtractor:
|
||||
@pytest.fixture
|
||||
def extractor(self, test_file):
|
||||
return MetadataExtractor(test_file)
|
||||
|
||||
@pytest.fixture
|
||||
def test_file(self):
|
||||
"""Use the filenames.txt file for testing"""
|
||||
return Path(__file__).parent / "filenames.txt"
|
||||
|
||||
def test_extract_title(self, test_file):
|
||||
def test_extract_title(self, extractor):
|
||||
"""Test extracting title from metadata"""
|
||||
title = MetadataExtractor.extract_title(test_file)
|
||||
title = extractor.extract_title()
|
||||
# Text files don't have metadata, so should be None
|
||||
assert title is None
|
||||
|
||||
def test_extract_duration(self, test_file):
|
||||
def test_extract_duration(self, extractor):
|
||||
"""Test extracting duration from metadata"""
|
||||
duration = MetadataExtractor.extract_duration(test_file)
|
||||
duration = extractor.extract_duration()
|
||||
# Text files don't have duration
|
||||
assert duration is None
|
||||
|
||||
def test_extract_artist(self, test_file):
|
||||
def test_extract_artist(self, extractor):
|
||||
"""Test extracting artist from metadata"""
|
||||
artist = MetadataExtractor.extract_artist(test_file)
|
||||
artist = extractor.extract_artist()
|
||||
# Text files don't have artist
|
||||
assert artist is None
|
||||
|
||||
def test_extract_all_metadata(self, test_file):
|
||||
"""Test extracting all metadata"""
|
||||
metadata = MetadataExtractor.extract_all_metadata(test_file)
|
||||
expected = {
|
||||
'title': None,
|
||||
'duration': None,
|
||||
'artist': None
|
||||
}
|
||||
assert metadata == expected
|
||||
Reference in New Issue
Block a user