feat: Enhance metadata extraction and formatting, improve extractor architecture, and update documentation

2025-12-26 13:38:17 +00:00
parent 8f68624529
commit 91df347727
13 changed files with 170 additions and 76 deletions
--- a/AI_AGENT.md
+++ b/AI_AGENT.md
@@ -7,27 +7,45 @@ This is a Python Terminal User Interface (TUI) application for managing media fi
 Key features:
 - Recursive directory scanning
 - Tree-based file navigation
- Detailed metadata extraction and display
+- Detailed metadata extraction and display from multiple sources
 - Color-coded information
 - Keyboard and mouse navigation
- Extensible for future renaming and editing features
+- Extensible extractor and formatter architecture
 ## Technology Stack
 - Python 3.11+
 - Textual (TUI framework)
 - Mutagen (audio/video metadata)
 - PyMediaInfo (detailed track information)
 - Mutagen (embedded metadata)
 - Python-Magic (MIME type detection)
 - UV (package manager)
 ## Code Structure
- `main.py`: Main application code
+- `main.py`: Main application entry point
 - `pyproject.toml`: Project configuration and dependencies
 - `README.md`: User documentation
 - `todo.txt`: Development task list
 - `AI_AGENT.md`: This file
 - `renamer/`: Main package
  - `app.py`: Main Textual application class
  - `extractor.py`: MediaExtractor class coordinating multiple extractors
  - `extractors/`: Individual extractor classes
    - `mediainfo_extractor.py`: PyMediaInfo-based extraction
    - `filename_extractor.py`: Filename parsing
    - `metadata_extractor.py`: Mutagen-based metadata
    - `fileinfo_extractor.py`: Basic file information
  - `formatters/`: Data formatting classes
    - `media_formatter.py`: Main formatter coordinating display
    - `track_formatter.py`: Track information formatting
    - `size_formatter.py`: File size formatting
    - `date_formatter.py`: Timestamp formatting
    - `duration_formatter.py`: Duration formatting
    - `resolution_formatter.py`: Resolution formatting
    - `text_formatter.py`: Text styling utilities
  - `constants.py`: Application constants
  - `screens.py`: Additional UI screens
  - `test/`: Unit tests
 ## Instructions for AI Agents
@@ -42,19 +60,35 @@ Key features:
 ### Development Workflow
-1. Read the current code and understand the structure
+1. Read the current code and understand the architecture
 2. Check the TODO list for pending tasks
 3. Implement features incrementally
 4. Test changes by running the app with `uv run python main.py [directory]`
-5. Update TODO list as tasks are completed
+5. Update tests as needed
 6. Ensure backward compatibility
 ### Key Components
 - `RenamerApp`: Main application class inheriting from Textual's App
 - `MediaTree`: Custom Tree widget with file-specific styling
- `get_media_tracks`: Function to extract media track information
+- `MediaExtractor`: Coordinates multiple specialized extractors
- Various helper functions for formatting and detection
+- `MediaFormatter`: Formats extracted data for TUI display
 - Various extractor classes for different data sources
 - Various formatter classes for different data types
 ### Extractor Architecture
 Extractors are responsible for gathering raw data from different sources:
 - Each extractor inherits from no base class but follows the pattern of `__init__(file_path)` and `extract_*()` methods
 - The `MediaExtractor` class coordinates multiple extractors and provides a unified `get()` interface
 - Extractors return raw data (strings, numbers, dicts) without formatting
 ### Formatter Architecture
 Formatters are responsible for converting raw data into display strings:
 - Each formatter provides static methods like `format_*()`
 - The `MediaFormatter` coordinates formatters and applies them based on data types
 - Formatters handle text styling, color coding, and human-readable representations
 ### Future Enhancements
@@ -69,6 +103,7 @@ Key features:
 - Test navigation, selection, and display
 - Verify metadata extraction accuracy
 - Check for any errors or edge cases
 - Run unit tests with `uv run pytest`
 ### Contribution Guidelines
@@ -76,5 +111,6 @@ Key features:
 - Update documentation as needed
 - Ensure the app runs without errors
 - Follow the existing code patterns
 - Update tests for new functionality
 This document should be updated as the project evolves.
--- a/README.md
+++ b/README.md
@@ -6,9 +6,11 @@ A terminal-based (TUI) application for scanning directories, viewing media file
 - Recursive directory scanning for video files
 - Tree view navigation with keyboard and mouse support
- File details display (size, extensions, metadata)
+- Detailed metadata extraction from multiple sources (MediaInfo, filename parsing, embedded metadata)
 - Color-coded information display
 - Command-based interface with hotkeys
- Container type detection using Mutagen
+- Extensible extractor and formatter system
 - Support for video, audio, and subtitle track information
 ## Installation
@@ -54,7 +56,23 @@ renamer /path/to/media/directory
 - Mouse clicks supported
 - Select a video file to view its details in the right panel
-## Development
+## Architecture
 The application uses a modular architecture with separate extractors and formatters:
 ### Extractors
 - **MediaInfoExtractor**: Extracts detailed track information using PyMediaInfo
 - **FilenameExtractor**: Parses metadata from filenames
 - **MetadataExtractor**: Extracts embedded metadata using Mutagen
 - **FileInfoExtractor**: Provides basic file information
 ### Formatters
 - **MediaFormatter**: Formats extracted data for display
 - **TrackFormatter**: Formats video/audio/subtitle track information
 - **SizeFormatter**: Formats file sizes
 - **DateFormatter**: Formats timestamps
 - **DurationFormatter**: Formats time durations
 - **ResolutionFormatter**: Formats video resolutions
 ### Setup Development Environment
 ```bash
@@ -79,7 +97,7 @@ uv run python main.py /path/to/directory
 ### Uninstall
 ```bash
-uv tool uninstall renamerq
+uv tool uninstall renamer
 ```
 ## Supported Video Formats
@@ -96,4 +114,6 @@ uv tool uninstall renamerq
 ## Dependencies
 - textual: TUI framework
- mutagen: Media metadata detection
+- pymediainfo: Detailed media track information
 - mutagen: Embedded metadata extraction
 - python-magic: MIME type detection
--- a/todo.txt
+++ b/todo.txt
--- a/renamer/constants.py
+++ b/renamer/constants.py
@@ -36,7 +36,7 @@ MEDIA_TYPES = {
 }
 SOURCE_DICT = {
-    "WEB-DL": ["WEB-DL", "WEBRip", "WEB-Rip", "WEB"],
+    "WEB-DL": ["WEB-DL", "WEBRip", "WEB-Rip", "WEB", "WEB-DLRip"],
    "BDRip": ["BDRip", "BD-Rip", "BDRIP"],
    "BDRemux": ["BDRemux", "BD-Remux", "BDREMUX"],
    "DVDRip": ["DVDRip", "DVD-Rip", "DVDRIP"],
--- a/renamer/extractor.py
+++ b/renamer/extractor.py
@@ -64,6 +64,15 @@ class MediaExtractor:
            'extension': [
                ('FileInfo', lambda: self.fileinfo_extractor.extract_extension())
            ],
            'video_tracks': [
                ('MediaInfo', lambda: self.mediainfo_extractor.extract_video_tracks())
            ],
            'audio_tracks': [
                ('MediaInfo', lambda: self.mediainfo_extractor.extract_audio_tracks())
            ],
            'subtitle_tracks': [
                ('MediaInfo', lambda: self.mediainfo_extractor.extract_subtitle_tracks())
            ],
        }
        # Conditions for when a value is considered valid
@@ -76,8 +85,10 @@ class MediaExtractor:
            'aspect_ratio': lambda x: x is not None,
            'hdr': lambda x: x is not None,
            'audio_langs': lambda x: x is not None,
-            'metadata': lambda x: x is not None,
+            'tracks': lambda x: x is not None and any(x.get(k, []) for k in ['video_tracks', 'audio_tracks', 'subtitle_tracks']),
-            'tracks': lambda x: x is not None and any(x.get(k, []) for k in ['video_tracks', 'audio_tracks', 'subtitle_tracks'])
+            'video_tracks': lambda x: x is not None and len(x) > 0,
            'audio_tracks': lambda x: x is not None and len(x) > 0,
            'subtitle_tracks': lambda x: x is not None and len(x) > 0,
        }
    def get(self, key: str, source: str | None = None):
--- a/renamer/extractors/filename_extractor.py
+++ b/renamer/extractors/filename_extractor.py
@@ -31,8 +31,26 @@ class FilenameExtractor:
    def extract_year(self) -> str | None:
        """Extract year from filename"""
-        year_match = re.search(r'\((\d{4})\)|(\d{4})', self.file_name)
+        # First try to find year in parentheses (most common and reliable)
-        return (year_match.group(1) or year_match.group(2)) if year_match else None
+        paren_match = re.search(r'\((\d{4})\)', self.file_name)
        if paren_match:
            return paren_match.group(1)
        # Fallback: look for year in dots (like .1971.)
        dot_match = re.search(r'\.(\d{4})\.', self.file_name)
        if dot_match:
            return dot_match.group(1)
        # Last resort: any 4-digit number (but this is less reliable)
        any_match = re.search(r'\b(\d{4})\b', self.file_name)
        if any_match:
            year = any_match.group(1)
            # Basic sanity check: years should be between 1900 and current year + a few years
            current_year = 2025  # Update this as needed
            if 1900 <= int(year) <= current_year + 10:
                return year
        return None
    def extract_source(self) -> str | None:
        """Extract video source from filename"""
@@ -40,14 +58,22 @@ class FilenameExtractor:
        for src, aliases in SOURCE_DICT.items():
            for alias in aliases:
-                if re.search(r'\b' + re.escape(alias) + r'\b', temp_name, re.IGNORECASE):
+                if alias.upper() in temp_name.upper():
                    return src
        return None
    def extract_frame_class(self) -> str | None:
        """Extract frame class from filename (480p, 720p, 1080p, 2160p, etc.)"""
        # First check for specific numeric resolutions
        match = re.search(r'(\d{3,4})[pi]', self.file_name, re.IGNORECASE)
        if match:
            height = int(match.group(1))
            return self._get_frame_class_from_height(height)
        # If no specific resolution found, check for quality indicators
        unclassified_indicators = ['SD', 'LQ', 'HD', 'QHD']
        for indicator in unclassified_indicators:
            if re.search(r'\b' + re.escape(indicator) + r'\b', self.file_name, re.IGNORECASE):
                return 'Unclassified'
        return 'Unclassified'
--- a/renamer/extractors/metadata_extractor.py
+++ b/renamer/extractors/metadata_extractor.py
@@ -37,10 +37,6 @@ class MetadataExtractor:
            return type(self.info).__name__
        return self._detect_by_mime()
    def extract_meta_description(self) -> str:
        """Extract meta description"""
        meta_type = self.extract_meta_type()
        return {info['meta_type']: info['description'] for info in MEDIA_TYPES.values()}.get(meta_type, f'Unknown type {meta_type}')
    def _detect_by_mime(self) -> str:
        """Detect meta type by MIME"""
--- a/renamer/formatters/media_formatter.py
+++ b/renamer/formatters/media_formatter.py
@@ -132,7 +132,10 @@ class MediaFormatter:
                "label_formatters": [TextFormatter.bold, TextFormatter.uppercase],
            }
        ]
-        for item in self.extractor.get("tracks").get("video_tracks"):
+        
        # Get video tracks
        video_tracks = self.extractor.get("video_tracks", "MediaInfo") or []
        for item in video_tracks:
            data.append(
                {
                    "group": "Tracks Info",
@@ -142,9 +145,10 @@ class MediaFormatter:
                    "display_formatters": [TextFormatter.green],
                }
            )
-        for i, item in enumerate(
+        
-            self.extractor.get("tracks").get("audio_tracks"), start=1
+        # Get audio tracks
-        ):
+        audio_tracks = self.extractor.get("audio_tracks", "MediaInfo") or []
        for i, item in enumerate(audio_tracks, start=1):
            data.append(
                {
                    "group": "Tracks Info",
@@ -154,9 +158,10 @@ class MediaFormatter:
                    "display_formatters": [TextFormatter.yellow],
                }
            )
-        for i, item in enumerate(
+        
-            self.extractor.get("tracks").get("subtitle_tracks"), start=1
+        # Get subtitle tracks
-        ):
+        subtitle_tracks = self.extractor.get("subtitle_tracks", "MediaInfo") or []
        for i, item in enumerate(subtitle_tracks, start=1):
            data.append(
                {
                    "group": "Tracks Info",
@@ -195,13 +200,6 @@ class MediaFormatter:
                "value": self.extractor.get("artist", "Metadata") or "Not extracted",
                "display_formatters": [TextFormatter.grey],
            },
            {
                "label": "Description",
                "label_formatters": [TextFormatter.bold],
                "value": self.extractor.get("meta_description", "Metadata")
                or "Not extracted",
                "display_formatters": [TextFormatter.grey],
            },
        ]
        return [self._format_data_item(item) for item in data]
--- a/renamer/test/filenames.txt
+++ b/renamer/test/filenames.txt
@@ -90,7 +90,6 @@ The Invention of Lying (2009) [720p,ukr,eng].mkv
 The Island of Dr. Moreau.(1977).[720p,ukr].mp4
 The Killing.(1956).[SD,ukr,eng].mkv
 The Love Guru.(2008).[SD,ukr].avi
 The Love Guru.(2008).[SD,ukr].avi
 The Manchurian Candidate.(2004).[720p,ukr,eng].mkv
 The Mortal Instruments. City of Bones.(2013).[720p,ukr,eng].mkv
 The Mutant Chronicles.(2008).[SD,ukr,eng].mkv
@@ -203,3 +202,10 @@ Upgrade.(2018).[SD,eng].mkv
 Человек с бульвара Капуцинов (1987) [1080p,rus] [tmdbid-45227].mkv
 Человек-амфибия (1961) [SD,rus] [tmdbid-43685].avi
 Чук и Гек (1953) [SD,rus] [tmdbid-148412].avi
 The long title.(2008).[SD 720p,ukr].avi
 The_long_title.(2008).2K.1440p.ukr.avi
 The long title (2008) SD 720p UKR.avi
 The long title (2008) UHD 1440p ENG.mp4
 The long title (2008) UHD 1440 ENG.mp4
 The long title (2008) 8K 4320p ENG.mp4
--- a/renamer/test/test_fileinfo_extractor.py
+++ b/renamer/test/test_fileinfo_extractor.py
@@ -4,32 +4,35 @@ from renamer.extractors.fileinfo_extractor import FileInfoExtractor
 class TestFileInfoExtractor:
    @pytest.fixture
    def extractor(self, test_file):
        return FileInfoExtractor(test_file)
    @pytest.fixture
    def test_file(self):
        """Use the filenames.txt file for testing"""
        return Path(__file__).parent / "filenames.txt"
-    def test_extract_size(self, test_file):
+    def test_extract_size(self, extractor):
        """Test extracting file size"""
-        size = FileInfoExtractor.extract_size(test_file)
+        size = extractor.extract_size()
        assert isinstance(size, int)
        assert size > 0
-    def test_extract_modification_time(self, test_file):
+    def test_extract_modification_time(self, extractor):
        """Test extracting modification time"""
-        mtime = FileInfoExtractor.extract_modification_time(test_file)
+        mtime = extractor.extract_modification_time()
        assert isinstance(mtime, float)
        assert mtime > 0
-    def test_extract_file_name(self, test_file):
+    def test_extract_file_name(self, extractor):
        """Test extracting file name"""
-        name = FileInfoExtractor.extract_file_name(test_file)
+        name = extractor.extract_file_name()
        assert isinstance(name, str)
        assert name == "filenames.txt"
-    def test_extract_file_path(self, test_file):
+    def test_extract_file_path(self, extractor):
        """Test extracting file path"""
-        path = FileInfoExtractor.extract_file_path(test_file)
+        path = extractor.extract_file_path()
        assert isinstance(path, str)
        assert "filenames.txt" in path
        assert str(test_file) == path
--- a/renamer/test/test_filename_extractor.py
+++ b/renamer/test/test_filename_extractor.py
@@ -17,7 +17,8 @@ def load_test_filenames():
 def test_extract_title(filename):
    """Test title extraction from filename"""
    file_path = Path(filename)
-    title = FilenameExtractor.extract_title(file_path)
+    extractor = FilenameExtractor(file_path)
    title = extractor.extract_title()
    # Print filename and extracted title clearly
    print(f"\nFilename: \033[1;36m{filename}\033[0m")
    print(f"Extracted title: \033[1;32m{title}\033[0m")
@@ -29,7 +30,8 @@ def test_extract_title(filename):
 def test_extract_year(filename):
    """Test year extraction from filename"""
    file_path = Path(filename)
-    year = FilenameExtractor.extract_year(file_path)
+    extractor = FilenameExtractor(file_path)
    year = extractor.extract_year()
    # Print filename and extracted year clearly
    print(f"\nFilename: \033[1;36m{filename}\033[0m")
    print(f"Extracted year: \033[1;32m{year}\033[0m")
@@ -42,7 +44,8 @@ def test_extract_year(filename):
 def test_extract_source(filename):
    """Test source extraction from filename"""
    file_path = Path(filename)
-    source = FilenameExtractor.extract_source(file_path)
+    extractor = FilenameExtractor(file_path)
    source = extractor.extract_source()
    # Print filename and extracted source clearly
    print(f"\nFilename: \033[1;36m{filename}\033[0m")
    print(f"Extracted source: \033[1;32m{source}\033[0m")
@@ -54,7 +57,8 @@ def test_extract_source(filename):
 def test_extract_frame_class(filename):
    """Test frame class extraction from filename"""
    file_path = Path(filename)
-    frame_class = FilenameExtractor.extract_frame_class(file_path)
+    extractor = FilenameExtractor(file_path)
    frame_class = extractor.extract_frame_class()
    # Print filename and extracted frame class clearly
    print(f"\nFilename: \033[1;36m{filename}\033[0m")
    print(f"Extracted frame_class: \033[1;32m{frame_class}\033[0m")
--- a/renamer/test/test_mediainfo_extractor.py
+++ b/renamer/test/test_mediainfo_extractor.py
@@ -5,8 +5,8 @@ from renamer.extractors.mediainfo_extractor import MediaInfoExtractor
 class TestMediaInfoExtractor:
    @pytest.fixture
-    def extractor(self):
+    def extractor(self, test_file):
-        return MediaInfoExtractor()
+        return MediaInfoExtractor(test_file)
    @pytest.fixture
    def test_file(self):
@@ -15,18 +15,18 @@ class TestMediaInfoExtractor:
    def test_extract_resolution(self, extractor, test_file):
        """Test extracting resolution from media info"""
-        resolution = extractor.extract_resolution(test_file)
+        resolution = extractor.extract_resolution()
        # Text files don't have video resolution
        assert resolution is None
    def test_extract_hdr(self, extractor, test_file):
        """Test extracting HDR info"""
-        hdr = extractor.extract_hdr(test_file)
+        hdr = extractor.extract_hdr()
        # Text files don't have HDR
        assert hdr is None
    def test_extract_audio_langs(self, extractor, test_file):
        """Test extracting audio languages"""
-        langs = extractor.extract_audio_langs(test_file)
+        langs = extractor.extract_audio_langs()
        # Text files don't have audio tracks
        assert langs == ''
--- a/renamer/test/test_metadata_extractor.py
+++ b/renamer/test/test_metadata_extractor.py
@@ -4,35 +4,29 @@ from renamer.extractors.metadata_extractor import MetadataExtractor
 class TestMetadataExtractor:
    @pytest.fixture
    def extractor(self, test_file):
        return MetadataExtractor(test_file)
    @pytest.fixture
    def test_file(self):
        """Use the filenames.txt file for testing"""
        return Path(__file__).parent / "filenames.txt"
-    def test_extract_title(self, test_file):
+    def test_extract_title(self, extractor):
        """Test extracting title from metadata"""
-        title = MetadataExtractor.extract_title(test_file)
+        title = extractor.extract_title()
        # Text files don't have metadata, so should be None
        assert title is None
-    def test_extract_duration(self, test_file):
+    def test_extract_duration(self, extractor):
        """Test extracting duration from metadata"""
-        duration = MetadataExtractor.extract_duration(test_file)
+        duration = extractor.extract_duration()
        # Text files don't have duration
        assert duration is None
-    def test_extract_artist(self, test_file):
+    def test_extract_artist(self, extractor):
        """Test extracting artist from metadata"""
-        artist = MetadataExtractor.extract_artist(test_file)
+        artist = extractor.extract_artist()
        # Text files don't have artist
        assert artist is None
    def test_extract_all_metadata(self, test_file):
        """Test extracting all metadata"""
        metadata = MetadataExtractor.extract_all_metadata(test_file)
        expected = {
            'title': None,
            'duration': None,
            'artist': None
        }
        assert metadata == expected