feat: Enhance metadata extraction and formatting, improve extractor architecture, and update documentation

2025-12-26 13:38:17 +00:00
parent 8f68624529
commit 91df347727
13 changed files with 170 additions and 76 deletions
--- a/AI_AGENT.md
+++ b/AI_AGENT.md
@@ -7,27 +7,45 @@ This is a Python Terminal User Interface (TUI) application for managing media fi
 Key features:
 - Recursive directory scanning
 - Tree-based file navigation
- Detailed metadata extraction and display
+- Detailed metadata extraction and display from multiple sources
 - Color-coded information
 - Keyboard and mouse navigation
- Extensible for future renaming and editing features
+- Extensible extractor and formatter architecture

 ## Technology Stack

 - Python 3.11+
 - Textual (TUI framework)
- Mutagen (audio/video metadata)
 - PyMediaInfo (detailed track information)
+- Mutagen (embedded metadata)
 - Python-Magic (MIME type detection)
 - UV (package manager)

 ## Code Structure

- `main.py`: Main application code
+- `main.py`: Main application entry point
 - `pyproject.toml`: Project configuration and dependencies
 - `README.md`: User documentation
- `todo.txt`: Development task list
 - `AI_AGENT.md`: This file
+- `renamer/`: Main package
+  - `app.py`: Main Textual application class
+  - `extractor.py`: MediaExtractor class coordinating multiple extractors
+  - `extractors/`: Individual extractor classes
+    - `mediainfo_extractor.py`: PyMediaInfo-based extraction
+    - `filename_extractor.py`: Filename parsing
+    - `metadata_extractor.py`: Mutagen-based metadata
+    - `fileinfo_extractor.py`: Basic file information
+  - `formatters/`: Data formatting classes
+    - `media_formatter.py`: Main formatter coordinating display
+    - `track_formatter.py`: Track information formatting
+    - `size_formatter.py`: File size formatting
+    - `date_formatter.py`: Timestamp formatting
+    - `duration_formatter.py`: Duration formatting
+    - `resolution_formatter.py`: Resolution formatting
+    - `text_formatter.py`: Text styling utilities
+  - `constants.py`: Application constants
+  - `screens.py`: Additional UI screens
+  - `test/`: Unit tests

 ## Instructions for AI Agents

@@ -42,19 +60,35 @@ Key features:

 ### Development Workflow

-1. Read the current code and understand the structure
+1. Read the current code and understand the architecture
 2. Check the TODO list for pending tasks
 3. Implement features incrementally
 4. Test changes by running the app with `uv run python main.py [directory]`
-5. Update TODO list as tasks are completed
+5. Update tests as needed
 6. Ensure backward compatibility

 ### Key Components

 - `RenamerApp`: Main application class inheriting from Textual's App
 - `MediaTree`: Custom Tree widget with file-specific styling
- `get_media_tracks`: Function to extract media track information
- Various helper functions for formatting and detection
+- `MediaExtractor`: Coordinates multiple specialized extractors
+- `MediaFormatter`: Formats extracted data for TUI display
+- Various extractor classes for different data sources
+- Various formatter classes for different data types
+
+### Extractor Architecture
+
+Extractors are responsible for gathering raw data from different sources:
+- Each extractor inherits from no base class but follows the pattern of `__init__(file_path)` and `extract_*()` methods
+- The `MediaExtractor` class coordinates multiple extractors and provides a unified `get()` interface
+- Extractors return raw data (strings, numbers, dicts) without formatting
+
+### Formatter Architecture
+
+Formatters are responsible for converting raw data into display strings:
+- Each formatter provides static methods like `format_*()`
+- The `MediaFormatter` coordinates formatters and applies them based on data types
+- Formatters handle text styling, color coding, and human-readable representations

 ### Future Enhancements

@@ -69,6 +103,7 @@ Key features:
 - Test navigation, selection, and display
 - Verify metadata extraction accuracy
 - Check for any errors or edge cases
+- Run unit tests with `uv run pytest`

 ### Contribution Guidelines

@@ -76,5 +111,6 @@ Key features:
 - Update documentation as needed
 - Ensure the app runs without errors
 - Follow the existing code patterns
+- Update tests for new functionality

 This document should be updated as the project evolves.
--- a/README.md
+++ b/README.md
@@ -6,9 +6,11 @@ A terminal-based (TUI) application for scanning directories, viewing media file

 - Recursive directory scanning for video files
 - Tree view navigation with keyboard and mouse support
- File details display (size, extensions, metadata)
+- Detailed metadata extraction from multiple sources (MediaInfo, filename parsing, embedded metadata)
+- Color-coded information display
 - Command-based interface with hotkeys
- Container type detection using Mutagen
+- Extensible extractor and formatter system
+- Support for video, audio, and subtitle track information

 ## Installation

@@ -54,7 +56,23 @@ renamer /path/to/media/directory
 - Mouse clicks supported
 - Select a video file to view its details in the right panel

-## Development
+## Architecture
+
+The application uses a modular architecture with separate extractors and formatters:
+
+### Extractors
+- **MediaInfoExtractor**: Extracts detailed track information using PyMediaInfo
+- **FilenameExtractor**: Parses metadata from filenames
+- **MetadataExtractor**: Extracts embedded metadata using Mutagen
+- **FileInfoExtractor**: Provides basic file information
+
+### Formatters
+- **MediaFormatter**: Formats extracted data for display
+- **TrackFormatter**: Formats video/audio/subtitle track information
+- **SizeFormatter**: Formats file sizes
+- **DateFormatter**: Formats timestamps
+- **DurationFormatter**: Formats time durations
+- **ResolutionFormatter**: Formats video resolutions

 ### Setup Development Environment
 ```bash
@@ -79,7 +97,7 @@ uv run python main.py /path/to/directory

 ### Uninstall
 ```bash
-uv tool uninstall renamerq
+uv tool uninstall renamer
 ```

 ## Supported Video Formats
@@ -96,4 +114,6 @@ uv tool uninstall renamerq

 ## Dependencies
 - textual: TUI framework
- mutagen: Media metadata detection
+- pymediainfo: Detailed media track information
+- mutagen: Embedded metadata extraction
+- python-magic: MIME type detection
--- a/todo.txt
+++ b/todo.txt
--- a/renamer/constants.py
+++ b/renamer/constants.py
@@ -36,7 +36,7 @@ MEDIA_TYPES = {
 }

 SOURCE_DICT = {
-    "WEB-DL": ["WEB-DL", "WEBRip", "WEB-Rip", "WEB"],
+    "WEB-DL": ["WEB-DL", "WEBRip", "WEB-Rip", "WEB", "WEB-DLRip"],
    "BDRip": ["BDRip", "BD-Rip", "BDRIP"],
    "BDRemux": ["BDRemux", "BD-Remux", "BDREMUX"],
    "DVDRip": ["DVDRip", "DVD-Rip", "DVDRIP"],
--- a/renamer/extractor.py
+++ b/renamer/extractor.py
@@ -64,6 +64,15 @@ class MediaExtractor:
            'extension': [
                ('FileInfo', lambda: self.fileinfo_extractor.extract_extension())
            ],
+            'video_tracks': [
+                ('MediaInfo', lambda: self.mediainfo_extractor.extract_video_tracks())
+            ],
+            'audio_tracks': [
+                ('MediaInfo', lambda: self.mediainfo_extractor.extract_audio_tracks())
+            ],
+            'subtitle_tracks': [
+                ('MediaInfo', lambda: self.mediainfo_extractor.extract_subtitle_tracks())
+            ],
        }
        
        # Conditions for when a value is considered valid
@@ -76,8 +85,10 @@ class MediaExtractor:
            'aspect_ratio': lambda x: x is not None,
            'hdr': lambda x: x is not None,
            'audio_langs': lambda x: x is not None,
-            'metadata': lambda x: x is not None,
-            'tracks': lambda x: x is not None and any(x.get(k, []) for k in ['video_tracks', 'audio_tracks', 'subtitle_tracks'])
+            'tracks': lambda x: x is not None and any(x.get(k, []) for k in ['video_tracks', 'audio_tracks', 'subtitle_tracks']),
+            'video_tracks': lambda x: x is not None and len(x) > 0,
+            'audio_tracks': lambda x: x is not None and len(x) > 0,
+            'subtitle_tracks': lambda x: x is not None and len(x) > 0,
        }

    def get(self, key: str, source: str | None = None):
--- a/renamer/extractors/filename_extractor.py
+++ b/renamer/extractors/filename_extractor.py
@@ -31,23 +31,49 @@ class FilenameExtractor:

    def extract_year(self) -> str | None:
        """Extract year from filename"""
-        year_match = re.search(r'\((\d{4})\)|(\d{4})', self.file_name)
-        return (year_match.group(1) or year_match.group(2)) if year_match else None
+        # First try to find year in parentheses (most common and reliable)
+        paren_match = re.search(r'\((\d{4})\)', self.file_name)
+        if paren_match:
+            return paren_match.group(1)
+        
+        # Fallback: look for year in dots (like .1971.)
+        dot_match = re.search(r'\.(\d{4})\.', self.file_name)
+        if dot_match:
+            return dot_match.group(1)
+        
+        # Last resort: any 4-digit number (but this is less reliable)
+        any_match = re.search(r'\b(\d{4})\b', self.file_name)
+        if any_match:
+            year = any_match.group(1)
+            # Basic sanity check: years should be between 1900 and current year + a few years
+            current_year = 2025  # Update this as needed
+            if 1900 <= int(year) <= current_year + 10:
+                return year
+        
+        return None

    def extract_source(self) -> str | None:
        """Extract video source from filename"""
-        temp_name = re.sub(r'\s*\(\d{4}\)\s*|\s*\d{4}\s*|\.\d{4}\.', '', self.file_name)
+        temp_name = re.sub(r'\s*\(\d{4}\)\s*|\s*\d{4}\s*|\.\d{4}\.', ' ', self.file_name)

        for src, aliases in SOURCE_DICT.items():
            for alias in aliases:
-                if re.search(r'\b' + re.escape(alias) + r'\b', temp_name, re.IGNORECASE):
+                if alias.upper() in temp_name.upper():
                    return src
        return None

    def extract_frame_class(self) -> str | None:
        """Extract frame class from filename (480p, 720p, 1080p, 2160p, etc.)"""
+        # First check for specific numeric resolutions
        match = re.search(r'(\d{3,4})[pi]', self.file_name, re.IGNORECASE)
        if match:
            height = int(match.group(1))
            return self._get_frame_class_from_height(height)
+        
+        # If no specific resolution found, check for quality indicators
+        unclassified_indicators = ['SD', 'LQ', 'HD', 'QHD']
+        for indicator in unclassified_indicators:
+            if re.search(r'\b' + re.escape(indicator) + r'\b', self.file_name, re.IGNORECASE):
+                return 'Unclassified'
+        
        return 'Unclassified'
--- a/renamer/extractors/metadata_extractor.py
+++ b/renamer/extractors/metadata_extractor.py
@@ -37,10 +37,6 @@ class MetadataExtractor:
            return type(self.info).__name__
        return self._detect_by_mime()

-    def extract_meta_description(self) -> str:
-        """Extract meta description"""
-        meta_type = self.extract_meta_type()
-        return {info['meta_type']: info['description'] for info in MEDIA_TYPES.values()}.get(meta_type, f'Unknown type {meta_type}')

    def _detect_by_mime(self) -> str:
        """Detect meta type by MIME"""
--- a/renamer/formatters/media_formatter.py
+++ b/renamer/formatters/media_formatter.py
@@ -132,7 +132,10 @@ class MediaFormatter:
                "label_formatters": [TextFormatter.bold, TextFormatter.uppercase],
            }
        ]
-        for item in self.extractor.get("tracks").get("video_tracks"):
+        
+        # Get video tracks
+        video_tracks = self.extractor.get("video_tracks", "MediaInfo") or []
+        for item in video_tracks:
            data.append(
                {
                    "group": "Tracks Info",
@@ -142,9 +145,10 @@ class MediaFormatter:
                    "display_formatters": [TextFormatter.green],
                }
            )
-        for i, item in enumerate(
-            self.extractor.get("tracks").get("audio_tracks"), start=1
-        ):
+        
+        # Get audio tracks
+        audio_tracks = self.extractor.get("audio_tracks", "MediaInfo") or []
+        for i, item in enumerate(audio_tracks, start=1):
            data.append(
                {
                    "group": "Tracks Info",
@@ -154,9 +158,10 @@ class MediaFormatter:
                    "display_formatters": [TextFormatter.yellow],
                }
            )
-        for i, item in enumerate(
-            self.extractor.get("tracks").get("subtitle_tracks"), start=1
-        ):
+        
+        # Get subtitle tracks
+        subtitle_tracks = self.extractor.get("subtitle_tracks", "MediaInfo") or []
+        for i, item in enumerate(subtitle_tracks, start=1):
            data.append(
                {
                    "group": "Tracks Info",
@@ -195,13 +200,6 @@ class MediaFormatter:
                "value": self.extractor.get("artist", "Metadata") or "Not extracted",
                "display_formatters": [TextFormatter.grey],
            },
-            {
-                "label": "Description",
-                "label_formatters": [TextFormatter.bold],
-                "value": self.extractor.get("meta_description", "Metadata")
-                or "Not extracted",
-                "display_formatters": [TextFormatter.grey],
-            },
        ]

        return [self._format_data_item(item) for item in data]
--- a/renamer/test/filenames.txt
+++ b/renamer/test/filenames.txt
@@ -90,7 +90,6 @@ The Invention of Lying (2009) [720p,ukr,eng].mkv
 The Island of Dr. Moreau.(1977).[720p,ukr].mp4
 The Killing.(1956).[SD,ukr,eng].mkv
 The Love Guru.(2008).[SD,ukr].avi
-The Love Guru.(2008).[SD,ukr].avi
 The Manchurian Candidate.(2004).[720p,ukr,eng].mkv
 The Mortal Instruments. City of Bones.(2013).[720p,ukr,eng].mkv
 The Mutant Chronicles.(2008).[SD,ukr,eng].mkv
@@ -203,3 +202,10 @@ Upgrade.(2018).[SD,eng].mkv
 Человек с бульвара Капуцинов (1987) [1080p,rus] [tmdbid-45227].mkv
 Человек-амфибия (1961) [SD,rus] [tmdbid-43685].avi
 Чук и Гек (1953) [SD,rus] [tmdbid-148412].avi
+The long title.(2008).[SD 720p,ukr].avi
+The_long_title.(2008).2K.1440p.ukr.avi
+The long title (2008) SD 720p UKR.avi
+The long title (2008) UHD 1440p ENG.mp4
+The long title (2008) UHD 1440 ENG.mp4
+The long title (2008) 8K 4320p ENG.mp4
+
--- a/renamer/test/test_fileinfo_extractor.py
+++ b/renamer/test/test_fileinfo_extractor.py
@@ -4,32 +4,35 @@ from renamer.extractors.fileinfo_extractor import FileInfoExtractor


 class TestFileInfoExtractor:
+    @pytest.fixture
+    def extractor(self, test_file):
+        return FileInfoExtractor(test_file)
+
    @pytest.fixture
    def test_file(self):
        """Use the filenames.txt file for testing"""
        return Path(__file__).parent / "filenames.txt"

-    def test_extract_size(self, test_file):
+    def test_extract_size(self, extractor):
        """Test extracting file size"""
-        size = FileInfoExtractor.extract_size(test_file)
+        size = extractor.extract_size()
        assert isinstance(size, int)
        assert size > 0

-    def test_extract_modification_time(self, test_file):
+    def test_extract_modification_time(self, extractor):
        """Test extracting modification time"""
-        mtime = FileInfoExtractor.extract_modification_time(test_file)
+        mtime = extractor.extract_modification_time()
        assert isinstance(mtime, float)
        assert mtime > 0

-    def test_extract_file_name(self, test_file):
+    def test_extract_file_name(self, extractor):
        """Test extracting file name"""
-        name = FileInfoExtractor.extract_file_name(test_file)
+        name = extractor.extract_file_name()
        assert isinstance(name, str)
        assert name == "filenames.txt"

-    def test_extract_file_path(self, test_file):
+    def test_extract_file_path(self, extractor):
        """Test extracting file path"""
-        path = FileInfoExtractor.extract_file_path(test_file)
+        path = extractor.extract_file_path()
        assert isinstance(path, str)
-        assert "filenames.txt" in path
-        assert str(test_file) == path
+        assert "filenames.txt" in path
--- a/renamer/test/test_filename_extractor.py
+++ b/renamer/test/test_filename_extractor.py
@@ -17,7 +17,8 @@ def load_test_filenames():
 def test_extract_title(filename):
    """Test title extraction from filename"""
    file_path = Path(filename)
-    title = FilenameExtractor.extract_title(file_path)
+    extractor = FilenameExtractor(file_path)
+    title = extractor.extract_title()
    # Print filename and extracted title clearly
    print(f"\nFilename: \033[1;36m{filename}\033[0m")
    print(f"Extracted title: \033[1;32m{title}\033[0m")
@@ -29,7 +30,8 @@ def test_extract_title(filename):
 def test_extract_year(filename):
    """Test year extraction from filename"""
    file_path = Path(filename)
-    year = FilenameExtractor.extract_year(file_path)
+    extractor = FilenameExtractor(file_path)
+    year = extractor.extract_year()
    # Print filename and extracted year clearly
    print(f"\nFilename: \033[1;36m{filename}\033[0m")
    print(f"Extracted year: \033[1;32m{year}\033[0m")
@@ -42,7 +44,8 @@ def test_extract_year(filename):
 def test_extract_source(filename):
    """Test source extraction from filename"""
    file_path = Path(filename)
-    source = FilenameExtractor.extract_source(file_path)
+    extractor = FilenameExtractor(file_path)
+    source = extractor.extract_source()
    # Print filename and extracted source clearly
    print(f"\nFilename: \033[1;36m{filename}\033[0m")
    print(f"Extracted source: \033[1;32m{source}\033[0m")
@@ -54,7 +57,8 @@ def test_extract_source(filename):
 def test_extract_frame_class(filename):
    """Test frame class extraction from filename"""
    file_path = Path(filename)
-    frame_class = FilenameExtractor.extract_frame_class(file_path)
+    extractor = FilenameExtractor(file_path)
+    frame_class = extractor.extract_frame_class()
    # Print filename and extracted frame class clearly
    print(f"\nFilename: \033[1;36m{filename}\033[0m")
    print(f"Extracted frame_class: \033[1;32m{frame_class}\033[0m")
--- a/renamer/test/test_mediainfo_extractor.py
+++ b/renamer/test/test_mediainfo_extractor.py
@@ -5,8 +5,8 @@ from renamer.extractors.mediainfo_extractor import MediaInfoExtractor

 class TestMediaInfoExtractor:
    @pytest.fixture
-    def extractor(self):
-        return MediaInfoExtractor()
+    def extractor(self, test_file):
+        return MediaInfoExtractor(test_file)

    @pytest.fixture
    def test_file(self):
@@ -15,18 +15,18 @@ class TestMediaInfoExtractor:

    def test_extract_resolution(self, extractor, test_file):
        """Test extracting resolution from media info"""
-        resolution = extractor.extract_resolution(test_file)
+        resolution = extractor.extract_resolution()
        # Text files don't have video resolution
        assert resolution is None

    def test_extract_hdr(self, extractor, test_file):
        """Test extracting HDR info"""
-        hdr = extractor.extract_hdr(test_file)
+        hdr = extractor.extract_hdr()
        # Text files don't have HDR
        assert hdr is None

    def test_extract_audio_langs(self, extractor, test_file):
        """Test extracting audio languages"""
-        langs = extractor.extract_audio_langs(test_file)
+        langs = extractor.extract_audio_langs()
        # Text files don't have audio tracks
        assert langs == ''
--- a/renamer/test/test_metadata_extractor.py
+++ b/renamer/test/test_metadata_extractor.py
@@ -4,35 +4,29 @@ from renamer.extractors.metadata_extractor import MetadataExtractor


 class TestMetadataExtractor:
+    @pytest.fixture
+    def extractor(self, test_file):
+        return MetadataExtractor(test_file)
+
    @pytest.fixture
    def test_file(self):
        """Use the filenames.txt file for testing"""
        return Path(__file__).parent / "filenames.txt"

-    def test_extract_title(self, test_file):
+    def test_extract_title(self, extractor):
        """Test extracting title from metadata"""
-        title = MetadataExtractor.extract_title(test_file)
+        title = extractor.extract_title()
        # Text files don't have metadata, so should be None
        assert title is None

-    def test_extract_duration(self, test_file):
+    def test_extract_duration(self, extractor):
        """Test extracting duration from metadata"""
-        duration = MetadataExtractor.extract_duration(test_file)
+        duration = extractor.extract_duration()
        # Text files don't have duration
        assert duration is None

-    def test_extract_artist(self, test_file):
+    def test_extract_artist(self, extractor):
        """Test extracting artist from metadata"""
-        artist = MetadataExtractor.extract_artist(test_file)
+        artist = extractor.extract_artist()
        # Text files don't have artist
-        assert artist is None
-
-    def test_extract_all_metadata(self, test_file):
-        """Test extracting all metadata"""
-        metadata = MetadataExtractor.extract_all_metadata(test_file)
-        expected = {
-            'title': None,
-            'duration': None,
-            'artist': None
-        }
-        assert metadata == expected
+        assert artist is None