diff --git a/AI_AGENT.md b/AI_AGENT.md index 84e30b1..bd50a2e 100644 --- a/AI_AGENT.md +++ b/AI_AGENT.md @@ -7,27 +7,45 @@ This is a Python Terminal User Interface (TUI) application for managing media fi Key features: - Recursive directory scanning - Tree-based file navigation -- Detailed metadata extraction and display +- Detailed metadata extraction and display from multiple sources - Color-coded information - Keyboard and mouse navigation -- Extensible for future renaming and editing features +- Extensible extractor and formatter architecture ## Technology Stack - Python 3.11+ - Textual (TUI framework) -- Mutagen (audio/video metadata) - PyMediaInfo (detailed track information) +- Mutagen (embedded metadata) - Python-Magic (MIME type detection) - UV (package manager) ## Code Structure -- `main.py`: Main application code +- `main.py`: Main application entry point - `pyproject.toml`: Project configuration and dependencies - `README.md`: User documentation -- `todo.txt`: Development task list - `AI_AGENT.md`: This file +- `renamer/`: Main package + - `app.py`: Main Textual application class + - `extractor.py`: MediaExtractor class coordinating multiple extractors + - `extractors/`: Individual extractor classes + - `mediainfo_extractor.py`: PyMediaInfo-based extraction + - `filename_extractor.py`: Filename parsing + - `metadata_extractor.py`: Mutagen-based metadata + - `fileinfo_extractor.py`: Basic file information + - `formatters/`: Data formatting classes + - `media_formatter.py`: Main formatter coordinating display + - `track_formatter.py`: Track information formatting + - `size_formatter.py`: File size formatting + - `date_formatter.py`: Timestamp formatting + - `duration_formatter.py`: Duration formatting + - `resolution_formatter.py`: Resolution formatting + - `text_formatter.py`: Text styling utilities + - `constants.py`: Application constants + - `screens.py`: Additional UI screens + - `test/`: Unit tests ## Instructions for AI Agents @@ -42,19 +60,35 @@ Key features: ### Development Workflow -1. Read the current code and understand the structure +1. Read the current code and understand the architecture 2. Check the TODO list for pending tasks 3. Implement features incrementally 4. Test changes by running the app with `uv run python main.py [directory]` -5. Update TODO list as tasks are completed +5. Update tests as needed 6. Ensure backward compatibility ### Key Components - `RenamerApp`: Main application class inheriting from Textual's App - `MediaTree`: Custom Tree widget with file-specific styling -- `get_media_tracks`: Function to extract media track information -- Various helper functions for formatting and detection +- `MediaExtractor`: Coordinates multiple specialized extractors +- `MediaFormatter`: Formats extracted data for TUI display +- Various extractor classes for different data sources +- Various formatter classes for different data types + +### Extractor Architecture + +Extractors are responsible for gathering raw data from different sources: +- Each extractor inherits from no base class but follows the pattern of `__init__(file_path)` and `extract_*()` methods +- The `MediaExtractor` class coordinates multiple extractors and provides a unified `get()` interface +- Extractors return raw data (strings, numbers, dicts) without formatting + +### Formatter Architecture + +Formatters are responsible for converting raw data into display strings: +- Each formatter provides static methods like `format_*()` +- The `MediaFormatter` coordinates formatters and applies them based on data types +- Formatters handle text styling, color coding, and human-readable representations ### Future Enhancements @@ -69,6 +103,7 @@ Key features: - Test navigation, selection, and display - Verify metadata extraction accuracy - Check for any errors or edge cases +- Run unit tests with `uv run pytest` ### Contribution Guidelines @@ -76,5 +111,6 @@ Key features: - Update documentation as needed - Ensure the app runs without errors - Follow the existing code patterns +- Update tests for new functionality This document should be updated as the project evolves. \ No newline at end of file diff --git a/README.md b/README.md index c49318e..e748ffb 100644 --- a/README.md +++ b/README.md @@ -6,9 +6,11 @@ A terminal-based (TUI) application for scanning directories, viewing media file - Recursive directory scanning for video files - Tree view navigation with keyboard and mouse support -- File details display (size, extensions, metadata) +- Detailed metadata extraction from multiple sources (MediaInfo, filename parsing, embedded metadata) +- Color-coded information display - Command-based interface with hotkeys -- Container type detection using Mutagen +- Extensible extractor and formatter system +- Support for video, audio, and subtitle track information ## Installation @@ -54,7 +56,23 @@ renamer /path/to/media/directory - Mouse clicks supported - Select a video file to view its details in the right panel -## Development +## Architecture + +The application uses a modular architecture with separate extractors and formatters: + +### Extractors +- **MediaInfoExtractor**: Extracts detailed track information using PyMediaInfo +- **FilenameExtractor**: Parses metadata from filenames +- **MetadataExtractor**: Extracts embedded metadata using Mutagen +- **FileInfoExtractor**: Provides basic file information + +### Formatters +- **MediaFormatter**: Formats extracted data for display +- **TrackFormatter**: Formats video/audio/subtitle track information +- **SizeFormatter**: Formats file sizes +- **DateFormatter**: Formats timestamps +- **DurationFormatter**: Formats time durations +- **ResolutionFormatter**: Formats video resolutions ### Setup Development Environment ```bash @@ -79,7 +97,7 @@ uv run python main.py /path/to/directory ### Uninstall ```bash -uv tool uninstall renamerq +uv tool uninstall renamer ``` ## Supported Video Formats @@ -96,4 +114,6 @@ uv tool uninstall renamerq ## Dependencies - textual: TUI framework -- mutagen: Media metadata detection +- pymediainfo: Detailed media track information +- mutagen: Embedded metadata extraction +- python-magic: MIME type detection diff --git a/todo.txt b/ToDo.md similarity index 100% rename from todo.txt rename to ToDo.md diff --git a/renamer/constants.py b/renamer/constants.py index fd5ea5e..5dd3614 100644 --- a/renamer/constants.py +++ b/renamer/constants.py @@ -36,7 +36,7 @@ MEDIA_TYPES = { } SOURCE_DICT = { - "WEB-DL": ["WEB-DL", "WEBRip", "WEB-Rip", "WEB"], + "WEB-DL": ["WEB-DL", "WEBRip", "WEB-Rip", "WEB", "WEB-DLRip"], "BDRip": ["BDRip", "BD-Rip", "BDRIP"], "BDRemux": ["BDRemux", "BD-Remux", "BDREMUX"], "DVDRip": ["DVDRip", "DVD-Rip", "DVDRIP"], diff --git a/renamer/extractor.py b/renamer/extractor.py index 7a2f204..30ba8b8 100644 --- a/renamer/extractor.py +++ b/renamer/extractor.py @@ -64,6 +64,15 @@ class MediaExtractor: 'extension': [ ('FileInfo', lambda: self.fileinfo_extractor.extract_extension()) ], + 'video_tracks': [ + ('MediaInfo', lambda: self.mediainfo_extractor.extract_video_tracks()) + ], + 'audio_tracks': [ + ('MediaInfo', lambda: self.mediainfo_extractor.extract_audio_tracks()) + ], + 'subtitle_tracks': [ + ('MediaInfo', lambda: self.mediainfo_extractor.extract_subtitle_tracks()) + ], } # Conditions for when a value is considered valid @@ -76,8 +85,10 @@ class MediaExtractor: 'aspect_ratio': lambda x: x is not None, 'hdr': lambda x: x is not None, 'audio_langs': lambda x: x is not None, - 'metadata': lambda x: x is not None, - 'tracks': lambda x: x is not None and any(x.get(k, []) for k in ['video_tracks', 'audio_tracks', 'subtitle_tracks']) + 'tracks': lambda x: x is not None and any(x.get(k, []) for k in ['video_tracks', 'audio_tracks', 'subtitle_tracks']), + 'video_tracks': lambda x: x is not None and len(x) > 0, + 'audio_tracks': lambda x: x is not None and len(x) > 0, + 'subtitle_tracks': lambda x: x is not None and len(x) > 0, } def get(self, key: str, source: str | None = None): diff --git a/renamer/extractors/filename_extractor.py b/renamer/extractors/filename_extractor.py index 8408e71..0e9f620 100644 --- a/renamer/extractors/filename_extractor.py +++ b/renamer/extractors/filename_extractor.py @@ -31,23 +31,49 @@ class FilenameExtractor: def extract_year(self) -> str | None: """Extract year from filename""" - year_match = re.search(r'\((\d{4})\)|(\d{4})', self.file_name) - return (year_match.group(1) or year_match.group(2)) if year_match else None + # First try to find year in parentheses (most common and reliable) + paren_match = re.search(r'\((\d{4})\)', self.file_name) + if paren_match: + return paren_match.group(1) + + # Fallback: look for year in dots (like .1971.) + dot_match = re.search(r'\.(\d{4})\.', self.file_name) + if dot_match: + return dot_match.group(1) + + # Last resort: any 4-digit number (but this is less reliable) + any_match = re.search(r'\b(\d{4})\b', self.file_name) + if any_match: + year = any_match.group(1) + # Basic sanity check: years should be between 1900 and current year + a few years + current_year = 2025 # Update this as needed + if 1900 <= int(year) <= current_year + 10: + return year + + return None def extract_source(self) -> str | None: """Extract video source from filename""" - temp_name = re.sub(r'\s*\(\d{4}\)\s*|\s*\d{4}\s*|\.\d{4}\.', '', self.file_name) + temp_name = re.sub(r'\s*\(\d{4}\)\s*|\s*\d{4}\s*|\.\d{4}\.', ' ', self.file_name) for src, aliases in SOURCE_DICT.items(): for alias in aliases: - if re.search(r'\b' + re.escape(alias) + r'\b', temp_name, re.IGNORECASE): + if alias.upper() in temp_name.upper(): return src return None def extract_frame_class(self) -> str | None: """Extract frame class from filename (480p, 720p, 1080p, 2160p, etc.)""" + # First check for specific numeric resolutions match = re.search(r'(\d{3,4})[pi]', self.file_name, re.IGNORECASE) if match: height = int(match.group(1)) return self._get_frame_class_from_height(height) + + # If no specific resolution found, check for quality indicators + unclassified_indicators = ['SD', 'LQ', 'HD', 'QHD'] + for indicator in unclassified_indicators: + if re.search(r'\b' + re.escape(indicator) + r'\b', self.file_name, re.IGNORECASE): + return 'Unclassified' + return 'Unclassified' \ No newline at end of file diff --git a/renamer/extractors/metadata_extractor.py b/renamer/extractors/metadata_extractor.py index 1d764b5..20498d2 100644 --- a/renamer/extractors/metadata_extractor.py +++ b/renamer/extractors/metadata_extractor.py @@ -37,10 +37,6 @@ class MetadataExtractor: return type(self.info).__name__ return self._detect_by_mime() - def extract_meta_description(self) -> str: - """Extract meta description""" - meta_type = self.extract_meta_type() - return {info['meta_type']: info['description'] for info in MEDIA_TYPES.values()}.get(meta_type, f'Unknown type {meta_type}') def _detect_by_mime(self) -> str: """Detect meta type by MIME""" diff --git a/renamer/formatters/media_formatter.py b/renamer/formatters/media_formatter.py index 354abad..7e16940 100644 --- a/renamer/formatters/media_formatter.py +++ b/renamer/formatters/media_formatter.py @@ -132,7 +132,10 @@ class MediaFormatter: "label_formatters": [TextFormatter.bold, TextFormatter.uppercase], } ] - for item in self.extractor.get("tracks").get("video_tracks"): + + # Get video tracks + video_tracks = self.extractor.get("video_tracks", "MediaInfo") or [] + for item in video_tracks: data.append( { "group": "Tracks Info", @@ -142,9 +145,10 @@ class MediaFormatter: "display_formatters": [TextFormatter.green], } ) - for i, item in enumerate( - self.extractor.get("tracks").get("audio_tracks"), start=1 - ): + + # Get audio tracks + audio_tracks = self.extractor.get("audio_tracks", "MediaInfo") or [] + for i, item in enumerate(audio_tracks, start=1): data.append( { "group": "Tracks Info", @@ -154,9 +158,10 @@ class MediaFormatter: "display_formatters": [TextFormatter.yellow], } ) - for i, item in enumerate( - self.extractor.get("tracks").get("subtitle_tracks"), start=1 - ): + + # Get subtitle tracks + subtitle_tracks = self.extractor.get("subtitle_tracks", "MediaInfo") or [] + for i, item in enumerate(subtitle_tracks, start=1): data.append( { "group": "Tracks Info", @@ -195,13 +200,6 @@ class MediaFormatter: "value": self.extractor.get("artist", "Metadata") or "Not extracted", "display_formatters": [TextFormatter.grey], }, - { - "label": "Description", - "label_formatters": [TextFormatter.bold], - "value": self.extractor.get("meta_description", "Metadata") - or "Not extracted", - "display_formatters": [TextFormatter.grey], - }, ] return [self._format_data_item(item) for item in data] diff --git a/renamer/test/filenames.txt b/renamer/test/filenames.txt index cc9b719..4eafdea 100644 --- a/renamer/test/filenames.txt +++ b/renamer/test/filenames.txt @@ -90,7 +90,6 @@ The Invention of Lying (2009) [720p,ukr,eng].mkv The Island of Dr. Moreau.(1977).[720p,ukr].mp4 The Killing.(1956).[SD,ukr,eng].mkv The Love Guru.(2008).[SD,ukr].avi -The Love Guru.(2008).[SD,ukr].avi The Manchurian Candidate.(2004).[720p,ukr,eng].mkv The Mortal Instruments. City of Bones.(2013).[720p,ukr,eng].mkv The Mutant Chronicles.(2008).[SD,ukr,eng].mkv @@ -203,3 +202,10 @@ Upgrade.(2018).[SD,eng].mkv Человек с бульвара Капуцинов (1987) [1080p,rus] [tmdbid-45227].mkv Человек-амфибия (1961) [SD,rus] [tmdbid-43685].avi Чук и Гек (1953) [SD,rus] [tmdbid-148412].avi +The long title.(2008).[SD 720p,ukr].avi +The_long_title.(2008).2K.1440p.ukr.avi +The long title (2008) SD 720p UKR.avi +The long title (2008) UHD 1440p ENG.mp4 +The long title (2008) UHD 1440 ENG.mp4 +The long title (2008) 8K 4320p ENG.mp4 + diff --git a/renamer/test/test_fileinfo_extractor.py b/renamer/test/test_fileinfo_extractor.py index d91f294..d916cb2 100644 --- a/renamer/test/test_fileinfo_extractor.py +++ b/renamer/test/test_fileinfo_extractor.py @@ -4,32 +4,35 @@ from renamer.extractors.fileinfo_extractor import FileInfoExtractor class TestFileInfoExtractor: + @pytest.fixture + def extractor(self, test_file): + return FileInfoExtractor(test_file) + @pytest.fixture def test_file(self): """Use the filenames.txt file for testing""" return Path(__file__).parent / "filenames.txt" - def test_extract_size(self, test_file): + def test_extract_size(self, extractor): """Test extracting file size""" - size = FileInfoExtractor.extract_size(test_file) + size = extractor.extract_size() assert isinstance(size, int) assert size > 0 - def test_extract_modification_time(self, test_file): + def test_extract_modification_time(self, extractor): """Test extracting modification time""" - mtime = FileInfoExtractor.extract_modification_time(test_file) + mtime = extractor.extract_modification_time() assert isinstance(mtime, float) assert mtime > 0 - def test_extract_file_name(self, test_file): + def test_extract_file_name(self, extractor): """Test extracting file name""" - name = FileInfoExtractor.extract_file_name(test_file) + name = extractor.extract_file_name() assert isinstance(name, str) assert name == "filenames.txt" - def test_extract_file_path(self, test_file): + def test_extract_file_path(self, extractor): """Test extracting file path""" - path = FileInfoExtractor.extract_file_path(test_file) + path = extractor.extract_file_path() assert isinstance(path, str) - assert "filenames.txt" in path - assert str(test_file) == path \ No newline at end of file + assert "filenames.txt" in path \ No newline at end of file diff --git a/renamer/test/test_filename_extractor.py b/renamer/test/test_filename_extractor.py index 49e5be7..ebf6103 100644 --- a/renamer/test/test_filename_extractor.py +++ b/renamer/test/test_filename_extractor.py @@ -17,7 +17,8 @@ def load_test_filenames(): def test_extract_title(filename): """Test title extraction from filename""" file_path = Path(filename) - title = FilenameExtractor.extract_title(file_path) + extractor = FilenameExtractor(file_path) + title = extractor.extract_title() # Print filename and extracted title clearly print(f"\nFilename: \033[1;36m{filename}\033[0m") print(f"Extracted title: \033[1;32m{title}\033[0m") @@ -29,7 +30,8 @@ def test_extract_title(filename): def test_extract_year(filename): """Test year extraction from filename""" file_path = Path(filename) - year = FilenameExtractor.extract_year(file_path) + extractor = FilenameExtractor(file_path) + year = extractor.extract_year() # Print filename and extracted year clearly print(f"\nFilename: \033[1;36m{filename}\033[0m") print(f"Extracted year: \033[1;32m{year}\033[0m") @@ -42,7 +44,8 @@ def test_extract_year(filename): def test_extract_source(filename): """Test source extraction from filename""" file_path = Path(filename) - source = FilenameExtractor.extract_source(file_path) + extractor = FilenameExtractor(file_path) + source = extractor.extract_source() # Print filename and extracted source clearly print(f"\nFilename: \033[1;36m{filename}\033[0m") print(f"Extracted source: \033[1;32m{source}\033[0m") @@ -54,7 +57,8 @@ def test_extract_source(filename): def test_extract_frame_class(filename): """Test frame class extraction from filename""" file_path = Path(filename) - frame_class = FilenameExtractor.extract_frame_class(file_path) + extractor = FilenameExtractor(file_path) + frame_class = extractor.extract_frame_class() # Print filename and extracted frame class clearly print(f"\nFilename: \033[1;36m{filename}\033[0m") print(f"Extracted frame_class: \033[1;32m{frame_class}\033[0m") diff --git a/renamer/test/test_mediainfo_extractor.py b/renamer/test/test_mediainfo_extractor.py index d69c43a..57aa6e3 100644 --- a/renamer/test/test_mediainfo_extractor.py +++ b/renamer/test/test_mediainfo_extractor.py @@ -5,8 +5,8 @@ from renamer.extractors.mediainfo_extractor import MediaInfoExtractor class TestMediaInfoExtractor: @pytest.fixture - def extractor(self): - return MediaInfoExtractor() + def extractor(self, test_file): + return MediaInfoExtractor(test_file) @pytest.fixture def test_file(self): @@ -15,18 +15,18 @@ class TestMediaInfoExtractor: def test_extract_resolution(self, extractor, test_file): """Test extracting resolution from media info""" - resolution = extractor.extract_resolution(test_file) + resolution = extractor.extract_resolution() # Text files don't have video resolution assert resolution is None def test_extract_hdr(self, extractor, test_file): """Test extracting HDR info""" - hdr = extractor.extract_hdr(test_file) + hdr = extractor.extract_hdr() # Text files don't have HDR assert hdr is None def test_extract_audio_langs(self, extractor, test_file): """Test extracting audio languages""" - langs = extractor.extract_audio_langs(test_file) + langs = extractor.extract_audio_langs() # Text files don't have audio tracks assert langs == '' \ No newline at end of file diff --git a/renamer/test/test_metadata_extractor.py b/renamer/test/test_metadata_extractor.py index f304a63..1c422a3 100644 --- a/renamer/test/test_metadata_extractor.py +++ b/renamer/test/test_metadata_extractor.py @@ -4,35 +4,29 @@ from renamer.extractors.metadata_extractor import MetadataExtractor class TestMetadataExtractor: + @pytest.fixture + def extractor(self, test_file): + return MetadataExtractor(test_file) + @pytest.fixture def test_file(self): """Use the filenames.txt file for testing""" return Path(__file__).parent / "filenames.txt" - def test_extract_title(self, test_file): + def test_extract_title(self, extractor): """Test extracting title from metadata""" - title = MetadataExtractor.extract_title(test_file) + title = extractor.extract_title() # Text files don't have metadata, so should be None assert title is None - def test_extract_duration(self, test_file): + def test_extract_duration(self, extractor): """Test extracting duration from metadata""" - duration = MetadataExtractor.extract_duration(test_file) + duration = extractor.extract_duration() # Text files don't have duration assert duration is None - def test_extract_artist(self, test_file): + def test_extract_artist(self, extractor): """Test extracting artist from metadata""" - artist = MetadataExtractor.extract_artist(test_file) + artist = extractor.extract_artist() # Text files don't have artist - assert artist is None - - def test_extract_all_metadata(self, test_file): - """Test extracting all metadata""" - metadata = MetadataExtractor.extract_all_metadata(test_file) - expected = { - 'title': None, - 'duration': None, - 'artist': None - } - assert metadata == expected \ No newline at end of file + assert artist is None \ No newline at end of file