· last year · Nov 03, 2023, 02:00 PM
1# WHAT IS THIS? - This script is used to convert markdown files into HTML, which then dynamically inserts the Meta Title and Body Content into a pre-existing HTML template. Output files come in .html, and are named after the h1 tag. It's great for programmatically expanding content in HTML sites, or for edge-case mass-content generation formatting.
2
3# INSTRUCTIONS - to configure and execute the provided script, first ensure that Python is installed on your system, with PATH, and that the mistune, tqdm, and pathlib packages are installed, which can be done using pip install mistune tqdm pathlib. Next, update the input_dir variable to the directory path containing your Markdown .txt files, and set the output_dir to your desired location for the HTML files. The error_log path should also be set to where you'd like error logs to be saved. Check that these directories exist or adjust the script to create them. Then, customize the html_template string, within the triple quotes, to match the desired HTML structure and styles for your output files. Finally, run the script by executing it with Python from your command line interface. The script will process each Markdown file, convert it to HTML, and save it in the output directory, with any errors logged as specified. You can save the script on your desktop as a .py, and then if you run the script by name, directly from command line, it will show a progress bar, for tracking very large batch progress.
4
5# WARNINGS - this script looks for single curly brackets, to dynamically insert things, so if you have anything, rendering in your HTML Content Placeholder, which uses curly brackets, you have to double them, on both sides, so that python won't freak-out. Similarly, your input/output paths have to have double backslashes, or python will freak-out. Markdown tables work, but there can be spacing oddities.
6
7import os
8import re
9import mistune
10from pathlib import Path
11from tqdm import tqdm
12
13# Input and output directories (must have double backslash)
14input_dir = 'C:\\Users\\EXAMPLE\\Desktop\\FOLDER\\MARKDOWN1'
15output_dir = 'C:\\Users\\EXAMPLE\\Desktop\\FOLDER\\HTML2'
16error_log = 'C:\\Users\\EXAMPLE\\Desktop\\FOLDER\\error-log.txt'
17
18# Your HTML template designed to accommodate title and content appropriately
19html_template = """
20<!DOCTYPE html>
21<html lang="en">
22<head>
23 <meta charset="UTF-8">
24 <title>{title_placeholder}</title>
25 <!-- other head elements -->
26</head>
27<body>
28 {content_placeholder}
29</body>
30</html>
31"""
32
33# Function to convert Markdown to HTML using Mistune with table support
34def markdown_to_html(md):
35 # Create a markdown instance and enable the table plugin
36 markdown = mistune.create_markdown(plugins=['table'])
37 return markdown(md)
38
39# Function to extract the title (from the first H1 tag in Markdown)
40def extract_title(md_content):
41 lines = md_content.split('\n')
42 title = None
43 for line in lines:
44 if line.startswith('# '):
45 title = line.lstrip('# ').strip()
46 md_content = '\n'.join(lines[lines.index(line):])
47 break
48 return title, md_content
49
50# Ensure output directory exists
51Path(output_dir).mkdir(parents=True, exist_ok=True)
52
53# Process each markdown file in the input directory
54for md_filename in tqdm(os.listdir(input_dir)):
55 if md_filename.endswith('.txt'):
56 with open(os.path.join(input_dir, md_filename), 'r', encoding='utf-8') as md_file:
57 markdown_content = md_file.read()
58
59 # Extract the title from the Markdown content
60 title, markdown_content = extract_title(markdown_content)
61 if not title: # If title is not found, use a default or filename
62 title = 'Untitled'
63
64 # Convert Markdown content to HTML
65 html_content = markdown_to_html(markdown_content)
66
67 # Replace placeholders with actual content
68 final_html = html_template.format(
69 title_placeholder=title,
70 content_placeholder=html_content
71 )
72
73 # Format the filename based on the title
74 filename_title = re.sub(r'\s+', '-', title.lower())
75 filename_title = re.sub(r'[^\w-]', '', filename_title)
76
77 # Output the final HTML to a file
78 output_file_path = os.path.join(output_dir, f'{filename_title}.html')
79 with open(output_file_path, 'w', encoding='utf-8') as html_file:
80 html_file.write(final_html)
81
82# Print completion message
83print("All files processed. Check the error log for errors.")