at.ncvp.me

First Astro project with 'pages' and 'posts' collections

.doc(x) to .md file conversion

This is the source page doc2md.docx which the conversion process is supposed to convert into doc2md.md.

This page works through the various requirements, one at a time. Various Astro techniques may be employed, including the possibility of .md and .mdx. To facilitate this development doc2md-md.md and doc2md-mdx.mdx are developed in parallel. If possible, MD will be preferable to MDX. MDX throws syntax errors on all sorts of innocuous-looking strings.

We need to be able to convert .doc and .docx files from my documentation store to .md(x) for Astro content creation.
We need a utility that will do the conversion automatically file by file. I don’t think we’ll ever need a block converter.
Various schemes have been tried:

See also

This is a two column list of external links, to other documentation files in my store and web URLs.
Markdown can’t cope with columns, but simple HTML works in Astro MD.

astro-blog home page

astro-test blog

ncvp website

projects/doc/doc2md

Contents

Images

Method
Paragraphs

Tables

What has to work

What has to work

1. My sort of paragraph. See Paragraphs
2. Internal anchors and links
3. Tables. See Tables
4. Images. See Images
5. Columns

Top

Tables

Markdown tables have to have a header by default:

FeatureStatusNotes
Astro LayoutsWorkingUsing @layouts alias
StylesScopedTesting specificity
Indentation2 SpacesConfigured in VS Code

But putting them in this sort of <div> removes it, in conjunction with CSS in astro-test.css:

1Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vel massa tincidunt, aliquam elit id, venenatis tellus.
2Nunc molestie mauris et magna placerat tempus.
3Phasellus sodales dolor enim, vel eleifend ante facilisis semper.
4Integer vel dictum orci.
5Praesent cursus ligula vel nisi rutrum, sit amet mollis tortor euismod.
42Duis sollicitudin elit sit amet quam dictum congue.

Top

Images

IMAGE-PLACEHOLDER

Laus Veneris, by Edward Burne Jones

See astro-test Textflow for more images with Fancybox effect.

Top

Paragraphs

This is my sort of paragraph which is proving so difficult to convert from .doc(x):
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc vel massa tincidunt, aliquam elit id, venenatis tellus.
Nunc molestie mauris et magna placerat tempus.
Phasellus sodales dolor enim, vel eleifend ante facilisis semper.
Integer vel dictum orci.
Praesent cursus ligula vel nisi rutrum, sit amet mollis tortor euismod.
Duis sollicitudin elit sit amet quam dictum congue.

Top

Method

  1. % cd file-folder # the folder holding the file to be converted

  2. % doc2md file-name.doc(x)

This creates file-folder/file-name.md
Intermediate files are created in file-folder/doc2md_intermediate. The folder may be deleted when the process is complete.
Copy the MD file to the appropriate Astro folder

~/winxp/projects/linux/bin/doc2md

Bash script controlling whole process

  1. Convert .doc file into intermediate .docx with LibreOffice
    OR copy .docx to intermediate folder.

  2. Convert <Enter> for newlines in paragraphs to <Shift/Enter> with fix_word_01.py, and save document title in a file for later.

  3. Bracket 2-column sections with special strings for later with fix_word_02.py

  4. Convert to MD with pandoc

  5. Fix pandoc errors, convert section brackets to HTML, pick up document title and write frontmatter with fix_pandoc.py

  6. Move .md file back to file-folder

Top