Hi everyone !

I’m in need for some assistance for string manipulation with sed and regex. I tried a whole day to trial & error and look around the web to find a solution however it’s way over my capabilities and maybe here are some sed/regex gurus who are willing to give me a helping hand !

With everything I gathered around the web, It seems it’s rather a complicated regex and sed substitution, here we go !

What Am I trying to achieve?

I have a lot of markdown guides I want to host on a self-hosted forgejo based git markdown. However the classic markdown links are not the same as one github/forgejo…

Convert the following string:

[Some text](#Header%20Linking%20MARKDOWN.md)

Into

[Some text](#header-linking-markdown.md)

As you can see those are the following requirement:

  • Pattern: [Some text](#link%20to%20header.md)
  • Only edit what’s between parentheses
  • Replace space (%20) with -
  • Everything as lowercase
  • Links are sometimes in nested parentheses
    • e.g. (look here [Some text](#link%20to%20header.md))
  • Do not change a line that begins with https (external links)

While everything is probably a bit complex as a whole the trickiest part is probably the nested parentheses :/

What I tried

The furthest I got was the following:

sed -Ei 's|\(([^\)]+)\)|\L&|g' test3.md #make everything between parentheses lowercase

sed -i '/https/ ! s/%20/-/g' test3.md #change every %20 occurrence to -

These sed/regx substitution are what I put together while roaming the web, but it has a lot a flaws and doesn’t work with nested parentheses. Also this would change every %20 occurrence in the file.

The closest solution I found on stackoverflow looks similar but wasn’t able to fit to my needs. Actually my lack of regex/sed understanding makes it impossible to adapt to my requirements.


I would appreciate any help even if a change of tool is needed, however I’m more into a learning processes, so a script or CLI alternative is very appreciated :) actually any help is appreciated :D !

Thanks in advance.

  • N0x0n@lemmy.mlOP
    link
    fedilink
    arrow-up
    1
    ·
    4 hours ago

    Hello :) Sorry for the late response !!! I was busy working it out with another user ! However out of curiosity gave your sed regex a try, but there seems a missing ( somewhere ! I tried to fix the issue but your regex is way over my capabilities ! If you are sed/regex fanatic a want to give it another try feel free :). Right now I found a solution with another user that works great here’s the script in question if you are interested:

    #! /bin/bash
    
    files="/home/USER/projects/test.md"
    
    mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
    mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"
    
    while IFS= read -r line; do
    	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
    	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
    	sed -i "s/$line/${dashlink}/" "$files"
    
    	#Puts everything to lowercase after a hashtag
    	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
    	sed -i "s/$dashlink/${lowercaselink}/" "$files"
    
    	#Replace spaces (%20) from markdown links to - after a hashtag
    	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
    	sed -i "s/$lowercaselink/${spacelink}/" "$files"
    
    done <<<"$mdlinks2"
    

    It’s not very elegant but it does the job… While working on it with another very friendly user I came across other thing I haven’t though of like:

    • Converting 1.2 to 1-2 (e.g. [Just a placeholder](#1.2%20Just%20a%20link%20to%20header))
    • Linking to another markdown file (e.g. [Just a placeholder](Another%20File.md#1.2%20Just%20a%20link%20to%20header))
    • The link to file before the # need to keeps it’s original form (e.g. [Just a placeholder](Another%20File.md#1-2-just-a-link-tp-header))

    Well I think that bare bone sed/regex wasn’t the right tool, but in a bash script it does exactly what I’m expecting :)

    Thanks for your help and pointers !