Ran across this broken link here. Summit shows links differently to the webui, and probably also other apps. I think matching webui behavior is best here.
There seems to be more complex regex at play, including checking for matching brackets etc… Probably best to dig that out of the sourcecode.
Examples:
Summit:
https://en.m.wikipedia.org/wiki/Orion_(spacecraft)
Raw:
https://en.m.wikipedia.org/wiki/Orion_(spacecraft)
Webui:
https://en.m.wikipedia.org/wiki/Orion_(spacecraft)
Summit:
https://example.org/aa(aa
Raw:
https://example.org/aa(aa
Webui:
https://example.org/aa(aa
Summit:
https://example.org/aa(aa)
Raw:
https://example.org/aa(aa)
Webui:
https://example.org/aa(aa)
Interestingly while writing this, I noticed the rules for android long press text select also match webui behavior.
Lemmyui seems to be using markdown-it, so the spec might be this
re.tpl_link_fuzzy
: https://github.com/markdown-it/linkify-it/blob/master/lib/re.mjsThat looks like it includes the bracket logic.
Urg. I hate dealing with hyperlinks. The reason why the ) isn’t included is because I didn’t want to match the paren in case the url is enclosed in paren. Eg. (https://google.com/). But clearly there are edge cases where it needs to be included.
I got linkify-it to run with nodejs with some minor modifications and this is the output of
console.log(re.tpl_link_fuzzy);
: https://files.catbox.moe/8y1bfx.regex (tpl_link_fuzzy.regex, 18.47kiB)Just paste 19kB of raw regex into your code, noone has ever regretted pasting 19kB of regex into their code.
This doesn’t convert cleanly to java/kotlin. At least one of the groups is messed up and I am not going to go through 19,000 characters to find each one. I found a library that looks promising and I’ll try that instead.
Understandable. I have an even worse idea then: https://files.catbox.moe/71dzf7.base64 (tpl_link_fuzzy.regex.base64, 24.63kiB)
Take this base64, and decode it in kotlin into a string variable. And then maybe make kotlin give it to you in a form you can paste back into the code idk
I will consider this is all else fails but I also don’t have high hopes this regex would even work.
What markdown-it does is match parentheses across the path only. It makes sense to parse urls component by component, for example protocol and domain can’t contain those characters anyway.
Interesting, there is some backend logic that prevents links within paren from ending in an alpha numeric character. For instance if I send the comment
(https://google.com/)
, Lemmy auto changes it to(https://google.com/)
. I wonder if this is done to make it easier to parse links.test: (https://google.com/)