Ran across this broken link here. Summit shows links differently to the webui, and probably also other apps. I think matching webui behavior is best here.

There seems to be more complex regex at play, including checking for matching brackets etc… Probably best to dig that out of the sourcecode.

Examples:

Summit: https://en.m.wikipedia.org/wiki/Orion_(spacecraft)
Raw: https://en.m.wikipedia.org/wiki/Orion_(spacecraft)
Webui: https://en.m.wikipedia.org/wiki/Orion_(spacecraft)

Summit: https://example.org/aa(aa
Raw: https://example.org/aa(aa
Webui: https://example.org/aa(aa

Summit: https://example.org/aa(aa)
Raw: https://example.org/aa(aa)
Webui: https://example.org/aa(aa)

Interestingly while writing this, I noticed the rules for android long press text select also match webui behavior.

  • idunnololz@lemmy.worldM
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    3 days ago

    Urg. I hate dealing with hyperlinks. The reason why the ) isn’t included is because I didn’t want to match the paren in case the url is enclosed in paren. Eg. (https://google.com/). But clearly there are edge cases where it needs to be included.

    • redjard@lemmy.dbzer0.comOP
      link
      fedilink
      arrow-up
      2
      ·
      3 days ago

      I got linkify-it to run with nodejs with some minor modifications and this is the output of console.log(re.tpl_link_fuzzy);: https://files.catbox.moe/8y1bfx.regex (tpl_link_fuzzy.regex, 18.47kiB)

      Just paste 19kB of raw regex into your code, noone has ever regretted pasting 19kB of regex into their code.

      • idunnololz@lemmy.worldM
        link
        fedilink
        arrow-up
        1
        ·
        3 days ago

        This doesn’t convert cleanly to java/kotlin. At least one of the groups is messed up and I am not going to go through 19,000 characters to find each one. I found a library that looks promising and I’ll try that instead.

    • redjard@lemmy.dbzer0.comOP
      link
      fedilink
      arrow-up
      1
      ·
      3 days ago

      What markdown-it does is match parentheses across the path only. It makes sense to parse urls component by component, for example protocol and domain can’t contain those characters anyway.

    • idunnololz@lemmy.worldM
      link
      fedilink
      arrow-up
      1
      ·
      3 days ago

      Interesting, there is some backend logic that prevents links within paren from ending in an alpha numeric character. For instance if I send the comment (https://google.com/), Lemmy auto changes it to (https://google.com/). I wonder if this is done to make it easier to parse links.

      test: (https://google.com/)