Clickable links in Chat are broken for many valid URL formats
The regex used in ChatClickData doesn't match on valid URLs which use an IP4 address, port, hash (#), or have comma in the query string. This makes a large variety of valid URLs "un-clickable" in the Chat GUI.
ChatClickData
Pattern pattern = Pattern.compile("^(?:(https?)://)?([-\\w_\\.]{2,}\\.[a-z]{2,4})(/\\S*)?$");
As a fix, the proposed regex will accept either TLD or IP4, optionally allows port, and accepts a hash or commas in the query string:
ChatClickData
Pattern pattern = Pattern.compile("^(https?:\\/\\/)?"+ // protocol "((([a-z\\d]([a-z\\d-]*[a-z\\d])*)\\.)+[a-z]{2,}|"+ // domain name "((\\d{1,3}\\.){3}\\d{1,3}))"+ // OR ip (v4) address "(\\:\\d+)?(\\/[-a-z\\d%_.~+]*)*"+ // port and path "(\\?[;&a-z\\d%_.~+=-]*)?"+ // query string "(\\#[-a-z\\d_]*)?$"); // hash
Here's a JSFiddle showing the current and proposed regexes in action:
http://jsfiddle.net/mwoodman/UycV9/
Note that the current regex is failing to match on more than half of the URLs tested (2766 out of 4704). The proposed regex passes all of the URLs tested.
2013-05-02, 11:35 PM
2016-11-07, 02:14 AM
2016-11-06, 10:16 PM
5
6