文
章
目
录
章
目
录
String string = "www.panziye.com";
byte[] bytes = string.getBytes();
Java示例,使用HTML实体来转义字符串中的字符。编码过程将Java字符串转换为等效的HTML内容,供浏览器打印。
1.使用StringEscapeUtils.escapeHtml4()
StringEscapeUtils类是Apache common-text库的一部分。它接受一个原始字符串作为参数,然后使用HTML实体来转义字符。它支持所有已知的HTML 4.0实体。
请注意,撇号转义字符(’)不是合法实体,因此不受支持。
1.1. Maven依赖
要使用StringEscapeUtils,请导入最新版本的commons-text依赖。
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.4</version>
</dependency>
1.2. 例子
现在我们使用StringEscapeUtils.escapeHtml4()
方法如下:
String unEscapedString = "<java>public static void main(String[] args) { ... }</java>";
String escapedHTML = StringEscapeUtils.escapeHtml4(unEscapedString);
System.out.println(escapedHTML); //Browser can now parse this and print
程序输出:
<java>public static void main(String[] args) { ... }</java>
2.自定义StringUtils.encodeHtml()
如果我们有某些要求需要修改库方法提供的逻辑,我们可以编写自己的方法。大多数情况下应该避免这种方法,但当出现需求时它可能会很方便。
String unEscapedString = "<java>public static void main(String[] args) { ... }</java>";
String escapedHTML = StringUtils.encodeHtml(unEscapedString);
System.out.println(escapedHTML); //Browser can now parse this and print
StringUtils类如下。我们可以添加、修改或删除htmlEncodeChars映射中的条目来自定义编码函数的行为。
import java.util.HashMap;
public class StringUtils
{
private static final HashMap<Character, String> htmlEncodeChars = new HashMap<>();
static {
// Special characters for HTML
htmlEncodeChars.put('\u0026', "&");
htmlEncodeChars.put('\u003C', "<");
htmlEncodeChars.put('\u003E', ">");
htmlEncodeChars.put('\u0022', """);
htmlEncodeChars.put('\u0152', "Œ");
htmlEncodeChars.put('\u0153', "œ");
htmlEncodeChars.put('\u0160', "Š");
htmlEncodeChars.put('\u0161', "š");
htmlEncodeChars.put('\u0178', "Ÿ");
htmlEncodeChars.put('\u02C6', "ˆ");
htmlEncodeChars.put('\u02DC', "˜");
htmlEncodeChars.put('\u2002', " ");
htmlEncodeChars.put('\u2003', " ");
htmlEncodeChars.put('\u2009', " ");
htmlEncodeChars.put('\u200C', "‌");
htmlEncodeChars.put('\u200D', "‍");
htmlEncodeChars.put('\u200E', "‎");
htmlEncodeChars.put('\u200F', "‏");
htmlEncodeChars.put('\u2013', "–");
htmlEncodeChars.put('\u2014', "—");
htmlEncodeChars.put('\u2018', "‘");
htmlEncodeChars.put('\u2019', "’");
htmlEncodeChars.put('\u201A', "‚");
htmlEncodeChars.put('\u201C', "“");
htmlEncodeChars.put('\u201D', "”");
htmlEncodeChars.put('\u201E', "„");
htmlEncodeChars.put('\u2020', "†");
htmlEncodeChars.put('\u2021', "‡");
htmlEncodeChars.put('\u2030', "‰");
htmlEncodeChars.put('\u2039', "‹");
htmlEncodeChars.put('\u203A', "›");
htmlEncodeChars.put('\u20AC', "€");
// Character entity references for ISO 8859-1 characters
htmlEncodeChars.put('\u00A0', " ");
htmlEncodeChars.put('\u00A1', "¡");
htmlEncodeChars.put('\u00A2', "¢");
htmlEncodeChars.put('\u00A3', "£");
htmlEncodeChars.put('\u00A4', "¤");
htmlEncodeChars.put('\u00A5', "¥");
htmlEncodeChars.put('\u00A6', "¦");
htmlEncodeChars.put('\u00A7', "§");
htmlEncodeChars.put('\u00A8', "¨");
htmlEncodeChars.put('\u00A9', "©");
htmlEncodeChars.put('\u00AA', "ª");
htmlEncodeChars.put('\u00AB', "«");
htmlEncodeChars.put('\u00AC', "¬");
htmlEncodeChars.put('\u00AD', "­");
htmlEncodeChars.put('\u00AE', "®");
htmlEncodeChars.put('\u00AF', "¯");
htmlEncodeChars.put('\u00B0', "°");
htmlEncodeChars.put('\u00B1', "±");
htmlEncodeChars.put('\u00B2', "²");
htmlEncodeChars.put('\u00B3', "³");
htmlEncodeChars.put('\u00B4', "´");
htmlEncodeChars.put('\u00B5', "µ");
htmlEncodeChars.put('\u00B6', "¶");
htmlEncodeChars.put('\u00B7', "·");
htmlEncodeChars.put('\u00B8', "¸");
htmlEncodeChars.put('\u00B9', "¹");
htmlEncodeChars.put('\u00BA', "º");
htmlEncodeChars.put('\u00BB', "»");
htmlEncodeChars.put('\u00BC', "¼");
htmlEncodeChars.put('\u00BD', "½");
htmlEncodeChars.put('\u00BE', "¾");
htmlEncodeChars.put('\u00BF', "¿");
htmlEncodeChars.put('\u00C0', "À");
htmlEncodeChars.put('\u00C1', "Á");
htmlEncodeChars.put('\u00C2', "Â");
htmlEncodeChars.put('\u00C3', "Ã");
htmlEncodeChars.put('\u00C4', "Ä");
htmlEncodeChars.put('\u00C5', "Å");
htmlEncodeChars.put('\u00C6', "Æ");
htmlEncodeChars.put('\u00C7', "Ç");
htmlEncodeChars.put('\u00C8', "È");
htmlEncodeChars.put('\u00C9', "É");
htmlEncodeChars.put('\u00CA', "Ê");
htmlEncodeChars.put('\u00CB', "Ë");
htmlEncodeChars.put('\u00CC', "Ì");
htmlEncodeChars.put('\u00CD', "Í");
htmlEncodeChars.put('\u00CE', "Î");
htmlEncodeChars.put('\u00CF', "Ï");
htmlEncodeChars.put('\u00D0', "Ð");
htmlEncodeChars.put('\u00D1', "Ñ");
htmlEncodeChars.put('\u00D2', "Ò");
htmlEncodeChars.put('\u00D3', "Ó");
htmlEncodeChars.put('\u00D4', "Ô");
htmlEncodeChars.put('\u00D5', "Õ");
htmlEncodeChars.put('\u00D6', "Ö");
htmlEncodeChars.put('\u00D7', "×");
htmlEncodeChars.put('\u00D8', "Ø");
htmlEncodeChars.put('\u00D9', "Ù");
htmlEncodeChars.put('\u00DA', "Ú");
htmlEncodeChars.put('\u00DB', "Û");
htmlEncodeChars.put('\u00DC', "Ü");
htmlEncodeChars.put('\u00DD', "Ý");
htmlEncodeChars.put('\u00DE', "Þ");
htmlEncodeChars.put('\u00DF', "ß");
htmlEncodeChars.put('\u00E0', "à");
htmlEncodeChars.put('\u00E1', "á");
htmlEncodeChars.put('\u00E2', "â");
htmlEncodeChars.put('\u00E3', "ã");
htmlEncodeChars.put('\u00E4', "ä");
htmlEncodeChars.put('\u00E5', "å");
htmlEncodeChars.put('\u00E6', "æ");
htmlEncodeChars.put('\u00E7', "ç");
htmlEncodeChars.put('\u00E8', "è");
htmlEncodeChars.put('\u00E9', "é");
htmlEncodeChars.put('\u00EA', "ê");
htmlEncodeChars.put('\u00EB', "ë");
htmlEncodeChars.put('\u00EC', "ì");
htmlEncodeChars.put('\u00ED', "í");
htmlEncodeChars.put('\u00EE', "î");
htmlEncodeChars.put('\u00EF', "ï");
htmlEncodeChars.put('\u00F0', "ð");
htmlEncodeChars.put('\u00F1', "ñ");
htmlEncodeChars.put('\u00F2', "ò");
htmlEncodeChars.put('\u00F3', "ó");
htmlEncodeChars.put('\u00F4', "ô");
htmlEncodeChars.put('\u00F5', "õ");
htmlEncodeChars.put('\u00F6', "ö");
htmlEncodeChars.put('\u00F7', "÷");
htmlEncodeChars.put('\u00F8', "ø");
htmlEncodeChars.put('\u00F9', "ù");
htmlEncodeChars.put('\u00FA', "ú");
htmlEncodeChars.put('\u00FB', "û");
htmlEncodeChars.put('\u00FC', "ü");
htmlEncodeChars.put('\u00FD', "ý");
htmlEncodeChars.put('\u00FE', "þ");
htmlEncodeChars.put('\u00FF', "ÿ");
// Mathematical, Greek and Symbolic characters for HTML
htmlEncodeChars.put('\u0192', "ƒ");
htmlEncodeChars.put('\u0391', "Α");
htmlEncodeChars.put('\u0392', "Β");
htmlEncodeChars.put('\u0393', "Γ");
htmlEncodeChars.put('\u0394', "Δ");
htmlEncodeChars.put('\u0395', "Ε");
htmlEncodeChars.put('\u0396', "Ζ");
htmlEncodeChars.put('\u0397', "Η");
htmlEncodeChars.put('\u0398', "Θ");
htmlEncodeChars.put('\u0399', "Ι");
htmlEncodeChars.put('\u039A', "Κ");
htmlEncodeChars.put('\u039B', "Λ");
htmlEncodeChars.put('\u039C', "Μ");
htmlEncodeChars.put('\u039D', "Ν");
htmlEncodeChars.put('\u039E', "Ξ");
htmlEncodeChars.put('\u039F', "Ο");
htmlEncodeChars.put('\u03A0', "Π");
htmlEncodeChars.put('\u03A1', "Ρ");
htmlEncodeChars.put('\u03A3', "Σ");
htmlEncodeChars.put('\u03A4', "Τ");
htmlEncodeChars.put('\u03A5', "Υ");
htmlEncodeChars.put('\u03A6', "Φ");
htmlEncodeChars.put('\u03A7', "Χ");
htmlEncodeChars.put('\u03A8', "Ψ");
htmlEncodeChars.put('\u03A9', "Ω");
htmlEncodeChars.put('\u03B1', "α");
htmlEncodeChars.put('\u03B2', "β");
htmlEncodeChars.put('\u03B3', "γ");
htmlEncodeChars.put('\u03B4', "δ");
htmlEncodeChars.put('\u03B5', "ε");
htmlEncodeChars.put('\u03B6', "ζ");
htmlEncodeChars.put('\u03B7', "η");
htmlEncodeChars.put('\u03B8', "θ");
htmlEncodeChars.put('\u03B9', "ι");
htmlEncodeChars.put('\u03BA', "κ");
htmlEncodeChars.put('\u03BB', "λ");
htmlEncodeChars.put('\u03BC', "μ");
htmlEncodeChars.put('\u03BD', "ν");
htmlEncodeChars.put('\u03BE', "ξ");
htmlEncodeChars.put('\u03BF', "ο");
htmlEncodeChars.put('\u03C0', "π");
htmlEncodeChars.put('\u03C1', "ρ");
htmlEncodeChars.put('\u03C2', "ς");
htmlEncodeChars.put('\u03C3', "σ");
htmlEncodeChars.put('\u03C4', "τ");
htmlEncodeChars.put('\u03C5', "υ");
htmlEncodeChars.put('\u03C6', "φ");
htmlEncodeChars.put('\u03C7', "χ");
htmlEncodeChars.put('\u03C8', "ψ");
htmlEncodeChars.put('\u03C9', "ω");
htmlEncodeChars.put('\u03D1', "ϑ");
htmlEncodeChars.put('\u03D2', "ϒ");
htmlEncodeChars.put('\u03D6', "ϖ");
htmlEncodeChars.put('\u2022', "•");
htmlEncodeChars.put('\u2026', "…");
htmlEncodeChars.put('\u2032', "′");
htmlEncodeChars.put('\u2033', "″");
htmlEncodeChars.put('\u203E', "‾");
htmlEncodeChars.put('\u2044', "⁄");
htmlEncodeChars.put('\u2118', "℘");
htmlEncodeChars.put('\u2111', "ℑ");
htmlEncodeChars.put('\u211C', "ℜ");
htmlEncodeChars.put('\u2122', "™");
htmlEncodeChars.put('\u2135', "ℵ");
htmlEncodeChars.put('\u2190', "←");
htmlEncodeChars.put('\u2191', "↑");
htmlEncodeChars.put('\u2192', "→");
htmlEncodeChars.put('\u2193', "↓");
htmlEncodeChars.put('\u2194', "↔");
htmlEncodeChars.put('\u21B5', "↵");
htmlEncodeChars.put('\u21D0', "⇐");
htmlEncodeChars.put('\u21D1', "⇑");
htmlEncodeChars.put('\u21D2', "⇒");
htmlEncodeChars.put('\u21D3', "⇓");
htmlEncodeChars.put('\u21D4', "⇔");
htmlEncodeChars.put('\u2200', "∀");
htmlEncodeChars.put('\u2202', "∂");
htmlEncodeChars.put('\u2203', "∃");
htmlEncodeChars.put('\u2205', "∅");
htmlEncodeChars.put('\u2207', "∇");
htmlEncodeChars.put('\u2208', "∈");
htmlEncodeChars.put('\u2209', "∉");
htmlEncodeChars.put('\u220B', "∋");
htmlEncodeChars.put('\u220F', "∏");
htmlEncodeChars.put('\u2211', "∑");
htmlEncodeChars.put('\u2212', "−");
htmlEncodeChars.put('\u2217', "∗");
htmlEncodeChars.put('\u221A', "√");
htmlEncodeChars.put('\u221D', "∝");
htmlEncodeChars.put('\u221E', "∞");
htmlEncodeChars.put('\u2220', "∠");
htmlEncodeChars.put('\u2227', "∧");
htmlEncodeChars.put('\u2228', "∨");
htmlEncodeChars.put('\u2229', "∩");
htmlEncodeChars.put('\u222A', "∪");
htmlEncodeChars.put('\u222B', "∫");
htmlEncodeChars.put('\u2234', "∴");
htmlEncodeChars.put('\u223C', "∼");
htmlEncodeChars.put('\u2245', "≅");
htmlEncodeChars.put('\u2248', "≈");
htmlEncodeChars.put('\u2260', "≠");
htmlEncodeChars.put('\u2261', "≡");
htmlEncodeChars.put('\u2264', "≤");
htmlEncodeChars.put('\u2265', "≥");
htmlEncodeChars.put('\u2282', "⊂");
htmlEncodeChars.put('\u2283', "⊃");
htmlEncodeChars.put('\u2284', "⊄");
htmlEncodeChars.put('\u2286', "⊆");
htmlEncodeChars.put('\u2287', "⊇");
htmlEncodeChars.put('\u2295', "⊕");
htmlEncodeChars.put('\u2297', "⊗");
htmlEncodeChars.put('\u22A5', "⊥");
htmlEncodeChars.put('\u22C5', "⋅");
htmlEncodeChars.put('\u2308', "⌈");
htmlEncodeChars.put('\u2309', "⌉");
htmlEncodeChars.put('\u230A', "⌊");
htmlEncodeChars.put('\u230B', "⌋");
htmlEncodeChars.put('\u2329', "⟨");
htmlEncodeChars.put('\u232A', "⟩");
htmlEncodeChars.put('\u25CA', "◊");
htmlEncodeChars.put('\u2660', "♠");
htmlEncodeChars.put('\u2663', "♣");
htmlEncodeChars.put('\u2665', "♥");
htmlEncodeChars.put('\u2666', "♦");
}
public static String encodeHtml(String source) {
return encode(source, htmlEncodeChars);
}
private static String encode(String source, HashMap<Character, String> encodingTable) {
if (null == source) {
return null;
}
if (null == encodingTable) {
return source;
}
StringBuffer encoded_string = null;
char[] string_to_encode_array = source.toCharArray();
int last_match = -1;
int difference = 0;
for (int i = 0; i < string_to_encode_array.length; i++) {
char char_to_encode = string_to_encode_array[i];
if (encodingTable.containsKey(char_to_encode)) {
if (null == encoded_string) {
encoded_string = new StringBuffer(source.length());
}
difference = i - (last_match + 1);
if (difference > 0) {
encoded_string.append(string_to_encode_array, last_match + 1, difference);
}
encoded_string.append(encodingTable.get(char_to_encode));
last_match = i;
}
}
if (null == encoded_string) {
return source;
} else {
difference = string_to_encode_array.length - (last_match + 1);
if (difference > 0) {
encoded_string.append(string_to_encode_array, last_match + 1, difference);
}
return encoded_string.toString();
}
}
}