[JAVA] split vs StringTokenizer
κ²μκΈ
β° 2021-06-13 16:56:01
D O W N
D O W N
μκ³ λ¦¬μ¦μ νλ€λ³΄λ©΄, νμ°μ μΌλ‘ μ λ ₯κ° μ²λ¦¬λ₯Ό νκ²λλ€. λ€μν μΌμ΄μ€μ λμνκΈ° μν΄, μ¬μ©μμ κ°μ μ§μ μ λ ₯λ°μ μ΄λ₯Ό μ²λ¦¬νκ² λλ€. μ΄ λ, μ°λ¦¬λ μμ€νꡬ μ΄μ κ°μ μν©μ΄ λ°μνλ€. λ°μ΄ν°μ μ 보λ΄κΈ° μν΄ λ°μ΄ν°μ λͺ¨μμ ꡬλΆμ(곡백 νΉμ μΌν)λ₯Ό ν΅ν΄ νλμ λ¬Έμμ΄λ‘ ν©μ³ μ λ¬νλ€.
μ΄λ₯Όν λ©΄, μμ κ°μ΄ μ κ°μ λ°°μ΄μ μ λ¬νκΈ° μν΄, κ° μμλ₯Ό 곡백μΌλ‘ ꡬλΆνμ¬ μ κ°μ΄ μ λ¬νκ² λλ€. λ³΄ν΅ λ΄ κ²½μ° split λ©μλλ₯Ό νμ©νλλ°, μκ³ λ¦¬μ¦ νμ΄λ₯Ό μ°Ύμ보λ€λ³΄λ StringTokenizerμ΄λΌλ classλ₯Ό μ°λ μ½λλ€μ΄ λλ¬μμλ€. μ²μ보λ classμΈλ°λ€, μ κ·Όμ±μ΄ ν¨μ¬ λ°μ΄λ splitλ₯Ό κ΅³μ΄ λ체ν΄μ μ°λ μ΄μ κ° μμκ±°λΌ νλ¨. μ§μ νΌν¬λ¨Όμ€λ₯Ό λΉκ΅ν΄λ³΄κΈ°λ‘ νλ€. μκ³ λ¦¬μ¦μ μνμλ μμ μ€μν μ§νλ‘ μμ©νκΈ° λλ¬Έμ, μ‘°κΈμ΄λΌλ μκ°μ μ€μΌ νμκ° μλ€. μνκΉκ²λ λλ μ½λ μ΅μ ν μ€λ ₯μ΄ μ΅μ μ΄λΌ, μ΄λ°μμΌλ‘ μ€μΌ μ μλ μμν λΆλΆμ μ€μ¬μΌνλ€. ν΅μ¬ μ½λλ₯Ό μ΅μ νν μκ°μ μ νκ³ μ΄λ°λ°μ μκ°μ λ¨μΆνλκ² κΌ λ€μ΄μ΄νΈνλ΅μκ³ νΌμ λ¨ΉμΌλ©΄μ μ λ‘μ½λΌ λ§μλ λλμ΄κΈ΄ νλ, StringTokenizerμ΄ λ μ±λ₯μ΄ λ°μ΄λλ€λ©΄ μμΌλ‘ νΈλ μκ³ λ¦¬μ¦μ μ μ©ν κ°μΉκ° μμ κ²μ΄λ€.
κ΅¬λΆ | λ΄μ© |
---|---|
μΈμ΄ | |
OS | Windows 10 64bit |
CPU | Intel i7-10700K |
RAM | 32GB |
split λ©μλλ νΉμ ꡬλΆμλ‘ λ¬Έμμ΄μ λΆλ¦¬νλ μ ν΅μ μΈ λ©μλλ€. κ΅³μ΄ JAVAκ° μλλλΌλ C(++, #), JavaScript, Python λ± μ¬λ¬ μΈμ΄μ μ‘΄μ¬νλ ν€μλλΌ μ΄λ€ μΈμ΄λ λ¬Έμμ΄μ ꡬλΆν λ μ μΌ λ¨Όμ μλνλ λ°©λ²μ΄λ€.
JAVAμ splitμ λ¬Έμμ΄ λ°μ΄ν° νμμΈ String classμ ν¬ν¨λ λ©μλλ€. λ¬Έμμ΄ λ°μ΄ν°λΌλ©΄ splitλ₯Ό νΈμΆνμ¬ λ¬Έμμ΄μ ꡬλΆν μ μλ€. λ°νκ°μ String[] κ°μ²΄.
μ¬μ©λ²μ μλμ κ°λ€.
JAVA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
import java.util.Arrays; /** * λ©μΈ ν΄λμ€ * * @author RWB * @since 2021.06.13 Sun 22:50:57 */ public class Main { /** * λ©μΈ ν¨μ * * @param args: [String[]] 맀κ°λ³μ */ public static void main(String[] args) { String text = "A B C D"; String[] splited = text.split(" "); System.out.println(Arrays.toString(splited)); } }
μΆλ ₯μ μλμ κ°λ€.
TC
1
[A, B, C, D]
λ¬Έμμ΄ A B C Dκ° κ³΅λ°±μ κΈ°μ€μΌλ‘ [A, B, C, D]λ‘ λΆλ¦¬λκ±Έ νμΈν μ μλ€. κ·Έ λ°μ ν κ°μ§ νΉμ΄ν μ μ΄ μλλ°, JAVAμ split λ©μλλ ꡬλΆμμ μ κ·μμ μ μ©ν μ μλ€. μ΄λ₯Ό μλ§ μ΄μ©νλ©΄ 볡ν©μ μΈ κ΅¬λΆμλ₯Ό μ¬μ©ν μλ μλ€.
μ΄ ν¬μ€ν μ μ°κ² λ§λ μ§μ μ μΈ μμΈ. StringTokenizer μμ λ¬Έμμ΄μ ꡬλΆνλλ° νΉνλ classμ μΌμ’ μ΄λ€. String[]μ λ°ννλ splitκ³Ό λ¬λ¦¬ κ·Έ μμ²΄λ‘ νλμ κ°λ³μ μΈ classλΌλ μ°¨μ΄κ° μλ€.
StringTokenizer tokenizer = new StringTokenizer("λ¬Έμμ΄");κ³Ό κ°μ ννλ‘ μ΄κΈ°νν΄μ μ¬μ©νλ€. StringTokenizer μΈμ€ν΄μ€λ₯Ό μ¬μ©νλλ° μμλλ©΄ μ’μλ²ν λ©μλλ μλμ κ°λ€.
λ©μλ | λ°νκ° | λ΄μ© |
---|---|---|
countToken | int | ν ν°μ κ°―μ |
nextToken | String | λ€μ ν ν° |
hasMoreTokens | boolean | λ€μ ν ν°μ μ‘΄μ¬ μ 무 |
StringTokenizer tokenizer = new StringTokenizer("λ¬Έμμ΄", "ꡬλΆμ");μ κ°μ΄ μμ±μμ μΈμμ ꡬλΆμλ₯Ό μΆκ°νμ¬ μνλ ꡬλΆμλ‘ κ΅¬λΆνκ² ν μλ μλ€. λ³λλ‘ μ§μ νμ§ μλλ€λ©΄ ꡬλΆμλ \t\n\r\tλ‘, μ€λ°κΏ, 곡백, νμ ꡬλΆνλ€. μ¬κΈ°μ μ£Όμν μ μ΄ νλ μλλ°, κΈ°λ³Έ ꡬλΆμ \t\n\r\tλ μ€λ°κΏ, 곡백, νμ μ λΆ ν¬ν¨νλ€. μ¦, A B C D\nA B C Dμ κ°μ΄ 곡백과 μ€λ°κΏμ΄ νΌμ©λμ΄ μμ κ²½μ°, 곡백과 μ€λ°κΏμ μ λΆ κ΅¬λΆνμ¬ [A, B, C, D, A, B, C, D]μ κ°μ΄ μΆλ ₯λλ€. μμ±μμ ꡬλΆμλ₯Ό κ°μ λ‘ μ§μ ν΄μ€ κ²½μ°, μ΄λ₯Ό λ§μ μ μλ€. μ§μ μ§μ ν κ²½μ° κ³΅λ°±μ΄λ μ€λ°κΏμ΄ μλλλΌλ μ¬λ¬ λ¬Έμμ΄μ μ¬μ©ν μ μλ€.
JAVA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
import java.util.Arrays; import java.util.StringTokenizer; /** * λ©μΈ ν΄λμ€ * * @author RWB * @since 2021.06.13 Sun 23:48:14 */ public class Test { /** * λ©μΈ ν¨μ * * @param args: [String[]] 맀κ°λ³μ */ public static void main(String[] args) { String text = "A B C D"; StringTokenizer tokenizer = new StringTokenizer(text); String[] splited = new String[tokenizer.countTokens()]; for (int i = 0; i < splited.length; i++) { splited[i] = tokenizer.nextToken(); } System.out.println(Arrays.toString(splited)); } }
μΆλ ₯μ λμΌνλ€.
TC
1
[A, B, C, D]
κ·Έλ λ€λ©΄ splitκ³Ό StringTokenizerμ μ±λ₯μ μ΄λ¨κΉ? μ΄λ₯Ό λΉκ΅νκΈ° μν΄ κ°λ¨ν ν μ€νΈ νλ‘κ·Έλ¨μ λ§λ€μλ€.
μμ€λ μλμ κ°λ€.
JAVA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
import java.text.DecimalFormat; import java.util.Arrays; import java.util.Random; import java.util.StringTokenizer; /** * λ©μΈ ν΄λμ€ * * @author RWB * @since 2021.06.14 Mon 00:06:32 */ public class Main { /** * λ©μΈ ν¨μ * * @param args: [String[]] 맀κ°λ³μ */ public static void main(String[] args) { int t = 10000; long[] timer = { 0, 0 }; int[] sum = { 0, 0 }; for (int i = 0; i < t; i++) { int random = (int) ((Math.random() * (20 - 5)) + 5); String text = getTestString(random); // split λ‘μ§ ---------------------------------------- long timeStart = System.nanoTime(); String[] a1 = useSplit(text); long timeEnd = System.nanoTime() - timeStart; sum[0] += a1.length; timer[0] += timeEnd; System.out.println(Arrays.toString(a1) + ": " + addComma(timeEnd) + "ns"); // split λ‘μ§ ---------------------------------------- // StringTokenizer λ‘μ§ ---------------------------------------- timeStart = System.nanoTime(); String[] a2 = useStringTokenizer(text); timeEnd = System.nanoTime() - timeStart; sum[1] += a2.length; timer[1] += timeEnd; System.out.println(Arrays.toString(a2) + ": " + addComma(timeEnd) + "ns"); // StringTokenizer λ‘μ§ ---------------------------------------- } System.out.println(addComma(t) + "κ° λ°μ΄ν° κ·Έλ£Ή μν"); System.out.println(); System.out.println("split κ²°κ³Ό"); System.out.println(" * μ΄ μμ: " + addComma(timer[0]) + "ns"); System.out.println(" * νκ· μμ: " + addComma((timer[0] / t)) + "ns"); System.out.println(" * λΆν΄ν μμ: " + addComma(sum[0]) + "κ°"); System.out.println(); System.out.println("StringTokenizer κ²°κ³Ό"); System.out.println(" * μ΄ μμ: " + addComma(timer[1]) + "ns"); System.out.println(" * νκ· μμ: " + addComma((timer[1] / t)) + "ns"); System.out.println(" * λΆν΄ν μμ: " + addComma(sum[1]) + "κ°"); System.out.println(); System.out.println("split " + (timer[0] == timer[1] ? "==" : (timer[0] > timer[1]) ? "<" : ">") + " StringTokenizer"); } /** * ꡬλΆλ λ¬Έμμ΄ λ°ν ν¨μ (split) * * @param text: [String] λμ λ¬Έμμ΄ * * @return [String[]] ꡬλΆλ λ¬Έμμ΄ */ private static String[] useSplit(String text) { return text.split(" "); } /** * ꡬλΆλ λ¬Έμμ΄ λ°ν ν¨μ (StringTokenizer) * * @param text: [String] λμ λ¬Έμμ΄ * * @return [String[]] ꡬλΆλ λ¬Έμμ΄ */ private static String[] useStringTokenizer(String text) { StringTokenizer tokenizer = new StringTokenizer(text, " "); int count = tokenizer.countTokens(); String[] result = new String[count]; for (int i = 0; i < count; i++) { result[i] = tokenizer.nextToken(); } return result; } /** * 무μμ λ¬Έμμ΄ λ°ν ν¨μ * * @param n: [int] λ¬Έμ κ°―μ * * @return [String] 무μμ λ¬Έμ */ private static String getTestString(int n) { Random random = new Random(); StringBuilder builder = new StringBuilder(); for (int i = 0; i < n; i++) { builder.append((char) ((random.nextInt(26)) + 97)).append(" "); } return builder.toString().trim(); } /** * 1000 λ¨μ κ΅¬λΆ μ«μ λ°ν ν¨μ * * @param num: [long] λμ μ«μ * * @return [String] 1000 λ¨μ κ΅¬λΆ μ«μ */ private static String addComma(long num) { DecimalFormat format = new DecimalFormat(",###"); return format.format(num); } }
νμλ³λ‘ 10λ²μ© λλ¦° κ²°κ³Όλ₯Ό μλμ νλ‘ μ 리νλ€.
ν μ€νΈ νμ | split μ΄ μμ | StringTokenizer μ΄ μμ | μλ |
---|---|---|---|
1 | 80.3us | 44.8us | split < StringTokenizer |
2 | 83.7us | 46.2us | split < StringTokenizer |
3 | 136.6us | 31.8us | split < StringTokenizer |
4 | 111.3us | 40.4us | split < StringTokenizer |
5 | 93.4us | 32.2us | split < StringTokenizer |
6 | 104.5us | 28.7us | split < StringTokenizer |
7 | 40.1us | 42.7us | split > StringTokenizer |
8 | 40.1us | 42.7us | split > StringTokenizer |
9 | 104.7us | 28.3us | split < StringTokenizer |
10 | 38.3us | 29.2us | split < StringTokenizer |
ν λ²λ§ λ°λ³΅ν κ²½μ°, 8:2λ‘ StringTokenizerμ΄ μμΉνλ€.
ν μ€νΈ νμ | split μ΄ μμ | StringTokenizer μ΄ μμ | μλ |
---|---|---|---|
1 | 1.12ms | 0.602ms | split < StringTokenizer |
2 | 1.11ms | 0.612ms | split < StringTokenizer |
3 | 1.06ms | 0.562ms | split < StringTokenizer |
4 | 1.02ms | 0.595ms | split < StringTokenizer |
5 | 1.ms | 0.550ms | split < StringTokenizer |
6 | 1.16ms | 0.651ms | split < StringTokenizer |
7 | 98ms | 0.558ms | split < StringTokenizer |
8 | 1.11ms | 0.627ms | split < StringTokenizer |
9 | 0.981ms | 0.555ms | split < StringTokenizer |
10 | 1.23ms | 0.666ms | split < StringTokenizer |
100λ²μ λ°λ³΅ν λ μμ 10:0μΌλ‘ StringTokenizerμ΄ μμΉνλ€.
ν μ€νΈ νμ | split μ΄ μμ | StringTokenizer μ΄ μμ | μλ |
---|---|---|---|
1 | 3.00ms | 3.17ms | split > StringTokenizer |
2 | 2.53ms | 2.71ms | split > StringTokenizer |
3 | 2.79ms | 2.84ms | split > StringTokenizer |
4 | 2.53ms | 2.67ms | split > StringTokenizer |
5 | 2.67ms | 2.97ms | split > StringTokenizer |
6 | 2.58ms | 2.87ms | split > StringTokenizer |
7 | 2.48ms | 2.65ms | split > StringTokenizer |
8 | 2.69ms | 3.01ms | split > StringTokenizer |
9 | 2.50ms | 2.90ms | split > StringTokenizer |
10 | 2.62ms | 2.94ms | split > StringTokenizer |
, μ²λΌ λμ΄κ°λ€κ° λ¬κΈμμ΄ 1000μ λ£μ μ΄μ λ, μ΄μνκ² μΌ λ splitμ΄ μμΉνλ€.
ν μ€νΈ νμ | split μ΄ μμ | StringTokenizer μ΄ μμ | μλ |
---|---|---|---|
1 | 9.91ms | 9.27ms | split < StringTokenizer |
2 | 9.49ms | 9.19ms | split < StringTokenizer |
3 | 9.02ms | 8.61ms | split < StringTokenizer |
4 | 9.95ms | 9.25ms | split < StringTokenizer |
5 | 9.03ms | 8.87ms | split < StringTokenizer |
6 | 8.83ms | 9.08ms | split > StringTokenizer |
7 | 9.14ms | 8.68ms | split < StringTokenizer |
8 | 9.28ms | 9.07ms | split < StringTokenizer |
9 | 9.49ms | 9.66ms | split > StringTokenizer |
10 | 11.79ms | 11.20ms | split < StringTokenizer |
λ€μ 8:2λ‘ StringTokenizerμ΄ μμΉνλ€.
ν μ€νΈ νμ | split μ΄ μμ | StringTokenizer μ΄ μμ | μλ |
---|---|---|---|
1 | 306.86ms | 373.06ms | split > StringTokenizer |
2 | 287.26ms | 262.05ms | split < StringTokenizer |
3 | 289.92ms | 255.51ms | split < StringTokenizer |
4 | 272.43ms | 267.96ms | split < StringTokenizer |
5 | 278.35ms | 322.28ms | split > StringTokenizer |
6 | 285.23ms | 264.57ms | split < StringTokenizer |
7 | 273.37ms | 268.18ms | split < StringTokenizer |
8 | 278.65ms | 264.34ms | split < StringTokenizer |
9 | 278.56ms | 266.62ms | split < StringTokenizer |
10 | 306.00ms | 256.56ms | split < StringTokenizer |
8:2λ‘ StringTokenizerμ΄ μμΉνλ€.
μ΄λΌλ νΉμν μν©μ μ μΈνκ³ λ 보νΈμ μΌλ‘ StringTokenizerκ° μ±λ₯μ΄ λ μ°μνλ€. μ λ° νμμ΄ μ λ°μνλμ§ μ΄ν΄λ μ μ λλ€. λ¬Όλ‘ ν΅κ³λΌλκ² μ«μκ° ν΄ μλ‘ μλ―Έκ° μ»€μ§λ―λ‘ 10λ²μ΄λΌλ μμ νμλ§μΌλ‘ λ¨μ μ§κΈ΄ μ΄λ ΅λ€.
νμ¬ μ»΄ν¨ν°(AMD Ryzen 2700X)μμλ λͺ¨λ μΌμ΄μ€μμ StringTokenizerμ μλκ° λΉ¨λλ€. CPUμ λ°λΌ μ°μ° κ²°κ³Όλ λ°©μμ μ‘°κΈμ© μ°¨μ΄κ° μμ μ μκ² λ€.
π JAVA APIμ μνλ©΄, StringTokenizerμ νμ νΈνμ±μ 보μ₯νκΈ° μν λ κ±°μ ν΄λμ€λΌκ³ νλ€. JAVA APIλ κ°κΈμ StringTokenizerλ³΄λ€ split λ΄μ§λ regex ν¨ν€μ§λ₯Ό νμ©νλλ‘ κΆκ³ νκ³ μλ€.
νμ μμΉ μ StringTokenizerκ° splitμ λΉν΄ μ΅λ μ½ 20% μ λ λ λΉ λ₯΄λ€. νμ§λ§ JAVA APIμμ κ°κΈμ λ€λ₯Έ λ체μ λ₯Ό μ¬μ©νλλ‘ κΆκ³ νκ³ μκ³ , λ°±λ§λ²μ μ°μ°μλ λ¨μμμ μμ§μΈλ€. μλμ μΌλ‘ μ°¨μ΄κ° μμ΄λ κ°κ΄μ μΈ μ§νλ‘ λ΄€μλ λ³λ€λ₯Έ μ°¨μ΄κ° μλ μ . λ¬Έμμ΄ λΆλ¦¬νμκ³ μλ‘μ΄ classλ₯Ό λ€λ£° λ°μ κ·Έλ₯ λ¬Έμμ΄ μ체λ₯Ό λ€λ£¨λ splitμ μ¬μ©νλ κ² λ ν¨μ¨μ μ΄λΌ μκ°νλ€.
π·οΈ Related Tag