H
hiwa
Below is a simple demo program for the problem I have encountered
during a development. The java.util.regex seems that it can't match if
the matched substring is too long. If you slice the long line in the
test data into four to five pieces of matched substrings with "><",
then the demo program runs fine. The test data has no < nor > in "..."
quotations.
---demo program---
---test data(test.txt)-number of lines is 4, meta "description" is the
long line---
<meta http-equiv="content-type" content="text/html;
charset=windows-1252">
<meta name="keywords" content=" Java, J2EE, Enterprise Java,
J2ME, Java 2 Micro Edition, perfect">
<meta name="description" content="Java is like any development
platform/language combination most developers have a love-hate
relationship with it. Sure, for Java aficionados it's better than
using .Net, LAMP, or (add your own particular poison here), but we
bemoan the complexity of Swing, the bulkiness of the Enterprise
JavaBeans (EJB) specification, performance, additional overheads
imposed on skimpy hardware by the Java 2 Platform, Micro Edition
(J2ME) platform, the 101 different ways to do things, and on and on.
If we could just address Java's weak points, we might make Java that
mythical beast?the perfect technology platform...So then, what are
those changes? Is there such a thing as the perfect technology
platform, and does Java have the potential to become it? (3,500 words;
January 2, 2004)">
<meta name="GOOGLEBOT" content="NOARCHIVE">
<style type="text/css">
---end test data-------
during a development. The java.util.regex seems that it can't match if
the matched substring is too long. If you slice the long line in the
test data into four to five pieces of matched substrings with "><",
then the demo program runs fine. The test data has no < nor > in "..."
quotations.
---demo program---
Code:
import java.io.*;
import java.util.regex.*;
public class TagMatchTest{
public static void main(String[] args) throws IOException{
String line;
StringBuffer sb = new StringBuffer();
BufferedReader br = new BufferedReader(new
FileReader("test.txt"));
while ((line = br.readLine()) != null){
sb.append(line);
}
Pattern pat = Pattern.compile("<[^>]*>"); //find tags
Matcher mat = pat.matcher(new String(sb));
while (mat.find()){
System.out.println(mat.group());
}
}
}
long line---
<meta http-equiv="content-type" content="text/html;
charset=windows-1252">
<meta name="keywords" content=" Java, J2EE, Enterprise Java,
J2ME, Java 2 Micro Edition, perfect">
<meta name="description" content="Java is like any development
platform/language combination most developers have a love-hate
relationship with it. Sure, for Java aficionados it's better than
using .Net, LAMP, or (add your own particular poison here), but we
bemoan the complexity of Swing, the bulkiness of the Enterprise
JavaBeans (EJB) specification, performance, additional overheads
imposed on skimpy hardware by the Java 2 Platform, Micro Edition
(J2ME) platform, the 101 different ways to do things, and on and on.
If we could just address Java's weak points, we might make Java that
mythical beast?the perfect technology platform...So then, what are
those changes? Is there such a thing as the perfect technology
platform, and does Java have the potential to become it? (3,500 words;
January 2, 2004)">
<meta name="GOOGLEBOT" content="NOARCHIVE">
<style type="text/css">
---end test data-------