Regex returning less number of groups - where is the error?

K

Kimie Nakahara

Hello!

I'm trying to build a regex to count the number of modified files by
commit. For example, for the following svn log output:

------------------------------------------------------------------------
r727998 | bentmann | 2008-12-19 21:50:48 +1100 (Fri, 19 Dec 2008) | 4
lines
Changed paths:
A
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_de.properties
M
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_en.properties

[MNG-3451] [MNG-3790] German localization for maven-core
Submitted by: Christian Schulte

o Applied with minor modifications
------------------------------------------------------------------------
r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) | 2 lines
Changed paths:
M /maven/components/branches/maven-2.0.x
M
/maven/components/branches/maven-2.0.x/maven-core/src/main/java/org/apache/maven/cli/MavenCli.java

[MNG-1830] use ISO 8601 format (not combined for readability)

------------------------------------------------------------------------

It need to return the revision number and the number of modified files,
so the result of match would be something like :

Result 1:
1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) |
2 lines
2. A
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_de.properties
3. M
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_en.properties"]
Result 2:
1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) |
2 lines
2. M /maven/components/branches/maven-2.0.x
3. M
/maven/components/branches/maven-2.0.x/maven-core/src/main/java/org/apache/maven/cli/MavenCli.java

I tried many variations of the following regular expression:

/(^r\d+.*?)(?:^Changed paths:\n)(^\s*[MDA]\s(?:\/[\w.-]+)+/m

but either it returns incomplete results, as:

Result 1

1. r727998 | bentmann | 2008-12-19 21:50:48 +1100 (Fri, 19 Dec 2008) | 4
lines
2. A
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_de.properties

Result 2

1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) | 2
lines
2. M /maven/components/branches/maven-2.0.x

Would someone be able to help me on this? Thanks in advance!!

Kimie
 
H

Heesob Park

Hi,

2009/3/16 Kimie Nakahara said:
Hello!

I'm trying to build a regex to count the number of modified files by
commit. For example, for the following svn log output:

------------------------------------------------------------------------
r727998 | bentmann | 2008-12-19 21:50:48 +1100 (Fri, 19 Dec 2008) | 4
lines
Changed paths:
=C2=A0 A
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/= apache/maven/messages/messages_de.properties
=C2=A0 M
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/= apache/maven/messages/messages_en.properties

[MNG-3451] [MNG-3790] German localization for maven-core
Submitted by: Christian Schulte

o Applied with minor modifications
------------------------------------------------------------------------
r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) | 2 lines
Changed paths:
=C2=A0 M /maven/components/branches/maven-2.0.x
=C2=A0 M
/maven/components/branches/maven-2.0.x/maven-core/src/main/java/org/apach= e/maven/cli/MavenCli.java

[MNG-1830] use ISO 8601 format (not combined for readability)

------------------------------------------------------------------------

It need to return the revision number and the number of modified files,
so the result of match would be something like :

Result 1:
=C2=A0 =C2=A0 1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec= 2008) |
2 lines
=C2=A0 =C2=A0 2. A
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/= apache/maven/messages/messages_de.properties
=C2=A0 =C2=A0 3. M
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/= apache/maven/messages/messages_en.properties"]
Result 2:
=C2=A0 =C2=A01. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec = 2008) |
2 lines
=C2=A0 =C2=A0 2. M /maven/components/branches/maven-2.0.x
=C2=A0 =C2=A0 3. M
/maven/components/branches/maven-2.0.x/maven-core/src/main/java/org/apach= e/maven/cli/MavenCli.java

I tried many variations of the following regular expression:

/(^r\d+.*?)(?:^Changed paths:\n)(^\s*[MDA]\s(?:\/[\w.-]+)+/m

but either it returns incomplete results, as:

Result 1

1. r727998 | bentmann | 2008-12-19 21:50:48 +1100 (Fri, 19 Dec 2008) | 4
lines
2. A
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/= apache/maven/messages/messages_de.properties

Result 2

1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) | 2
lines
2. M /maven/components/branches/maven-2.0.x

Would someone be able to help me on this? Thanks in advance!!
I guess you want this:
/(^r\d+.*?)(?:^Changed paths:\n)((?:^\s*[MDA]\s\/[\w.\/-]+\n)+)/m

Regards,

Park Heesob
 
K

Kimie Nakahara

Hi Park, thank you for the quick answer. Using your regex, I get the
following result:

Result 1

1. r727998 | bentmann | 2008-12-19 21:50:48 +1100 (Fri, 19 Dec 2008) | 4
lines
2. A
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_de.properties
M
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_en.properties

Result 2

1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) | 2
lines
2. M /maven/components/branches/maven-2.0.x
M
/maven/components/branches/maven-2.0.x/maven-core/src/main/java/org/apache/maven/cli/MavenCli.java

I returns all modified lines, but in the same group. So, for example,
to know how many files were modified in revision 727998, I would have to
work on the result[0][1] and result[1][1] ( (or 1.2 and 2.2 as in
example above) to extract the number of modified files. Maybe splitting
the text and counting how many files were modified (1 per line).
Actually I tried to do it splitting by \n, but it didn't work. So what I
was wondering if it possible to return each modified line as a diferent
group, as I said in the first email.

Or, if you can suggest a way to break the lines of results[0][1] and
results[1][1], it would make it too!

Thank you!
Kimie
 
H

Heesob Park

2009/3/16 Kimie Nakahara said:
Hi Park, thank you for the quick answer. Using your regex, I get the
following result:

Result 1

1. r727998 | bentmann | 2008-12-19 21:50:48 +1100 (Fri, 19 Dec 2008) | 4
lines
2. A
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/= apache/maven/messages/messages_de.properties
M
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/=
apache/maven/messages/messages_en.properties

Result 2

1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) | 2
lines
2. M /maven/components/branches/maven-2.0.x
M
/maven/components/branches/maven-2.0.x/maven-core/src/main/java/org/apach= e/maven/cli/MavenCli.java

I returns all modified lines, but in the same group. So, for example,
to know how many files were modified in revision 727998, I would have to
work on the result[0][1] and result[1][1] ( (or 1.2 and 2.2 as in
example above) to extract the number of modified files. =C2=A0Maybe split= ting
the text and counting how many files were modified (1 per line).
Actually I tried to do it splitting by \n, but it didn't work. So what I
was wondering if it possible to return each modified line as a diferent
group, as I said in the first email.

Or, if you can suggest a way to break the lines of results[0][1] and
results[1][1], it would make it too!
Or try with the more simple regex:
/(^r\d+.*?)(?:^Changed paths:\n)|(^\s*[MDA]\s\/[\w.\/-]+\n)/m

Regards,

Park Heesob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,989
Messages
2,570,207
Members
46,782
Latest member
ThomasGex

Latest Threads

Top