Wondering about JCR

T

Tom Anderson

Hay guise,

I'm wondering if JCR, aka the Content Repository API for Java, could be
useful to me. I've been reading about it - including reading the spec -
but i'm still not really sure. Does anyone have any hands-on experience
with it?

The thing is, my use is not managing anything really document-like, but
nonetheless, it looks like the fundamentals of JCR are still a good fit. I
want to manage a catalogue for an e-commerce site - there's a
hierarchicaly structure comprising a root category at the top, with
subcategories below it, then at some level products, which in turn contain
SKUs. Each of these has a number of properties, things like name,
description, price, and so on. Possibly also images, although those could
be handled externally to the repository.

Now, on the site itself, this stuff is all read-only, with some simple
access patterns based on individual item lookups plus a few kinds of
query, followed by reading properties from the objects found. Here,
something like JPA or another ORM or OODB approach (or even just POJOs
stored with serialisation) is probably right - there's no need for the
complexity of JPA.

However, the site is only half the story. There's also a backoffice
server, on which the merchandising team prepare the data that gets used on
the site. Lots of this is coming from feeds from other backoffice systems,
but there's still manual editing on top of that. To let people do their
work without stepping on each others' toes, there's a rather CVS-like
model, where there's a single canonical version of the data, plus a set of
workspaces for each user: users can update their workspace from the
canonical version, and commit changes from their workspace into it, with
merge conflicts being detected and handled. The system maintains a history
of these commits, and the versions of the data that existed at each stage.
There's a workflow mechanism associated with this, because edits might
need to be approved by a supervisor before being committed. There's then a
mechanism for pushing the contents of the canonical version to the site -
the two use separate databases, so this is a final checkpoint for making
sure everything is okay.

This is all rather more complicated than the 'objects which live in a box'
model embodied by things like JPA. It does, however, seem like a pretty
good fit for JCR, which has these ideas of workspaces, updates, merges,
and so on. Am i right in thinking that?

The things i'm not sure about are:

- Whether JCR implementations will work well with large numbers (on the
order of 200 000 items all told) of fairly small items, rather than the
smaller numbers of larger items that are more typical.

- Whether there's any kind of ready-made generic metadata-driven UI (free
or commercial) for the backoffice server that i can slap on top of my
repository, rather than having to handwrite it.

- Whether JCR's workspace and versioning model really works the way i want
it to. I don't think JCR has the idea of a canonical workspace, but that's
no big deal - i'd just set one up and designate it canonical by fiat.

- Whether JCR implementations out there actually support the bits of the
workspace and versioning model i need.

- Whether, and how, JCR implementations would let me specify the quite
rigid node type definitions i need. I don't want users to be able to add
random extra properties to items, for instance.

- How i'd handle the push from the backoffice JCR store to the site's JPA
or whatever, or whether i even would - should i just use JCR at the front
too? The API is much less fun to work with than JPA, but the look of it.

- How i'd integrate workflow with the editing and pushing process.

- That there are N other showstopping problems i haven't even thought of!

Any thoughts welcome!

tom

PS Bonus for those who read to the end - an article about JCR that posits
that it's the kind of data architecture Sir Thomas More would have wanted:

http://www.artima.com/lejava/articles/contentrepository.html

Certainly, when i'm choosing software, the hypothetical opinions of
Renaissance intellectuals weigh heavily in my evaluation. For example, we
chose CentOS Linux as our preferred development platform because we
thought Francis Bacon would have been cool with it.
 
T

Tom Anderson


Good work, Jeff! The wikipedia article is also a reasonable one-minute
summary, and has links to the specs:

http://en.wikipedia.org/wiki/Content_repository_API_for_Java

Apologies if i've baffled anyone - although JCR is a few years old now,
and has a number of implementations, it does seem to have kept under the
radar. I only heard of it a couple of weeks ago, and rapidly became quite
interested. This might be a subject where cljp is not the best place to
ask (gasp!) - maybe Stack Overflow or theserverside.com or somewhere
similarlyly enterprisey would be better.

Since posting my question, one thing i've discovered is that there are a
couple of ORMs which sit on top of JPA:

http://jackrabbit.apache.org/jackrabbit-ocm.html
http://code.google.com/p/jcrom/

Although it's not entirely clear to me that this is a good or necessary
thing. And it doesn't really address my particular problem.

tom
 
D

Daniel Pitts

Tom said:
Good work, Jeff! The wikipedia article is also a reasonable one-minute
summary, and has links to the specs:

http://en.wikipedia.org/wiki/Content_repository_API_for_Java

Apologies if i've baffled anyone - although JCR is a few years old now,
and has a number of implementations, it does seem to have kept under the
radar. I only heard of it a couple of weeks ago, and rapidly became
quite interested. This might be a subject where cljp is not the best
place to ask (gasp!) - maybe Stack Overflow or theserverside.com or
somewhere similarlyly enterprisey would be better.

Since posting my question, one thing i've discovered is that there are a
couple of ORMs which sit on top of JPA:

http://jackrabbit.apache.org/jackrabbit-ocm.html
http://code.google.com/p/jcrom/

Although it's not entirely clear to me that this is a good or necessary
thing. And it doesn't really address my particular problem.

tom
If it is any help to you, I ask basically the same question last week.
The primary responses I got were "These commercial CMS use it."

Nobody seems to be able to comment on the ease of use or quality of
implementation.
 
M

Michael Marth

Hi Tom,

I'm wondering ifJCR, aka the Content Repository API for Java, could be
useful to me. I've been reading about it - including reading the spec -
but i'm still not really sure. Does anyone have any hands-on experience
with it?

Yes, me and my colleagues (I work for Day Software that contributed a
lot to the JCR spec and the Jackrabbit reference impl)

The thing is, my use is not managing anything really document-like, but

There are a number of projects using JCR for non-content mgmt stuff. I
compiled some here:

http://dev.day.com/microsling/content/blogs/main/jcrisformore.html
nonetheless, it looks like the fundamentals ofJCRare still a good fit. I
want to manage a catalogue for an e-commerce site - there's a
hierarchicaly structure comprising a root category at the top, with
subcategories below it, then at some level products, which in turn contain
SKUs. Each of these has a number of properties, things like name,
description, price, and so on. Possibly also images, although those could
be handled externally to the repository.

You might also want to add the images there. JCR repos handle binary
and non-binary content equally well.
Now, on the site itself, this stuff is all read-only, with some simple
access patterns based on individual item lookups plus a few kinds of
query, followed by reading properties from the objects found. Here,
something like JPA or another ORM or OODB approach (or even just POJOs
stored with serialisation) is probably right - there's no need for the
complexity of JPA.

My advise is to stay from obkect-content-mapping until you think that
you really need it. For JCR repos and Java impedence mismatch between
data and OOP is nowhere as big as it is for relational data. You might
get away by just working on the nodes directly.

However, the site is only half the story. There's also a backoffice
server, on which the merchandising team prepare the data that gets used on
the site. Lots of this is coming from feeds from other backoffice systems,
but there's still manual editing on top of that. To let people do their
work without stepping on each others' toes, there's a rather CVS-like
model, where there's a single canonical version of the data, plus a set of
workspaces for each user: users can update their workspace from the
canonical version, and commit changes from their workspace into it, with
merge conflicts being detected and handled. The system maintains a history
of these commits, and the versions of the data that existed at each stage.
There's a workflow mechanism associated with this, because edits might
need to be approved by a supervisor before being committed. There's then a
mechanism for pushing the contents of the canonical version to the site -
the two use separate databases, so this is a final checkpoint for making
sure everything is okay.

While you can use workspaces for that another approach might be to
simply use different branches in one workspace (and set the ACLs such
that each user can only see his own branch). You would have to do the
merge yourself, but the whole setup would be simpler to handle in my
opinion.
btw: "committing" changes could be done by setting an attribute on the
user's node and having a JCR Observer listen for such changes and
distribute them to the other workspaces.

This is all rather more complicated than the 'objects which live in a box'
model embodied by things like JPA. It does, however, seem like a pretty
good fit forJCR, which has these ideas of workspaces, updates, merges,
and so on. Am i right in thinking that?

The things i'm not sure about are:

- WhetherJCRimplementations will work well with large numbers (on the
order of 200 000 items all told) of fairly small items, rather than the
smaller numbers of larger items that are more typical.

This should be no problem.
The only issue I am aware of is that the Jackrabbit implementation
does not like too many child nodes (more than, say, 1000) in the same
hierarchy level. Usually, one can design his node hierarchy to avoid
this problem.
- Whether there's any kind of ready-made generic metadata-driven UI (free
or commercial) for the backoffice server that i can slap on top of my
repository, rather than having to handwrite it.

Not sure about "backoffice server", but there are a number of UIs for
JCRs:

http://dev.day.com/microsling/content/blogs/main/jcrtools.html
- WhetherJCR'sworkspace and versioning model really works the way i want
it to. I don't thinkJCRhas the idea of a canonical workspace, but that's
no big deal - i'd just set one up and designate it canonical by fiat.

see above
- WhetherJCRimplementations out there actually support the bits of the
workspace and versioning model i need.

Jackrabbit fully implements the spec, so if the model itself is right
for you then Jackrbbit will have you covered.
- Whether, and how,JCRimplementations would let me specify the quite
rigid node type definitions i need. I don't want users to be able to add
random extra properties to items, for instance.

Yes, the node types can be defined as rigid as you like.
In CRX (our commercial version of Jackrabbit) you'll find a user
interface to design node types.

CRX download: http://www.day.com/content/day/en/products/crx/download/registration.html
- How i'd handle the push from the backofficeJCRstore to the site's JPA
or whatever, or whether i even would - should i just useJCRat the front
too? The API is much less fun to work with than JPA, but the look of it.

Not sure if I get the question 100% right, but have a look at JCR
listeners. The catch events when sthg in the repository changes and
could be used to push data from the repo to some outside system.
- How i'd integrate workflow with the editing and pushing process.

You will have to integrate or develop your own workflow tool, workflow
is not part of JCR. Having said that, workflow state can easily be
handled by setting property values on nodes.
- That there are N other showstopping problems i haven't even thought of!

Don't know any.

In case you hav not looked at it, yet, check out Apache Sling, which
is an application framework to develop on top of JCRs.

HTH
Michael
 
T

Tom Anderson

Yes, me and my colleagues (I work for Day Software that contributed a
lot to the JCR spec and the Jackrabbit reference impl)

Excellent! Thanks for your response - apologies for not replying sooner,
but i wanted to go off and get my hands dirty first. I spent yesterday
playing with Jackrabbit, and i like it so far.
There are a number of projects using JCR for non-content mgmt stuff. I
compiled some here:

http://dev.day.com/microsling/content/blogs/main/jcrisformore.html

Several of those are still documentish - images, maven artifacts and
emails all look like kinds of documents to me. Some of the other things
could also be documenty, but it's impossible to tell from a quick look at
their websites. Still, from what i understand of JCR, there seems to be
absolutely no barrier to using it for smaller, more structured objects.
You might also want to add the images there. JCR repos handle binary and
non-binary content equally well.

Absolutely! And managing that inside the catalogue would definitely be
better from a business user point of view. The only reason i hesitate is
that i probably don't want to put the image data in a repository of any
sort in production, i want to push it out to the webservers at the front
end, so putting it in a repository at the back end makes things more
complicated for me. Simpler for my users, but who cares about them, eh?
Now, on the site itself, this stuff is all read-only, with some simple
access patterns based on individual item lookups plus a few kinds of
query, followed by reading properties from the objects found. Here,
something like JPA or another ORM or OODB approach (or even just POJOs
stored with serialisation) is probably right - there's no need for the
complexity of JPA [i meant JCR - tom].

My advise is to stay from obkect-content-mapping until you think that
you really need it. For JCR repos and Java impedence mismatch between
data and OOP is nowhere as big as it is for relational data. You might
get away by just working on the nodes directly.

I am very dubious indeed about this. I like being able to say:

int listPrice = product.getListPrice();

Rather than:

int listPrice = (int)product.getProperty("listPrice").getLong();

And:

int currentPrice = product.getPrice();

Rather than:

int currentPrice;
boolean isOnSale = product.getProperty("onSale").getBoolean();
if (isOnSale) currentPrice = (int)product.getProperty("salePrice").getLong();
else currentPrice = (int)product.getProperty("listPrice").getLong();

(my point here is that you can't write methods in JCR nodes, so you have
to put your business logic outside the objects where it naturally belongs)

Domain objects need to be real objects. JCR nodes don't cut the mustard.
While you can use workspaces for that another approach might be to
simply use different branches in one workspace (and set the ACLs such
that each user can only see his own branch). You would have to do the
merge yourself, but the whole setup would be simpler to handle in my
opinion.

Could you expand on why you think that? Writing the merge code myself is
something i'd like to avoid, as it could be very complicated.
btw: "committing" changes could be done by setting an attribute on the
user's node and having a JCR Observer listen for such changes and
distribute them to the other workspaces.

Ah, interesting. Again, i assume this means doing the merge logic myself?
This should be no problem. The only issue I am aware of is that the
Jackrabbit implementation does not like too many child nodes (more than,
say, 1000) in the same hierarchy level. Usually, one can design his node
hierarchy to avoid this problem.

Yes, that shouldn't be a problem. Categories may have hundreds of
products, and products may have dozens of SKUs, but there shouldn't be
1000 of anything at any level. Indeed, JPA won't like that any more than
JCR will!
Not sure about "backoffice server", but there are a number of UIs for
JCRs:

http://dev.day.com/microsling/content/blogs/main/jcrtools.html

Excellent, thank you, this is exactly what i need to look into.
Jackrabbit fully implements the spec, so if the model itself is right
for you then Jackrbbit will have you covered.

But then there's the question of whether Jackrabbit is suitable for
production use. Presumably, if it was, there wouldn't be things like CRX,
for which you have to pay some nontrivial dollar.
Yes, the node types can be defined as rigid as you like.
Ideal.

In CRX (our commercial version of Jackrabbit) you'll find a user
interface to design node types.

Cool - could be useful for experimentation. In my current master plan, i
need to drive node type definition from metadata elsewhere in the system,
though, so a programmatic interface is what i need. The JSR 283 node type
definition stuff looks perfect, and as soon as JCR 2 is widely supported,
will be portable.
Not sure if I get the question 100% right, but have a look at JCR
listeners. The catch events when sthg in the repository changes and
could be used to push data from the repo to some outside system.

Yes, i'd come to this conclusion too. Attach a listener, do a merge,
convert the events into changes to the foreign system. The only trouble is
that, if i've read the spec right, events only get delivered *after* the
repository changes are committed, ie the transaction ends. That means that
if the changing of the foreign system fails, i'm in a pickle, because it's
to late to roll back the changes to the repository. I guess i could
maintain a pair of branches/workspaces for the foreign system, one of the
goal state, and one of the current state, so i can manually roll back the
JCR repo if the update fails. Or do something else - the manual merge
starts to look more attractive at this point.
You will have to integrate or develop your own workflow tool, workflow
is not part of JCR.

Yes, that's what i expected.
Having said that, workflow state can easily be handled by setting
property values on nodes.

Aha, good point. I need to learn to think more in terms of how i can make
the nodes work for me.
Don't know any.

In case you hav not looked at it, yet, check out Apache Sling, which is
an application framework to develop on top of JCRs.

I'd come across it, but not looked into it. I will do, but i'm not keen to
commit to a web framework at this point - i'd like to have an
infrastructure solution that leaves the front-end developers with their
choices open.

tom
 
A

Arne Vajhøj

If it is any help to you, I ask basically the same question last week.
The primary responses I got were "These commercial CMS use it."

Nobody seems to be able to comment on the ease of use or quality of
implementation.

I thought it was just me arguing that the commercial usage somehow
indicated quality. Considering that primary response is maybe
giving it too much credit.

The reason there were not more answers is probably that as
Tom states - it is flying below the radar screen.

Arne
 
A

Arne Vajhøj

I'm wondering if JCR, aka the Content Repository API for Java, could be
useful to me. I've been reading about it - including reading the spec -
but i'm still not really sure. Does anyone have any hands-on experience
with it?

No hands on.
The thing is, my use is not managing anything really document-like, but
nonetheless, it looks like the fundamentals of JCR are still a good fit.
I want to manage a catalogue for an e-commerce site - there's a
hierarchicaly structure comprising a root category at the top, with
subcategories below it, then at some level products, which in turn
contain SKUs. Each of these has a number of properties, things like
name, description, price, and so on. Possibly also images, although
those could be handled externally to the repository.

I would seriously consider JCR if I have data that are
fundamentally tree structured, because handling that via
JCR is a lot easier than SQL, ORM etc..
Now, on the site itself, this stuff is all read-only, with some simple
access patterns based on individual item lookups plus a few kinds of
query, followed by reading properties from the objects found. Here,
something like JPA or another ORM or OODB approach (or even just POJOs
stored with serialisation) is probably right - there's no need for the
complexity of JPA.

I don't consider JPA particular complex.

Arne
 
A

Arne Vajhøj

This might be a subject where cljp is not the best
place to ask (gasp!) - maybe Stack Overflow or theserverside.com or
somewhere similarlyly enterprisey would be better.

Even though the SE/EE ratio is much higher here than
among Java professionals in general, then there are
several EE guys around here.

Arne
 
T

Tom Anderson

No hands on.


I would seriously consider JCR if I have data that are fundamentally
tree structured, because handling that via JCR is a lot easier than SQL,
ORM etc..

That's exactly what i thought too.
I don't consider JPA particular complex.

Quite right - that last JPA was a typo, and should have been JCR. My
mistake.

tom
 
M

Michael Marth

Tom, sorry for the belated reply - xmas hols.

Several of those are still documentish - images, maven artifacts and
emails all look like kinds of documents to me. Some of the other things
could also be documenty, but it's impossible to tell from a quick look at
their websites. Still, from what i understand of JCR, there seems to be
absolutely no barrier to using it for smaller, more structured objects.

Don't see a barrier here either.
Absolutely! And managing that inside the catalogue would definitely be
better from a business user point of view. The only reason i hesitate is
that i probably don't want to put the image data in a repository of any
sort in production, i want to push it out to the webservers at the front
end, so putting it in a repository at the back end makes things more
complicated for me. Simpler for my users, but who cares about them, eh?

In our CMS we keep the images etc in the repository, but put a reverse
proxy server in front that caches them for production load.


I am very dubious indeed about this. I like being able to say:

int listPrice = product.getListPrice();

Rather than:

int listPrice = (int)product.getProperty("listPrice").getLong();

And:

int currentPrice = product.getPrice();

Rather than:

int currentPrice;
boolean isOnSale = product.getProperty("onSale").getBoolean();
if (isOnSale) currentPrice = (int)product.getProperty("salePrice").getLong();
else currentPrice = (int)product.getProperty("listPrice").getLong();

(my point here is that you can't write methods in JCR nodes, so you have
to put your business logic outside the objects where it naturally belongs)

Domain objects need to be real objects. JCR nodes don't cut the mustard.

OK, I guess this is a matter of taste. I don't use it, but from hearsay
[1] JCROM[2] seems to be good:

[1] http://dev.day.com/discussion-groups/content/lists/jackrabbit-users.html?q=ocm
[2] http://code.google.com/p/jcrom/wiki/TwoMinuteIntro

Could you expand on why you think that? Writing the merge code myself is
something i'd like to avoid, as it could be very complicated.

Well, dealing with concurrent changes on copies of the same data is a
pain in any case. So it would be simpler if you could partition your
data into branches such that authors do not overwrite each others
edits. However, if that is a requirement, workspaces would be the way
to go.
Ah, interesting. Again, i assume this means doing the merge logic myself?

Even if you use workspaces it might be beneficial to use observers.
For example a node could have a property "approved". If the value is
true an observer gets triggered that merges into the main workspace.
But then there's the question of whether Jackrabbit is suitable for
production use. Presumably, if it was, there wouldn't be things like CRX,
for which you have to pay some nontrivial dollar.

Jackrabbit is suitable for production use, no doubt. (see e.g. the
users in [3])
With CRX you get a better persistence manager[4], admin tools and
commercial support.

[3] http://wiki.apache.org/jackrabbit/JcrLinks
[4]http://dev.day.com/discussion-groups/content/lists/jackrabbit-
users.html?q=ocm


Cool - could be useful for experimentation. In my current master plan, i
need to drive node type definition from metadata elsewhere in the system,
though, so a programmatic interface is what i need. The JSR 283 node type
definition stuff looks perfect, and as soon as JCR 2 is widely supported,
will be portable.

Jackrabbit 2.0 supports a lot (all?) of JSR283 already

Yes, i'd come to this conclusion too. Attach a listener, do a merge,
convert the events into changes to the foreign system. The only trouble is
that, if i've read the spec right, events only get delivered *after* the
repository changes are committed, ie the transaction ends. That means that
if the changing of the foreign system fails, i'm in a pickle, because it's
to late to roll back the changes to the repository. I guess i could
maintain a pair of branches/workspaces for the foreign system, one of the
goal state, and one of the current state, so i can manually roll back the
JCR repo if the update fails. Or do something else - the manual merge
starts to look more attractive at this point.

You could make all your nodes versionable. For rollback you could then
restore the last version.

Aha, good point. I need to learn to think more in terms of how i can make
the nodes work for me.

One way to do this could be to create a "mixin" "mix:myWorkflowItem"
that contains the properties you need to steer the workflow state. All
your nodes would then have this mixin.



One more thing: many JCR people hang out on the Jackrabbit users list.
You will probably get good answers there as well.

Cheers
Michael
 
T

Tom Anderson

Tom, sorry for the belated reply - xmas hols.

No worries! Thanks for your continued help.
Even if you use workspaces it might be beneficial to use observers. For
example a node could have a property "approved". If the value is true an
observer gets triggered that merges into the main workspace.

Ah, interesting idea. So you could commit changes piecemeal, without
having to explicitly trigger a merge.
But then there's the question of whether Jackrabbit is suitable for
production use. Presumably, if it was, there wouldn't be things like
CRX, for which you have to pay some nontrivial dollar.

Jackrabbit is suitable for production use, no doubt. (see e.g. the users
in [3]) With CRX you get a better persistence manager[4], admin tools
and commercial support.

Okay, makes sense.
Jackrabbit 2.0 supports a lot (all?) of JSR283 already

Which is how i'm writing the code! Although i note that Jackrabbit puts a
lot of the new JSR283 stuff in the org.apache.jackrabbit.api.jsr283
package, which is presumably not where it will be when it's finalised; as
such, code i write now isn't compatible with real JSR283, although the
changes required to make it so will be trivial.
You could make all your nodes versionable. For rollback you could then
restore the last version.

Aha! Of course! That makes it rather straightforward, then.
One way to do this could be to create a "mixin" "mix:myWorkflowItem"
that contains the properties you need to steer the workflow state. All
your nodes would then have this mixin.

Could do. I like the idea of workflow being orthogonal to the content of a
project - rather than nodes being in various states, the whole project is
either in progess, pending approval, approved, etc. That would require
some way to prevent it being edited. But again, this could be a question
of having multiple workspaces with different access control rules -
editors edit in their own workspace, then submit for approval by
triggering a copy into some kind of holding bin which they don't have
write access to, then approval triggers a merge into the mainline. Or
maybe it should just be an access control thing. I really just need to sit
down and actually try some of this!
One more thing: many JCR people hang out on the Jackrabbit users list.
You will probably get good answers there as well.

Okay, if i have more questions, that's where i'll ask.

Thanks again,
tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,996
Messages
2,570,238
Members
46,826
Latest member
robinsontor

Latest Threads

Top