database advice

C

ccc31807

I'm building a database app (in Perl) and have a database question.

Ordinarily, when you have a many to many relationship, such as patient/
diagnosis, you deal with it by having a conversion table, where (for
example) patients are not unique, diagnoses are not unique, but a
patient/diagnosis combination is unique.

I my app, I have this situation, but this approach seems to cumbersome
and I quite frankly think it will take too much of an effort for the
little bit of utility that I need, because I don't need to access
diagnoses, just display them, and only display them on infrequent
occasions. I've been using Postgres and have been spoiled by the array
datatype that PG has. If I were using PG, I'd use the array datatype.
(One problem is that the diagnoses are free form and it's just too
difficult to shoe-horn a multiplicity of diagnoses into a manageable
number.)

My client insists on using MySQL. I'm about to create a table, which
has a column 'diagnoses' and a datatype of text. I will append values
to this colum in a CSV format, such as (e.g.) "broken finger|headache|
sprained ankle|influenza". I can then access this cell as a string,
split on the pipe, and manipulate the resulting list as an array.

Has anyone done this? Does it work? What are the disadvantages?
Comments in general?

Thanks, CC.
 
B

Bigus

My client insists on using MySQL. I'm about to create a table, which
has a column 'diagnoses' and a datatype of text. I will append values
to this colum in a CSV format, such as (e.g.) "broken finger|headache|
sprained ankle|influenza". I can then access this cell as a string,
split on the pipe, and manipulate the resulting list as an array.

My only comment is an alternative suggestion to the above bit. That is,
rather than storing as a string and splitting up etc, you could freeze the
array and store in a binary blob.

e.g. something like this:

use Storable qw(freeze thaw);
my @array = (1,2,3,4);
my $frozenarray = freeze(@array);
[code to store $frozenarray in the blob field]

then after you've retrieved the frozen array from the database and back into
$frozenarray

my $thawedarrayref = thaw($frozenarray);
my @array = @$thawedarrayref;

at least that way you are preserving your array structure and I guess a blob
is more space efficient than a text field?

Regards
Bigus
 
K

Klaus

I'm building a database app (in Perl) and ...

[ snip non-perl related stuff ]
... I will append values
to this colum in a CSV format, such as (e.g.) "broken finger|headache|
sprained ankle|influenza". I can then access this cell as a string,
split on the pipe, and manipulate the resulting list as an array.

Has anyone done this? Does it work? What are the disadvantages?
Comments in general?

perldoc -f split
 
S

smallpond

I'm building a database app (in Perl) and have a database question.

Ordinarily, when you have a many to many relationship, such as patient/
diagnosis, you deal with it by having a conversion table, where (for
example) patients are not unique, diagnoses are not unique, but a
patient/diagnosis combination is unique.

Wow. A patient can never get the same injury or illness
twice? That's news to me. You might want to ask a few
folks with chronic illnesses about that. I know people
who have been in the emergency room twice in the same
day for the same thing.

Why do you care if it's unique? Surely you aren't making
the beginner mistake of not creating unique keys for your
DB rows are you?


My client insists on using MySQL. I'm about to create a table, which
has a column 'diagnoses' and a datatype of text. I will append values
to this colum in a CSV format, such as (e.g.) "broken finger|headache|
sprained ankle|influenza". I can then access this cell as a string,
split on the pipe, and manipulate the resulting list as an array.

Has anyone done this? Does it work? What are the disadvantages?
Comments in general?

Thanks, CC.


What happens when they enter "sprained ankle"?
Is that different from "sprained ankle"?
(the first one has two spaces)

Do you ever expect to select on diagnosis?
fuhgeddaboutit with this scheme.
Searching every record for a matching text string
is much slower than selecting just records with a
matching field.

How about three tables:
Diagnoses - row created every time a new value is used
Patient - created by whatever form creates a patient
Ailments - row created for each patient/diagnosis event

The poor guy in your example would get one Patients record
and 4 Ailments rows linked to existing or newly
created Diagnoses. The key for the Diagnoses would be the
standard insurance company billing codes. They don't
want to see "sprained ankle" they want something
like "3772".
 
C

ccc31807

It sounds trivial - I would simply spend the effort to "do it right"
(which in this case means the three tables you mention)
rather than expend effort in trying to find a shortcut.

I wish I could show you the data. The diagnoses are sometimes codes,
but more often transcribed from the physician notes. There possibly
could be more(!) discrete, individual diagnoses than patients!

In this case, IMO, 'doing it right' means doing it simply and cheaply
rather than following the rules.

Thoughts? Thanks for your input. CC
 
C

ccc31807

Wow.  A patient can never get the same injury or illness
twice?  That's news to me.  You might want to ask a few
folks with chronic illnesses about that.  I know people
who have been in the emergency room twice in the same
day for the same thing.

Actually, these are 'accounts.' The unique datum is the account
balance, and that's tied to an an admission rather than a diagnosis or
an admit date. In fact, sometimes the data comes in with ALL(!) of the
data identical except for a diagnosis, so that the patient name,
hospital name, hospital account ID, etc., occupy multiple rows in the
data file, the only difference being the diagnosis.
Why do you care if it's unique?  Surely you aren't making
the beginner mistake of not creating unique keys for your
DB rows are you?

I very frequently create tables without unique keys in my job, and I
don't consider that a beginner mistake. If you're processing data and
using a database to help, it doesn't matter if each row is distinct,
frex, if you're counting values, you expect many of the rows to be
identical so you can count them. Like anything else, how you handle
the data depends on what you want to do with it.

What happens when they enter "sprained  ankle"?
Is that different from "sprained ankle"?
(the first one has two spaces)

Doesn't matter. The diagnosis is merely informational, and very rarely
accessed. I don't programatically manipulate this data, just display
it when requested. Actually, the app would probably work better
without the diagnosis, but the client wants it available to look at.
Do you ever expect to select on diagnosis?
Never.

fuhgeddaboutit with this scheme.
Searching every record for a matching text string
is much slower than selecting just records with a
matching field.

As I said, they just want to look at it. I read records into a
hashref, so I can do something like this: $hashref->{$acctno}
{$diagnosis}.
How about three tables:
  Diagnoses - row created every time a new value is used
  Patient - created by whatever form creates a patient
  Ailments - row created for each patient/diagnosis event

Yeah, this is the textbook solution. But it's much too much effort for
the reward, since I would never actually need to query the Ailments
table for anything -- except if I implemented the textbook solution.
As I said, I have used the PG array datatype to 'violate' first normal
form, such as by having columns like 'office hours' or 'academic
degrees.' Being able to violate first normal form can be quite helpful
on those occasions when you need to.
The poor guy in your example would get one Patients record
and 4 Ailments rows linked to existing or newly
created Diagnoses.  The key for the Diagnoses would be the
standard insurance company billing codes.  They don't
want to see "sprained ankle" they want something
like "3772".

Not a requirement for my app. The central component is something
called the 'original balance' which is tied to a particular
individual, hospital, and hospital account number. The diagnosis or
diagnoses are irrelevant to the purpose of the app, except they want
the info in the database.

I'm just being lazy and hubristic. ;-)

CC
 
X

Xho Jingleheimerschmidt

ccc31807 said:
Doesn't matter. The diagnosis is merely informational, and very rarely
accessed. I don't programatically manipulate this data, just display
it when requested.

How did the data get there in the first place, if it never undergoes
anything other than retrieval and display?

Xho
 
C

ccc31807

How did the data get there in the first place, if it never undergoes
anything other than retrieval and display?

Xho

Understand that this is client driven. The diagnoses come from the
client's client, and the client's client requires that the diagnoses
be retained as part of the data. To me as the builder of the app, the
diagnoses are irrelevant. To my client, the customer, they are only
marginally useful. To the client's client, they form an essential part
of a patient record.

Anyway, I've made the decision. I'm concating all the diagnosis data
into a database column of type text, and I'm not even going to worry
about a nice display. The only concession I'm going to make is to
provide a key word search function, which I can easily do with like:
$search = said:
execute($search);

CC
 
K

Klaus

Anyway, I've made the decision. I'm concating all the diagnosis data
into a database column of type text, and I'm not even going to worry
about a nice display. The only concession I'm going to make is to
provide a key word search function, which I can easily do with like:

I don't think that the above SQL is syntactically correct.

Anyway, with Perl 5.10 you could, for example, do the following:

use strict;
use warnings;
use 5.010;
....
my $search = <>;

$sth = $dbh->prepare("SELECT diagnosis FROM account");
$sth->execute();

while (my ($diag) = $sth->fetchrow_array) {
if ($search ~~ [split /\|/, $diag]) {
say "Found $diag";
}
}
 
S

smallpond

Anyway, I've made the decision. I'm concating all the diagnosis data
into a database column of type text, and I'm not even going to worry
about a nice display. The only concession I'm going to make is to
provide a key word search function, which I can easily do with like:
$search = <>;
$sth->prepare("select * from account with diagnosis like %?%"); $sth-

I don't think that the above SQL is syntactically correct.

Anyway, with Perl 5.10 you could, for example, do the following:

use strict;
use warnings;
use 5.010;
...
my $search = <>;

$sth = $dbh->prepare("SELECT diagnosis FROM account");
$sth->execute();

while (my ($diag) = $sth->fetchrow_array) {
    if ($search ~~ [split /\|/, $diag]) {
        say "Found $diag";
    }

}

That doesn't give you the rest of the row, just
whether any rows match.
It also fetches the whole column and matches in perl,
which is slower than doing it in MySQL:

SELECT * FROM account WHERE MATCH (diagnosis)
AGAINST ("+$search" IN BOOLEAN MODE);

would select the matching rows.
 
M

Michael Austin

ccc31807 wrote:
My client insists on using MySQL. I'm about to create a table, which
has a column 'diagnoses' and a datatype of text. I will append values

Hopefully you are following all of the precautions prescribed in your
country's (not evident by your id...) laws pertaining to the storing of
Personal Identifiable Information, Medical Information (in the US, that
would be HIPPA) and the encryption of sensitive data. Hopefully you are
also building security into your app and not trying to bolt it on as an
after-thought. That could be disastrous for both you and your client
should something go wrong and you get hacked.

Not sure I trust MySQL in its current state to provide the security
necessary...
 
C

ccc31807

Hopefully you are following all of the precautions prescribed in your
country's (not evident by your id...)  laws pertaining to the storing of
Personal Identifiable Information, Medical Information (in the US, that
would be HIPPA) and the encryption of sensitive data.  Hopefully you are
also building security into your app and not trying to bolt it on as an
after-thought.  That could be disastrous for both you and your client
should something go wrong and you get hacked.

Thanks for this. The first thing I did was build the security
apparatus. However, the app belongs to the client, and he is the one
who giveth and taketh access. I didn't think to put a HIPPA clause in
my contract, but I'm going to remedy that defect Monday.
Not sure I trust MySQL in its current state to provide the security
necessary...

Agree totally. I think the PG security model is much better that
MySQL, but still, a server is only secure as the sys admin makes it,
and all I've been hired to do is build the app, not administer the
server.

CC
 
C

Charlton Wilbur

cc> In this case, IMO, 'doing it right' means doing it simply and
cc> cheaply rather than following the rules.

cc> Thoughts? Thanks for your input. CC

We've had this discussion before. Once again, you're choosing to do it
using the quick and dirty method -- and in this case, doing it right
isn't even that much more effort -- and I'd bet money that once again,
in six months or a year, you will be in here whining about how your
managers don't understand why your code base is so crappy and hard to
maintain.

Do it right. The amount of time doing it right will add to this project
is measurable in *hours*.

Charlton
 
K

Klaus

Quoth Klaus said:
Anyway, with Perl 5.10 you could, for example, do the following:
use strict;
use warnings;
use 5.010;
...
my $search = <>;
$sth = $dbh->prepare("SELECT diagnosis FROM account");
$sth->execute();
while (my ($diag) = $sth->fetchrow_array) {
    if ($search ~~ [split /\|/, $diag]) {

I would be a little wary of using smartmatch (~~) until 5.10.1 is out.
p5p have decided some of the match semantics were wrong, so they're
changing in an incompatible way.

Ok, until 5.10.1 is out, the "if ($search ~~ [split /\|/, $diag])"
would then be

if (grep {$search eq $_} split /\|/, $diag) {

However, I keep the say command, even under 5.10.0
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,213
Messages
2,571,108
Members
47,700
Latest member
Naveed baloch

Latest Threads

Top