Some questions on Ant

S

subhabangalore

You are not stuck. You have succeeded. Everything is working as it should.



Liebe Gruesse,

Joerg



--

Ich lese meine Emails nicht, replies to Email bleiben also leider

ungelesen.

Dear Group,

Thanks. But I am unable to do these commands like "bin/mallet train-classifier --input data.mallet...".

Regards,
Subhabrata.
 
J

Joerg Meier

Thanks. But I am unable to do these commands like "bin/mallet train-classifier --input data.mallet...".

Why ? Does your keyboard not work ? If you get errors, you should consider
sharing them. From what you posted, everything looks fine and in working
order.

Liebe Gruesse,
Joerg
 
S

subhabangalore

Why ? Does your keyboard not work ? If you get errors, you should consider

sharing them. From what you posted, everything looks fine and in working

order.



Liebe Gruesse,

Joerg



--

Ich lese meine Emails nicht, replies to Email bleiben also leider

ungelesen.

Dear Group,

As you suggested upto that it was fine. Now I wanted to test some data either inbuilt or user defined. So, I started to carry on, but getting stuck, as given here,

Directory of C:\Users\subhabrata\Documents\mallet-2.0.7\bin

05/09/2013 03:00 AM <DIR> .
05/09/2013 03:00 AM <DIR> ..
09/02/2011 12:50 PM 635 classifier2info
09/02/2011 12:50 PM 632 csv2classify
09/02/2011 12:50 PM 631 csv2vectors
09/02/2011 12:50 PM 2,347 mallet
09/02/2011 12:50 PM 2,471 mallet.bat
09/02/2011 12:50 PM 1,771 mallethon
09/02/2011 12:50 PM 63 prepend-license.sh
09/02/2011 12:50 PM 636 svmlight2vectors
09/02/2011 12:50 PM 633 text2classify
09/02/2011 12:50 PM 632 text2vectors
09/02/2011 12:50 PM 636 vectors2classify
09/02/2011 12:50 PM 632 vectors2info
09/02/2011 12:50 PM 631 vectors2topics
09/02/2011 12:50 PM 635 vectors2vectors
14 File(s) 12,985 bytes
2 Dir(s) 413,130,571,776 bytes free

C:\Users\subhabrata\Documents\mallet-2.0.7\bin>mallet
Mallet 2.0 commands:
import-dir load the contents of a directory into mallet instances (one
per file)
import-file load a single file into mallet instances (one per line)
import-svmlight load a single SVMLight format data file into mallet instance
s (one per line)
train-classifier train a classifier from Mallet data files
train-topics train a topic model from Mallet data files
infer-topics use a trained topic model to infer topics for new documents
estimate-topics estimate the probability of new documents given a trained mo
del
hlda train a topic model using Hierarchical LDA
prune remove features based on frequency or information gain
split divide data into testing, training, and validation portions
Include --help with any option for more information

C:\Users\subhabrata\Documents\mallet-2.0.7\bin>mallet train-classifier
java.io.FileNotFoundException: text.vectors (The system cannot find the file spe
cified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(Unknown Source)
at cc.mallet.types.InstanceList.load(InstanceList.java:787)
at cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:25
9)
Exception in thread "main" java.lang.IllegalArgumentException: Couldn't read Ins
tanceList from file text.vectors
at cc.mallet.types.InstanceList.load(InstanceList.java:794)
at cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:25
9)
C:\Users\subhabrata\Documents\mallet-2.0.7\bin>

Regards,
Subhabrata.
 
J

JLP

Le 16/05/2013 10:33, (e-mail address removed) a écrit :
C:\Users\subhabrata\Documents\mallet-2.0.7\bin>mallet
train-classifier java.io.FileNotFoundException: text.vectors (The
system cannot find the file spe cified) at
java.io.FileInputStream.open(Native Method) at
java.io.FileInputStream.<init>(Unknown Source) at
cc.mallet.types.InstanceList.load(InstanceList.java:787) at
cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:25

9)
Exception in thread "main" java.lang.IllegalArgumentException:
Couldn't read Ins tanceList from file text.vectors at
cc.mallet.types.InstanceList.load(InstanceList.java:794) at
cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:25

9)
C:\Users\subhabrata\Documents\mallet-2.0.7\bin>

Regards, Subhabrata.

The message is clear the "mallet" sofware needs a file :
text.vectors that misses and so it fails
 
S

subhabangalore

Le 16/05/2013 10:33, (e-mail address removed) a écrit :











The message is clear the "mallet" sofware needs a file :

text.vectors that misses and so it fails

Dear Group,

Thank you for pointing out. But if you can kindly suggest the command I have to give. Sorry to ask this silly question.

Regards,
Subhabrata.
 
J

Joerg Meier

Thank you for pointing out. But if you can kindly suggest the command I have to give. Sorry to ask this silly question.

Why don't you read what you yourself pasted ?
Include --help with any option for more information

So why didn't you try --help with your train-classifier ?

Your total lack of effort is quickly becoming tiresome and frustrating to
those of us trying to help. Do you really need someone to hold your hand
and explain "Include --help with any option for more information" to you ?

For what it's worth, your next step will require actual data files
compatible with mallet. As the page that you linked to plenty of times now
clearly shows. It even gives example commands. You likely cannot continue
at all without getting or generating those mallet data files.

Liebe Gruesse,
Joerg
 
S

subhabangalore

Why don't you read what you yourself pasted ?






So why didn't you try --help with your train-classifier ?



Your total lack of effort is quickly becoming tiresome and frustrating to

those of us trying to help. Do you really need someone to hold your hand

and explain "Include --help with any option for more information" to you ?



For what it's worth, your next step will require actual data files

compatible with mallet. As the page that you linked to plenty of times now

clearly shows. It even gives example commands. You likely cannot continue

at all without getting or generating those mallet data files.



Liebe Gruesse,

Joerg



--

Ich lese meine Emails nicht, replies to Email bleiben also leider

ungelesen.

Dear Group,

It is true you are upset. But I tried and not getting. I am trying bit more and including my efforts by night. If things come by it would be nice. Let us see.

Regards,
Subhabrata.
 
S

subhabangalore

Dear Group,



It is true you are upset. But I tried and not getting. I am trying bit more and including my efforts by night. If things come by it would be nice. Let us see.



Regards,

Subhabrata.

Dear Group,

The only progress I could do I could write one file "mydata.txt" and convert it into mallet format as "output.mallet", as can be seen in REPORT NO.2.

The exercises I made are copied and pasted as, REPORT NO.1, AND REPORT NO.2, as I could not attach them. Apology if you feel it looks like "junk". Theonly other change you may find I have changed path of Mallet as I was trying to follow the tutorial("http://programminghistorian.org/lessons/topic-modeling-and-mallet") surfed out.

REPORT NO.1:
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\Users\subhabrata>cd\

C:\>cd mallet

C:\mallet>
C:\mallet>ant
Buildfile: C:\mallet\build.xml

init:
[copy] Copying 1 file to C:\mallet\class

compile:
[javac] C:\mallet\build.xml:60: warning: 'includeantruntime' was not set, de
faulting to build.sysclasspath=last; set to false for repeatable builds

BUILD SUCCESSFUL
Total time: 6 seconds

C:\mallet>ant jar
Buildfile: C:\mallet\build.xml

init:
[copy] Copying 1 file to C:\mallet\class

compile:
[javac] C:\mallet\build.xml:60: warning: 'includeantruntime' was not set, de
faulting to build.sysclasspath=last; set to false for repeatable builds

jar:
[jar] Building jar: C:\mallet\dist\mallet.jar

BUILD SUCCESSFUL
Total time: 1 second

C:\mallet>dir
Volume in drive C is Acer
Volume Serial Number is 7A35-B119

Directory of C:\mallet

05/17/2013 01:02 AM <DIR> .
05/17/2013 01:02 AM <DIR> ..
05/17/2013 01:02 AM <DIR> bin
09/02/2011 12:50 PM 2,897 build.xml
05/17/2013 01:02 AM <DIR> class
05/17/2013 01:02 AM <DIR> dist
05/17/2013 01:02 AM <DIR> lib
09/02/2011 12:50 PM 11,918 LICENSE
09/02/2011 12:50 PM 3,566 Makefile
09/02/2011 12:50 PM 2,519 pom.xml
05/17/2013 01:02 AM <DIR> sample-data
05/17/2013 01:02 AM <DIR> src
05/17/2013 01:02 AM <DIR> stoplists
09/02/2011 12:50 PM <DIR> test
4 File(s) 20,900 bytes
10 Dir(s) 415,472,402,432 bytes free

C:\mallet>cd bin

C:\mallet\bin>dir
Volume in drive C is Acer
Volume Serial Number is 7A35-B119

Directory of C:\mallet\bin

05/17/2013 01:02 AM <DIR> .
05/17/2013 01:02 AM <DIR> ..
09/02/2011 12:50 PM 635 classifier2info
09/02/2011 12:50 PM 632 csv2classify
09/02/2011 12:50 PM 631 csv2vectors
09/02/2011 12:50 PM 2,347 mallet
09/02/2011 12:50 PM 2,471 mallet.bat
09/02/2011 12:50 PM 1,771 mallethon
09/02/2011 12:50 PM 63 prepend-license.sh
09/02/2011 12:50 PM 636 svmlight2vectors
09/02/2011 12:50 PM 633 text2classify
09/02/2011 12:50 PM 632 text2vectors
09/02/2011 12:50 PM 636 vectors2classify
09/02/2011 12:50 PM 632 vectors2info
09/02/2011 12:50 PM 631 vectors2topics
09/02/2011 12:50 PM 635 vectors2vectors
14 File(s) 12,985 bytes
2 Dir(s) 415,471,616,000 bytes free

C:\mallet\bin>mallet import-dir --input pathway\to\the\directory\with\the\files
--output tutorial.mallet --keep-sequence --remove-stopwords
Labels =
pathway\to\the\directory\with\the\files
Exception in thread "main" java.lang.IllegalArgumentException: C:\mallet\bin\pat
hway\to\the\directory\with\the\files is not a directory.
at cc.mallet.pipe.iterator.FileIterator.<init>(FileIterator.java:108)
at cc.mallet.pipe.iterator.FileIterator.<init>(FileIterator.java:145)
at cc.mallet.classify.tui.Text2Vectors.main(Text2Vectors.java:312)
C:\mallet\bin>svmlight2vectors
'svmlight2vectors' is not recognized as an internal or external command,
operable program or batch file.

C:\mallet\bin>mallet
Mallet 2.0 commands:
import-dir load the contents of a directory into mallet instances (one
per file)
import-file load a single file into mallet instances (one per line)
import-svmlight load a single SVMLight format data file into mallet instance
s (one per line)
train-classifier train a classifier from Mallet data files
train-topics train a topic model from Mallet data files
infer-topics use a trained topic model to infer topics for new documents
estimate-topics estimate the probability of new documents given a trained mo
del
hlda train a topic model using Hierarchical LDA
prune remove features based on frequency or information gain
split divide data into testing, training, and validation portions
Include --help with any option for more information

C:\mallet\bin>mallet train-classifier --help
A tool for training, saving and printing diagnostics from a classifier on vector
s.
--help TRUE|FALSE
Print this command line option usage information. Give argument of TRUE for l
onger documentation
Default is false
--prefix-code 'JAVA CODE'
Java code you want run before any other interpreted code. Note that the text
is interpreted without modification, so unlike some other Java code options, you
need to include any necessary 'new's when creating objects.
Default is null
--config FILE
Read command option values from a file
Default is null
--report [train|test|validation]:[accuracy|f1:label|confusion|raw]

Default is test:accuracy test:confusion train:accuracy
--trainer ClassifierTrainer constructor
Java code for the constructor used to create a ClassifierTrainer. If no '(' a
ppears, then "new " will be prepended and "Trainer()" will be appended.You may u
se this option mutiple times to compare multiple classifiers.
Default is new NaiveBayesTrainer()
--output-classifier FILENAME
The filename in which to write the classifier after it has been trained.
Default is classifier.mallet
--input FILENAME
The filename from which to read the list of training instances. Use - for std
in.
Default is text.vectors
--training-file FILENAME
Read the training set instance list from this file. If this is specified,the
input file parameter is ignored
Default is text.vectors
--testing-file FILENAME
Read the test set instance list to this file. If this option is specified, the
training-file parameter must be specified and the input-file parameter isigno
red
Default is text.vectors
--validation-file FILENAME
Read the validation set instance list to this file.If this option is specified
, the training-file parameter must be specified and the input-file parameter is
ignored
Default is text.vectors
--training-portion DECIMAL
The fraction of the instances that should be used for training.
Default is 1.0
--validation-portion DECIMAL
The fraction of the instances that should be used for validation.
Default is 0.0
--unlabeled-portion DECIMAL
The fraction of the training instances that should have their labels hidden.
Note that these are taken out of the training-portion, not allocated separately.

Default is 0.0
--random-seed INTEGER
The random seed for randomly selecting a proportion of the instance list for t
raining
Default is 0
--num-trials INTEGER
The number of random train/test splits to perform
Default is 1
--classifier-evaluator CONSTRUCTOR
Java code for constructing a ClassifierEvaluating object
Default is null
--verbosity INTEGER
The level of messages to print: 0 is silent, 8 is most verbose. Levels 0-8 cor
respond to the java.logger predefined levels off, severe, warning, info, config,
fine, finer, finest, all. The default value is taken from the mallet logging.pr
operties file, which currently defaults to INFO level (3)
Default is -1
--noOverwriteProgressMessages true|false
Suppress writing-in-place on terminal for progess messages - repetitive messag
es of which only the latest is generally of interest
Default is false
--cross-validation INT
The number of folds for cross-validation (DEFAULT=0).
Default is 0
C:\mallet\bin>mallet train-classifier --training-file text.vectors
java.io.FileNotFoundException: text.vectors (The system cannot find the file spe
cified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(Unknown Source)
at cc.mallet.types.InstanceList.load(InstanceList.java:787)
at cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:26
2)
Exception in thread "main" java.lang.IllegalArgumentException: Couldn't read Ins
tanceList from file text.vectors
at cc.mallet.types.InstanceList.load(InstanceList.java:794)
at cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:26
2)
C:\mallet\bin>

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
REPORT NO.2:
The number of iterations between printing a brief summary of the topics so far
..
Default is 50
--output-model-interval INTEGER
The number of iterations between writing the model (and its Gibbs sampling sta
te) to a binary file. You must also set the --output-model to use this option,
whose argument will be the prefix of the filenames.
Default is 0
--output-state-interval INTEGER
The number of iterations between writing the sampling state to a text file. Y
ou must also set the --output-state to use this option, whose argument willbe t
he prefix of the filenames.
Default is 0
--optimize-interval INTEGER
The number of iterations between reestimating dirichlet hyperparameters.
Default is 0
--optimize-burn-in INTEGER
The number of iterations to run before first estimating dirichlet hyperparamet
ers.
Default is 200
--use-symmetric-alpha true|false
Only optimize the concentration parameter of the prior over document-topic dis
tributions. This may reduce the number of very small, poorly estimated topics, b
ut may disperse common words over several topics.
Default is false
--use-ngrams true|false
Rather than using LDA, use Topical-N-Grams, which models phrases.
Default is false
--use-pam true|false
Rather than using LDA, use Pachinko Allocation Model, which models topical cor
relations.You cannot do this and also --use-ngrams.
Default is false
--alpha DECIMAL
Alpha parameter: smoothing over topic distribution.
Default is 50.0
--beta DECIMAL
Beta parameter: smoothing over unigram distribution.
Default is 0.01
--gamma DECIMAL
Gamma parameter: smoothing over bigram distribution
Default is 0.01
--delta DECIMAL
Delta parameter: smoothing over choice of unigram/bigram
Default is 0.03
--delta1 DECIMAL
Topic N-gram smoothing parameter
Default is 0.2
--delta2 DECIMAL
Topic N-gram smoothing parameter
Default is 1000.0
--pam-num-supertopics INTEGER
When using the Pachinko Allocation Model (PAM) set the number of supertopics.
Typically this is about half the number of subtopics, although more may help.
Default is 10
--pam-num-subtopics INTEGER
When using the Pachinko Allocation Model (PAM) set the number of subtopics.
Default is 20
Exception in thread "main" java.lang.IllegalArgumentException: Unrecognizedopti
on 2: --keep-sequence
at cc.mallet.util.CommandOption$List.process(CommandOption.java:344)
at cc.mallet.util.CommandOption.process(CommandOption.java:146)
at cc.mallet.topics.tui.Vectors2Topics.main(Vectors2Topics.java:200)
C:\mallet\bin>dir
Volume in drive C is Acer
Volume Serial Number is 7A35-B119

Directory of C:\mallet\bin

05/17/2013 01:20 AM <DIR> .
05/17/2013 01:20 AM <DIR> ..
09/02/2011 12:50 PM 635 classifier2info
09/02/2011 12:50 PM 632 csv2classify
09/02/2011 12:50 PM 631 csv2vectors
09/02/2011 12:50 PM 2,347 mallet
09/02/2011 12:50 PM 2,471 mallet.bat
09/02/2011 12:50 PM 1,771 mallethon
05/17/2013 01:19 AM 85 mydata.txt
05/17/2013 01:20 AM 7,379 output.mallet
09/02/2011 12:50 PM 63 prepend-license.sh
09/02/2011 12:50 PM 636 svmlight2vectors
09/02/2011 12:50 PM 633 text2classify
09/02/2011 12:50 PM 632 text2vectors
09/02/2011 12:50 PM 636 vectors2classify
09/02/2011 12:50 PM 632 vectors2info
09/02/2011 12:50 PM 631 vectors2topics
09/02/2011 12:50 PM 635 vectors2vectors
16 File(s) 20,449 bytes
2 Dir(s) 415,176,773,632 bytes free

C:\mallet\bin>text2vectors --input output.mallet --output x1.vectors
'text2vectors' is not recognized as an internal or external command,
operable program or batch file.

C:\mallet\bin>text2vectors
'text2vectors' is not recognized as an internal or external command,
operable program or batch file.

C:\mallet\bin>mallet
Mallet 2.0 commands:
import-dir load the contents of a directory into mallet instances (one
per file)
import-file load a single file into mallet instances (one per line)
import-svmlight load a single SVMLight format data file into mallet instance
s (one per line)
train-classifier train a classifier from Mallet data files
train-topics train a topic model from Mallet data files
infer-topics use a trained topic model to infer topics for new documents
estimate-topics estimate the probability of new documents given a trained mo
del
hlda train a topic model using Hierarchical LDA
prune remove features based on frequency or information gain
split divide data into testing, training, and validation portions
Include --help with any option for more information

C:\mallet\bin>mallet train-topics --input output.mallet --output x1.mallet
A tool for estimating, saving and printing diagnostics for topic models, such as
LDA.
--help TRUE|FALSE
Print this command line option usage information. Give argument of TRUE for l
onger documentation
Default is false
--prefix-code 'JAVA CODE'
Java code you want run before any other interpreted code. Note that the text
is interpreted without modification, so unlike some other Java code options, you
need to include any necessary 'new's when creating objects.
Default is null
--config FILE
Read command option values from a file
Default is null
--input FILENAME
The filename from which to read the list of training instances. Use - for std
in. The instances must be FeatureSequence or FeatureSequenceWithBigrams, not Fe
atureVector
Default is null
--language-inputs FILENAME [FILENAME ...]
Filenames for polylingual topic model. Each language should have its own file,
with the same number of instances in each file. If a document is missing in one
language, there should be an empty instance.
Default is (null)
--testing FILENAME
The filename from which to read the list of instances for empirical likelihood
calculation. Use - for stdin. The instances must be FeatureSequence or Featur
eSequenceWithBigrams, not FeatureVector
Default is null
--output-model FILENAME
The filename in which to write the binary topic model at the end of the iterat
ions. By default this is null, indicating that no file will be written.
Default is null
--input-model FILENAME
The filename from which to read the binary topic model to which the --input wi
ll be appended, allowing incremental training. By default this is null, indicat
ing that no file will be read.
Default is null
--inferencer-filename FILENAME
A topic inferencer applies a previously trained topic model to new documents.
By default this is null, indicating that no file will be written.
Default is null
--evaluator-filename FILENAME
A held-out likelihood evaluator for new documents. By default this is null, i
ndicating that no file will be written.
Default is null
--output-state FILENAME
The filename in which to write the Gibbs sampling state after at the end of th
e iterations. By default this is null, indicating that no file will be written.

Default is null
--output-topic-keys FILENAME
The filename in which to write the top words for each topic and any Dirichlet
parameters. By default this is null, indicating that no file will be written.
Default is null
--topic-word-weights-file FILENAME
The filename in which to write unnormalized weights for every topic and word t
ype. By default this is null, indicating that no file will be written.
Default is null
--word-topic-counts-file FILENAME
The filename in which to write a sparse representation of topic-word assignmen
ts. By default this is null, indicating that no file will be written.
Default is null
--xml-topic-report FILENAME
The filename in which to write the top words for each topic and any Dirichlet
parameters in XML format. By default this is null, indicating that no filewill
be written.
Default is null
--xml-topic-phrase-report FILENAME
The filename in which to write the top words and phrases for each topic and an
y Dirichlet parameters in XML format. By default this is null, indicating that
no file will be written.
Default is null
--output-doc-topics FILENAME
The filename in which to write the topic proportions per document, at theend
of the iterations. By default this is null, indicating that no file will be wri
tten.
Default is null
--doc-topics-threshold DECIMAL
When writing topic proportions per document with --output-doc-topics, do not p
rint topics with proportions less than this threshold value.
Default is 0.0
--doc-topics-max INTEGER
When writing topic proportions per document with --output-doc-topics, do not p
rint more than INTEGER number of topics. A negative value indicates that all to
pics should be printed.
Default is -1
--num-topics INTEGER
The number of topics to fit.
Default is 10
--num-threads INTEGER
The number of threads for parallel training.
Default is 1
--num-iterations INTEGER
The number of iterations of Gibbs sampling.
Default is 1000
--random-seed INTEGER
The random seed for the Gibbs sampler. Default is 0, which will use the clock
..
Default is 0
--num-top-words INTEGER
The number of most probable words to print for each topic after model estimati
on.
Default is 20
--show-topics-interval INTEGER
The number of iterations between printing a brief summary of the topics so far
..
Default is 50
--output-model-interval INTEGER
The number of iterations between writing the model (and its Gibbs sampling sta
te) to a binary file. You must also set the --output-model to use this option,
whose argument will be the prefix of the filenames.
Default is 0
--output-state-interval INTEGER
The number of iterations between writing the sampling state to a text file. Y
ou must also set the --output-state to use this option, whose argument willbe t
he prefix of the filenames.
Default is 0
--optimize-interval INTEGER
The number of iterations between reestimating dirichlet hyperparameters.
Default is 0
--optimize-burn-in INTEGER
The number of iterations to run before first estimating dirichlet hyperparamet
ers.
Default is 200
--use-symmetric-alpha true|false
Only optimize the concentration parameter of the prior over document-topic dis
tributions. This may reduce the number of very small, poorly estimated topics, b
ut may disperse common words over several topics.
Default is false
--use-ngrams true|false
Rather than using LDA, use Topical-N-Grams, which models phrases.
Default is false
--use-pam true|false
Rather than using LDA, use Pachinko Allocation Model, which models topical cor
relations.You cannot do this and also --use-ngrams.
Default is false
--alpha DECIMAL
Alpha parameter: smoothing over topic distribution.
Default is 50.0
--beta DECIMAL
Beta parameter: smoothing over unigram distribution.
Default is 0.01
--gamma DECIMAL
Gamma parameter: smoothing over bigram distribution
Default is 0.01
--delta DECIMAL
Delta parameter: smoothing over choice of unigram/bigram
Default is 0.03
--delta1 DECIMAL
Topic N-gram smoothing parameter
Default is 0.2
--delta2 DECIMAL
Topic N-gram smoothing parameter
Default is 1000.0
--pam-num-supertopics INTEGER
When using the Pachinko Allocation Model (PAM) set the number of supertopics.
Typically this is about half the number of subtopics, although more may help.
Default is 10
--pam-num-subtopics INTEGER
When using the Pachinko Allocation Model (PAM) set the number of subtopics.
Default is 20
Exception in thread "main" java.lang.IllegalArgumentException: Unrecognizedopti
on 2: --output
at cc.mallet.util.CommandOption$List.process(CommandOption.java:344)
at cc.mallet.util.CommandOption.process(CommandOption.java:146)
at cc.mallet.topics.tui.Vectors2Topics.main(Vectors2Topics.java:200)
C:\mallet\bin>mallet --input output.mallet --trainer NaiveBayes
Mallet 2.0 commands:
import-dir load the contents of a directory into mallet instances (one
per file)
import-file load a single file into mallet instances (one per line)
import-svmlight load a single SVMLight format data file into mallet instance
s (one per line)
train-classifier train a classifier from Mallet data files
train-topics train a topic model from Mallet data files
infer-topics use a trained topic model to infer topics for new documents
estimate-topics estimate the probability of new documents given a trained mo
del
hlda train a topic model using Hierarchical LDA
prune remove features based on frequency or information gain
split divide data into testing, training, and validation portions
Include --help with any option for more information

C:\mallet\bin>

Regards,
Subhabrata.
 
S

subhabangalore

Dear Room,



I was trying to learn Apache Ant.



I got the following information.



i)Ant is a build tool, it helps to create .exe file.

ii) Compiling is a subtask of building.



Now,

I am confused with few questions.



I was exploring the "Ant build" in Eclipse.



I could create one "build.xml" and could run successfully.



The questions are:

i) May I have to write the the "build.xml" or a build file everytime I want to build a project? Can't it be done automatic, means the generation of the .xml file?



ii) After the "build.xml" gives report like,



Hello:

[echo] Hello

BUILD SUCCESSFUL

Total time: 477 milliseconds



Where may I find .exe file? And how should I use it?



If any one of the learned members can kindly suggest?



Regards,

Subhabrata.

Dear Group,

After lot of experiments and web surf, I could get the results.

C:\mallet\bin>mallet train-classifier --input output.mallet --output-classifier
my.classifier
Training portion = 1.0
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.0

-------------------- Trial 0 --------------------

Trial 0 Training NaiveBayesTrainer with 1 instances
Trial 0 Training NaiveBayesTrainer finished
Trial 0 Trainer NaiveBayesTrainer training data accuracy= 1.0
Trial 0 Trainer NaiveBayesTrainer Test Data Confusion Matrix
Trial 0 Trainer NaiveBayesTrainer test data accuracy= NaN

NaiveBayesTrainer
Summary. train accuracy mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test accuracy mean = NaN stddev = NaN stderr = NaN
C:\mallet\bin>
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\Users\subhabrata>cd\

C:\>cd mallet

C:\mallet>cd bin

C:\mallet\bin>mallet train-classifier --input revised.mallet --output-classifier
mine.classifer
Training portion = 1.0
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.0

-------------------- Trial 0 --------------------

Trial 0 Training NaiveBayesTrainer with 21 instances
Trial 0 Training NaiveBayesTrainer finished
Trial 0 Trainer NaiveBayesTrainer training data accuracy= 0.5238095238095238
Trial 0 Trainer NaiveBayesTrainer Test Data Confusion Matrix
Trial 0 Trainer NaiveBayesTrainer test data accuracy= NaN

NaiveBayesTrainer
Summary. train accuracy mean = 0.5238095238095238 stddev = 0.0 stderr = 0.0
Summary. test accuracy mean = NaN stddev = NaN stderr = NaN
C:\mallet\bin>
Microsoft Windows [Version 6.1.7601]

C:\mallet\bin>mallet train-classifier --input revised.mallet --training-portion
0.9
Training portion = 0.9
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.09999999999999998

-------------------- Trial 0 --------------------

Trial 0 Training NaiveBayesTrainer with 19 instances
Trial 0 Training NaiveBayesTrainer finished
Trial 0 Trainer NaiveBayesTrainer training data accuracy= 0.5789473684210527
Trial 0 Trainer NaiveBayesTrainer Test Data Confusion Matrix
Confusion Matrix, row=true, column=predicted accuracy=0.0
label 0 1 2 3 4 5 6 7 8 9 10 11 |total
0 plot . . . . . . . . . . . . |0
1 Chauhan, . . . . . . . . . . . . |0
2 2 . . . . . . . . . . . |2
3 DELHI: . . . . . . . . . . . . |0
4 It's . . . . . . . . . . . . |0
5 A . . . . . . . . . . . . |0
6 "Pichle . . . . . . . . . . . . |0
7 all . . . . . . . . . . . . |0
8 Another . . . . . . . . . . . . |0
9 However . . . . . . . . . . . . |0
10 While . . . . . . . . . . . . |0
11 On . . . . . . . . . . . . |0

Trial 0 Trainer NaiveBayesTrainer test data accuracy= 0.0

NaiveBayesTrainer
Summary. train accuracy mean = 0.5789473684210527 stddev = 0.0 stderr = 0.0
Summary. test accuracy mean = 0.0 stddev = 0.0 stderr = 0.

C:\mallet\bin>mallet train-classifier --input revised.mallet --output-classifier
newly.classifier --trainer MaxEnt
Training portion = 1.0
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.0

-------------------- Trial 0 --------------------

Trial 0 Training MaxEntTrainer,gaussianPriorVariance=1.0 with 21 instances
Value (labelProb=52.18303964554802 prior=0.0) loglikelihood = -52.18303964554802
Value (labelProb=35.07842643418387 prior=0.5000000000000047) loglikelihood = -35
Value (labelProb=11.041269237873143 prior=7.57975726871467) loglikelihood = -18.
Value (labelProb=8.845302279528045 prior=8.283595234589693) loglikelihood = -17.
Value (labelProb=7.684014543706313 prior=8.67398223273065) loglikelihood = -16.3
Value (labelProb=7.620229834138587 prior=8.637738965621395) loglikelihood = -16.
Value (labelProb=7.6930087796076805 prior=8.55144842287889) loglikelihood = -16.
Value (labelProb=7.721825299388809 prior=8.520013417234695) loglikelihood = -16.
Value (labelProb=7.710910984716437 prior=8.529963512671845) loglikelihood = -16.
240874497388283
Exiting L-BFGS on termination #1:
value difference below tolerance (oldValue: -16.241838716623505 newValue: -16.24
0874497388283
Value (labelProb=8.706998275573689 prior=10.051741121362099) loglikelihood = -18
Value (labelProb=7.624790808427014 prior=8.637141273540822) loglikelihood = -16.
Value (labelProb=7.700549530911232 prior=8.540231288758795) loglikelihood = -16.
Value (labelProb=7.700990051603592 prior=8.539733279705375) loglikelihood = -16.
240723331308967
Exiting L-BFGS on termination #1:
value difference below tolerance (oldValue: -16.24078081967003 newValue: -16.240
723331308967

Trial 0 Training MaxEntTrainer,gaussianPriorVariance=1.0 finished
Trial 0 Trainer MaxEntTrainer,gaussianPriorVariance=1.0 training data accuracy=
0.9523809523809523
Trial 0 Trainer MaxEntTrainer,gaussianPriorVariance=1.0 Test Data Confusion Matr
ix
Trial 0 Trainer MaxEntTrainer,gaussianPriorVariance=1.0 test data accuracy= NaN

MaxEntTrainer,gaussianPriorVariance=1.0
Summary. train accuracy mean = 0.9523809523809523 stddev = 0.0 stderr = 0.0
Summary. test accuracy mean = NaN stddev = NaN stderr = NaN
C:\mallet\bin>

THANKS FOR PULLING ME UP. I ENJOYED.

Regards,
Subhabrata.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,156
Messages
2,570,878
Members
47,411
Latest member
Bennie8583

Latest Threads

Top