Next release

Topics: Developer Forum
Sep 4, 2006 at 9:02 PM
From the Google Groups list

> Right now, to run any of the demos, the user have to go through the
> following procedure:
> 1, download the sources, and re-compile it
> 2, download the .bin files from opennlp
> 3, run the modelconverter on most of the .bin files
> 4, realize, that they don't have any
> C:\Projects\DotNet\OpenNLP\OpenNLP\Models\ directory, and modify it
> according to their file system
> ...which is slightly painfull.
> solutions:
> 1, pre-compiled binaries

Yep, this just comes from having a release

> 2-3, including the pre-converted binaries into the release version

How about releasing the .nbin model files as a separate release from
the binaries. They change much less frequently than the source code and
they are big files, so it makes sense to prevent people from having to
keep downloading the same model files each time we do a new release of
library dlls etc.

> Now, I gave some thought about the fourth one, and basically it all boils
> down to the fact, that we all have our own preferance for directory
> structuring. The least complicated solution I could come up with, is adding
> a settings file to the OpenNLP project, containing a DataDir string, which
> points to the directory containing the .nbin files. This would enable
> developers to use whatever directory they already have, while making the
> development version - release version shift a matter of changing one line in
> the app.config file.

The settings file wouldn't be for the library - it would be for the
example executables. The library (should be) designed so you can use
models in any format supported by SharpEntropy, not just the .nbin
versions. I think we need to add an app.config to ToolsExample and
ParseTree, then change the constructor of Mainform.cs in each case to
read the mModelPath from the app.config rather than the hardcoded
string. Best to keep the setting name in app.config consistent; call it
ModelPath. (Which reminds me, we should stick to the naming conventions
for variables etc. that I have established in the code; it took me a
long time to change all the variable names from the Java ones to make
them more readable and consistent with Microsoft's .NET guidelines and
FxCop. The one wrinkle is using the "m" prefix on class-level variables
to distinguish them from local ones. This is the convention we use at
my workplace, so that's why I used it. I haven't done the FxCop sweep
for the coreference code, so expect a lot of variable name changes -
and changes generally - to that part of the code over the next few

> Also, how do you feel towards the version number? Personally, using the last
> stable version for more than four month of experimenting, I haven't come
> upon any crashes, or major malfunctions (some minor glitches with the Parser
> tool's ability to parse sentences with more than 2 composite sentences,
> though....)

Well, that's good news. I am still aware of the nasty SharpEntropy bug
that makes the Smoothing option for training not work. Tell me some
more about the problem with the Parser.

> If you give the OK sign, I'll package the release version, and upload it to
> codeplex.

I would prefer not to release the version with half-finished
coreference tool. Could we release the .NET 2.0 binaries that I provide
currently via the CodeProject article, and then do a release with
coreference included in a few weeks' time?

> (yes, these are all minor details towards the holy grail ;) , but they still
> needs to be addressed)

Talking of minor details, do you have any time to look into using
Sandcastle to produce .NET 2.0 documentation, now that the NDoc person
has decided not to do an NDoc for .NET 2.0?

> SDr

Sep 7, 2006 at 8:00 AM

Thanks for copying across my message from the SharpNLP Google group to here.

Did you have any comments on any of the things I wrote?

Sep 10, 2006 at 3:17 PM
>How about releasing the .nbin model files as a separate release from
>the binaries.


>The library (should be) designed so you can use
>models in any format supported by SharpEntropy, not just the .nbin

This brings up another question regarding the config files: if the models are stored in a database, how do we store it in the config files? A "file://" / "sqlite://" prefix maybe?

I'll look into the rest of the stuff as soon as I've got a stable internet connection probably a week, or so

Sep 12, 2006 at 12:07 PM
>This brings up another question regarding the config files: if the >models are stored in a database, how do we store it in the config >files? A "file://" / "sqlite://" prefix maybe?

Sqlite is file-based, so a file path to the Sqlite database would work fine. There would be an issue, however, if I put in my code to use SQL Server as the model store, because then you would need the database connection string. These issues would be handled by the client application, however, not the OpenNLP or SharpEntropy libraries. The OpenNLP library will just take a GisModel object; it doesn't care how that object was constructed or the internals of where it's getting its data from.