An easy(ish) alternative to porting OpenNlp to C#

Topics: User Forum
Jul 1, 2011 at 8:42 PM
Edited Aug 1, 2012 at 4:11 PM

 

** PLEASE NOTE: THIS IS NOW OUT OF DATE.**
**FOR V 1.5.2 PLEASE SEE THIS POST FURTHER DOWN FOR NEWER INSTRUCTIONS **

Hi,
Here is an alternative to to porting the java OpenNlp code to C#.

Using the amazing IKVM  http://www.ikvm.net/index.html the java files can be converted to a .Net assembly (dll).
Thus allowing you to use the latest releases of Opennlp from C# (or any other .net language).
I have done this, and so far it is working very well.

Here is a quick guide if you're interested.  (Don't forget to unblock any downloaded files)

  • Download & extract the latest OpenNlp release from http://opennlp.apache.org/cgi-bin/download.cgi . At the time of writing this is  apache-opennlp-1.5.1-incubating-bin.zip
  • The three .jar  files (opennlp-maxent-3.0.1-incubating.jar,   jwnl-1.3.3.jar,   opennlp-tools-1.5.1-incubating.jar) in the lib folder can be used to compile a .net assembly as follows.
  • Download & extract the latest IKVM from http://sourceforge.net/projects/ikvm/files/  At the time of writing this is ikvmbin-0.46.0.1.zip
  • For simplicity, copy the three .jar files above into the ikvmbin-0.46.0.1/bin folder
  • From a command window, In the ikvmbin-0.46.0.1/bin folder use IKVMC & the above three jar files  make the opennlp.dll as follows:
  • ikvmc -target:library -assembly:opennlp opennlp-maxent-3.0.1-incubating.jar jwnl-1.3.3.jar  opennlp-tools-1.5.1-incubating.jar
  • Copy the following from the ikvmbin-0.46.0.1/bin folder to your project folder (or the folder of your choice)
    •  opennlp.dll (the assembly you have just created)
    • IKVM.Runtime.dll
    • IKVM.OpenJDK.Core.dll
    • IKVM.OpenJDK.Jdbc.dll
    • IKVM.OpenJDK.Text.dll
    • IKVM.OpenJDK.Util.dll
    • IKVM.OpenJDK.XML.API.dll
    (I found using reflection which IKVM dll's are referenced in the opennlp.dll )

Add references to these assemblies in your project & use at will :-)

You will need the models for your language from http://opennlp.sourceforge.net/models-1.5/

The OpenNlp manual is a good place to start http://incubator.apache.org/opennlp/documentation/manual/opennlp.html 

Note: This is still a java in .net clothes, so care has to be taken over some things.
e.g when loading models the inputstreams are java types (referenced from the assemblies above)

string modelpath = "C:\models\";  \\Wherever you've stored your downloaded models
java.io.FileInputStream modelInpStream = new java.io.FileInputStream(ModelPath + "en-sent.bin");
opennlp.tools.sentdetect.SentenceModel sentenceModel =new opennlp.tools.sentdetect.SentenceModel(modelInpStream);
opennlp.tools.sentdetect.SentenceDetectorME SentenceDetectorME=new opennlp.tools.sentdetect.SentenceDetectorME(sentenceModel);

Mostly though it seems to be very straightforward & works well.

Once set up it takes only a few seconds to create an opennlp.dll assembly from the latest releases
and so it is very easy to keep it bang up to date.

Hope this may be of use to someone.

Cheers

Paul

 

Jul 7, 2011 at 1:42 AM

I've followed your instruction, and success with the Sentence Detector. But I can't use the POS Tagger. When I load the model (try both Maxent and Perceptron), it throws the exception "The profile data stream has an invalid format!" at the following step:

java.io.FileInputStream modelInpStreamPOS = new java.io.FileInputStream("en-pos-maxent.bin");
opennlp.tools.postag.POSModel posmodel = new opennlp.tools.postag.POSModel(modelInpStreamPOS);  >>>>> Throw exception here

Hope you can check this. Thank you!

Inuris

Jul 7, 2011 at 4:59 PM

Hi,
Ah yes, i had this problem too, but forgot to mention it.  (oops!)
I'm not sure if this will affect things, but i fixed this as follows.
The file en-pos-maxent.bin is actually a zip archive.

If you examine the contents of this zip file it has three files (the others seem to only have 2)
manifest.properties,   tags.tagdict,  &  pos.model
Delete the tags.tagdict from the zipfile so that it only contains manifest.properties & pos.model
Note: Don't actually unzip  en-pos-maxent.bin just delete tags.dagdict, so that  en-pos-maxent.bin remains a Zip archive.

It then works fine, hopefully :-)
Hope that helps.
Paul

P.S.
I've successfully used Splitting, Tokenising, POS Tagging & Chunking, but have not had time to figure out using the parser yet.
The parser seems to have changed quite a bit  in 1.5.1 & the docs seem a bit vague.
If anyone has got a working example, i would be very grateful if you could share it.
Cheers

 

Mar 6, 2012 at 8:00 AM

Thank you! this has been really helpful! 

Mar 6, 2012 at 9:09 AM

You are very welcome.  8-)

May 1, 2012 at 10:19 PM

Just as an update here, I followed your instructions but also needed to include the IKVM.OpenJDK.Charsets dll for the models to actually read in the FileInputStream successfully. Otherwise, I got a charset exception. Everything else was perfect, this was really useful. Thanks!

May 2, 2012 at 5:33 PM

Thanks for the heads up on Charsets.

Glad it was useful to you.

Cheers

May 4, 2012 at 6:38 PM

Do you have a working copy of this that you are willing to share? I'm looking for some libraries I can use to extract verbs and nouns (in that order) from paragraphs of text. This would help me identify potential functions. Functions are made up of action verbs and target (measurable) nouns such as: Lift Weight, Support Weight, Open Circuit, Close Circuit, Control Flow, Separate Fluids, etc.

I have very limited time and resources, and was hoping I could find some libraries to help with this.

Thanks.

May 5, 2012 at 6:03 PM
Edited May 5, 2012 at 6:18 PM

I have a VERY basic demo program that i quickly knocked up. It might be of some use to you.
It doesn't do any parsing, but does do basic Splitting, Tokenizing, Pos Tagging & very basic Chunking.
It's VERY quick 'n' dirty, but you're welcome to a copy if it helps (i'm not sure if it will)
The opennlp.dll made using the method outlined in my original post was made with 1.5.1, so it's a bit out of date.
Unfortunately,  I'm to busy to spend any time on this at the moment

I've bundled it all up as a c# Winforms project already compiled so you can just extract it (don't forget to unblock the zip file first)  & run the Nlp.exe In the Bin\debug folder. Of course the source code is included as well.
The models are in there too in the debug folder, the program expects the model folder to be in the same folder as nlp.exe (i said it was quick & dirty ;-)

Here's the file (it's ~95Mb !! as it includes the models)  http://dl.dropbox.com/u/24630720/Nlp.zip
Hope you find it useful.

Cheers
Greeny

Jul 30, 2012 at 4:40 PM

I cannot seem to get your solution to work.  I keep getting this handy little error.  Any ideas what i'm missing?

 

...\NLP\ikvm-7.0.4335.0\bin> ikvmc -target:library-assembly:opennlp opennlp-maxent-3.0.2-incubating.jar jwnl-1.3.3.jar opennlp-tools-1.5.2-incubating.jar


Unhandled Exception: System.ArgumentException: Index was out of range. Must be non-negative and less than the size of the collection.   

at System.Security.Util.Hex.ConvertHexDigit(Char val)
at System.Security.Util.Hex.DecodeHexString(String hexString)   
at System.Security.Policy.StrongNameMembershipCondition.ParseKeyBlob()   
at System.Security.Policy.StrongNameMembershipCondition.get_PublicKey()   
at System.Security.Policy.StrongNameMembershipCondition.System.Security.Policy.IReportMatchMembershipCondition.Check(Evidence evidence, Object& usedEvidence)

at System.Security.Policy.PolicyLevel.IsFullTrustAssembly(ArrayList fullTrustAssemblies, Evidence evidence)   
at System.Security.Policy.PolicyLevel.Resolve(Evidence evidence, Int32 count, Char[] serializedEvidence)   
at System.Security.PolicyManager.CodeGroupResolve(Evidence evidence, BooleansystemPolicy)   
at System.Security.PolicyManager.ResolveHelper(Evidence evidence)   
at System.Security.PolicyManager.Resolve(Evidence evidence)   
at System.Security.SecurityManager.ResolvePolicy(Evidence evidence, PermissionSet reqdPset, PermissionSet optPset, PermissionSet denyPset, PermissionSet& denied, Boolean checkExecutionPermission)   
at System.Security.SecurityManager.ResolvePolicy(Evidence evidence, PermissionSet reqdPset, PermissionSet optPset, PermissionSet denyPset, PermissionSet& denied, Int32& securitySpecialFlags, Boolean checkExecutionPermission)

Jul 31, 2012 at 11:22 AM
Edited Jul 31, 2012 at 11:31 AM

Hi,

I'm not sure what the problem is. Two things spring to mind.
Did you unblock all the zip files you downloaded? (some of the messages are security related)
There is a more current version of IKVM available (7.1.4532.2) that might help.

Also 1.5.2 introduces dependencies that make the whole job a bit more complicated
I have posted an updated method below which works ok for me.
hopefully it will work for you too.

Cheers
Paul

Jul 31, 2012 at 11:25 AM
Edited Aug 1, 2012 at 4:01 PM

UPDATE FOR VERSION 1.5.2

 

Hi,
Here is an alternative to to porting the java OpenNlp code to C#.

Using the amazing IKVM  http://www.ikvm.net/index.html the java files can be converted to a .Net assembly (dll).
Thus allowing you to use the latest releases of Opennlp from C# (or any other .net language).
I have done this, and so far it is working very well.

Here is a quick guide if you're interested. 

(Don't forget to unblock any downloaded files, by right clicking on Zip file – Properties - Unblock)

  • Download & extract the latest OpenNlp release from http://opennlp.apache.org/cgi-bin/download.cgi . At the time of writing this is  apache-opennlp-1.5.2-incubating-bin.zip

  • The four .jar  files (opennlp-maxent-3.0.2-incubating.jar,   jwnl-1.3.3.jar,   opennlp-tools-1.5.2-incubating.jar, opennlp-uima-1.5.2-incubating.jar ) in the lib folder can be used to compile a .net assembly as follows.

  • Download & extract the latest IKVM from http://sourceforge.net/projects/ikvm/files/  At the time of writing this is  ikvmbin-7.1.4532.2.zip

  • For simplicity, copy the six .jar files above into the ikvmbin-7.1.4532.2/bin folder

    • opennlp-maxent-3.0.2-incubating.jar
    • jwnl-1.3.3.jar
    • opennlp-tools-1.5.2-incubating.jar
    • opennlp-uima-1.5.2-incubating.jar
    • uima-core.jar
    • mail.jar
  • From a command window, In the ikvmbin-7.1.4532.2/bin folder use IKVMC & the above five jar files  make the opennlp.dll as follows:

  • ikvmc -target:library -assembly:opennlp opennlp-maxent-3.0.2-incubating.jar jwnl-1.3.3.jar opennlp-uima-1.5.2-incubating.jar opennlp-tools-1.5.2-incubating.jar uima-core.jar mail.jar

    Note: There are some warnings in the compilation due to missing references log4j-x.x.x.jar & javax.jms.jar but they don't cause a problem & i'm trying to keep things as simple as possble.

  • Copy the following from the ikvmbin-7.1.4532.2/bin folder to your project folder (or the folder of your choice)

    •  opennlp.dll (the assembly you have just created)

    • IKVM.Runtime.dll

    • IKVM.OpenJDK.Core.dll

    • IKVM.OpenJDK.Jdbc.dll

    • IKVM.OpenJDK.Text.dll

    • IKVM.OpenJDK.Util.dll

    • IKVM.OpenJDK.XML.API.dll

    • IKVM.OpenJDK.Charsets.dll  (Thanks carl4os)

    (I found using reflection which IKVM dll's are referenced in the opennlp.dll )

Add references to these assemblies in your project & use at will :-)

You will need the models for your language from http://opennlp.sourceforge.net/models-1.5/

The OpenNlp manual is a good place to start http://opennlp.apache.org/documentation/manual/opennlp.html

Note: This is still a java in .net clothes, so care has to be taken over some things.
e.g when loading models the inputstreams are java types (referenced from the assemblies above)

string modelpath = "C:\models\";  \\Wherever you've stored your downloaded models
java.io.FileInputStream modelInpStream = new java.io.FileInputStream(ModelPath + "en-sent.bin");
opennlp.tools.sentdetect.SentenceModel sentenceModel =new opennlp.tools.sentdetect.SentenceModel(modelInpStream);
opennlp.tools.sentdetect.SentenceDetectorME SentenceDetectorME=new opennlp.tools.sentdetect.SentenceDetectorME(sentenceModel); 

Mostly though it seems to be very straightforward & works well.

Once set up it takes only a few seconds to create an opennlp.dll assembly from the latest releases
and so it is very easy to keep it bang up to date.

I have a VERY basic demo program that i quickly knocked up. I've bundled it all up as a C# Winforms project already compiled so you can just extract it (don't forget to unblock the zip file first)  & run the Nlp.exe In the Bin\debug folder. Of course the source code is included as well.
The models are in there too in the debug folder, the program expects the model folder to be in the same folder as nlp.exe (i said it was quick & dirty ;-)
Here's the file (it's ~100Mb !! as it includes the models)  https://dl.dropbox.com/u/24630720/nlp.zip

Hope this may be of use to someone.

Cheers

Paul



Jul 31, 2012 at 4:11 PM
Edited Jul 31, 2012 at 4:16 PM

Hi Paul.

 

Thanks for the Update!

I'm still running into the same issue as early(Double checking to unblock the zips and doing everything I can to make sure the files are free and clear). But i'll keep plugging away at it.

Edit:  I was able to d/l your demo app and run that though.  Thanks for all the great help!


One thing I did note is that the D/l of UIMA doesn't seem to have the mail.jar in it's lib folder ( or any folder for that matter).  Any advice on where it could be found?

 

Thanks!
Dave. 

Aug 1, 2012 at 3:53 PM
Edited Aug 1, 2012 at 4:14 PM

Hi Dave,
You are most welcome.

Sorry, i got myself in a bit of a muddle with mail.jar there.
You can download javamail1_4_5.zip from
http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-eeplat-419426.html#javamail-1.4.5-oth-JPR
mail.jar is in that zip file  (sorry,i can't give a  direct download as you have to accept their t&c's)
It's probably not necessary to include it, but at least you will know it's not that causing your problems.

I can't really help with your main problem, although i have a strong hunch it's something to do with permissions.
Perhaps running a command window as administrator might help.


Glad you could at least run the little demo, you can always use the  opennlp.dll  from that in your own projects.

Cheers
Paul

Aug 15, 2012 at 4:25 AM

For what it's worth, i was able to take Paul's version of Open NLP and use it to generate some visuals that are a little easier to interpret (at least for my brain).  You can find an example image here

(http://dl.dropbox.com/u/38604282/output.png)

The debugging code is kind of rooted in my application, but i'd be happy to share if anyone is interested.

 

Thanks!
Dave. 

Nov 26, 2012 at 11:13 AM

Much obliged Paul!  Exactly what I was looking for.

Anyone else using OpenNlp in .Net potentially looking collaborate on a library? 

Nov 26, 2012 at 2:53 PM
Edited Nov 26, 2012 at 2:53 PM

Cheers Zuvvy, you're most welcome.

I'm afraid i don't have any spare time to devote to any collaboration at the moment,
but i hope you get some interest.

Cheers
Paul

Feb 21, 2013 at 6:59 PM
Hello, folks - using parser is straight forward - the same way as you would do in java. Here is an example :
        try
        {
            fileInputStream = new FileInputStream(string.Concat(ModelPath, "en-parser-chunking.bin"));

            try
            {
                ParserModel model = new ParserModel(fileInputStream);
                this.FParser = ParserFactory.create(model);
            }
   var strBldr = new java.lang.StringBuffer();


        string directoryName = Path.GetDirectoryName(Assembly.GetExecutingAssembly().GetName().CodeBase);
        var ModelPath = string.Concat((new Uri(directoryName)).LocalPath, "\\Models\\");

        topParses = ParserTool.parseLine(InputText, FParser, 1);
        foreach (var topParse in topParses)
        {
            topParse.show(strBldr);
        }

        }
It works perfectly fine. Loading model takes about 3-4 seconds however - it looks longer than in native java solution. But parsing time seems fine. No performace testing have been performed so far.
Feb 22, 2013 at 7:15 PM
Cheers k0ss,

Not really got as much time as i'd like for NLP at the moment, but will give it a try when i can.

Cheers
Mar 20 at 10:32 AM
Hi,
I am getting following exception. Can anyone tell me why??

Could not load type 'opennlp.tools.sentdetect.SentenceDetectorME' from assembly 'OpenNlp, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'.