Writing plugins

Post by **Conor** » Fri Jun 16, 2006 7:47 am

Thanks for the plug-in. I have placed it in an automatic installer and added it to our extras page, so other users can download it. I have also updated CDpedia so it selects the last entered result via a plug-in just like it does for regular searches.

Glad the plug-in got your appetite going for Cocoa programming. If you have any Cocoa questions let me know, also the Cocoa mailing list is full of talented and helpful Mac programmers. Looking forward to some great OCR software for Mac. Thanks for supporting the Mac platform, any of your clients that switches to Mac because of your software is one more potential CDpedia buyer.

Since you are into OCR if you would like to become a developer in the open source iSight barcode scanner, all help is welcome. It needs to be updated to work with the new integrated fixed focus iSight, which makes the image harder to recognize.

aramp · Post by **aramp** » Fri Jun 16, 2006 1:06 pm

I think mentioning me as a developer of the plug-in is kind of too much. I only wrote a small part of the code. It would be more correct to say something like "with contribution of" or alike. When you are revisiting the page next time, and you are in a good mood for few more keystrokes, I would feel more comfortable when you change that.

Thanks for links. I have already bookmarked them all.

Too many projects for the start could be overwhelming, so I can't promise I will join barcode scanner project right away. I have already noticed the barcode scanning does not really work for my built-in iSight cameras (niether on MacBook Pro nor on iMac). But at some time I promise to have a look, because from the top of my head I think the solution should come more from clever binarization than from barcode reading algorithm by itself.

Actually getting used to Cocoa and Mac development is a lot of work. Being in the industry for about 15 years, I am a real Mac user starting from January. I used to work on Mac Classic II for some while, I had a chance to play with Next boxes when they just appeared, but only now I get the real feeliing of what Mac is. And I have to fight through the totally Microsoft-based environment in the office (while I am co-sharing my main responsibilities with a CIO position, that does not help a lot). So I have enough trill for the few coming months.

Another challange for me is to make use of Russian State Library's catalogue for my copy of BookPedia. My daughter has already managed to enter all the English-language books we have, but the dozens time more of Russian books are still waiting on shelves to be catalogued. BTW, is it possible to make a plug-in for BookPedia the same way as for CDPedia? If not, then putting everything into XML and then importing will be the only choice. Also, for BookPedia plugin I would prefer using Perl, to simplify searching through HTML pages. What about Perl scripts in Cocoa app?

Post by **Conor** » Sat Jun 17, 2006 6:22 am

with contribution of

Done

clever binarization

I don't even know what that means.

Truth be told I haven't looked at the code yet. I made a new Tiger compatible version of the code and then handed it off to a novice programmer to get him going. But I overestimated the complexity of the problem and he was unable to solve it. So I have to sit down with him and see if we can work it out.

is it possible to make a plug-in for BookPedia

Yes, what you want here is not a menu plug-in but a search plug-in. It will show up on the search site pop up like Amazon and the rest of the sites. Here is download of an internal code for one of the French sites. As I mentioned before we want to open source the plug-ins, but we would have to cleanup the code and comment it properly, right now it's only em and one other user who work with it. So sorry if it's a bit confusing, but it gives you a starting point. This page also has some information on plug-ins. The HTML parsing is done via Cocoa, but if you are more comfortable with perl there is a Cocoa Perl bridge. Also it's trivial to send a command to the shell and get a response if that is easier. Bookpedia also has support for z39.50 servers, under the advance menu. With those you don't get cover images and the information is limited to the basics but it does get information from a wide variety of libraries and institutions.

aramp · Post by **aramp** » Mon Jun 19, 2006 5:47 am

Thanks.

As to binarization: that's converting the full-color image camera yealds to black and white, because most of the algorithms in recognition deal with b&w. It is important to make the conversion purpousfully, which is not easy. Probably a good idea would be to focus on the center of the image to find a global treshold (the brightness level at weach we separate "white" from "black"), because most probably the user exposes the barcode to the center of the image. Then build a vertical histogram from b&w image, and first adjust the treshold level to receive a maximum peaks (representing black and white stripes in barcode), and then adjust the angle (rotating a little bit to the left and to the right to see if that gives better peaks). The image around the barcode should only generate more or less uniform "noise" to the histogram, still, when the horizontal size of the barcode is found out (based on the placement of the all peaks on histogram, it makes a lot of sense to cut the area of barcode from bottom and top to make recognition easier. And then the remaining job is to count peaks and the distance between them. That's how generally barcode recognition should work to my understanding.

Thanks for the idea of using z39.50 servers. I have never heard of them before. It looks like there is a whole infrastructure of such servers in Russia, but I was unable to use BookPedia's interface with them -- no book was found (don't know the reason). So I will probably have to experiment a little with these servers by writing some scripts to figure out how to deal with them properly.

aramp · Post by **aramp** » Mon Jun 19, 2006 5:51 am

I look through my own post and understood that I have used wrong wording. You normally don't need to rotate the image, it is possible to build a histogram without that, just by shifting pixels on each row before couting them .

Post by **Conor** » Thu Jun 22, 2006 7:32 am

Thanks for the tips. We had not thought about adjusting the B&W threshold based on the image. We were using an established algorithm for changing RGB to gray scale and using a medium value as the threshold.

With the z39.50 server as some libraries give little information, one has to do a trial and error with the different encodings and marc formats. Also remember in the server section the entry should have the port number and the database name: "www.library.com:210/database".

aramp · Post by **aramp** » Thu Jun 22, 2006 1:04 pm

Thanks for the hint on z39.50, I totally missed the name of the collection after the slash... I was actually wondering where to put that info.

I am now all set to browse the catalogue of the Russian State Library, which is more than enough for me, and it works very fast.

The only issue is that BookPedia normalizes the ISBN string (by cutting all intermediate dashes), and that seems to be bad idea in my case. Because using a web interface that directly communicates with the z39.50 server I am able to find the book based on ISBN only when all the dashes are exactly in their places. Otherwise the server don't find anything. Is it possible to tell BookPedia not to "normalize" the ISBN string before sending the query?

BTW, I am going to consult our imaging team on barcodes tomorrow and will probably post some more hint, probably more helpful that I have already gave.

Post by **Conor** » Sat Jun 24, 2006 10:22 am

Is it possible to tell BookPedia not to "normalize" the ISBN string before sending the query?

Bookpedia takes away the dashes because most database will do a search on the ISBN without the dashes. (I would have said all, before you posted the information that the Russian State Library can't.) I been thinking about the idea of still normalizing the ISBNs without dashes as it is now, but displaying it with dashes to the users to make it easier to read. But that is another matter, not relevant to searching. If it were a search plug-in the dashes could be re-inserted but as most z39.50 servers take the number without dashes and they share a common query interface it has to stay that way.

Will take all the tips and help we can get, but we haven't even gotten started on the tips you gave us before so you are way ahead of us.