Goose parser

Probably each web developer in his professional life has tried to write two things: his own framework and an Internet page parser. We spent some time on analyzing current parser packages and have realized the necessity to create our own. We followed the special goal – to create something easily extendable and with an ability to save parsing rules in storage. So, parsers can be made as external modules, which know HOW to parse, but don’t know WHAT to. And Goose is a revolution in the parsing industry.

This fall I’ve started my experience with NodeJs. It began from a simple task – it was a need to parse a simple website with user actions such as: click, scroll, and etc, which has to be done on the page before parsing process.

Unfortunately, I’ve realized this task is NOT possible with PHP. So I’ve started my research and actually found a lot of libraries such as: CasperJs, CrawlerNinja, and others. But all of them was incomplete or hard to extend for my personal needs. Another point – I wanted to have parsing rules as json schema, to add an inheritance in the future and to store them in mongo.

October 7, 2015 was the first commit to the project with code name “RedParser”.

But after some time it was renamed to goose parser, because Goose has really impressed me with its passion to that project.For someone Goose just a toy or puppet, but for us it’s an idea of free web and free useful scrapping tools…

Coming soon…