Posts

goose-logo-square

Goose parser

Probably each web developer in his professional life has tried to write two things: his own framework and an Internet page parser. We spent some time on analyzing current parser packages and have realized the necessity to create our own. We followed the special goal – to create something easily extendable and with an ability to save parsing rules in storage. So, parsers can be made as external modules, which know HOW to parse, but don’t know WHAT to. And Goose is a revolution in the parsing industry.

Who is Goose?

Goose is a talented web developer, expressive leader, and just a “guy” who loves pretty birds. Goose also likes to nibble grass in the garden and parse web pages. He made a trailer and is working on his own film, which shows all the benefits of the software.

Why use Goose?

And you might ask a reasonable question – “Why should I use the goose parser instead of a million others”. And the answer will be simple – it has a lot of rich features that are fully promised. So I can say “I Promise, you will like it”.

Let’s look at the features list.

Key features

  • Declarative approach for definition of parsing rules, actions, and transformations.
  • Multi-environments to run the parser on the browser and server sides.
  • Clear and consistent API with promises all the way.
  • Improved Sizzle format of selectors.
  • Ajax and multi-page parsing modes.
  • Easily extendable.
  • 90%+ test-covered.

Let’s see how simple it is to start and parse your first Internet page.

Place of inhabitance

First of all I want to introduce you to different Goose environments, he knows to work with:

  • PhantomEnvironment (most popular, for server usage)
  • BrowserEnvironment (for browser usage, for example in a browser extension)
  • SeleniumEnvironment (for tests, in development)

In simple words you can run in any place you want with the same result.

Goose preparation

It’s time to prepare your first run with Goose. In this article I take a look at the PhantomEnvironment, because it is the most stable and popular.

By passing a url to the environment we let Goose know about the start point.

Before the parsing process

Usually we need to do some actions on the page before the parsing process. For example, Goose wants to find a goose babe on a flirt site.

Time to tweak something

For example, markup looks like this:

Navigate Goose, let him know about essential data by passing rules:

Run Goose!

And get results:

The beginning

This is just a short article about Goose’s abilities, but he is much stronger than you can imagine.

See more info in the official documentation.

See original trailer about that project.

Patiently wait for the full movie, which will be released soon.

Give Goose a star and he will serve the world on a plate for you!

 

Leave a Reply

Tags