Home > Documentation / Tutorials

Unit Miner Documentation

Documentation for Unit Miner is available in PDF format for download (approx. 200KB)

Tutorials

Check the tutorials that will show you step by step how to write your own script.

Tutorial 1 - grabbing story from news website

Tutorial 2 - parsing access.log

Tutorial 3 - grabbing PR articles from www.prweb.com and store them to files

For more extraction scripts examples, check out our live demos/examples.

Examples of use

  • Gather Financial Data
  • Gather Real Estate Data
  • Get Auction Info
  • Gather Financial Data
  • Dating Site Info
  • Implement personalized news services
  • New innovative services

UnitMiner

is robust, flexible and easy to use system for monitoring, retrieving and mining content from web sites, documents, or any non-structured source of data.

Web Extraction Scheme
Web Extraction Scheme

I used Unit Miner to automate extraction of betting info. So far I am very satisfied with performance. I'd like to thank you for all your help with my project.I would recommend your software to everybody who is searching for information-retrieval solution.

Unit Miner service

We create data extraction application according to your request. Receive desired data without any software, hardware or installation needed. More

Introduction to web data extraction: How it works?

With the explosion of the World Wide Web, a wealth of data on many different subjects has become available online. Usually, users retrieve Web data by browsing and keyword searching, these traditional methods have their limitations and disadvantages. Browsing is not suitable for locating particular items of data, because following numerous links often results in getting lost, not to mention that this method is very time consuming. Keyword searching can be more efficient, but often returns vast amounts of data. Data on internet are not structured as e.g. in databases. Simply said - all data are there on net, but to gather and format them in desired way and often enough is beyond human capabilities.

This is when data extraction comes to place with ability to retrieve data from web, transform them and transfer them in desided way to websites, xml files, databases, spreadsheets etc.

Websites are different. Data are presented differently and structure of data changes heavily from one website to another. Additionally, websites use different encoding and different HTML elements to display their content. This is first challenge for web extraction tools: ability to extract data from various resources.

Traditional approach for extracting data from web is to write specialize programs called wrappers. These programs are specific for each extraction. They in fact map data from websites, transform them and provide them for output (which can be another website, XML file, database, spreadsheets, input for 3rd party application, vitually any structured source of data) Main challenges for wrappers is that they need to distinguish between interesting(e.g. data we want to extract) and uninteresting data (such as code snippets, links, ads etc.), handle mutli-hierarchical, non-rigid structure of how are data presented on web.

There are many shortcomings of developing wrappers manually mainly due to time-consuming development and maintenance. That's why sophisticated software such as Unit Miner uses scripting language, that is able to radically shorten development process. Another advantage is higher flexibility - scripting language is able to handle minor modifications in document structure without any maintenance work. Our concern is to be able to offer data extraction solution for reasonable price - that's why we created basic templates that help to develop custom web data extraction application even faster.

Web data extractor.

Next step