Example - news from WALL STREET JOURNAL in Chinese

HTML output retrieves all news from the left column at chinese.wsj.com. Output is refreshing each 15 minutes from cron.

 

 

Output:

Source code of script:

# File: expekt_main.w
# Name: WALL STREET JOURNAL in Chinese
# Description: output html retrieves all news from the left column at http://chinese.wsj.com/gb/strhrd.asp
# Input: URL [http://chinese.wsj.com/gb/strhrd.asp]
# Output format: HTML file
# Output fields: linked url, title, text(description)

#<Logger File>
#	Global
#	FileName wsj_log.log
#	# log all messages up to debug messages
#	Level debug
#</Logger>

<Section>
	Name wsj_main
	
	# define name of output file
	Define $output_file wsj_output.html
	
	
	
	# clean output file
	<Action Print>
		FileName {$output_file}
		FileMode Write  
	</Action>
	
	
	
	# load content    
	<Action ContentURL>
		URL http://chinese.wsj.com/gb/strhrd.asp
		RemoveNewLine
	</Action>
    
	<Action Php>
		Code $context->setVariable('$output', $context->getVariable('$output').'<head><meta http-equiv="Content-Type" content="text/html; charset=GB2312"></head>\n'); 
	</Action>
	
	# finds all dates
	<Section While>
		NoContext
		
		# pattern for linked url and title
		<Pattern>
			RegExp <h3 class="WSJChinaTheme__headline{:re([^"]*)}"><a class="" href="{$url:re([^"]*)}">{$title:re([^<]*)}</a></h3>
		</Pattern>
		
		# pattern of text under title
		<Pattern>
			Optional
			RegExp <p class="WSJChinaTheme__summary{:re([^"]*)}"><span>{$text:re([^<]*)}</span></p>
		</Pattern>
        
		<Action Php>
			Code $context->setVariable('$output', $context->getVariable('$output').$context->getVariable('$url').' - '.$context->getVariable('$title').'  - '.$context->getVariable('$text').' \n<br>'); 
		</Action>
	</Section> 
	
	<Action Print>
		FileName {$output_file}
		Text {$output}
	</Action>
</Section>

Main wsj_main