ASP.NET Data Scraping

Something that has been annoying me for a while on a hobby website is the process of maintaining a points table of F1 drivers. Recently I thought there had to be a way to pull the drivers standings from the official website; there was.

Using the WebClient class, String functions and some Regex I was able to pull down the drivers standings from the official website… nice! Here is the code, very easy to do! Enjoy 😉

Default.aspx:

<%@ Page Language=”C#” AutoEventWireup=”true” CodeFile=”Default.aspx.cs” Inherits=”_Default” %>

<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>

<html xmlns=”http://www.w3.org/1999/xhtml”>

<head runat=”server”>

    <title></title>

</head>

<body>

    <form id=”form1” runat=”server”>

        <asp:Label ID=”OfficialF1DriverStandings” runat=”server” Text=”Label”></asp:Label>

    </form>

</body>

</html>

Default.aspx.cs:

using System;

using System.Net;

using System.Text.RegularExpressions;

public partial class _Default : System.Web.UI.Page

{

protected void Page_Load(object sender, EventArgs e)

{

WebClient wc = new WebClient();

string html = wc.DownloadString(“http://www.formula1.com/results/driver/2011/”);

int startDriversPointsIndex = html.IndexOf(“<div class=”contentContainer”>”);

int endDriversPointsIndex = html.IndexOf(“</div>”, startDriversPointsIndex);

int lengthOfDriverPointsContent = (endDriversPointsIndex - startDriversPointsIndex);

string officialF1DriverStandingsWithLinks = html.Substring(startDriversPointsIndex, lengthOfDriverPointsContent);

string officialF1DriverStandings = Regex.Replace(officialF1DriverStandingsWithLinks, “(?i)<as+href[^>]+>|</a>”, “”, RegexOptions.IgnoreCase);

OfficialF1DriverStandings.Text = officialF1DriverStandings;

}

}

Result:

Advertisements

Published by

Alan Feekery

Developer, Gamer, Musician, Cyclist and big Motorsport fan... enjoys the odd cup of coffee :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s