Archive for the 'Utilities' Category

SkyDrive: Why aren’t thou nicer to me? or “How to crawl pages and download all images using C#”

Ok, SkyDrive is still a baby, and personally I’ve used other services providing file space in the cloud and enjoyed them a little better. But this post isn’t about if SkyDrive is good or bad, it’s just about a missing feature that is very painful. Someone wanted to share some fotos, uploaded them to SkyDrive and all I wanted was to download them all to my PC. Tough look, you can click on each and every image to get to the preview page, where you click on the preview picture to then finally get at the actual picture. Multiply that by about 100. I have better things to do than waste my time on that.

So a Dev does what he does best, fires up Visual Studio 2008 and hacks away (did I just say I had something better to do - well I lied partially, but before I go off to do that, there is always time for some good ol’ C#).

I’ve posted it here not as a finished utility (there are no binaries) but as a small sample. Using WebClients, RegEx and some other stuff it downloads the list page of the SkyDrive folder, fetches the preview page and then downloads the actual image to a folder on the hard disk. Not really rocket science and of course there are a few quirks (no real error handling for example), but it’s just a sample. Feel free to extend as you wish, don’t blame me if it starts downloading Gigabytes of files overnight, because you accidentally crawled a HoneyPot. (And yes, it only downloads jpgs at the moment. I didn’t need any other types.)

May those SkyDrive bytes be with you…


/**********************************************************************************
 *
 * Example Application for crawling web pages and downloading images.
 *
 * This code works if you pass in a SkyDrive Folder Url (http://.... /browse.aspx/...)
 * and will download any jpg images it finds in there.
 *
 * Permission to use, copy, modify, distribute and sell this software and its
 * documentation for any purpose is hereby granted without fee.
 * I make no representations about the suitability of this software for any purpose.
 * It is provided "as is" without express or implied warranty.
 *
 * Alex Duggleby - 24.05.08 - V0.9 - http://alexduggleby.com
 *
 **********************************************************************************/
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.Text.RegularExpressions;
using System.IO;
using System.Web;
using System.ComponentModel;

namespace Tools.SkyDrive.DownloadAll
{
    class Program
    {
        // Used for tracking how many items we have left
        private static int _wcInnerCount = 0;
        private static int _wcInnerCompleted = 0;

        // We have to start somewhere
        private static Uri _uriStart;

        // Work we have already done
        private readonly static List<string> _urisCrawled = new List<string>();
        private readonly static List<string> _imagesDownloaded = new List<string>();

        // Download images to?
        private readonly static DirectoryInfo _diDownloadTo = new DirectoryInfo(Path.Combine(Path.Combine(System.Environment.GetFolderPath(Environment.SpecialFolder.Personal), "Downloads"),"Images"));

        // This finds urls in the page
        private readonly static Regex _regexUrl = new Regex("href\\s*=\\s*(?:(?:\\\"(?<url>[^\\\"]*)\\\")|(?<url>[^\\s]* ))");

        // This finds the open url in the image page
        private readonly static Regex _regexUrlOpen = new Regex("href\\s*=\\s*(?:(?:\\\"(?<url>[^\\\"]*)\\\")|(?<url>[^\\s]*)) title=\\\"Open\\\""); 

        /// <summary>
        /// Takes the url to a skydrive folder page and downloads all jpg images.
        /// </summary>
        static void Main(string[] args)
        {
            // Usage check
            if (args.Length != 1)
            {
                Console.WriteLine("Usage: App.exe http://theUrlToThe/SkyDrive/FolderPage");
                return;
            }

            try
            {
                // First parameter is url
                _uriStart = new Uri(args[0]);
            }
            catch (Exception _ex)
            {
                Console.WriteLine("Invalid Url. " + _ex.Message);
                return;
            }

            // Make sure download directory exists
            if (!_diDownloadTo.Exists) _diDownloadTo.Create();

            using (WebClient _wc = new WebClient())
            {
                // This is the index with all the images
                string _pageContents = _wc.DownloadString(_uriStart);

                // Each image has a preview page, so we get the url to that, before we get the url to the actual image
                foreach (Match _matchUrlToImagePage
                    in _regexUrl.Matches(_pageContents))
                {
                    Uri _uriToImagePage =
                        new Uri(_uriStart, HttpUtility.HtmlDecode(_matchUrlToImagePage.Groups["url"].Value));

                    CrawlPreviewPage(_uriToImagePage);
                }
            }

            // Wait for the async web clients to complete...
            while (_wcInnerCompleted < _wcInnerCount)
            {
                Console.WriteLine("Wait for images to complete...");
                Console.ReadLine();
            }

            Console.WriteLine("Should be finished!");
            Console.ReadLine();
        }

        /// <summary>
        /// Parses the preview page and finds the actual image link
        /// </summary>
        /// <param name="uriToImagePage">The url to the preview page</param>
        /// <returns></returns>
        private static void CrawlPreviewPage(Uri uriToImagePage)
        {
            using (WebClient _wc = new WebClient())
            {
                if (!_urisCrawled.Contains(uriToImagePage.ToString()))
                {
                    _urisCrawled.Add(uriToImagePage.ToString());

                    if (uriToImagePage.ToString().ToLower().EndsWith(".jpg"))
                    {
                        string _pageContents = _wc.DownloadString(uriToImagePage);

                        // Find the image we want to download... There should be
                        // only one link with title="Open" in it.
                        foreach (Match _matchImage in _regexUrlOpen.Matches(_pageContents))
                        {
                            Uri _uriToImage = new Uri(_matchImage.Groups["url"].Value);

                            DownloadImage(_uriToImage);
                        }
                    }
                }
            }
        }

        /// <summary>
        /// Downloads async'ly an image from a Uri
        /// </summary>
        /// <param name="uriToImage">The uri to download</param>
        private static void DownloadImage(Uri uriToImage)
        {
            // Output the url
            Console.WriteLine("{0}{1}", uriToImage.ToString(), Environment.NewLine);

            if (!_imagesDownloaded.Contains(uriToImage.ToString()))
            {
                _imagesDownloaded.Add(uriToImage.ToString());
                string _lowerUrl = uriToImage.ToString().ToLower();

                // Simple checking
                if (_lowerUrl.EndsWith(".jpg") &&
                   (!_lowerUrl.Contains("browse")) &&
                   (!_lowerUrl.Contains("self")))
                {
                    // HtmlDecode here because some urls have encoded characters
                    string _localFilename = HttpUtility.HtmlDecode(
                        uriToImage.Segments[uriToImage.Segments.Length - 1]);

                    // Create a valid local filename
                    Path.GetInvalidPathChars().ToList().ForEach(
                        c => _localFilename = _localFilename.Replace(c, '_'));

                    Console.Write("Downloading {0}...{1}", _localFilename, Environment.NewLine);

                    // Create a seperate web client for each image (uses async, and you can't
                    // issue two downloads at the same time for the same client). Of course
                    // here we should be using some kind of pooling but this is the quickest
                    // way to do it.
                    using (WebClient _wcInner = new WebClient())
                    {
                        _wcInnerCount++;
                        _wcInner.DownloadFileAsync(uriToImage, Path.Combine(_diDownloadTo.ToString(), _localFilename));
                        _wcInner.DownloadFileCompleted += new AsyncCompletedEventHandler(_wcInner_DownloadFileCompleted);
                    }
                }
            }
        }

        // Is fired when a download complete. We output status and check if we are finished!
        private static void _wcInner_DownloadFileCompleted(object sender, AsyncCompletedEventArgs e)
        {
            // Increase the completed counter
            _wcInnerCompleted++;

            // Ok, we could do some more extensive checking, this could trigger
            // even if there are still items to download... but hey, it's just a
            // quick utility!
            if (_wcInnerCompleted == _wcInnerCount)
            {
                Console.WriteLine("{0}{1}{2}", Environment.NewLine, "Finished all files!", Environment.NewLine);
                Console.ReadLine();
            }
            else
            {
                Console.WriteLine("File {0} of {1} completed!", _wcInnerCompleted, _wcInnerCount);
            }
        }
    }
}

A Reusable Secure Login Form

For the impatient, source and binaries and quick explanation available at: http://code.msdn.microsoft.com/UtilsCommonViews/

How many of the last ten projects you implemented had a login form of some kind? Thanks to all those wonderful apis that you coded applications against, probably the majority of them did. Now try to remember how many of those login forms used strings for storing passwords in memory and how many saved the password to a settings file in plain text. I know most of my small utilities did and I’m not proud of it, but it happened and I’m almost sure you have some of those lying around in your src folder.

Anyway, today I had a couple of hours of spare time and decided to implement a reusable secure login form. My goals were to use SecureString for storing the password in memory and encrypt the username and password in a settings file for storing them between the sessions. I started off with a little research about the components I’d need to use.

SecureString is a class that was introduced in .NET Framework 2.0. Why not simply use a string for storing a password? Well let’s have the MSDN do the explaining:

An instance of the System.String class is both immutable and [...] cannot be programmatically [...] deleted from computer memory. Consequently [...] there is a risk the information [stored in it] could be revealed after it is used [...].

A SecureString object is similar to a String object in that it has a text value. However, the value of a SecureString object is automatically encrypted, can be modified until your application marks it as read-only, and can be deleted from computer memory by either your application or the .NET Framework garbage collector.

That’s a great scenario for our password. Next up I found this great secure textbox control that handles the user input into a SecureString object. The original control and source can de downloaded at: http://www.theglavs.com/DownloadItem.aspx?FileID=46. (Thanks to Glav and Dominik Zemp who created or extended the control.)

Then I wrote this simple extension method to get the characters of the contents of the SecureString because it can get quite dirty to do it inline. It uses requires execution of unmanaged code and some marshalling.


public static Char[] GetCharacters(this SecureString secureString)
{
	if (secureString == null)
		throw new ArgumentNullException("secureString");

	lock (secureString)
	{
		char[] _chars = new char[secureString.Length];
		IntPtr _ptrToChars = IntPtr.Zero
		try
		{
			_ptrToChars = Marshal.SecureStringToBSTR(secureString);
			Marshal.Copy(_ptrToChars, _chars, 0, secureString.Length);
		}
		finally
		{
			if (_ptrToChars != IntPtr.Zero)
				Marshal.ZeroFreeBSTR(_ptrToChars);
		}

		return _chars;
	}
}

The second feature was to encrypt the username and password in a file. I looked at encrypting the configuration file but decided against that path and went with encrypting the data using the ProtectedData API (in System.Security.Cryptography). It’s a simple call to the static method:


ProtectedData.Protect(byte[] data, byte[] entropy, DataProtectionScope);

The entropy specifies a kind of salt for your application and the DataProtectionScope can be set so that only the current user can decrypt the data or to local machine. The library will save a file UserData.bin to the local application directory (including the calling assemblies name). Additionally I decided to encrypt the file using the FileInfo.Encrypt method. This works only on NTFS, so the library will currently only work on NTFS systems. Now that I’ve written that sentence I think I’ll make that optional in the next release.Which brings me to the release and the source code. I want to try out MSDN code gallery so I’ve published the project here: http://code.msdn.microsoft.com/UtilsCommonViews/

Using the library is very simple:


LoginController _loginController = new LoginController();
_loginController.GetCredentials();

string _username = _loginController.Username;
char[] _password = _loginController.Password;
// _password needs to be zero'ed a.s.a.p. after usage

That last comment is important. If you don’t zero out that char array, the password will be floating around in memory.

Comments, suggestions are of course welcome! Here or in the MSDN code gallery.

Note: The project uses another utility library (Utils.Extensions) of mine that I’ve built-up over the last weeks that contains a few extension methods for common stuff I was missing in the BCL. The source is not clean enough yet for publishing but it’s in the pipe. For now it’s only available in this project as a release dll.

Tech-Ed: Download session files now

My last Tech-Ed report is still missing and will be coming shortly, but I wanted to share a quick tool that I wrote to download all of the session slides and things that were available while I was still at Tech-Ed. I thought using the network there might be faster than doing it at home, but I was wrong, so I ended up downloading it from here.

I know some companies can’t wait for the DVD to be delivered and want the slides now, so this tool will simply let you log in to your MS Events site, then press “Start” to download the files. It will parse the download sites and then download the files to a directory of your choice. It’s quick and dirty, but it works. There isn’t a tremendous amount of error handling and you will have to work by the process “Log-In” then press “Start”, but if you want to change something feel free to do so, a link back to here would be nice if you use or change anything.

For sake of having a license at all code and binaries are subject to the Common Publice License.

Download Binaries or Source.

Update: Some people have to login, then click on “PPTX files” and then press start to download the files. But you should only have to login and press start.

Technorati Tags:


Subscribe / Search

Imagine Cup 2009 - Egypt
msplogo_small.jpg
mcprgb.png

 

July 2008
M T W T F S S
« Jun    
 123456
78910111213
14151617181920
21222324252627
28293031  

Blog Stats

  • 10,824 hits