PowerShell for Unix nerds
(This post was inspired by a question on ServerFault)
Windows has had an increasingly useful scripting language since 2006 in PowerShell. After Microsoft apparently fell in love with backend developers a while back, they’ve even ported its modern version to GNU/Linux and macOS. This is actually a big deal for us who prefer our workstations to run Unix but have Windows servers to manage on a regular basis.
Coming from a background in Unix shell scripting, how do we approach the PowerShell mindset? Theoretically it’s simple to say that Unix shells are string-based while PowerShell is object oriented, but what does that mean in practice? Let me try to present a concrete example to illustrate the difference in philosophy between the two worlds.
We will parse some system logs on an Ubuntu server and on a Windows server respectively to get a feel for each system.
Task 1, Ubuntu
The first task we shall accomplish is to find events that occur between 04:00 and 04:30 every morning.
In Ubuntu, logs are regular text files. Each line clearly consists of predefined fields delimited by space characters. Each line starts with a timestamp with the date followed by the time in hh:mm:ss format. We can find anything that happens during the hour “04” of any day in our retention period with a naïve grep for ” 04:”:
(Note that I use zgrep to also analyze the archived, rotated log files.)
Of course this particular search results in twice as much data to sift through as we originally wanted, which may be cumbersome on a busy server. Let’s complement our commands with some simple regular expressions to filter the results:
Mission accomplished: We’re seeing all system log events between 04:00:00 and 04:29:59 for each day stored in our log retention period. To clarify the command, each bracket represents one position in our search string and defines the valid characters for this specific position.
Bonus knowledge:
[0-9]
can be substituted with \d
, which translates into “any digit”. I used the longer form here for clarity.
Task 2, Ubuntu
Now let’s identify the process that triggered each event. We’ll look at a line from the output of the last command to get a feeling for how to parse it:
This can be translated into a general form:
<filename>:<MMM DD hh:mm:ss> <hostname> <procname[procID]>: <message>
Let’s say we want to filter the output from the previous command and only see the process information and message. Since everything is a string, we’ll pipe grep to a string manipulation command. This particular job looks like a good use case for GNU cut. With this command we need to define a delimiter, which we know is a space character, and then we need to count spaces in our log file format to see that we’re interested in what corresponds to ”fields” number 5 and 6. The message part of each line, of course, may contain spaces, so once we reach that field we’ll want to show the entire rest of the line. The required command looks like this:
Now let’s do the same in Windows:
Task 1, Windows
Again our task is to find events between 04:00 and 04:30 on any day. As opposed to our Ubuntu server, Windows treats each line in our log as an object, and each field as a property of that object. This means that we will get no results at best and unpredictable results at worst if we treat our log as a searchable mass of text.
Let’s look at two examples that won’t work and why:
Wrong answer 1
This looks nice, but it implicitly only gives us log events between the given times this current day.
Wrong answer 2
Windows can use regular expressions just fine in this context, so that’s not the issue here. What’s wrong is that we’re searching the actual object instance for the pattern; not the contents of the object’s properties.
Right answer
If we remember that Powershell works with objects rather than plain text, the conclusion is that we should be able to query for properties within each line object. Enter the where
or ?
command:
What did we do here?
The first few characters after the pipe can be read as “For each line check whether this line’s property “Time Generated” matches…“.
One of the things we “just have to know” to understand what happened here, is that the column name “Time” in the output of the Get-EventLog
command doesn’t represent the actual name of the property. Looking at the output of get-eventlog | fl
shows us that there’s one property called “TimeWritten”, and one property called “TimeGenerated”. We’re naturally looking for the latter one.
This was it for the first task. Now let’s see how we pick up the process and message information in PowerShell.
Task 2, Windows
By looking at the headers from the previous command, we see that we’re probably interested in the Source and Message columns. Let’s try to extract those:
The only addition here, is that we call the Format-Table
cmdlet for each query hit and tell it to include the contents of the Source and the Message properties of the passed object.
Summary
PowerShell is different from traditional Unix shells, and by trying to accomplish a specific task in both we’ve gained some understanding in how they differ:
- When piping commands together in Unix, we’re sending one command’s string output to be parsed by the next command.
- When piping cmdlets together in PowerShell, we’re instead sending entire objects with properties and all to the next cmdlet.
Anyone who has tried object oriented programming understands how the latter is potentially powerful, just as anyone who has “gotten” Unix understands how the former is potentially powerful. I would argue that it’s easier for a non-developer to learn Unix than to learn PowerShell, that Unix allows for a more concise syntax than PowerShell, and that Unix shells execute commands faster than PowerShell in many common cases. However I’m glad that there’s actually a useful, first-party scripting language available in Windows.
Getting things done in PowerShell is mainly a matter of turning around and working with entire properties (whose values may but needn’t necessarily be strings) rather than with strings directly.