I ordered some tools and parts yesterday, from Adafruit. The parts are just more of the same: lights, resistors, etc. The tool is a USBtinyISP, an in-system/in-circuit programmer for most of the the 8-bit Atmel microcontrollers.
The USBtinyISP is $22, compared to the official Atmel USB ISP mk2, at $33. Neither supports JTAG or debugWire, but I find it hard to complain at those prices. They seem comparable at the feature level. The Atmel AVRISP mk2 is potentially faster, since Lady Ada's design uses a software USB implementation, which only runs at USB low-speed mode, or 1.5Mbps. The Atmel ISP is full speed, although programming only takes a few seconds either way.
The Adafruit design does support some bit-banging operations, though, and the ISP is based on a generic ATtiny 2313, so I can ransack and pillage the firmware as necessary. Perhaps debugWire can be added on this way.
I'll have a tidy review up after I receive it and get a chance to get down to business with it.
With the ISP, I can grow a bit away from the Arduino projects. I'm looking forward to programming bare ATmega and ATtiny microcontrollers, and making them do my bidding!
A sort of drunkards walk of the IT industry, geeky fun, science and whatever other trivia catches my interest.
Tuesday, July 7, 2009
Monday, June 15, 2009
Low Rent Oscilloscope
A post found via Hackedgadget gives a description and code for an Arduino based oscilloscope. It looks pretty cool, and I must confess I certainly need an oscilloscope (a logic analyzer would be awesome, too) so I had a look. Before delving too deep here, have a look at the original post, please.
As discussed here before, the Arduino is based one of the Atmel ATmega-8 family of chips. The various ATmega-8 family members all seem to use the same analog-to-digital (A/D) conversion module, which can sample up to 10-bits of resolution, at sample rates from somewhere between DC and 75kHz, depending on resolution.
The oscilliscope code uses basic Arduino calls to read from the analog pin, and send the data out the Arduino serial port. The Arduino analog input functions don't allow you to specify a rate or resolution. You get 10-bits, and you get one sample per call. The Arduino docs say it takes about 100µS to get a sample, so about 10kHz sample rate, for a Nyquist frequency of 5kHz. So assuming you can get your data out of the chip fast enough, we have an oscilliscope with a 5kHz bandwidth. Honestly, that isn't so bad for $25 worth of parts and a few dozen lines of code.
Unforunately, there's some other problems. The code sets up the serial port at 9600 bps. RS232 serial port transmits (typically) 10 bits per byte. The code sends two bytes to get all 10 bits of data across.
9600/10 = 960 bytes per second
960/2 = 480 samples per second
480/2 = Nyquist frequency of 240Hz.
So this 'scope has a bandwidth of 240Hz. Despite this low frequency, this is still useful! If you re-coded both ends to support multiple analog pins you could use it as a logic analyzer for step-by-step debugging, etc. You can use it for measuring the kind of low-frequency stuff that's handy around the house. Like, you know, 50- or 60Hz house wiring.
But some comments in the original post say that they've bumped the serial rate up to 38400 bps, for a bandwidth of 960Hz. The comments indicate that they've had some trouble going beyond that. The A/D converter should have no trouble sampling beyond those rates, so it may be a matter of pipelining the conversions and the serial sends appropriately, or of modifying the underlying Arduino code (or writing your own handlers) to reduce the resolution, switch to interrupt-based A/D, etc.
I don't see many situations for a low-rent solution like this where one would need more than 8-bits of resolution. That would automatically double your sample rate over the serial port, and reduce the A/D settling time. Faster sampling, faster sending. With this (hopefully) simple change, we should see a bandwidth of 1.92kHz with a serial rate of 38400.
Additionally, the ATmega seems to have 64 bytes of buffer on the hardware UART. There seems to be no reason we couldn't use the A/D in interrupt mode, and have it stuff bytes in the buffer, for automatic sending. Careful balance of sample rate and serial rate would have to be managed. Oh, and the Arduino code environment doesn't directly support it...
As discussed here before, the Arduino is based one of the Atmel ATmega-8 family of chips. The various ATmega-8 family members all seem to use the same analog-to-digital (A/D) conversion module, which can sample up to 10-bits of resolution, at sample rates from somewhere between DC and 75kHz, depending on resolution.
The oscilliscope code uses basic Arduino calls to read from the analog pin, and send the data out the Arduino serial port. The Arduino analog input functions don't allow you to specify a rate or resolution. You get 10-bits, and you get one sample per call. The Arduino docs say it takes about 100µS to get a sample, so about 10kHz sample rate, for a Nyquist frequency of 5kHz. So assuming you can get your data out of the chip fast enough, we have an oscilliscope with a 5kHz bandwidth. Honestly, that isn't so bad for $25 worth of parts and a few dozen lines of code.
Unforunately, there's some other problems. The code sets up the serial port at 9600 bps. RS232 serial port transmits (typically) 10 bits per byte. The code sends two bytes to get all 10 bits of data across.
9600/10 = 960 bytes per second
960/2 = 480 samples per second
480/2 = Nyquist frequency of 240Hz.
So this 'scope has a bandwidth of 240Hz. Despite this low frequency, this is still useful! If you re-coded both ends to support multiple analog pins you could use it as a logic analyzer for step-by-step debugging, etc. You can use it for measuring the kind of low-frequency stuff that's handy around the house. Like, you know, 50- or 60Hz house wiring.
But some comments in the original post say that they've bumped the serial rate up to 38400 bps, for a bandwidth of 960Hz. The comments indicate that they've had some trouble going beyond that. The A/D converter should have no trouble sampling beyond those rates, so it may be a matter of pipelining the conversions and the serial sends appropriately, or of modifying the underlying Arduino code (or writing your own handlers) to reduce the resolution, switch to interrupt-based A/D, etc.
I don't see many situations for a low-rent solution like this where one would need more than 8-bits of resolution. That would automatically double your sample rate over the serial port, and reduce the A/D settling time. Faster sampling, faster sending. With this (hopefully) simple change, we should see a bandwidth of 1.92kHz with a serial rate of 38400.
Additionally, the ATmega seems to have 64 bytes of buffer on the hardware UART. There seems to be no reason we couldn't use the A/D in interrupt mode, and have it stuff bytes in the buffer, for automatic sending. Careful balance of sample rate and serial rate would have to be managed. Oh, and the Arduino code environment doesn't directly support it...
Sunday, June 14, 2009
Project Complete(?)
Finished writing the code for my little project. the Arduino needed a couple tweaks to correct some code flaws: One typo, and one poor design choice. Then I tweaked the serial output code on my Processing front end. Now I have a three-slider UI to control the 24-bit RGB output of one I2C controlled LED.
150 lines of Processing code (I could get this down by about a third with some tuning, but who cares? It's Java, it's supposed to be bloated.)
50 lines of code in the Arduino, although all the handy little Arduino libraries that make the C/C++ so compact mean the binary on the controller is 3700+ bytes long.
Pictures and video later.
150 lines of Processing code (I could get this down by about a third with some tuning, but who cares? It's Java, it's supposed to be bloated.)
50 lines of code in the Arduino, although all the handy little Arduino libraries that make the C/C++ so compact mean the binary on the controller is 3700+ bytes long.
Pictures and video later.
Saturday, June 13, 2009
Arduino Project I
I've spent the most of the evenings of the last week working on code, and experimenting with I2C communication between my Arduino and one of the BlinkMs I bought. (The other one is defective, and I'm not getting much response from Sparkfun, the company I bought it from, in getting it replaced. I'm going to start sending angry email next week.)
Having nailed down the I2C communications, I started working on a simple user interface, written in Processing. The UI displays a set of simple sliders that represent the level of each of the primary additive colors, red, green and blue. The sliders run from 0 to 254 (255 is a special marker character in the serial protocol I wrote for my Processing UI and my Arduino code to talk to each other). It also has a box in the UI panel that shows an approximation of the color. Whenever you adjust the sliders, the preview box updates, and the update string is sent out the serial port to the arduino, then the arduino sends the appropriate color change command over I2C to the BlinkM.
I'll probably adapt the Arduino code to directly run a common-cathode LED off the PWM pins as well. I can update the protocol and the UI to run both the BlinkM and the direct LED at once, with two different colors. Longer term will be to code Arduino light scripts, a sequencer (I don't like the one that is supplied with the BlinkM) etc.
Having nailed down the I2C communications, I started working on a simple user interface, written in Processing. The UI displays a set of simple sliders that represent the level of each of the primary additive colors, red, green and blue. The sliders run from 0 to 254 (255 is a special marker character in the serial protocol I wrote for my Processing UI and my Arduino code to talk to each other). It also has a box in the UI panel that shows an approximation of the color. Whenever you adjust the sliders, the preview box updates, and the update string is sent out the serial port to the arduino, then the arduino sends the appropriate color change command over I2C to the BlinkM.
I'll probably adapt the Arduino code to directly run a common-cathode LED off the PWM pins as well. I can update the protocol and the UI to run both the BlinkM and the direct LED at once, with two different colors. Longer term will be to code Arduino light scripts, a sequencer (I don't like the one that is supplied with the BlinkM) etc.
Tuesday, June 2, 2009
Arduino Pazzia
I'm going to start posting my microcontroller adventures here, since these is an appropriate place to do it.
So about a month ago, I bought an Arduino from the Maker Shed, since it was on sale.
The Arduino is a broadly open-source project. The circuit board is open source, and available in just about any format, in several different types. The software is open source, and even includes a friendly and easy to learn development environment.
The core chip in the Arduino family of boards is the Atmel ATmega. The ATmega is an 8-bit microcontroller family. It's both low-power and high-performance. It's available in several versions with different amounts of flash and RAM on-board. The board I have uses the ATmega 328P, which has 32KB of flash and 2KB of SRAM.
Almost all of the pins on the chip are multipurpose IO pins. Some of them can be analog inputs, digital inputs, digital outputs, or PWM outputs with 8 bits of PWM control.
I've been playing around with the analog and digital output side with some LEDs. I did the usual Arduino "hello world" program, which is just to connect a single LED to an output and ground, and make it blink. (Most of the Arduinos actually have an LED and resistor on one output for this sort of thing)
I bought a dozen high-brightness green LEDs, and connected 6 of them up to outputs on the Arduino.
I bought a couple RGB LEDs, which I already posted about. After a great deal of experimenting, I found a mix of resistors that got the colors balanced and workable. I've got one of them running a constant color changer, slowly sweeping each of the colors through a 100-step sine wave, with three different frequencies (all relatively prime) so that it eventually sweeps through all 24 million color combinations. Some of the stuff it does is really pretty.
I'm going to hook up the second LED, and have two! I also ordered some tiny LED modules that have on-board power control and their own tiny processor to run scripted light shows. They're controlled by the Arduino via I2C, which the Arduino environment supports natively.
I also ordered some real common-cathode RGB LEDs to play with. Much easier to deal with then these cheap-ass Radio Shack common-anode LEDs.
I've also found a 4-channel I2C controlled DAC with current controlled outputs, etc. It'd be great for an even cheaper Arduino-controlled LED light show sort of thing. I need to get some samples and see how cheap and small I can build such a thing.
So about a month ago, I bought an Arduino from the Maker Shed, since it was on sale.
The Arduino is a broadly open-source project. The circuit board is open source, and available in just about any format, in several different types. The software is open source, and even includes a friendly and easy to learn development environment.
The core chip in the Arduino family of boards is the Atmel ATmega. The ATmega is an 8-bit microcontroller family. It's both low-power and high-performance. It's available in several versions with different amounts of flash and RAM on-board. The board I have uses the ATmega 328P, which has 32KB of flash and 2KB of SRAM.
Almost all of the pins on the chip are multipurpose IO pins. Some of them can be analog inputs, digital inputs, digital outputs, or PWM outputs with 8 bits of PWM control.
I've been playing around with the analog and digital output side with some LEDs. I did the usual Arduino "hello world" program, which is just to connect a single LED to an output and ground, and make it blink. (Most of the Arduinos actually have an LED and resistor on one output for this sort of thing)
I bought a dozen high-brightness green LEDs, and connected 6 of them up to outputs on the Arduino.
- I got them doing the Cylon/Knight Rider LED sweep.
- I got them doing a sine wave (at several frequencies and phases)
- I got them doing a cool analog sweep back and forth
- I made them into a 60-second binary timer. Tick tick tick tick...
I bought a couple RGB LEDs, which I already posted about. After a great deal of experimenting, I found a mix of resistors that got the colors balanced and workable. I've got one of them running a constant color changer, slowly sweeping each of the colors through a 100-step sine wave, with three different frequencies (all relatively prime) so that it eventually sweeps through all 24 million color combinations. Some of the stuff it does is really pretty.
I'm going to hook up the second LED, and have two! I also ordered some tiny LED modules that have on-board power control and their own tiny processor to run scripted light shows. They're controlled by the Arduino via I2C, which the Arduino environment supports natively.
I also ordered some real common-cathode RGB LEDs to play with. Much easier to deal with then these cheap-ass Radio Shack common-anode LEDs.
I've also found a 4-channel I2C controlled DAC with current controlled outputs, etc. It'd be great for an even cheaper Arduino-controlled LED light show sort of thing. I need to get some samples and see how cheap and small I can build such a thing.
Monday, March 23, 2009
ZFS second look
I thought I'd see if I could improve performance by compiling my own ZFS. I pulled the source from the ZFS-Fuse site, edited the make files for O3 instead of O2. Byte rates jumped to 90MB/sec at only 45% CPU. I don't know what options the original APT package was compiled with, but mine's faster. The binaries are four times the size of the originals, but significantly faster.
Sunday, March 22, 2009
ZFS on Linux
ZFS on linux. I love ZFS, I do. There's no argument there.
But currently in runs in userspace on Linux, and performance is terrible. Copying from one non-ZFS disk to a new single-disk ZFS volume hit 50MB/sec, which isn't bad, but which soaked up 60% CPU on a 2.0GHz dual-core Core 2 Celeron.
Unfortunately LVM and MD don't do what I want, so I think ZFS is still the direction I'm going to go. Hopefully performance will improve in future releases. Or they'll find a way to get ZFS in the kernel.
I could install OpenSolaris, but my Solaris admin skills are beyond rusty. Between that and shitty driver support, OpenSolaris makes it pretty much a no-go.
But currently in runs in userspace on Linux, and performance is terrible. Copying from one non-ZFS disk to a new single-disk ZFS volume hit 50MB/sec, which isn't bad, but which soaked up 60% CPU on a 2.0GHz dual-core Core 2 Celeron.
Unfortunately LVM and MD don't do what I want, so I think ZFS is still the direction I'm going to go. Hopefully performance will improve in future releases. Or they'll find a way to get ZFS in the kernel.
I could install OpenSolaris, but my Solaris admin skills are beyond rusty. Between that and shitty driver support, OpenSolaris makes it pretty much a no-go.
Friday, January 2, 2009
A Recent History of High Performance Computing
"High Performance Computing" is traditionally the purview of universities, governments and Big Science. The difference between what most people have at home and high-performance computing (HPC) is the difference between bicycles and the space race.
But the scene has been changing for quite some time. Clusters of cheap desktops running Linux and one or another parallel-computing software interfaces were the first to come. A showcase example was Oak Ridge National Labs "Stone Souper Computer", put together from surplus PCs and components that would otherwise have been thrown away. It was used for real work, doing the computational heavy lifting for several ecological modeling projects. Over time clustering became the primary means of acheiving high performance. A quick look at the TOP500 list shows just how prevalent clustering is.
Over the last few years, multicore CPUs have expanded this in a somewhat new direction. First, by doubling or quadrupling the number of CPU cores in a computing node, and later by adding highly-parallel processors to the clusters. By highly-parallel, I refer to the TOP500's current #1 supercomputer, the Roadrunner, which combines standard multicore AMD Opteron processors (sibling to the Athlon 64 processors common in desktop computers) with IBM PowerXCell processors, which are a slightly modified version of the Cell processor that powers every Sony Playstation3. The PowerXCell features one normal PowerPC processor, and 8 "Synergistic Processing Engines," essentially smaller sub-processors which focus on simple parallelizable tasks.
With the recent attention of the HPC market seeing the utility of many small, simple processing cores, many turned their attention to modern graphics processing units (GPUs). Modern GPUs contain dozens or even hundreds of small processing pipelines, optimized for the types of math and operations that 3D graphics require. As time progressed, these 3D graphics, especially games, required more complex computation, to such a degree that each of the pipelines started to resember a general purpose, if graphics-optimized, CPU.
Soon, hobbyists and programmers started taking advantage of this, programming their GPUs to perform computations that would normally run on a CPU. By moving the highly-parallel parts of their computation to the GPU, they could perform dozens of operations in the time that a single CPU could do one.
ATI (now AMD) and nVidia smelled money. ATI released some of their low-level programming interfaces, and a set of software extensions that allowed programmers easier access to the GPU. nVidia developed a whole programming language called CUDA, that allows developers to write C code directly for their GPUs. Now nVidia even has a version of their most powerful graphics card, with all the graphics hardware removed. Called "Tesla", it's a pure computation engine. You can fill your PC with as many Tesla cards as you have slots to stick 'em in. They have dedicated chassis full of Tesla cards with dedicated high-performance connections to PCs. Tokyo Tech University has begun adding Tesla units to their Tsubame supercomputer. The US National Center for Atmospheric Research has begun testing Tesla for accelerating particularly obnoxious computation, and found significant gains.
And really, this is sort of taking us full circle. Some of the dead technological offshoots in computing's past looked in this direction. Many hundreds, even thousands of small processors were used in the Connection Machines computers, each processor being only one bit wide. INMOS developed a radically different type of computing with their transputer, each chip a small microprocessor with a small amount of RAM and several inter-transputer network links, they were designed from the ground up as massively parallel computing engines.
The Connection Machines started with the ideas introduced in variable-width "bit-slice" processors of their predecessors, and took them to their logical extreme: A machine an arbitrary number of bits wide, and code-reconfigurable. It proved to be unsuccessful in implementation, though. With their 5th generation CM-5, Connection Machines designed a large parallel machine powered by up to 512 Sun SPARC processors
The INMOS Transputer is the evolutionary forebear to today's massively parallel supercomputers, though. Each transputer was a small processor, memory and enough glue logic to allow the device to stand mostly on its own. With network links to multiple other transputers on the same board, in the same case, or even spread throughout multiple cases, a cluster of transputers could appear to be a single virtual system, with individual threads running on each transputer. It was only ahead of its time in so far as there was a great deal of headroom left in getting higher performance out of traditional computing, and programming for parallel computing is hard.
Today, the modern single-threaded microprocessor has hit a performance barrier. Power requirements and the laws of decreasing returns have made it difficult to wring higher performance out of more transistors and higher clockspeeds. To improve returns on Moore's Law, today's CPU makers divided their transistor budget across two or more fully functional CPU cores on a single die. Dual-core CPUs are commonplace, and quad-core CPUs are about to become so, as well.
With the ubiquity of multicore computing and the aforementioned implicitly parallel nature of graphics processing, parallel programming has been thrust into the forefront of development. In order to achieve adquate performance for any demanding task, developers now must parallelize their computing tasks. What was a good idea in hardware, and optional in software before, has become ubiquitous in hardware, and required in software now.
But the scene has been changing for quite some time. Clusters of cheap desktops running Linux and one or another parallel-computing software interfaces were the first to come. A showcase example was Oak Ridge National Labs "Stone Souper Computer", put together from surplus PCs and components that would otherwise have been thrown away. It was used for real work, doing the computational heavy lifting for several ecological modeling projects. Over time clustering became the primary means of acheiving high performance. A quick look at the TOP500 list shows just how prevalent clustering is.
Over the last few years, multicore CPUs have expanded this in a somewhat new direction. First, by doubling or quadrupling the number of CPU cores in a computing node, and later by adding highly-parallel processors to the clusters. By highly-parallel, I refer to the TOP500's current #1 supercomputer, the Roadrunner, which combines standard multicore AMD Opteron processors (sibling to the Athlon 64 processors common in desktop computers) with IBM PowerXCell processors, which are a slightly modified version of the Cell processor that powers every Sony Playstation3. The PowerXCell features one normal PowerPC processor, and 8 "Synergistic Processing Engines," essentially smaller sub-processors which focus on simple parallelizable tasks.
With the recent attention of the HPC market seeing the utility of many small, simple processing cores, many turned their attention to modern graphics processing units (GPUs). Modern GPUs contain dozens or even hundreds of small processing pipelines, optimized for the types of math and operations that 3D graphics require. As time progressed, these 3D graphics, especially games, required more complex computation, to such a degree that each of the pipelines started to resember a general purpose, if graphics-optimized, CPU.
Soon, hobbyists and programmers started taking advantage of this, programming their GPUs to perform computations that would normally run on a CPU. By moving the highly-parallel parts of their computation to the GPU, they could perform dozens of operations in the time that a single CPU could do one.
ATI (now AMD) and nVidia smelled money. ATI released some of their low-level programming interfaces, and a set of software extensions that allowed programmers easier access to the GPU. nVidia developed a whole programming language called CUDA, that allows developers to write C code directly for their GPUs. Now nVidia even has a version of their most powerful graphics card, with all the graphics hardware removed. Called "Tesla", it's a pure computation engine. You can fill your PC with as many Tesla cards as you have slots to stick 'em in. They have dedicated chassis full of Tesla cards with dedicated high-performance connections to PCs. Tokyo Tech University has begun adding Tesla units to their Tsubame supercomputer. The US National Center for Atmospheric Research has begun testing Tesla for accelerating particularly obnoxious computation, and found significant gains.
And really, this is sort of taking us full circle. Some of the dead technological offshoots in computing's past looked in this direction. Many hundreds, even thousands of small processors were used in the Connection Machines computers, each processor being only one bit wide. INMOS developed a radically different type of computing with their transputer, each chip a small microprocessor with a small amount of RAM and several inter-transputer network links, they were designed from the ground up as massively parallel computing engines.
The Connection Machines started with the ideas introduced in variable-width "bit-slice" processors of their predecessors, and took them to their logical extreme: A machine an arbitrary number of bits wide, and code-reconfigurable. It proved to be unsuccessful in implementation, though. With their 5th generation CM-5, Connection Machines designed a large parallel machine powered by up to 512 Sun SPARC processors
The INMOS Transputer is the evolutionary forebear to today's massively parallel supercomputers, though. Each transputer was a small processor, memory and enough glue logic to allow the device to stand mostly on its own. With network links to multiple other transputers on the same board, in the same case, or even spread throughout multiple cases, a cluster of transputers could appear to be a single virtual system, with individual threads running on each transputer. It was only ahead of its time in so far as there was a great deal of headroom left in getting higher performance out of traditional computing, and programming for parallel computing is hard.
Today, the modern single-threaded microprocessor has hit a performance barrier. Power requirements and the laws of decreasing returns have made it difficult to wring higher performance out of more transistors and higher clockspeeds. To improve returns on Moore's Law, today's CPU makers divided their transistor budget across two or more fully functional CPU cores on a single die. Dual-core CPUs are commonplace, and quad-core CPUs are about to become so, as well.
With the ubiquity of multicore computing and the aforementioned implicitly parallel nature of graphics processing, parallel programming has been thrust into the forefront of development. In order to achieve adquate performance for any demanding task, developers now must parallelize their computing tasks. What was a good idea in hardware, and optional in software before, has become ubiquitous in hardware, and required in software now.
Subscribe to:
Posts (Atom)