I will point out that tests to make general-to-specific characterizations of large, complex systems don't lend themselves well to double-blind testing concepts. I mean, would you have the MERs bring along a suite of various rock types with them to Mars, selected by a group of people who have no communication with the PIs, and have every measurement taken on Mars include this test suite, with the PIs not being informed of which set of results belonged to native Martian rock and which to the terrestrial samples?
As you see, the specific double-blind process doesn't lend itself to the work at hand. Not that I don't see a need for some way to try and reduce the Rosenthal effect.
That said, what I note strongly in the process of designing science payloads for planetary probes is that it seems to reward those who have developed very detailed models of their expected findings, and have thence designed their instruments to most effectively collect the expected data.
It seems as if any experiment proposal that includes the phrase, in any form, "We don't know what we'll find" is automatically rejected because of the possibility that, by not meeting some preselected expectation, the experiment runs a high risk of being viewed as a "failure."
That's a process that not only allows a fair amount of the Rosenthal effect, it fairly demands it. When you design your instruments to show you only what you expect to see, it's awfully hard to see those things that *are* there that you never expected.
One of the worst examples of this effect, I think, was the life detection suite aboard the Viking landers. They were designed to say Yes or No to a very specific (and very terrestrial) set of life-bounded conditions, so the PIs didn't look closely enough at what Maybe results might mean, or how they might be interpreted.
I think the worst unflown example of this effect would have to be a contender for the 2001 lander program who, if I'm remembering the details from Squyres' "Roving Mars" correctly, wanted to devote an entire science payload to positively identifying amino acids within the Martian regolith. That would have been a good portion of a billion dollars to answer what is probably not *nearly* the most useful question to be asking.
The spacecraft that suffered the least from this effect? IMHO, at least for fairly recent probes, I would say Stardust. Yes, the designers of the Stardust collectors had to make some assumptions about the size of the particles they were going for, and the density of particles in their collection location. But the whole point of Stardust was "Let's go grab some comet dust, bring it back, and then see whatever we see when we get our hands on it." That mission design, since it brought samples back to where a great multitude of tests could be run on them as appropriate, was able to follow a more simple paradigm of "grab what you can and then see what you've got."
It seems to me, though, that until we can bring samples back and have the luxury of running whatever tests on them that seem appropriate (to answer all the new questions that the the initial test results pose), you have to narrow your data collection based on some form of triage theory. You can't fly all of the tools you want to fly that would truly enable you to just follow up on what you find rather than looking for what you expect. That's a given, considering mass budgets and funding budgets.
So, you *have* to narrow the focus to what you can afford to place in situ. Granted, the current process encourages that narrowing more than it should... but I'm not coming up with any good ideas on how to change the process to reduce the Rosenthal effect.
-the other Doug