home
puzzles
genres
studio
help
about
links
forum


Forums
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
LITS Puzzle #1590 by mathgrant
Forum Index -> Puzzle Feedback
Author Message
mucha



Joined: 19/05/2009 17:40:58
Messages: 104
Offline

Number: Puzzle #1590
Genre: LITS
Author: mathgrant
Appeared at: November 16, 2009

It's a very nice puzzle, but I feel that the grading is very inconsistent. Puzzle no 1543 is way harder than this one (in fact it's one of the hardest LITS puzzles I ever solved) and it has the same amount of *. I know it is hard to make a good grading system for puzzles, perhaps it would be easier if the rating somehow automatically took into account the number of solvers, or the ratio of solvers to openers of other such quantities.

Marcin
mathgrant


[Avatar]

Joined: 19/08/2008 20:52:44
Messages: 71
Offline

While I'm not familiar with the grading system, when I view the statistics page for my own puzzle (as opposed to someone else's), it shows a "difficulty" thing at the bottom:

Difficulty proposed by the author (mathgrant): 0.5
Difficulty proposed by the judge (Bram): 0.44
Difficulty derived from user times (13 users) : 0.5589
Resulted difficulty: 0.4802

This means that the amount of time it takes for users to solve the puzzle plays a role in determining the displayed difficulty; sometimes the judge will place the proposed difficulty halfway between two star ratings, and let the user times determine the displayed difficulty.

I agree that #1543 is much more difficult than this one. Perhaps the puzzles having the same three-star difficulty is a weakness of having only five difficulties.

Grading the difficulty of a puzzle is a tough issue. :/

A Cleverly-Titled Logic Puzzle Blog
[WWW]
Johan


[Avatar]

Joined: 22/12/2006 20:08:51
Messages: 1046
Offline

The current method takes into account the impression of the puzzle maker, the judge and the solving time for registered members who solved it without printing or revealing. The author and the judge both count as 50 solvers to avoid volatility quickly after release. For difficult puzzles that less people successfully solve first time, indeed this doesn't always work so well.

I can think of several alternatives and adjustments, but they might also have drawbacks. As of now, the factor 50 might just be unreasonably large and this could be changed. Also we could take into account the ratio of people who solved it with and without revealing perhaps.

Puzzle 1543 is in fact very close to being rated 4 stars. A weighing factor lower than 50 would solve this; one might also benefit from a real rather than an integer number of displayed stars although in a way this is quite ugly.

Having 2 judges check a puzzle before its release could in principle be an option (this would help to avoid enormous volatility that we could have now if the judge and the author disagree by a star (we introduced real number ratings for judges precisely to patch this) and also be more towards an expert majority vote), but at this moment the judge team can not yet handle double the verification work.

Another point of my concern is that our stars might have started to reflect the opinion of expert puzzlers, not taking into account too much what a beginner or a less skilled puzzler would like to see.

At the very beginning we considered merely asking every solver to grade the puzzle difficulty and its fun factor, but this might be annoying, so if things can be automated, this is preferred.

Concluding, I think some datamining and user input could be very helpful in improving the grading system. All your thoughts are welcome.

achan1058


[Avatar]
Joined: 19/04/2008 05:22:28
Messages: 17
Offline

Among some of the things I noticed (for puzzles on this site in general):
mathgrant's puzzle style is relatively close to Nikoli's, while I can't figure out Bram/Johan's styles at all, and often resulted in branch and bound. As Nikoli's methods are usually based on certain rules you can figure out, they tend to be a bit easier. In fact, I tend to mentally decrease the # of stars by 1 if the puzzle is by mathgrant. (not to say they are not enjoyable, but the rules used tend not to be as messy, and besides I know Nikoli's style relatively well, except for their nastiest puzzles)

If anything, I say a ranking system by user would be useful.
Cyclone



Joined: 12/10/2009 10:28:36
Messages: 27
Offline

Yes, maybe an (additional) option to vote for a difficulty after solving the puzzle would be more accurate than time taken. I for instance don't push myself to solve for time, I sometimes do other things simultaneously, so for me personally, the time spent would not match the difficulty I had with the puzzle.
I see the rating system like the one on youtube, in contrast to the one on puzzlemix (if you know the site) for example where you are almost compelled to rate the puzzle you solved, which is unnecessary pressure.
Of course a manual rating system can function alongside the automatic one. If the math is done right it can only get more accurate.
mathgrant


[Avatar]

Joined: 19/08/2008 20:52:44
Messages: 71
Offline

Here's another LITS puzzle that I'd like to bring up:

http://www.puzzlepicnic.com/puzzle?1289

It's rated one-star, but I don't think it's that easy at all. I had to use some semi-advanced techniques and use trial and error twice. This is far from easy, for an LITS puzzle. Now look at this puzzle:

http://www.puzzlepicnic.com/puzzle?1515

I don't think this is a one-star puzzle, either, but it's more straightforward, and I think a beginner would have an easier time starting this one than the former.

(I just realized I haven't composed any 1-star LITS puzzles. . .!)

A Cleverly-Titled Logic Puzzle Blog
[WWW]
achan1058


[Avatar]
Joined: 19/04/2008 05:22:28
Messages: 17
Offline

Indeed, I still do not see how to logically reason it out, without any trial and error.
mathgrant


[Avatar]

Joined: 19/08/2008 20:52:44
Messages: 71
Offline

Okay, here's a 10x10 LITS that I believe is definitely one-star material: http://mathgrant.blogspot.com/2009/11/puzzle-327-tetra-firma-22.html [Edit Dec 6, 2009: This puzzle is now on PuzzlePicnic at http://www.puzzlepicnic.com/puzzle?1603 .]

The solver never needs to use the rule that all the black cells are connected, and the rule that no two congruent tetrominoes may share an edge is only used once. This is the type of LITS puzzle I would recommend for a total newbie to solve, not the one-star puzzle I mentioned above.

(I am submitting this puzzle to my studio now.)

A Cleverly-Titled Logic Puzzle Blog
[WWW]
mucha



Joined: 19/05/2009 17:40:58
Messages: 104
Offline

Johan wrote:
The current method takes into account the impression of the puzzle maker, the judge and the solving time for registered members who solved it without printing or revealing.
 

Wow, I didn't know that you take all that into account! But the problem here is that I rarely try to solve the puzzlepicnic puzzles fast. I do that at nikoli, because they have time measurements in the user interface and actually store the time for you so you can compare it to your previous run or to others. At puzzlepicnic I usually take my take often doing some other things at the same time (as the name "picnic" suggests ), so while my timings are in general correlated with difficulty of the puzzle, there's a lot of noise. I would guess it might be similar with other people too. That's why I suggested the ratio of solvers/openers, but I guess that only works for really hard puzzles...

I like the idea of voting. The reliability of this idea could probably be increased by weighting the votes. Someone who solved plenty of puzzles and gave a lot of "reasonable" votes could be weighted higher thus improving the convergence rate.

Marcin
Maarten


[Avatar]
Joined: 22/12/2006 20:10:10
Messages: 619
Offline

There is indeed a lot of noise in the measured user-times, because people don't puzzle for time. We also don't want them to: we want this to be a place where you can solve a puzzle "at your leisure". This is why we don't say we time you, and why the times are not visible to anyone except the admins, and even for us only anonymised. So please don't start feeling any pressure after reading this thread.

That being said, there is some information to be extracted from the time data, which is why we do record them. In principle we try to judge the difficulty ourselves, but if the measured times consistently are much shorter or longer than our estimate, this may indicate something. Estimating difficulty is one of the hardest parts of puzzle judging...

Still, the current "weighting" solution is probably not optimal. Asking users to rate the difficulty may be an option to get more data. But there is more to the problem than just getting enough data; different users experience different puzzles / styles / genres differently, so getting the "right" number of stars is probably not always possible...

It's good to see some feedback on the system though.
achan1058


[Avatar]
Joined: 19/04/2008 05:22:28
Messages: 17
Offline

Even if there is bias with user feedback, the puzzle difficulty is at least positively correlated with the rating, for a given user, so we don't need a huge data set to make it work.
 
Forum Index -> Puzzle Feedback
Go to:   
Powered by JForum 2.1.6 © JForum Team