Compile Directions

Subforum for discussion and help with ScummVM's Nintendo DS port

Moderator: ScummVM Team

krunkster
Posts: 2
Joined: Mon Feb 19, 2007 11:03 pm

Compile Directions

Post by krunkster »

Hello,
I looked in the wiki for compile directions for ScummVM DS and there were none so I tried myself. I have devKitPro installed and have successfully built many other NDS projects... unfortunately not this one.

I have the Code checked out from SVN and I'm attempting to compile from the backends/platform/ds directory.

This starts out okay and builds the arm7 directory fine, and then I end up with inevitable compile errors in the arm9 code for dsmain.cpp

I tried using the 0.9.1 Tag instead of Trunk but I get the same types of errors:
dsmain.cpp: At global scope:
dsmain.cpp:216: error: 'f32' does not name a type
dsmain.cpp: In function 'void DS::displayMode8Bit()':
dsmain.cpp:439: error: 'VRAM_A_MAIN_BG_0x6000000' was not declared in this scope
dsmain.cpp:440: error: 'VRAM_B_MAIN_BG_0x6020000' was not declared in this scope
dsmain.cpp:442: error: 'VRAM_C_SUB_BG_0x6200000' was not declared in this scope
dsmain.cpp:443: error: 'VRAM_D_MAIN_BG_0x6040000' was not declared in this scope
dsmain.cpp: In function 'void DS::addEventsToQueue()':
dsmain.cpp:1050: error: 'penY' was not declared in this scope
dsmain.cpp:1051: error: 'penX' was not declared in this scope

This makes me think I'm missing some prereqs or something to be able to compile the NDs binaries. If anyone has done this successfully and can let me know what prereqs are needed besiides devKitPro (which I've updated already) I would appreciate the help.

Thanks.

agentq
ScummVM Porter
Posts: 806
Joined: Wed Dec 21, 2005 4:29 pm
Location: London, UK

Post by agentq »

Your problem is that you're compiling with DevkitPro r20. This was only released a few weeks ago and the current SVN of ScummVM DS only supports r19b.

I have a version which supports r20, but it isn't quite ready to check in yet. So for now, download r19b and compile with that, and you should be fine.

krunkster
Posts: 2
Joined: Mon Feb 19, 2007 11:03 pm

Post by krunkster »

Thanks for the info.

I was able to build 0.9.1 from
https://scummvm.svn.sourceforge.net/svn ... ease-0-9-1

Using devKitPro r19b
http://sourceforge.net/project/showfile ... _id=434282

And libnds-20060718
http://sourceforge.net/project/showfile ... _id=432975

The Trunk version requires libfat-20060813 as well:
http://sourceforge.net/project/showfile ... _id=439134

BTW I've asked Ender for a wiki account and I plan to document how to compile for Nintendo DS once I get the account.

Thanks again for all your work.

Edited: Figured out my errors

P.S. My reason for compiling on my own was to change the 200% zoom to 100% zoom, very quick hack since I was playing MI and wanted that feature... but now with the new CPU scaler I don't even need that option... very nice work.
Definately kills the FPS, but the text is much more readable... great work.

agentq
ScummVM Porter
Posts: 806
Joined: Wed Dec 21, 2005 4:29 pm
Location: London, UK

Post by agentq »

The trunk version doesn't actually require libfat. There was a header include in libcartreset which mistakenly includes it. I've got rid of that dependancy now, so if you update from SVN you will no longer need libfat.

Hopefully the software scalar should get a lot faster before release, although it's still impressively fast now considering the amount of work it has to do.

Tramboi
Posts: 42
Joined: Fri Sep 22, 2006 7:20 pm

Post by Tramboi »

Hi guys,
and welcome aboard krunkster!

I put a bit of energy into optimizing the scaler in C++ (no ARM assembly yet) with various techniques, and for now I can't get better than the version on the trunk (33 ms)
I'm open to any new approach (not asm yet just higher-level optis) so come on guys flood me with ideas!
I'm not experimented with this platform.

The current technique relies on this:

Code: Select all

* Unpack the palette from 1555 to 8888
  * For each line
    * For each 5 pixels block
    * Do the 5 lookups in the unpacked palette
    * Compute the 4 8888 filtered pixels save the div by 5
    * Isolate each component and do the div by 5 with a table lookup
    * Repack the resulting pixels
    * Store
The benefit of unpacking the palette is allowing the filtering (4x+y, 3x+2y, 2x+3y, and x+4y) to be done SIMD-like on all components.

I'm wondering if we couldn't leverage the pixel processing hardware to do some stuff for instance.
Or the ARM7.
Or another approach :)

Cheers,
Bertrand

BTW : I need a better ARM assembler than the GCC inline assembler. What do you use? gas? another one?

fingolfin
Retired
Posts: 1466
Joined: Wed Sep 21, 2005 4:12 pm

Post by fingolfin »

I am not 100% sure what code we are talking about here. But it sounds a lot like aspect ratio correction code (which stretches 4 lines to 5), resp. it's "inverse" (i.e. merge 5 pixels into one). If that is indeed what it is, here's another suggestion: The code in graphics/scaler/aspect.cpp by default does not use this exact method to scale 4 to 5 (dubbed kSlowAndPerfectAspectMode). Rather it's using the "kFastAndNiceAspectMode" setting, which is a *lot* faster, at the cost of accuracy. Basically, it uses different weights and a scale of 4 not 5, and this allows the code be a *lot* faster.

Applied to this case, assume you want to scale x0, x1, x2, x3, x4 to produce y0, y1, y2, y3. Ordinarily, you would do:
y0 = (4*x0 + x1)/5;
y1 = (3*x1 + 2*x2)/5;
y2 = (2*x2 + 3*x3)/5;
y3 = ( x3 + 4*x4)/5;

To actually code this, though, you have to unpack, then scale the R, G, B components separately, scale them, and repack. The scaling is expensive -- which is why you use a table lookup for it, I guess. (Side note: it's of course not always a good idea for generic code to use table lookups like this, unless you know definitely that the (cached) memory access will be faster than the division. Which depends a lot on the specific target CPU. Which, luckily, in your case is known in advance :-).

Back to the topic: Applying the trick of the aspect ration code to your case would change the above transformation to the following ones:
y0 = (3*x0 + x1)/4;
y1 = (2*x1 + 2*x2)/4;
y2 = (2*x2 + 2*x3)/4;
y3 = ( x3 + 3*x4)/4;

This is of course not exact anymore -- but at least for the ASR code, it proved to be "good enough". Whether this is also the case for downscaling would need to be tested experimentally, of course.
Anyway, now one can perform various optimizations. for starters, now only a simple shift is needed instead of a division/table lookup. More optimizations might be possible: If you are in the 555->555 case, for example, you could now try to interpolate the R and B components together. This "trick" is demonstrated by the interpolate16_2 template in intern.h; this reduces the work to 2/3 of what you need when processing each component on its own. Alas, whether this is useable once again depends on the target architecture and how efficiently it can deal with immediates resp. whether you have enough free registers to store the needed AND masks across calls to your interpolation code.
Even for the 1888->555 you might be able to apply this at least partially to save some shifts; the same caveats apply, though.

Tramboi
Posts: 42
Joined: Fri Sep 22, 2006 7:20 pm

Post by Tramboi »

Hi Fingolfin,

you're right, avoiding the div-by-5 (which is a rounded div, so in fact it is more a x -> (x + 5) / 10 in C terminology) would lead to many nice optimizations, but I'd love to go the furthest possible with "exact" computations to preserve the original colors :)

Yet,

f0(x, y) = (4x+y) / 5
f0approx(x,y = (3x+y) / 4

f1(x, y) = (3x+2y) / 5
f1approx(x,y = (2x+2y) / 4

The error for both functions are:
e0(x, y) = abs(x-y)/20
e1(x, y) = 2*e0(x,y)

for x in [0,31] and y in [0,31],
e0max = 1,55 (5%)
e1max = 3 (10%)

The errors appear only on the most contrasted edges, so it might be good enough, I will code it as soon as I can compile the trunk again!

Bertrand

BTW, shifts are quite inexpensive on ARM as they can be included in many opcodes for just one added cycle.

PS : Now I do Xbox360 at work, I understand why you like Altivec :)

fingolfin
Retired
Posts: 1466
Joined: Wed Sep 21, 2005 4:12 pm

Post by fingolfin »

Tramboi wrote:Hi Fingolfin,

you're right, avoiding the div-by-5 (which is a rounded div, so in fact it is more a x -> (x + 5) / 10 in C terminology) would lead to many nice optimizations, but I'd love to go the furthest possible with "exact" computations to preserve the original colors :)
Oh, I fully understand that, I'd do the same! If the exact code is fast enough, by all means use it. But in the end, if one has to choose between exact code that's too slow to run smoothly and slightly not exact code that runs fast enough, well the choice is clear.

Of course the choices gets more complicated when the exact code is fast enough in some cases but too slow in others. For the aspect ratio code, for example, the exact code always was fast enough on my system, but on lower end machines it just was too slow (using -O2 made a *huge* difference, BTW).

Well I guess at least that variable (machine speed) can be eliminated on the DS :-). But I would tend to assume that the load still differs depending on which game is being run, with what audio code etc... But certainly you know about it more than I, given that I don't even have a DS.

One way to deal with this choice is to offer a config option to the user. Alas, I tend to believe that one shouldn't make something configurable just because one is afraid to make a decision. My personal gut feeling is: If the non-exact code looks good in all cases (i.e. also in hard-contrast example) and is a lot faster, use it, always (and keep around the exact code, #ifdef'ed out, like we do with the aspect ratio code). (And just to make it crystal clear: this is just my gut feeling, not at all an "order", please do as you feel is appropriate :-).
Tramboi wrote:The errors appear only on the most contrasted edges, so it might be good enough, I will code it as soon as I can compile the trunk again!
Good luck, I am looking forward to seeing it (even though I have no DS). If the code is in pure C/C++, maybe we could even move it graphics/scalers, so that other platforms can benefit from it, too...

Tramboi wrote:PS : Now I do Xbox360 at work, I understand why you like Altivec :)
:lol:

agentq
ScummVM Porter
Posts: 806
Joined: Wed Dec 21, 2005 4:29 pm
Location: London, UK

Post by agentq »

Tramboi, are you sure that a table lookup is faster than doing the arithmetic? As far as I know, shifts that are inline with another instruction are free on the ARM9. Can't get much faster than that!

Good work on the scalar so far!

Tramboi
Posts: 42
Joined: Fri Sep 22, 2006 7:20 pm

Post by Tramboi »

agentq wrote:Tramboi, are you sure that a table lookup is faster than doing the arithmetic? As far as I know, shifts that are inline with another instruction are free on the ARM9. Can't get much faster than that!

Good work on the scalar so far!
Yep, if we ditch accuracy for performance and use Max's approximation, we can leverage this to get real faster with the div-by-4! :)
Hope I'll be satisfied with the error :oops:

blackcoat
Posts: 1
Joined: Wed May 09, 2007 12:44 am

Post by blackcoat »

Back to Krunkster's original question for a moment...

After upgrading to devKitARM r20, I ran into the exact same problem. Then, I noticed this line in nds/arm9/video.h:
Revision 1.32 2007/01/14 11:41:51 wntrmute
add leading zero to VRAM macros with addresses
So, VRAM_A_MAIN_BG_0x6000000 was renamed to VRAM_A_MAIN_BG_0x06000000, and so on. Just change the names in your source to get it working with r20.

agentq
ScummVM Porter
Posts: 806
Joined: Wed Dec 21, 2005 4:29 pm
Location: London, UK

Post by agentq »

If you're using the source downloaded from the downloads page, this is not the development version and you will have problems getting it to work with Devkitpro r20.

In order to build the code, always check out the latest source from Subversion.

See here: http://sourceforge.net/svn/?group_id=37116

bjorkmann
Posts: 3
Joined: Mon Mar 31, 2008 3:18 pm
Location: London

Post by bjorkmann »

I know this topic is old, but I thought I'd resurrect it, rather than start a new one. I was looking for krunkster's wiki update on how to build, but not much joy there.

I'm trying to build scummvm ds 0.11.1, but I always get the following error. It's driving me crazy! For some reason make thinks scummvm.nds is a make target rather than output?

I managed to get 0.9.1 going once I commented out the user_debugger.h (which I see has the right ifdef around it in the later version) but the 0.11.1 and TRUNK elude me. Both report the same issue. Also, you've checked in your libnds path change in the arm7, arm9 Makefiles for TRUNK - /home/neil/devkitpror21/libnds :)

I've tried versions of devKitARM from 19b-21 to no avail. Any assistance would be greatly appreciated. I'm sure I'm doing something dumb, but what?

Code: Select all

built ... arm7.bin
makefile:494: warning: overriding commands for target `.cpp.o'
/Users/andy/Development/scummvm/r0-11-1/backends/platform/ds/../../../Makefile.common:94: warning: ignoring old commands for target `.cpp.o'
make -C ./ -f ../makefile scummvm.nds
make[2]: *** No rule to make target `scummvm.nds'.  Stop.
make[1]: *** [ndsall] Error 2
make: *** [all] Error 2

bjorkmann
Posts: 3
Joined: Mon Mar 31, 2008 3:18 pm
Location: London

Post by bjorkmann »

Haha!! Ignore me because I am lame! (just a little).

It's in arm9/makefile

# Select the build you want by uncommenting one of the following lines:

Never mind me...

agentq
ScummVM Porter
Posts: 806
Joined: Wed Dec 21, 2005 4:29 pm
Location: London, UK

Post by agentq »

Glad you worked it out! You also may have to comment out the USE_MAD = 1 line, unless you have compiled the libmad for ARM, and placed it in the libs folder.

Post Reply