Kandu.dk - memset in a loop ..or not?


/ Forside / Teknologi / Udvikling / C/C++ / Nyhedsindlæg

Glemt dit kodeord?

Brugernavn*

Kodeord *

Husk mig

Brugerservice

Kom godt i gang

Bliv medlem

Seneste indlæg

Find en bruger

Stil et spørgsmål

Skriv et tip

Fortæl en ven

Pointsystemet

Kontakt Kandu.dk

Emnevisning

Kategorier

Alfabetisk

Karriere

Interesser

Teknologi

Reklame

Top 10 brugere

C/C++

#	Navn	Point
1	BertelBra..	2425
2	pmbruun	695
3	Master_of..	501
4	jdjespers..	500
5	kyllekylle	500
6	Bech_bb	500
7	scootergr..	300
8	gibson	300
9	molokyle	287
10	strarup	270

memset in a loop ..or not?
Fra : Jake

Dato : 15-09-10 15:08

I have a workbuffer with values that needs to be re-arranged,
so...initially...I did it like this:

for (i = (N - 1); i >= 0; i--)
{
workbuffer[Q * i] = workbuffer[i];
memset(&workbuffer[(Q * i) + 1], 0, (Q-1) * sizeof(int16));
}

but I was told not to use memset. I don't know exactly why I am not allowed
to use memset in a loop.
I guess it's not efficient enough? So I changed the code to this:

for (i = (N - 1); i >= 0; i--)
{
workbuffer[Q * i] = workbuffer[i];
for (j = 0; j < (Q - 1); j++)
{
workbuffer[(Q * i) + 1 + j] = 0;
}
}

Let's say we have a 16 cell workbuffer B.

Four values have been stored in the first 4 cells in the workbuffer: B[0],
B[1], B[2] and B[3] the remaining B[k] for k=4 to k=15 are undefined. The
code must re-arrange the 4 values so the workbuffer looks like this:

B[0],0,0,0,B[1],0,0,0,B[2],0,0,0,B[3],0,0,0

The code is for an interpolator and in the above example N is the number of
samples in the workbuffer before re-arrangement. So N would be 4! And Q
would be an interpolation factor equal to 4.

Any suggestions for improvement?

Comments about not using memset in a loop are also welcomed.

Thank you.

Bertel Brander (15-09-2010)

Kommentar
Fra : Bertel Brander

Dato : 15-09-10 19:00

Den 15-09-2010 16:07, Jake skrev:
> I have a workbuffer with values that needs to be re-arranged,
> so...initially...I did it like this:
>
> for (i = (N - 1); i >= 0; i--)
> {
> workbuffer[Q * i] = workbuffer[i];
> memset(&workbuffer[(Q * i) + 1], 0, (Q-1) * sizeof(int16));
> }
>
> but I was told not to use memset. I don't know exactly why I am not allowed
> to use memset in a loop.

The compiler has every chance to make memset at least as efficient
as anything else you can do, so if you need to set some memory
to something, go ahead and use memset.
It is in general not a good idea to loop backwards from N to 0.

> I guess it's not efficient enough? So I changed the code to this:
>
> for (i = (N - 1); i >= 0; i--)
> {
> workbuffer[Q * i] = workbuffer[i];
> for (j = 0; j < (Q - 1); j++)
> {
> workbuffer[(Q * i) + 1 + j] = 0;
> }
> }
>
> Let's say we have a 16 cell workbuffer B.
>
> Four values have been stored in the first 4 cells in the workbuffer: B[0],
> B[1], B[2] and B[3] the remaining B[k] for k=4 to k=15 are undefined. The
> code must re-arrange the 4 values so the workbuffer looks like this:
>
> B[0],0,0,0,B[1],0,0,0,B[2],0,0,0,B[3],0,0,0
>
> The code is for an interpolator and in the above example N is the number of
> samples in the workbuffer before re-arrangement. So N would be 4! And Q
> would be an interpolation factor equal to 4.
>
> Any suggestions for improvement?

For small blocks of memory, it can be a good idea to
"unroll" loops, so if Q in your case is small, it might
be better to:

for(i = 0; i < N; ++i)
{
workbuffer[Q * i] = workbuffer[i];
workbuffer[(Q * i) + 1 + 0] = 0;
workbuffer[(Q * i) + 1 + 1] = 0;
workbuffer[(Q * i) + 1 + 2] = 0;
workbuffer[(Q * i) + 1 + 3] = 0;
}

But as for any optimization, first check if you need
to do the optimization and then measure what is the
most efficient solution.

Arne Vajhøj (16-09-2010)

Kommentar
Fra : Arne Vajhøj

Dato : 16-09-10 02:31

On 15-09-2010 13:59, Bertel Brander wrote:
> Den 15-09-2010 16:07, Jake skrev:
>> I guess it's not efficient enough? So I changed the code to this:
>>
>> for (i = (N - 1); i >= 0; i--)
>> {
>> workbuffer[Q * i] = workbuffer[i];
>> for (j = 0; j < (Q - 1); j++)
>> {
>> workbuffer[(Q * i) + 1 + j] = 0;
>> }
>> }
>>
>> Let's say we have a 16 cell workbuffer B.
>>
>> Four values have been stored in the first 4 cells in the workbuffer:
>> B[0],
>> B[1], B[2] and B[3] the remaining B[k] for k=4 to k=15 are undefined. The
>> code must re-arrange the 4 values so the workbuffer looks like this:
>>
>> B[0],0,0,0,B[1],0,0,0,B[2],0,0,0,B[3],0,0,0
>>
>> The code is for an interpolator and in the above example N is the
>> number of
>> samples in the workbuffer before re-arrangement. So N would be 4! And Q
>> would be an interpolation factor equal to 4.
>>
>> Any suggestions for improvement?
>
> For small blocks of memory, it can be a good idea to
> "unroll" loops, so if Q in your case is small, it might
> be better to:
>
> for(i = 0; i < N; ++i)
> {
> workbuffer[Q * i] = workbuffer[i];
> workbuffer[(Q * i) + 1 + 0] = 0;
> workbuffer[(Q * i) + 1 + 1] = 0;
> workbuffer[(Q * i) + 1 + 2] = 0;
> workbuffer[(Q * i) + 1 + 3] = 0;
> }

I consider manual loop unrolling as a thing of the
past (late 80's early 90's).

Today I would expect the compiler to do that type
of optimizations.

(possible controlled by a compiler directive)

Arne

Arne Vajhøj (16-09-2010)

Kommentar
Fra : Arne Vajhøj

Dato : 16-09-10 02:29

On 15-09-2010 10:07, Jake wrote:
> I have a workbuffer with values that needs to be re-arranged,
> so...initially...I did it like this:
>
> for (i = (N - 1); i >= 0; i--)
> {
> workbuffer[Q * i] = workbuffer[i];
> memset(&workbuffer[(Q * i) + 1], 0, (Q-1) * sizeof(int16));
> }
>
> but I was told not to use memset. I don't know exactly why I am not allowed
> to use memset in a loop.

I think you should ask why.

> I guess it's not efficient enough? So I changed the code to this:
>
> for (i = (N - 1); i >= 0; i--)
> {
> workbuffer[Q * i] = workbuffer[i];
> for (j = 0; j < (Q - 1); j++)
> {
> workbuffer[(Q * i) + 1 + j] = 0;
> }
> }
>
> Let's say we have a 16 cell workbuffer B.
>
> Four values have been stored in the first 4 cells in the workbuffer: B[0],
> B[1], B[2] and B[3] the remaining B[k] for k=4 to k=15 are undefined. The
> code must re-arrange the 4 values so the workbuffer looks like this:
>
> B[0],0,0,0,B[1],0,0,0,B[2],0,0,0,B[3],0,0,0
>
> The code is for an interpolator and in the above example N is the number of
> samples in the workbuffer before re-arrangement. So N would be 4! And Q
> would be an interpolation factor equal to 4.
>
> Any suggestions for improvement?

I am skeptical about this being faster than memset.

I think it is safe to assume that the memset code has been
optimized - it can not be less optimized than your loop.

memset may use a special instruction for the specific CPU
architecture instead of a loop.

The only drawback of memset I can think of is function call
overhead. But then many compilers allow inlining of that call.

Arne

Søg

Reklame

Statistik

Spørgsmål :	177674
Tips :	31970
Nyheder :	719565
Indlæg :	6409808
Brugere :	218896

Månedens bedste

Årets bedste

Sidste års bedste