如何消除在文件

我有一个file.txt的如后续等于行：如何消除在文件

1. 0. 3.21 
1. 1. 2.11 
1. 2. 1.554 
1. 0. 3.21 
1. 3. 1.111 
1. 2. 1.554

正如你可以看到我有两条线中等于彼此（第一，第四和第三和第六）。我的尝试是消除是平等的，以获得类似的线路：

1. 0. 3.21 
1. 1. 2.11 
1. 2. 1.554 
1. 3. 1.111

我Fortran程序做的尝试是：

 program mean 
     implicit none 
     integer :: i,j,n,s,units 
     REAL*8,allocatable:: x(:),y(:),amp(:) 

      ! open the file I want to change 

      OPEN(UNIT=10,FILE='oldfile.dat') 
      n=0 
      DO 
       READ(10,*,END=100)   
       n=n+1 
      END DO 

    100  continue 
      rewind(10) 
     allocate(x(n),y(n),amp(n)) 
    s=0 

     ! save the numbers from the file in three different vectors 

     do s=1, n 
      read(10,*) x(s), y(s),amp(s) 
     end do 
     !---------------------! 

    ! Open the file that should contains the new data without repetition  
    units=107 
    open(unit=units,file='newfile.dat') 

    ! THIS SHOULD WRITE ONLY NOT EQUAL ELEMENTS of THE oldfile.dat: 
    ! scan the elements in the third column and write only the elements for which 
    ! the if statement is true, namely: write only the elements (x,y,amp) that have 
    ! different values in the third column. 

    do i=1,n 
     do j = i+1,n 
     if (amp(i) .ne. amp(j)) then ! 
     write(units,*),x(j),y(j),amp(j) 
     end if 
     end do 
    end do 
    end program

但输出文件看起来像这样：

1.000000  1.000000  2.110000  
    1.000000  2.000000  1.554000  
    1.000000  3.000000  1.111000  
    1.000000  2.000000  1.554000  
    1.000000  2.000000  1.554000  
    1.000000  0.0000000E+00 3.210000  
    1.000000  3.000000  1.111000  
    1.000000  2.000000  1.554000  
    1.000000  0.0000000E+00 3.210000  
    1.000000  3.000000  1.111000  
    1.000000  3.000000  1.111000  
    1.000000  2.000000  1.554000  
    1.000000  2.000000  1.554000

我不明白if条件的问题是什么，请问您能帮我一下吗？

非常感谢！

来源

2014-10-09 Panichi Pattumeros PapaCastoro

好多了。现在，您输入的文件是否真正代表了真实的输入文件？在典型的输入文件中会有多少行？ – 2014-10-09 13:34:15

@HighPerformanceMark yes与三个实型列和n行（其中n = 100000（或多或少，这是输出的一般行数）的矩阵）完全相同。 – 2014-10-09 14:01:03

无论算法考虑使用字符串操作来完成整个事情（假设文本表示中“相等”行相等）。它将简化代码，速度更快，并且您的输出将被自动格式化为与输入相同。 – agentp 2014-10-09 15:56:10

我不会修复你的方法我会完全放弃它。你得到的是一个O(n^2)算法，适用于少量线路，但在10^5线路上您将执行if语句0.5 * 10^10次。 Fortran的速度很快，但这是不必要的浪费。

我会先排序文件（O(n log n)）然后扫描它（O(n)）并消除重复。我可能不会使用Fortran对其进行排序，我会使用其中一个Linux实用程序，如sort。然后，我可能会使用uniq，并最终不做任何Fortran编程。

如果您想按原始顺序编写重复数据删除文件，那么我会添加一个行号，然后进行排序，唯一化，然后重新排序。

我相信Windows的最新版本，支持Powershell的版本，有相同的命令。

如果我绝对不得不在Fortran中完成所有这些工作，我会编写一个排序例程（或者相反，从我的一揽子技巧中抽出一个）并继续。我倾向于将字符串作为字符串进行读取，并对其进行文本分类，而不会混淆实数和他们棘手的平等概念。对于10^5行，我会将整个文件读入一个数组，然后将其排序到另一个数组中，然后继续。

最后，我认为您的if声明的逻辑是不可靠的。它决定是否仅根据第三个字段（即，不是）的第三个字段即amp的平等写入一个新文件的行。它肯定应该考虑对线i和j所有三个字段，更像

if (any([ x(i)/=x(j), y(i)/=y(j), amp(i)/=amp(j) ])) then

来源

2014-10-09 14:30:20

它的工作！它的工作非常好，而且速度很快。我使用'sort -n -k 3 oldfile.txt >> sort.txt'对文件进行排序，使其第三行的所有数字都等于其他数字。然后我只使用'uniq sort.txt >> newfile.txt'就是这样！非常感谢！ – 2014-10-09 16:08:49

只是为了修复蛮力循环，它应该是这样的：

do i=1,n 
    j=1 
    do while(j.lt.i.and.amp(i) .ne. amp(j)) 
    j=j+1 
    enddo 
    if(j.eq.i)write(units,*)x(i),y(i),amp(i) 
end do

或

do i=1,n 
    do j=1,i-1 
    if (amp(i) .eq. amp(j)) exit 
    enddo 
    if(j.eq.i)write(units,*)x(i),y(i),amp(i) 
end do

来源

2014-10-09 16:04:22 agentp

如何消除在文件

回答

相关问题